CN113190107B

CN113190107B - Gesture recognition method and device and electronic equipment

Info

Publication number: CN113190107B
Application number: CN202110282827.6A
Authority: CN
Inventors: 吴涛
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2023-04-14
Anticipated expiration: 2041-03-16
Also published as: CN113190107A

Abstract

The application discloses a gesture recognition method, a gesture recognition device and electronic equipment, wherein the method comprises the following steps: receiving a first gesture track input by a user at a first moment; inputting the first gesture track into a target gesture recognition model to obtain a first gesture instruction; acquiring a second gesture instruction at a second moment, wherein the second moment is earlier than the first moment, and the second gesture instruction is obtained by predicting a second gesture track input by the user at the second moment through the target gesture recognition model; and obtaining a target gesture instruction according to the first gesture instruction and the second gesture instruction. The method enables the electronic equipment to flexibly and accurately obtain the target gesture instruction representing the intention of the user.

Description

Gesture recognition method and device and electronic equipment

Technical Field

The present disclosure relates to the field of gesture recognition technologies, and in particular, to a gesture recognition method and apparatus, and an electronic device; the application also relates to a training method of the gesture recognition model.

Background

In the process that a user uses electronic devices such as Virtual Reality (VR), augmented Reality (AR), and Mixed Reality (MR), in order to enhance the user immersion, generally, an input instruction can be received by receiving a user gesture and performing gesture recognition, thereby realizing human-computer interaction.

When the existing electronic equipment realizes gesture recognition, several defined gestures are generally fixed, and a user learns the several gestures, so that when the user uses the electronic equipment, the interaction of the user in each scene is realized by recognizing the several gestures.

However, in the process of implementing the present application, the inventor finds that the existing gesture recognition method based on the fixed definition gesture may have a problem of recognition error when the gesture input by the user is a non-standard gesture or a gesture representing a special intention.

Disclosure of Invention

It is an object of the embodiments of the present disclosure to provide a new technical solution for gesture recognition to flexibly and accurately recognize a user gesture.

According to a first aspect of the present disclosure, there is provided a gesture recognition method, the method comprising:

receiving a first gesture track input by a user at a first moment;

inputting the first gesture track into a target gesture recognition model to obtain a first gesture instruction;

acquiring a second gesture instruction at a second moment, wherein the second moment is earlier than the first moment, and the second gesture instruction is obtained by predicting a second gesture track input by the user at the second moment through the target gesture recognition model;

and obtaining a target gesture instruction according to the first gesture instruction and the second gesture instruction.

Optionally, the target gesture recognition model comprises a first gesture recognition model and a second gesture recognition model, the first gesture recognition model is used for recognizing gesture tracks belonging to a first category; the second gesture recognition model is at least used for recognizing gesture tracks belonging to a second category and predicting gesture instructions of the user at the next moment;

the second gesture recognition model is obtained by training the following steps:

obtaining sample data, wherein the sample data comprises sample gesture characteristic information and user intention information, and the user intention information represents user intention corresponding to the sample gesture characteristic information;

and training by using the sample data to obtain the second gesture recognition model meeting a preset convergence condition.

Optionally, the sample gesture feature information is obtained by:

acquiring a sample gesture track input by any user in a sample time slice;

inputting the sample gesture track into the first gesture recognition model to obtain a candidate gesture instruction;

and obtaining the sample gesture characteristic information according to the candidate gesture instruction, the sample gesture track and the sample time slice.

Optionally, the obtaining the sample gesture feature information according to the candidate gesture instruction, the sample gesture trajectory, and the sample time slice includes:

acquiring at least one key point gesture corresponding to the candidate gesture instruction;

obtaining a sample gesture image corresponding to the key point gesture from the sample gesture track;

obtaining feature information corresponding to the sample gesture image by extracting the feature information of the sample gesture image; and the number of the first and second groups,

acquiring time information of the sample gesture image in the sample time slice;

and acquiring the sample gesture characteristic information by establishing characteristic representation information of the characteristic information and the time information.

Optionally, the user intention information is obtained by:

acquiring voice data which represents user intention and is input by a user when a sample gesture track is input, wherein the sample gesture feature information is obtained according to the sample gesture track;

and identifying semantic information of the voice data to obtain the user intention information.

Optionally, each of the first gesture instruction and the second gesture instruction comprises at least one gesture instruction;

the obtaining a target gesture instruction according to the first gesture instruction and the second gesture instruction comprises:

acquiring a matched gesture instruction from the first gesture instruction and the second gesture instruction to serve as a third gesture instruction;

and obtaining the target gesture instruction according to the third gesture instruction.

Optionally, when the third gesture instruction includes multiple gesture instructions, the obtaining the target gesture instruction according to the third gesture instruction includes:

obtaining confidence degrees corresponding to a plurality of gesture instructions in the third gesture instruction respectively;

and selecting a gesture instruction corresponding to the confidence coefficient with the numerical value meeting the preset condition from the third gesture instruction as the target gesture instruction.

According to a second aspect of the present disclosure, the present disclosure further provides a training method of a gesture recognition model, including:

and training by using the sample data to obtain a second gesture recognition model meeting a preset convergence condition, wherein the second gesture recognition model is at least used for recognizing gesture tracks belonging to a second category and predicting gesture instructions input by the user at the next moment.

According to a third aspect of the present disclosure, the present disclosure also provides a gesture recognition apparatus, including:

the gesture track receiving module is used for receiving a first gesture track input by a user at a first moment;

the first gesture command obtaining module is used for inputting the first gesture track into a target gesture recognition model to obtain a first gesture command;

the second gesture instruction obtaining module is used for obtaining a second gesture instruction at a second moment, wherein the second moment is earlier than the first moment, and the second gesture instruction is obtained by predicting by the target gesture recognition model according to a second gesture track input by the user at the second moment;

and the target gesture instruction obtaining module is used for obtaining a target gesture instruction according to the first gesture instruction and the second gesture instruction.

According to a fourth aspect of the present disclosure, there is also provided an electronic device comprising the apparatus according to the third aspect of the present disclosure; or,

the electronic device includes: a memory for storing executable instructions; and a processor, configured to execute the electronic device according to the control of the instruction to perform the method according to the first aspect or the second aspect of the present disclosure.

The method has the advantages that according to the embodiment of the disclosure, after receiving a first gesture track input by a user at a first moment, the electronic device can obtain a first gesture instruction by inputting the first gesture track into a target gesture recognition model; and acquiring a second gesture instruction at a second moment earlier than the first moment, and combining the first gesture instruction and the second gesture instruction to obtain a target gesture instruction. When gesture recognition is carried out, a gesture instruction corresponding to a gesture track input by a user at the current moment can be obtained through the target gesture recognition model, and a gesture instruction possibly input by the user at the next moment can be predicted, so that when the target gesture instruction is obtained by the electronic equipment at the next moment, the second gesture instruction predicted at the previous moment can be combined with the first gesture instruction obtained through current recognition, and the target gesture instruction representing the intention of the user can be flexibly and accurately obtained.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic flowchart of a gesture recognition method according to an embodiment of the present disclosure.

Fig. 2 is a schematic flowchart of a training method for a gesture recognition model according to an embodiment of the present disclosure.

Fig. 3 is a schematic block diagram of a gesture recognition apparatus provided in an embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< method embodiment I >

When a current electronic device, for example, a VR device, implements gesture recognition, several gestures are generally fixedly defined, and a user learns the several gestures, so that when the user uses the electronic device, the user can interact in various scenes by recognizing the several gestures performed by the user.

Although the above method can realize human-computer interaction based on user gestures, the gesture recognition method for fixedly defining gestures is not favorable for human-computer interaction if the number of the defined types of gestures is too small, and if the number of the defined types of gestures is too large, gestures may not be memorized by a user, so that when human-computer interaction is performed, due to the fact that gesture actions are mistaken, the electronic device cannot correctly recognize gesture instructions corresponding to the user gestures, and the problem of influencing user experience exists; on the other hand, the gesture recognition method for fixedly defining gestures recognizes fixed gestures, i.e., standard gestures, no matter how many types of gestures are, when the electronic device performs gesture recognition, but the electronic device often cannot correctly recognize non-standard gestures input by a user or other gestures representing special intentions, and there may be a problem of recognition errors.

In view of the above problems, an embodiment of the present disclosure provides a gesture recognition method, please refer to fig. 1, which is a schematic flow chart of the gesture recognition method provided in the embodiment of the present disclosure. The method can be applied to the electronic equipment, so that the electronic equipment can flexibly and accurately identify the user gesture to obtain a corresponding gesture instruction and make a corresponding response according to the gesture instruction, wherein the electronic equipment can be a server, the server is used for receiving a user gesture track acquired and sent by the terminal equipment, obtaining a target gesture instruction according to the gesture track and further controlling the terminal equipment to execute the corresponding response according to the instruction; alternatively, the electronic device may also be a terminal device directly, for example, a VR device, an AR device, an MR device, and the like, and is not limited herein. In this embodiment, unless otherwise specified, an electronic device to which the method is applied is described as an example of a VR device.

As shown in FIG. 1, the method of the present embodiment may include steps S1100-S1400, which are described in detail below.

In step S1100, a first gesture track input by a user at a first time is received.

In this embodiment, the first gesture track may be a track corresponding to a gesture motion made by a user at a first time, and the gesture motion may be a gesture motion of a first category, where the gesture motion of the first category may be a gesture motion preset in the electronic device, and is used to input a corresponding gesture instruction to the electronic device, for example, for a "confirm" instruction, the gesture motion may be a fist making motion; alternatively, the gesture motion may be a second category of gesture motion, wherein the second category of gesture motion may be a user-defined gesture motion, i.e., a non-standard motion, or a gesture motion indicating a particular user's intention.

In a specific implementation, a gesture track input by a user may be obtained by an electronic device by acquiring an image of a hand of the user through an image acquisition device, such as a video camera, connected to the electronic device and analyzing the image, for example, in a VR device, the gesture track may be obtained by analyzing an image acquired by one or more monochromatic fisheye tracking cameras built in the VR device; certainly, in specific implementation, the gesture trajectory input by the user may also be obtained by other methods, for example, a corresponding sensor may be arranged on a hand of the user, and the gesture trajectory input by the user is obtained through analysis according to the position information acquired by the sensor, which is not described herein again.

Step S1200, inputting the first gesture track into a target gesture recognition model to obtain a first gesture command.

Currently, when a user performs human-computer interaction with an electronic device, such as a VR device, based on gestures, at least one of the following interaction scenarios is generally included from an application level: 1. starting (Launcher) scene, the scene is generally used for selecting the intention application program, and the corresponding gesture actions are mainly 'selection' and 'confirmation', for example, in a VR device, a user generally only needs to represent the 'selection' gesture action and the 'confirmation' gesture action to select the intention application program under the main menu interface of the VR system; 2. the system sets a scene, the scene is generally used for browsing system settings and clicking to determine setting contents, and corresponding gesture actions are mainly 'pull down', 'flip up and slide' and 'confirm'; 3. the video film watching scene is generally used for determining and controlling the video to be watched, and the corresponding gesture actions mainly comprise an action of 'confirming' for selecting to play the video and actions for performing play control such as 'pause', 'play', 'fast forward', 'reverse', 'screen zoom' and the like; 4. the method comprises the following steps of (1) controlling webpage contents, wherein the scene is generally used for controlling the webpage contents, and corresponding gesture actions of the scene can comprise webpage browsing actions such as 'pull down', 'flip up', 'confirmation' and the like, and content editing actions such as 'copy', 'paste' and the like; 5. a multi-user social scenario, which is generally used for social interaction with other users remotely; 6. a multi-person cinema scene, in which users in different geographical locations, for example, in different cities or different countries, can watch movies in a virtual scene at the same time in different places, and can perform interactive communication through voice and gesture input; 7. the game scene is mainly used for carrying out game operation through corresponding gesture actions when a user carries out a game; 8. and a photographing scene, wherein the scene can contain some gesture actions for controlling photographing.

In specific implementation, the inventor finds that, when a user interacts with an electronic device based on gestures, one or more corresponding gesture actions can be preset for each scene according to different interaction scenes, so that the electronic device can accurately recognize corresponding gesture instructions by capturing gesture tracks of the user based on preset gesture actions corresponding to the interaction scenes under the condition that the current interaction scene of the user is determined, and then execute corresponding responses according to the gesture instructions. The method can improve the accuracy of gesture recognition to a certain extent, however, when performing gesture motions, the gesture motions of different users may be different from the standard motions, or different users may also be realized by some personalized gesture motions for the same intention, so that the electronic device may be recognized wrongly when performing gesture recognition, particularly when recognizing gesture motions of a second category such as non-standard gestures, personalized gestures, and the like, and therefore, in this embodiment, an intelligent target gesture recognition model is obtained through pre-training to solve the above problems.

In this embodiment, the target gesture recognition model may include a plurality of sub-models, for example, a first gesture recognition model and a second gesture recognition model, where the first gesture recognition model is used to recognize gesture actions belonging to a first category; the second gesture recognition model is at least used for recognizing gesture actions belonging to a second category and predicting gesture instructions of the user at the next moment.

Specifically, in this embodiment, according to interaction scenarios between the user and the electronic device, for example, the eight interaction scenarios, the gesture actions in each interaction scenario are counted and summarized to obtain a predefined gesture action library, where the gesture action library may include a plurality of predefined gesture actions belonging to the second category, and each gesture action corresponds to an interaction scenario, for example, in this embodiment, in the VR system, 85 general gesture actions may be predefined based on the eight interaction scenarios; thereafter, by collecting gesture trajectories of the 85 gesture actions made by several users, for example, 260 people or more, as sample gesture trajectories for training the first gesture recognition model; after acquiring and obtaining the sample gesture tracks and forming a training data set, the first gesture recognition model may be trained and obtained based on the training data set to recognize gesture actions input by the user and belonging to a first category, that is, to recognize common gesture actions input by the user. It should be noted that, in this embodiment, the first gesture recognition model and the second gesture recognition model may be convolutional neural network models, and the network structure of the convolutional neural network models is not specifically limited in this embodiment.

In specific implementation, after obtaining the first gesture recognition model through the above method, to solve the problem that the gesture motion belonging to the second category, for example, a non-standard gesture or a personalized gesture, cannot be correctly recognized in the prior art, in this embodiment, after obtaining the first gesture recognition model through training, sample gesture feature information for training the second gesture recognition model may be obtained based on the output of the first gesture recognition model, and based on the sample gesture feature information, the second gesture recognition model that can recognize the gesture motion belonging to the second category and predict a gesture command input by the user at the next time may be obtained through training, and how to obtain the second gesture recognition model through training will be described in detail below.

In one implementation, the second gesture recognition model may be obtained by training: obtaining sample data, wherein the sample data comprises sample gesture characteristic information and user intention information, and the user intention information represents user intention corresponding to the sample gesture characteristic information; and training by using the sample data to obtain the second gesture recognition model meeting a preset convergence condition.

Specifically, in order to improve the recognition accuracy of the personalized gesture, in the present embodiment, the second gesture recognition model may be trained and obtained using information representing the intention of the user and gesture feature information corresponding to the intention information as sample data.

In one embodiment, the user intention information is obtained by: acquiring voice data which represents user intention and is input by a user when a sample gesture track is input, wherein the sample gesture feature information is obtained according to the sample gesture track; and identifying semantic information of the voice data to obtain the user intention information.

In one embodiment, the target gesture recognition model may further include a Natural Language Understanding (NLU) model, and in implementation, when the second gesture recognition model is trained, and when the trained second gesture recognition model is used for recognizing the user gesture, the voice data representing the user's intention and input by the user may be recognized based on the Natural Language Understanding model to obtain the user intention information.

In one embodiment, the sample gesture characteristic information may be obtained by: acquiring a sample gesture track input by any user in a sample time slice; inputting the sample gesture track into the first gesture recognition model to obtain a candidate gesture instruction; and obtaining the sample gesture characteristic information according to the candidate gesture instruction, the sample gesture track and the sample time slice.

A sample time slice, which represents a time range corresponding to the user inputting the sample gesture trajectory, for example, when the user inputs the "punch" gesture, the corresponding time range is "9.

Specifically, in this embodiment, on the basis of the first gesture recognition model, the candidate gesture command obtained by recognizing the sample gesture trajectory according to the first gesture recognition model, and the sample time slice corresponding to the sample gesture trajectory input by the user may be obtained to obtain the sample gesture feature information.

In this embodiment, the obtaining the sample gesture feature information according to the candidate gesture instruction, the sample gesture trajectory, and the sample time slice includes: acquiring at least one key point gesture corresponding to the candidate gesture instruction; acquiring a sample gesture image corresponding to the key point gesture from the sample gesture track; obtaining feature information corresponding to the sample gesture image by extracting the feature information of the sample gesture image; and acquiring time information of the sample gesture image in the sample time slice; and acquiring the sample gesture characteristic information by establishing characteristic representation information of the characteristic information and the time information.

Specifically, in order to obtain more stable and reliable sample gesture feature information during training of the second gesture recognition model, in this embodiment, after the first gesture recognition model is used to recognize the input sample gesture trajectory and obtain one or more candidate gesture commands, at least one keypoint gesture may be obtained according to a standard gesture motion corresponding to the candidate gesture command, for example, for a "fist-making" gesture, the keypoint gesture may include a "palm-opening" gesture representing a starting motion, a "palm-half-holding" gesture representing an intermediate motion, and a "fist-making" gesture representing an ending motion; in specific implementation, after the key point gestures corresponding to the candidate gesture instruction are obtained, a sample gesture image corresponding to each key point gesture can be obtained from a sample gesture track; then, extracting characteristic information of each sample gesture image, and establishing association between the sample gesture image and corresponding time information of the sample gesture image in a sample time slice to obtain sample gesture characteristic information, wherein the time information can carry context information representing user intention; meanwhile, the user intention corresponding to the sample gesture track can be understood by further combining the user intention information input by the user, so that the recognition accuracy of the second gesture recognition model is improved.

After the second gesture recognition model is obtained through the processing and training, the second gesture recognition model can recognize the gesture instruction corresponding to the gesture track input at the current moment based on the intention of the user, and can predict the gesture instruction which is possibly input by the user at the next moment, namely the future moment, according to the intention of the user, so that the electronic equipment can comprehensively judge to obtain the target gesture instruction at the current moment according to the first gesture instruction recognized at the current moment and the second gesture instruction predicted at the previous moment.

Specifically, after step S1200, step S1300 is executed to obtain a second gesture command at a second time, where the second time is earlier than the first time, and the second gesture command is predicted by the target gesture recognition model according to a second gesture trajectory input by the user at the second time.

And S1400, obtaining a target gesture instruction according to the first gesture instruction and the second gesture instruction.

In one embodiment, the first gesture instruction and the second gesture instruction may each include at least one gesture instruction, in which case, obtaining the target gesture instruction according to the first gesture instruction and the second gesture instruction includes: acquiring a matched gesture instruction from the first gesture instruction and the second gesture instruction to serve as a third gesture instruction; and obtaining the target gesture instruction according to the third gesture instruction.

In a specific implementation, in a case that the third gesture instruction includes a plurality of gesture instructions, the obtaining the target gesture instruction according to the third gesture instruction includes: obtaining confidence degrees corresponding to a plurality of gesture instructions in the third gesture instruction respectively; and selecting a gesture instruction corresponding to the confidence coefficient with the numerical value meeting the preset condition from the third gesture instruction as the target gesture instruction.

That is, when the first gesture command and the second gesture command include a plurality of matched gesture commands, the first gesture recognition model and the second gesture recognition model may recognize or predict the gesture command and simultaneously input a Confidence level corresponding to the gesture command, where the Confidence level is a Confidence interval (Confidence interval) for representing one recognition result, that is, a value range for representing a correctness of the recognition result is generally between 0 and 1, and generally, the larger the value is, the larger the correctness is.

In specific implementation, the confidence that the numerical value meets the preset condition may be the confidence that the numerical value is the maximum, or may also be set according to needs, and is not particularly limited herein.

Certainly, in specific implementation, when the third gesture instruction includes a plurality of gesture instructions, the target gesture instruction may also be obtained by using other methods, for example, when the electronic device is a terminal device, for example, a VR device, the obtaining the target gesture instruction according to the third gesture instruction may be: displaying the third gesture instruction; and taking the gesture instruction selected by the user in the third gesture instruction as the target gesture instruction.

Of course, the above method for acquiring the target gesture command provided in this embodiment is only provided, and when the method is implemented specifically, the target gesture command may also be acquired by other methods, and the method is not limited specifically here.

It should be noted that, in the above, how to obtain the target gesture instruction is described by taking the third gesture instruction which is matched with the first gesture instruction and the second gesture instruction in the plurality of gesture instructions included in the first gesture instruction and the second gesture instruction as an example; in specific implementation, under the condition that the first gesture instruction and the second gesture instruction do not include a third gesture instruction matched with each other, the first gesture instruction predicted at the current first time may also be directly used as the target gesture instruction, or a gesture instruction meeting a preset condition is selected from a plurality of gesture instructions included in the first gesture instruction and used as the target gesture instruction, which is not described herein again.

In summary, in the gesture recognition method provided in this embodiment, after receiving a first gesture track input by a user at a first time, the electronic device may obtain a first gesture instruction by inputting the first gesture track into a target gesture recognition model; and acquiring a second gesture instruction at a second moment earlier than the first moment, and combining the first gesture instruction and the second gesture instruction to obtain a target gesture instruction. When gesture recognition is carried out, a gesture instruction corresponding to a gesture track input by a user at the current moment can be obtained through the target gesture recognition model, and a gesture instruction possibly input by the user at the next moment can be predicted, so that when the target gesture instruction is obtained by the electronic equipment at the next moment, the second gesture instruction predicted at the previous moment can be combined with the first gesture instruction obtained through current recognition, and the target gesture instruction representing the intention of the user can be flexibly and accurately obtained.

< method example two >

Corresponding to the above method embodiment, this embodiment further provides a training method for a gesture recognition model, please refer to fig. 2, which is a schematic flow diagram of the training method for a gesture recognition model provided in this embodiment. As shown in fig. 2, the method of the present embodiment may include steps S2100-S2200, which will be described in detail below.

Step S2100, sample data is obtained, wherein the sample data comprises sample gesture feature information and user intention information, and the user intention information represents user intention corresponding to the sample gesture feature information.

In one embodiment, the sample gesture feature information is obtained by: acquiring a sample gesture track input by any user in a sample time slice; inputting the sample gesture track into a first gesture recognition model to obtain a candidate gesture instruction; and obtaining the sample gesture characteristic information according to the candidate gesture instruction, the sample gesture track and the sample time slice, wherein the first gesture recognition model is used for recognizing gesture actions belonging to a first category.

In this embodiment, the obtaining the sample gesture feature information according to the candidate gesture instruction, the sample gesture trajectory, and the sample time slice includes: acquiring at least one key point gesture corresponding to the candidate gesture instruction; acquiring a sample gesture image corresponding to the key point gesture from the sample gesture track; obtaining feature information corresponding to the sample gesture image by extracting the feature information of the sample gesture image; and acquiring time information of the sample gesture image in the sample time slice; and acquiring the sample gesture feature information by establishing feature representation information of the feature information and the time information.

Step S2200, training by using the sample data to obtain a second gesture recognition model meeting a preset convergence condition, wherein the second gesture recognition model is at least used for recognizing gesture actions belonging to a second category and predicting a gesture instruction at the next moment.

< apparatus embodiment >

Corresponding to the above method embodiments, the present embodiment further provides a gesture recognition apparatus, as shown in fig. 3, the apparatus 3000 may be applied to an electronic device, and specifically may include a gesture track receiving module 3100, a first gesture instruction obtaining module 3200, a second gesture instruction obtaining module 3300, and a target gesture instruction obtaining module 3400.

The gesture track receiving module 3100 is configured to receive a first gesture track input by a user at a first time.

The first gesture instruction obtaining module 3200 is configured to input the first gesture trajectory into the target gesture recognition model, so as to obtain a first gesture instruction.

The second gesture instruction obtaining module 3300 is configured to obtain a second gesture instruction at a second time, where the second time is earlier than the first time, and the second gesture instruction is obtained by predicting, by the target gesture recognition model, according to a second gesture trajectory input by the user at the second time.

The target gesture instruction obtaining module 3400 is configured to obtain a target gesture instruction according to the first gesture instruction and the second gesture instruction.

In one embodiment, the first gesture instruction and the second gesture instruction each include at least one gesture instruction; when obtaining the target gesture instruction according to the first gesture instruction and the second gesture instruction, the target gesture instruction obtaining module 3400 may be configured to: acquiring a matched gesture instruction from the first gesture instruction and the second gesture instruction to serve as a third gesture instruction; and obtaining the target gesture instruction according to the third gesture instruction.

In this embodiment, in a case that the third gesture instruction includes a plurality of gesture instructions, when obtaining the target gesture instruction according to the third gesture instruction, the target gesture instruction obtaining module 3400 may be configured to: obtaining confidence degrees corresponding to a plurality of gesture instructions in the third gesture instruction respectively; and selecting a gesture instruction corresponding to the confidence coefficient with the numerical value meeting the preset condition from the third gesture instruction as the target gesture instruction.

< apparatus embodiment >

Corresponding to the above method embodiments, in this embodiment, an electronic device is further provided, which may include the gesture recognition apparatus 3000 according to any embodiment of the present disclosure, and is configured to implement the method according to any embodiment of the present disclosure.

As shown in fig. 4, the electronic device 4000 may further comprise a processor 4200 and a memory 4100, the memory 4100 being configured to store executable instructions; the processor 4200 is configured to operate the electronic device to perform a method according to any embodiment of the present disclosure under the control of instructions.

The various modules of the above apparatus 3000 may be implemented by the processor 4200 executing the instructions to perform a method according to any embodiment of the present disclosure.

The electronic device 4000 may be, for example, a VR, AR, MR device, etc., and is not particularly limited herein.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present disclosure is defined by the appended claims.

Claims

1. A gesture recognition method, comprising:

receiving a first gesture track input by a user at a first moment;

inputting the first gesture track into a target gesture recognition model to obtain a first gesture instruction; the target gesture recognition model comprises a first gesture recognition model and a second gesture recognition model, and the first gesture recognition model is used for recognizing gesture actions belonging to a first category; the second gesture recognition model is at least used for recognizing gesture actions belonging to a second category and predicting gesture instructions of the user at the next moment; the second gesture recognition model is obtained by training based on sample gesture feature information and user intention information used for representing the sample gesture feature information, and the sample gesture feature is determined according to output obtained by recognizing a sample gesture track through the first gesture recognition model and a sample time slice when the sample gesture track is input;

acquiring a second gesture instruction at a second moment, wherein the second moment is earlier than the first moment, and the second gesture instruction is obtained by predicting a second gesture track input by the user at the second moment by the target gesture recognition model;

2. The method of claim 1, the second gesture recognition model being trained by:

acquiring sample data, wherein the sample data comprises sample gesture feature information and user intention information;

3. The method of claim 2, the sample gesture feature information obtained by:

acquiring a sample gesture track input by any user in a sample time slice;

4. The method of claim 3, the obtaining the sample gesture feature information from the candidate gesture instruction, the sample gesture trajectory, and the sample time slice, comprising:

acquiring a sample gesture image corresponding to the key point gesture from the sample gesture track;

obtaining feature information corresponding to the sample gesture image by extracting the feature information of the sample gesture image; and (c) a second step of,

and acquiring the sample gesture feature information by establishing feature representation information of the feature information and the time information.

5. The method of claim 2, the user intent information obtained by:

6. The method of claim 1, the first gesture instruction and the second gesture instruction each including at least one gesture instruction;

7. The method of claim 6, where the third gesture instruction comprises a plurality of gesture instructions, the obtaining the target gesture instruction according to the third gesture instruction comprising:

obtaining confidence degrees corresponding to a plurality of gesture instructions in the third gesture instructions respectively;

8. A training method of a gesture recognition model comprises the following steps:

training by using the sample data to obtain a second gesture recognition model meeting a preset convergence condition, wherein the second gesture recognition model is at least used for recognizing gesture tracks belonging to a second category and predicting gesture instructions input by the user at the next moment; the sample gesture features are determined from output obtained by a first gesture recognition model recognizing a sample gesture trajectory and a sample time slice when the sample gesture trajectory is input; the first gesture recognition model is used for recognizing gesture actions belonging to a first category.

9. A gesture recognition apparatus comprising:

the first gesture command obtaining module is used for inputting the first gesture track into a target gesture recognition model to obtain a first gesture command; the target gesture recognition model comprises a first gesture recognition model and a second gesture recognition model, and the first gesture recognition model is used for recognizing gesture actions belonging to a first category; the second gesture recognition model is at least used for recognizing gesture actions belonging to a second category and predicting gesture instructions of the user at the next moment; the second gesture recognition model is obtained by training based on sample gesture feature information and user intention information used for representing the sample gesture feature information, and the sample gesture feature is determined according to output obtained by recognizing a sample gesture track through the first gesture recognition model and a sample time slice when the sample gesture track is input;

10. An electronic device comprising the apparatus of claim 9; or,

the electronic device includes:

a memory for storing executable instructions;

a processor configured to execute the electronic device to perform the method according to the control of the instruction, wherein the method is as claimed in any one of claims 1 to 8.