CN111103981B

CN111103981B - Control instruction generation method and device

Info

Publication number: CN111103981B
Application number: CN201911329945.7A
Authority: CN
Inventors: 刘思阳
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2024-06-11
Anticipated expiration: 2039-12-20
Also published as: CN111103981A

Abstract

The embodiment of the application provides a control instruction generation method, a control instruction generation device, electronic equipment and a computer readable storage medium, and relates to the field of data processing. The method comprises the following steps: acquiring continuous two-dimensional image frames; calculating user gesture keypoint information for each of the successive two-dimensional image frames, the user gesture keypoint information comprising: three-dimensional coordinates of the user gesture keypoints; identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames; and generating a corresponding control instruction according to the control intention. The application only needs to calculate the three-dimensional coordinates of each gesture key point of the user of each frame in the continuous two-dimensional image frames based on the acquired continuous two-dimensional image frames, and further determines the control intention of the user based on the three-dimensional coordinates of each gesture key point of the user of at least one frame in the continuous two-dimensional image frames, thereby realizing the recognition of the control intention of the user by adopting the two-dimensional image frames.

Description

Control instruction generation method and device

Technical Field

The present invention relates to the field of data processing, and in particular, to a control instruction generating method, a control instruction generating device, an electronic device, and a computer readable storage medium.

Background

The control intention of the user is acquired through the image, and the corresponding control instruction is obtained, so that convenience and diversity of operation of the user are improved, and the method is widely applied.

Currently, a depth data camera is generally used to acquire a depth image of a person, and a control intention of the person is identified based on the depth image. For example, an RGBD depth camera can acquire a depth image in addition to a normal color image, and based on the depth image acquired by the RGBD depth camera, the control intention of the person to be photographed is recognized.

However, in the prior art, a two-dimensional image acquired by a common camera cannot be used for identifying the control intention of a person.

Disclosure of Invention

The embodiment of the invention aims to provide a control instruction generation method, a control instruction generation device, electronic equipment and a computer readable storage medium, so as to solve the problem that a two-dimensional image acquired by a common camera cannot be used for identifying the control intention of a person. The specific technical scheme is as follows:

In a first aspect of the present invention, there is provided a control instruction generating method, including:

Acquiring continuous two-dimensional image frames;

calculating user gesture keypoint information for each of the successive two-dimensional image frames, the user gesture keypoint information comprising: three-dimensional coordinates of the user gesture keypoints;

Identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames;

and generating a corresponding control instruction according to the control intention.

Optionally, the identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames includes:

Inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model in sequence to obtain a matching control gesture corresponding to at least one frame;

And determining the control intention corresponding to the user according to the matched control gesture corresponding to the at least one frame.

Optionally, inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model in sequence to obtain a matching control gesture corresponding to at least one frame, which includes:

Inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture in sequence into the gesture matching model to obtain the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture;

and if the matching confidence coefficient exceeds a preset threshold value, determining the current control gesture as the matching control gesture corresponding to the at least one frame.

Optionally, the determining the control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame includes:

And if the matching control gesture is a static control gesture, determining the control intention corresponding to the matching control gesture according to the preset corresponding relation between the static control gesture and the control intention.

if the matching control gesture is a dynamic control gesture, detecting the variation of a control quantity gesture key point in the dynamic control gesture according to the current frame and at least one frame after the current frame time sequence;

and determining the control intention corresponding to the matched control gesture according to the dynamic control gesture and the variation of the control quantity gesture key point.

Optionally, the detecting, according to the current frame and at least one frame after the current frame time sequence, a change amount of a control amount gesture key point in the dynamic control gesture includes:

And inputting the current frame and at least one frame after the time sequence of the current frame into a variation determining model corresponding to the dynamic control gesture, and outputting the variation of the control quantity gesture key point in the dynamic control gesture.

inputting at least one frame after the current frame time sequence into the gesture matching model to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence;

and detecting the variation of the control quantity gesture key point in the dynamic control gesture according to the current frame and a target frame with the same matching control gesture as the matching gesture of the current frame in at least one frame after the current frame time sequence.

Optionally, the gesture matching model includes: a first fully-connected network, a second fully-connected network, and a third fully-connected network; inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into the gesture matching model in sequence to obtain the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture, wherein the method comprises the following steps:

Inputting user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and inputting control gesture key point information corresponding to each preset control gesture into the second fully-connected network;

and after adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the output vectors into the third fully-connected network, and outputting the matching confidence degree between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture through the third fully-connected network.

Optionally, the gesture keypoint information includes: gesture keypoint information, before calculating the user gesture keypoint information for each of the successive two-dimensional image frames, further comprises:

detecting human body key points of each frame in the continuous two-dimensional image frames to obtain a recognition result of the human body key points;

Determining left elbow coordinates and/or right elbow coordinates in each frame according to the identification result of the human body key points;

Determining a gesture detection area in each frame according to the left elbow coordinates and/or the right elbow coordinates; the gesture detection area comprises a finger key point and a wrist key point of a left hand, and/or a finger key point and a wrist key point of a right hand;

the calculating the user gesture key point information of each frame in the continuous two-dimensional image frames comprises:

user gesture keypoint information in a gesture detection region in each of the successive two-dimensional image frames is calculated.

Optionally, before calculating the user gesture key point information of each frame in the continuous two-dimensional image frames, the method further includes:

Performing face recognition on each frame in the continuous two-dimensional image frames, and determining authorized users in each frame in the continuous two-dimensional image frames;

Deleting gesture key points of unauthorized users in each frame, and reserving the gesture key points of the authorized users;

and calculating the gesture key point information of the authorized user reserved in each frame of the continuous two-dimensional image frames.

Optionally, before the identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames, the method further includes:

Judging whether a preparation gesture is received or not according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames;

the identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames comprises the following steps:

And when the preparation gesture is received, identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames.

Optionally, the method further comprises:

The control instruction is sent to controlled equipment; the controlled device is used for executing the control instruction.

In a second aspect of the present invention, there is also provided a control instruction generating apparatus, the apparatus including:

The image acquisition module is used for acquiring continuous two-dimensional image frames;

A gesture key point information calculation module, configured to calculate user gesture key point information of each frame in the continuous two-dimensional image frames, where the user gesture key point information includes: three-dimensional coordinates of the user gesture keypoints;

The control intention recognition module is used for recognizing the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames;

And the control instruction generation module is used for generating a corresponding control instruction according to the control intention.

Optionally, the control intention identifying module includes:

The matching control gesture determining sub-module is used for inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model in sequence to obtain at least one frame of corresponding matching control gesture;

And the first control intention recognition sub-module is used for determining the control intention corresponding to the user according to the matched control gesture corresponding to the at least one frame.

Optionally, the matching control gesture determination submodule includes:

The matching confidence determining unit is used for inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into the gesture matching model in sequence to obtain the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture;

And the matching control gesture determining unit is used for determining the current control gesture as the matching control gesture corresponding to the at least one frame if the matching confidence exceeds a preset threshold.

Optionally, the first control intention identifying sub-module includes:

and the first control intention recognition unit is used for determining the control intention corresponding to the matched control gesture according to the preset corresponding relation between the static control gesture and the control intention if the matched control gesture is the static control gesture.

Optionally, the first control intention identifying sub-module includes:

A change amount detection unit, configured to detect, if the matching control gesture is a dynamic control gesture, a change amount of a control amount gesture key point in the dynamic control gesture according to the current frame and at least one frame after the current frame timing;

And the second control intention recognition unit is used for determining the control intention corresponding to the matched control gesture according to the dynamic control gesture and the variation of the control quantity gesture key point.

Optionally, the variation detecting unit includes:

And the first variation detection subunit is used for inputting the current frame and at least one frame after the current frame time sequence into a variation determination model corresponding to the dynamic control gesture and outputting the variation of the control quantity gesture key point in the dynamic control gesture.

Optionally, the variation detecting unit includes:

the gesture determining subunit is used for inputting at least one frame after the current frame time sequence into the gesture matching model to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence;

And the second variation detection subunit is used for detecting the variation of the control quantity gesture key point in the dynamic control gesture according to the current frame and a target frame with the same matching control gesture as the matching gesture of the current frame in at least one frame after the current frame time sequence.

Optionally, the gesture matching model includes: a first fully-connected network, a second fully-connected network, and a third fully-connected network; the matching confidence determining unit includes:

An input subunit, configured to input user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and input control gesture key point information corresponding to each preset control gesture into the second fully-connected network;

And the matching confidence determining subunit is used for adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the output vectors into the third fully-connected network, and outputting the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture through the third fully-connected network.

Optionally, the gesture keypoint information includes: gesture keypoint information, the apparatus further comprising:

The human body key point identification module is used for carrying out human body key point detection on each frame in the continuous two-dimensional image frames to obtain an identification result of the human body key points;

The elbow coordinate determining module is used for determining the left elbow coordinate and/or the right elbow coordinate in each frame according to the identification result of the human body key points;

The gesture detection area determining module is used for determining a gesture detection area in each frame according to the left elbow coordinates and/or the right elbow coordinates; the gesture detection area comprises a finger key point and a wrist key point of a left hand, and/or a finger key point and a wrist key point of a right hand;

the gesture key point information calculation module includes:

And the gesture key point information first computing sub-module is used for computing the user gesture key point information in the gesture detection area in each frame of the continuous two-dimensional image frames.

Optionally, the apparatus further includes:

the face recognition module is used for recognizing the face of each frame in the continuous two-dimensional image frames and determining authorized users in each frame in the continuous two-dimensional image frames;

the deleting module is used for deleting the gesture key points of the unauthorized user in each frame and reserving the gesture key points of the authorized user;

the gesture key point information calculation module includes:

and the gesture key point information second calculation sub-module is used for calculating the gesture key point information of the authorized user reserved in each frame in the continuous two-dimensional image frames.

Optionally, the apparatus further includes:

a preparation gesture judging module, configured to judge whether a preparation gesture is received according to user gesture key point information of at least one frame of the continuous two-dimensional image frames;

The control intention recognition module includes:

and the second control intention recognition sub-module is used for recognizing the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames under the condition that the preparation gesture is received.

Optionally, the apparatus further includes:

The control instruction sending module is used for sending the control instruction to the controlled equipment; the controlled device is used for executing the control instruction. In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus; a memory for storing a computer program; and the processor is used for realizing any one of the control instruction generation methods when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform any of the control instruction generating methods described above.

In yet another aspect of the present invention there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the control instruction generating method of any of the above.

The embodiment of the invention provides a control instruction generation method and a device, wherein the method comprises the following steps: acquiring continuous two-dimensional image frames; calculating user gesture keypoint information for each of the successive two-dimensional image frames, the user gesture keypoint information comprising: three-dimensional coordinates of the user gesture keypoints; identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames; according to the control intention, a corresponding control instruction is generated, so that the problem that the control intention of a person cannot be identified by adopting a two-dimensional image acquired by a common camera can be solved to a great extent.

In the embodiment of the invention, the three-dimensional coordinates of each gesture key point of the user in each frame in the continuous two-dimensional image frames are calculated and obtained only based on the acquired continuous two-dimensional image frames, and the control intention of the user is further determined based on the three-dimensional coordinates of each gesture key point of the user in at least one frame in the continuous two-dimensional image frames, so that the control intention of the user is identified by adopting the two-dimensional image frames.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of steps of a control instruction generation method according to an embodiment of the present invention;

Fig. 2 is a schematic diagram of a human body key point provided in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a left-hand gesture key point according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating steps of a method for generating a control instruction according to another embodiment of the present invention;

FIG. 5 is a flowchart of steps for determining a matching control gesture in an embodiment of the present invention;

FIG. 6 is a schematic diagram of the operation of a gesture matching model in an embodiment of the present invention;

FIG. 7 is a flowchart of steps for calculating confidence in a match in an embodiment of the present invention;

FIG. 8 is a flowchart of steps for determining control intent in an embodiment of the present invention;

FIG. 9 is a flowchart illustrating steps of a method for generating a control instruction according to an embodiment of the present invention;

FIG. 10 is a flowchart illustrating steps for determining a gesture detection area according to an embodiment of the present invention;

FIG. 11 is a control instruction generating apparatus according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of a control instruction generating apparatus according to an embodiment of the present invention;

FIG. 13 is a further control instruction generating apparatus according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a control instruction generating method according to an embodiment of the present invention, where the method may be applied to a terminal or a controller of the terminal, and specific uses or types of the terminal, etc., and in this embodiment of the present invention, this is not limited specifically. For example, the terminal may include: video playback devices, and the like. For example, the method may be applied to television fruits. In the embodiment of the invention, the method mainly comprises the following steps:

step 101: successive two-dimensional image frames are acquired.

In the embodiment of the invention, the continuous two-dimensional image frames can be acquired based on a common or universal camera and the like. The two-dimensional image frame may be color, black and white, etc., and in the embodiment of the present invention, this is not particularly limited. The two-dimensional image frame may contain an image of the user. The continuous two-dimensional image frame may be a plurality of two-dimensional image frames that are sequential. The number of frames specifically included in the continuous two-dimensional image frame is not specifically limited.

Step 102: calculating user gesture keypoint information for each of the successive two-dimensional image frames, the user gesture keypoint information comprising: three-dimensional coordinates of the user gesture keypoints.

In the embodiment of the invention, the gesture key points of the user can be body parts and the like capable of representing the gesture of the user. For example, the gesture keypoints may include: human body keypoints, gesture keypoints, etc. of the user. The key points of the human body can specifically comprise: head bone keypoints, neck bone keypoints, right shoulder bone keypoints, left shoulder bone keypoints, right elbow bone keypoints, left elbow bone keypoints, right wrist bone keypoints, left wrist bone keypoints, right crotch bone keypoints, left crotch bone keypoints, right knee bone keypoints, left knee bone keypoints, right ankle bone keypoints, left ankle bone keypoints. The gesture keypoints may be wrist keypoints of the left and right hands and fingertips, root implants, and knuckles on each finger. Both left and right hands may include 21 gesture keypoints. For example, the gesture keypoints of the left hand may specifically include: wrist keypoints and gesture keypoints on each finger. The gesture keypoints on each finger may in turn comprise: finger tips, implant roots and two finger joints are 4.

For example, referring to fig. 2, fig. 2 is a schematic diagram of a key point of a human body according to an embodiment of the present invention. The skeletal points numbered 1 through 14 in fig. 2 may be human keypoints. There may be 14 human body key points.

For example, referring to fig. 3, fig. 3 is a schematic diagram of a left-hand gesture key point according to an embodiment of the present invention. The 21 gesture keypoints numbered 0 through 20 in FIG. 3 are the left hand.

In the embodiment of the invention, the three-dimensional coordinates of the gesture key points of the user are three-dimensional coordinates capable of reflecting the relative position relationship between the gesture key points. In the embodiment of the present invention, the respective gesture key points of the user may be first identified from each frame in the continuous two-dimensional image, for example, the respective gesture key points of the user may be identified from each frame in the continuous two-dimensional image based on Visual Geometry Group Network (visual geometry group) model or the like. In the embodiment of the present invention, this is not particularly limited. And then calculating the three-dimensional coordinates of each gesture key point according to the position information of each gesture key point in the two-dimensional image frame, the imaging parameters of a camera shooting the two-dimensional image frame and the like. Or training the three-dimensional coordinate recognition network of the user gesture key points in advance, inputting each frame in the continuous two-dimensional images into the three-dimensional coordinate recognition network of the user gesture key points, performing three-dimensional modeling and the like, and calculating the three-dimensional coordinates of the user gesture key points of each frame in the continuous two-dimensional images. In the embodiment of the present invention, this is not particularly limited.

Step 103: and identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames.

In the embodiment of the invention, the control intention of the user can be which operations the user wants to perform on the terminal, and the like. The correspondence between the user's gesture key point information and the user's control intention may be established in advance, and the control intention corresponding to the user's gesture key point information of at least one frame of the continuous two-dimensional image frames may be determined based on the correspondence.

Step 104: and generating a corresponding control instruction according to the control intention.

In the embodiment of the invention, the corresponding relation between the control intention and the control instruction can be set in advance. After the control intention of the user is acquired, a control instruction corresponding to the control intention of the user may be acquired in the correspondence relation. In the embodiment of the present invention, this is not particularly limited.

The control instruction may be used to control the terminal to perform a corresponding operation. If the determined control intention is to turn off the terminal, the control command may be "off", and the terminal is turned off after acquiring the control command.

In the embodiment of the invention, the three-dimensional coordinates of each gesture key point of the user in each frame in the continuous two-dimensional image frames are calculated and obtained only based on the acquired continuous two-dimensional image frames, and the control intention of the user is further determined based on the three-dimensional coordinates of each gesture key point of the user in at least one frame in the continuous two-dimensional image frames, so that the control intention of the user is identified by adopting the two-dimensional image frames. The two-dimensional image can be acquired by the common camera without an expensive depth data camera, so that the cost for acquiring the intention of the user through image information is reduced; meanwhile, the control intention of the user is determined based on the three-dimensional coordinates of the key points of the gestures of the user, so that the determined control intention errors caused by different shooting visual angles can be avoided, and the accuracy is high.

Referring to fig. 4, fig. 4 is a flowchart illustrating steps of a method for generating a control instruction according to another embodiment of the present invention, and the method is equally applicable to a terminal or a controller of a terminal, with specific reference to the foregoing description. In the embodiment of the invention, the method mainly comprises the following steps:

step 201: successive two-dimensional image frames are acquired.

In the embodiment of the present invention, the step 201 may refer to the aforementioned step 101 for avoiding repetition, and will not be described herein.

Step 202: calculating user gesture keypoint information for each of the successive two-dimensional image frames, the user gesture keypoint information comprising: three-dimensional coordinates of the user gesture keypoints.

In the embodiment of the present invention, the step 202 may refer to the foregoing step 102, and it should be noted that, before the step 202, the method may further include the following steps:

Step S1: and carrying out face recognition on each frame in the continuous two-dimensional image frames, and determining authorized users in each frame in the continuous two-dimensional image frames.

Step S2: and deleting the gesture key points of the unauthorized users in each frame, and reserving the gesture key points of the authorized users.

Specifically, face images of the authorized user and the like may be stored in advance, face recognition is performed on each of the continuous two-dimensional image frames based on the face images of the authorized user and the like stored in advance, and whether each of the continuous two-dimensional image frames includes the authorized user is first identified. If each of the continuous two-dimensional image frames does not include an authorized user, repeating the steps until the authorized user is identified in one of the continuous two-dimensional images. If the authorized user is identified, the authorized user and the unauthorized user in each frame of the continuous two-dimensional image frames are distinguished by using the face image of the authorized user and the like stored in advance. And deleting the gesture key points of the unauthorized user in each frame of the continuous two-dimensional image frames, only keeping the gesture key points of the authorized user, and then responding only to the authorized user, so that the operation of the terminal by the unauthorized user can be avoided, and the privacy of the authorized user and the like can be protected.

In the embodiment of the present invention, one or more authorized users corresponding to one terminal may be provided, which is not specifically limited in the embodiment of the present invention.

In an embodiment of the present invention, the step 202 may optionally include the following sub-steps: and calculating the gesture key point information of the authorized user reserved in each frame of the continuous two-dimensional image frames. Specifically, for each frame in the continuous two-dimensional image frames, only the gesture key points of the authorized user are reserved, only the gesture key point information of the authorized user reserved in each frame in the continuous two-dimensional image frames is calculated, and the gesture key point information of the unauthorized user in each frame in the continuous two-dimensional image frames is not calculated, so that on one hand, the operation amount is reduced, and the speed of calculating the gesture key point information is high; on the other hand, the control intention of the authorized user is recognized only according to the gesture key point information of the authorized user reserved in each frame in the continuous two-dimensional image frames, and the control intention of the unauthorized user is ignored, so that the operation of the terminal by the unauthorized user can be avoided, and the privacy of the authorized user is protected.

Step 203: and inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model in sequence to obtain a matching control gesture corresponding to at least one frame.

Specifically, a gesture matching model can be obtained by training in advance, and the gesture matching model is mainly used for outputting the similarity between the gesture key point information of the user, and the two types of gesture key point information with the similarity exceeding the set similarity are determined as the matched gesture key point information. Control gesture key point information corresponding to the control gesture may be preset in advance. Inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture in turn into a gesture matching model, wherein the gesture matching model sequentially calculates the similarity between the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture, and when the similarity between the user gesture key point information of the current frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to the preset current control gesture exceeds the set similarity, determining the current control gesture as the matching control gesture corresponding to the current frame in the continuous two-dimensional image frames. By adopting the gesture matching model, the user intention can be accurately identified.

In an embodiment of the present invention, optionally, referring to fig. 5, fig. 5 is a flowchart illustrating a step of determining a matching control gesture in an embodiment of the present invention. The step 203 may include the following steps:

Step 2031: inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture in sequence into the gesture matching model to obtain the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture;

step 2032: and if the matching confidence coefficient exceeds a preset threshold value, determining the current control gesture as the matching control gesture corresponding to the at least one frame.

Specifically, the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture may be sequentially input into the gesture matching model, the gesture matching model sequentially calculates a matching confidence between the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture, and determines the current control gesture as a matching control gesture corresponding to at least one frame in the continuous two-dimensional image frames when the matching confidence between the user gesture key point information of the current frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to the preset current control gesture exceeds a preset threshold. It should be noted that, the preset threshold may be set according to actual needs, which is not limited in particular in the embodiment of the present invention. And sequentially determining the matching confidence coefficient by sequentially combining the user gesture key point information of each frame in the continuous two-dimensional image frames with the control gesture key point information corresponding to each preset control gesture, thereby being beneficial to accurately determining the matching control gesture corresponding to at least one frame in the continuous two-dimensional image frames.

In the embodiment of the present invention, if each matching confidence coefficient corresponding to each frame in the continuous two-dimensional image frames does not exceed the preset confidence coefficient, the continuous two-dimensional image frames can be considered as invalid frames, or the continuous two-dimensional image frames can be considered as not finding the matching control gesture. If only one of the matching confidences corresponding to one frame in the continuous two-dimensional image frames has the matching confidence coefficient exceeding the preset threshold, determining the preset control gesture corresponding to the matching confidence coefficient exceeding the preset threshold as the matching control gesture corresponding to the continuous two-dimensional image frames. Or determining the preset control gesture corresponding to the matching confidence exceeding the preset threshold as the matching control gesture corresponding to the frame in the continuous two-dimensional image frames. If the plurality of frames in the continuous two-dimensional image frames determine the matching confidence coefficient exceeding the preset threshold, and if the plurality of frames are the same in the preset control gesture corresponding to the matching confidence coefficient exceeding the preset threshold, the preset control gesture corresponding to the matching confidence coefficient exceeding the preset threshold can be determined to be the matching control gesture corresponding to the continuous two-dimensional image frame. Or respectively determining preset control gestures corresponding to the matching confidence coefficient of the multi-frames exceeding a preset threshold value as the matching control gesture respectively corresponding to each frame in the multi-frames in the continuous two-dimensional image frames.

For example, the user gesture key point information of the first frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture are sequentially input into the gesture matching model, and the gesture matching model sequentially calculates each matching confidence between the user gesture key point information of the first frame and the control gesture key point information corresponding to each preset control gesture. Judging whether each matching confidence coefficient corresponding to the first frame has the matching confidence coefficient exceeding a threshold value, if so, determining a preset control gesture corresponding to the matching confidence coefficient exceeding a preset threshold value as the matching control gesture corresponding to the first frame, and if not, considering the first frame as an invalid frame or not finding the matching control gesture. And similar calculation processes are executed for the user gesture key point information of the second frame in the continuous two-dimensional image frames, and the like until the last frame is calculated. If the confidence of matching between the user gesture key point information of the fourth frame and the user gesture key point information of the second preset control gesture in the continuous two-dimensional image frames exceeds the preset threshold after the calculation is completed, and the confidence of matching between the user gesture key point information of the rest frames and the user gesture key point information of each preset control gesture does not exceed the preset threshold, the second preset control gesture can be determined to be the matching control gesture corresponding to the fourth frame in the continuous two-dimensional image frames. Or determining the second preset control gesture as a matching control gesture corresponding to the continuous two-dimensional image frame.

In the embodiment of the present invention, referring to fig. 6, fig. 6 is a schematic working diagram of a gesture matching model in the embodiment of the present invention. The gesture matching model may include: a first fully connected network, a second fully connected network, and a third fully connected network. The specific structure of the first fully-connected network, the second fully-connected network, and the third fully-connected network is not particularly limited. Referring to fig. 7, fig. 7 is a flowchart illustrating steps for calculating a confidence level of a match in an embodiment of the present invention. The step 2031 may include the steps of:

Step 20311: inputting user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and inputting control gesture key point information corresponding to each preset control gesture into the second fully-connected network.

Step 20312: and after adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the output vectors into the third fully-connected network, and outputting the matching confidence degree between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture through the third fully-connected network.

Specifically, the user gesture key point information of each frame in the continuous two-dimensional image frames may be input into the first fully-connected network, and the control gesture key point information corresponding to each preset control gesture may be input into the second fully-connected network. The first fully-connected network calculates the user gesture key point information of each frame and outputs a first vector. And the second fully-connected network calculates control gesture key point information corresponding to each preset control gesture and outputs a second vector. And adding the output first vector and the output second vector of the first fully-connected network and the second fully-connected network, inputting the added first vector and the added second vector into a third fully-connected network, and outputting the matching confidence degree between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture through the third fully-connected network.

Step 204: and determining the control intention corresponding to the user according to the matched control gesture corresponding to the at least one frame.

In the embodiment of the invention, the corresponding relation between the matching control gesture and the control intention can be established in advance, and the control intention corresponding to the user is determined according to the matching control gesture corresponding to at least one frame in the continuous two-dimensional image frames based on the corresponding relation.

In the embodiment of the present invention, the preset control gesture may include a static control gesture, and then the matching control gesture may also include a static control gesture. For example, the saluting gesture may be a static control gesture.

In the embodiment of the present invention, optionally, if the matching control gesture is a static control gesture, the control intent corresponding to the matching control gesture is determined according to a preset correspondence between the static control gesture and the control intent.

Specifically, the correspondence between the static control gesture and the control intention may be established in advance. If the matching control gesture corresponding to at least one frame in the continuous two-dimensional image frames is a static control gesture, the control gesture corresponding to the static control gesture can be determined as the control gesture corresponding to the matching control gesture according to the corresponding relation between the preset static control gesture and the control gesture,

In the embodiment of the present invention, the preset control gesture may include a dynamic control gesture, and then the matching control gesture may also include a dynamic control gesture. The dynamic control gestures may have a greater distinction from the static control gestures, while the various dynamic control gestures may have a greater degree of distinction between themselves. For example, a thumb to index finger ratio of one letter C may be a dynamic control gesture.

In an embodiment of the present invention, optionally, referring to fig. 8, fig. 8 is a flowchart of a step of determining a control intention in an embodiment of the present invention. The step 204 may include the following steps:

Step 2041: and if the matching control gesture is a dynamic control gesture, detecting the variation of the control quantity gesture key point in the dynamic control gesture according to the current frame and at least one frame after the current frame time sequence.

Step 2042: and determining the control intention corresponding to the matched control gesture according to the dynamic control gesture and the variation of the control quantity gesture key point.

Specifically, the dynamic control gesture may include a control amount gesture key point, which may be a gesture key point or the like that is easily changed from among the respective gesture key points in the dynamic control gesture. The number of control amount gesture key points in the dynamic control gesture may be one or more, and in the embodiment of the present invention, this is not particularly limited. For example, a thumb to index finger ratio of a letter C may be a dynamic control gesture, and thumb tips and index finger tips may be control volume gesture key points of the dynamic control gesture.

If the matching control gesture corresponding to the current frame in the continuous two-dimensional image frames is a dynamic control gesture, at least one frame after the current frame and the current frame time sequence can be compared to obtain the variation of the control quantity gesture key point in the current frame and at least one frame after the current frame time sequence. The change amount may be a position change amount of the control amount posture key point in the current frame and at least one frame after the current frame timing, or the like. In the embodiment of the present invention, this is not particularly limited. For example, the change amount may be any one of a translation amount, a rotation amount, and a scaling amount of the control amount gesture key point in the current frame and at least one frame after the current frame timing.

For example, if a thumb and index finger are compared to one letter C, the thumb tip and index finger tip may be the control volume gesture key points for the dynamic control gesture. And if the matching control gesture corresponding to the current frame in the continuous two-dimensional image frames is the dynamic control gesture. The amount of change in distance between the tip of the thumb and the tip of the index finger may be detected in the current frame and at least one frame subsequent to the timing of the current frame. If the distance between the finger tips of the thumb and the finger tips of the index finger of the user is determined to be gradually increased from the initial 5cm to 10cm according to the current frame in the continuous two-dimensional image frames and at least one frame after the time sequence of the current frame. And determining the variation of the control quantity gesture key points in the dynamic control gesture of the current frame and at least one frame after the time sequence of the current frame in the continuous two-dimensional image frames, wherein the variation is as follows: the distance between the thumb tip and the index finger tip gradually increases from the first 5cm to 10cm.

In the embodiment of the invention, the dynamic control gesture and the variable quantity of the control quantity gesture key point in the dynamic control gesture can be combined, and the corresponding relation between each combination and the control intention can be set in advance. Based on the correspondence relation, a dynamic control gesture corresponding to the current frame may be determined, and the detected control intention corresponding to the combination of the variation amounts of the control amount gesture key points in the dynamic control gesture may be determined by the current frame and at least one frame after the current frame timing.

In an embodiment of the present invention, optionally, the step 2041 may include: and inputting the current frame and at least one frame after the time sequence of the current frame into a variation determining model corresponding to the dynamic control gesture, and outputting the variation of the control quantity gesture key point in the dynamic control gesture.

Specifically, a variation determining model corresponding to the dynamic control gesture may be trained in advance for each preset dynamic control gesture, where the variation determining model is configured to receive the current frame and at least one frame after the current frame timing in the continuous two-dimensional image frames, compare the variation corresponding to the dynamic control gesture in the current frame and at least one frame after the current frame timing in the continuous two-dimensional image frames, and output the variation of the control quantity gesture key point in the dynamic control gesture in the current frame and at least one frame after the current frame timing in the continuous two-dimensional image frames. At least one frame after the current frame and the current frame time sequence can be input into the variation determining model corresponding to the dynamic control gesture trained in advance, and the variation of the control quantity gesture key point in the dynamic control gesture can be output. The variable quantity of the control quantity gesture key points in the dynamic control gesture can be accurately and rapidly obtained through the variable quantity determination model corresponding to the dynamic control gesture.

In an embodiment of the present invention, optionally, the step 2041 may include: inputting at least one frame after the current frame time sequence into the gesture matching model to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence; and detecting the variation of the control quantity gesture key point in the dynamic control gesture according to the current frame and a target frame with the same matching control gesture as the matching gesture of the current frame in at least one frame after the current frame time sequence.

Specifically, at least one frame after the current frame time sequence in the continuous two-dimensional image frames may be input into the gesture matching model, so as to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence. And selecting a target frame with the same matching control gesture as the matching gesture of the current frame from all frames in at least one frame after the current frame time sequence, and calculating the variation of the control quantity gesture key point in the dynamic control gesture only in the current frame and the target frame with the same dynamic control gesture as the current frame. The matching control gestures corresponding to the current frame and the target frame corresponding to the current frame are the same and are the same dynamic control gesture, and the variation of the control quantity gesture key point in the dynamic control gesture is calculated only based on the current frame and the target frame corresponding to the current frame, so that the false detection of the variation can be avoided, and the accuracy of the variation is improved.

Step 205: and generating a corresponding control instruction according to the control intention.

In the embodiment of the present invention, the description of this step with reference to the foregoing step 104 is omitted herein for avoiding repetition.

Step 206: the control instruction is sent to controlled equipment; the controlled device is used for executing the control instruction.

In the embodiment of the invention, the control instruction can be sent to the controlled device under the condition that the execution main body of the method and the controlled device are not the same device. The controlled device may be a device or the like that can execute the above-described operation instructions. For example, the controlled device may be a video playback terminal, such as a television or the like. The execution bodies of the steps 201 to 205 may perform data interaction with the controlled device, where the data interaction may be performed in a wired manner or a wireless manner, etc., and the control instruction may be sent to the controlled device, and the controlled device executes the control instruction to control the controlled device. The controlled equipment can be operated in a diversified manner, and the user operation can be facilitated.

For example, for the above example, if the matching control gesture corresponding to at least one frame of the two-dimensional image frames is recognized as: the saluting posture. If the control intention corresponding to the saluting gesture is: and controlling the controlled device to pause playing or continue playing. The control instruction determined according to the control intention may be: pause play or continue play. Then, a control instruction to pause or continue playing may be sent to the controlled device such as: a television. If the television is in the normal playing state of the video currently, after receiving the control instruction, the television executes the control instruction of suspending playing so as to stop playing of the current video. If the television is in the state of video pause, after receiving the control instruction, the television executes the control instruction of continuing to play so as to continue to play the current video. The user can conveniently control the television only through the gesture key point information, and the diversity of the television operated by the user is increased.

For another example, in the above example, if the matching control gesture corresponding to at least one frame of the continuous two-dimensional image frames is identified as: the thumb and index finger are compared with one letter C. From the current frame and at least one frame after the current frame in the continuous two-dimensional image frames, the distance between the thumb fingertip and the index finger fingertip is gradually increased from the initial 5cm to 10cm. If the thumb and the index finger are compared with one letter C, the control intention corresponding to the combination that the distance between the finger tips of the thumb and the index finger gradually increases from the initial 5cm to the initial 10cm is as follows: the controlled device was controlled to turn up the volume by 5 db. The control instruction determined according to the control intention may be: the volume is turned up by 5 db. Then, a control instruction to turn up the volume by 5 db may be sent to the controlled device such as: a television. If the current playing volume of the television is 20 dB, after receiving the control instruction, the television executes a control instruction of increasing the volume by 5 dB, and adjusts the playing volume of the television to 25 dB.

In the embodiment of the present invention, optionally, if the execution body and the controlled device in the steps 201 to 205 are the same device, the controlled device may directly execute the control instruction without executing the control instruction sending operation.

Referring to fig. 9, fig. 9 is a flowchart showing steps of a control instruction generating method according to an embodiment of the present invention, and the method is equally applicable to a terminal or a controller of the terminal, with specific reference to the foregoing description. In the embodiment of the invention, the method mainly comprises the following steps:

step 301: successive two-dimensional image frames are acquired.

In the embodiment of the present invention, the step 301 may refer to the aforementioned step 101, and in order to avoid repetition, the description is omitted here.

Step 302: the gesture keypoint information includes: and (3) carrying out human body key point detection on each frame in the continuous two-dimensional image frames according to the gesture key point information to obtain a recognition result of the human body key points.

In the embodiment of the invention, the gesture key point information comprises: gesture keypoint information. That is, the gesture key point information includes three-dimensional coordinates of the gesture key point. The gesture keypoints may be left-hand gesture keypoints, and/or right-hand gesture keypoints. The gesture key points may refer to the foregoing descriptions, and in order to avoid repetition, they are not repeated here.

In the embodiment of the invention, the human body key points can be detected for each frame in the continuous two-dimensional image frames, and the recognition result of the human body key points in each frame in the continuous two-dimensional image frames can be obtained. That is, individual human body keypoints are found in each of the successive two-dimensional image frames. Or, the coordinates of the key points of the respective human body are identified in each of the successive two-dimensional image frames.

Step 303: and determining the left elbow coordinates and/or the right elbow coordinates in each frame according to the identification result of the human body key points.

In the embodiment of the invention, the left elbow and/or the right elbow are human body key points, so that the left elbow coordinates and/or the right elbow coordinates in each frame can be determined based on the identification result of the human body key points. The left elbow coordinates, and/or the right elbow coordinates, may be left elbow, and/or the right elbow may have two-dimensional coordinates or three-dimensional coordinates in each frame, etc. In the embodiment of the present invention, this is not particularly limited.

Step 304: determining a gesture detection area in each frame according to the left elbow coordinates and/or the right elbow coordinates; the gesture detection area comprises a finger key point and a wrist key point of a left hand, and/or a finger key point and a wrist key point of a right hand.

In the embodiment of the invention, the gesture detection area including the finger key point and the wrist key point of the left hand and/or the finger key point and the wrist key point of the right hand in each frame can be determined according to the left elbow coordinate and/or the right elbow coordinate in the key points of the human body.

In an embodiment of the present invention, the step 304 may optionally include the following steps: the left wrist coordinates of the user and/or the right wrist coordinates of the user are determined in each frame. A gesture detection area is determined from the left wrist coordinates and the left elbow coordinates, and/or a gesture detection area is determined from the right wrist coordinates and the right elbow coordinates.

The user's left wrist coordinates and/or the user's right wrist coordinates may be determined in each frame. The left wrist coordinate and/or the right wrist coordinate may be two-dimensional coordinates or the like. Since the wrist is a gesture key point, the left wrist coordinate and/or the right wrist coordinate can be determined from each frame through the recognition result of the gesture key point. In the embodiment of the present invention, this is not particularly limited.

In the embodiment of the invention, the gesture detection area may cover all gesture keypoints of one hand, and at the same time, less other content is covered except all gesture keypoints. Or the gesture detection area may cover a portion of the gesture keypoints of one hand while covering less content than the portion of the gesture keypoints. Or the gesture detection area may cover all gesture keypoints for both hands, while covering less than all of the gesture keypoints described above. Or the gesture detection area may cover part of the gesture keypoints of both hands, while covering less content than the part of the gesture keypoints.

For example, FIG. 3 may be a gesture detection area determined from a frame.

In an embodiment of the present invention, optionally, the gesture detection area may be square, and the left elbow coordinate of the user may include: the first abscissa and the first ordinate, the wrist keypoints may include: left and/or right wrists, the user's left wrist coordinates may include: a third abscissa and a third ordinate. The user's right elbow coordinates may include: the second abscissa and the second ordinate, the right wrist coordinates of the user may include: a fourth abscissa and a fourth ordinate.

Referring to fig. 10, fig. 10 is a flowchart illustrating a step of determining a gesture detection area according to an embodiment of the present invention. The method can comprise the following steps:

step S1: subtracting the third abscissa from the first abscissa by 5 times to obtain a first difference.

Step S2: and dividing the first difference by 4 to obtain a target abscissa of the center of the gesture detection area.

Step S3: subtracting the third ordinate from the first ordinate by a factor of 5 gives a second difference.

Step S4: and dividing the second difference by 4 to obtain the target ordinate of the center of the gesture detection area.

Step S5: subtracting the third abscissa from the first abscissa to obtain a third difference.

Step S6: dividing the third difference by 2 to obtain the side length of the gesture detection area.

Specifically, the third abscissa of the left wrist of the user is subtracted from the first abscissa of the left elbow of the user by 5 times to obtain a first difference, and the first difference is divided by 4 to obtain the target abscissa of the center of the gesture detection area. Subtracting the third ordinate of the left wrist of the user from the first ordinate of the left elbow of the user by 5 times to obtain a second difference, and dividing the second difference by 4 to obtain the target ordinate of the center of the gesture detection area. Subtracting the third abscissa of the left wrist of the user from the first abscissa of the left elbow of the user to obtain a third difference; dividing the third difference by 2 to obtain the side length of the gesture detection area. The gesture detection area calculated by the left elbow coordinate and the left wrist coordinate is mainly used for framing a gesture detection area of a left hand. Similarly, the gesture detection area calculated by the right elbow coordinate and the right wrist coordinate can be mainly used for framing the gesture detection area of the right hand. The gesture detection area obtained by the method can accurately frame all gesture key points, and other frame selection contents except all gesture key points are less, so that the accuracy is higher.

For example, in a certain frame, if the first coordinate of the left elbow of the user is (x 1, y 1), the third coordinate of the left wrist of the user is (x 3, y 3). The gesture detection area may be square. The target abscissa of the center of the gesture detection area is: The ordinate of the target at the center of the gesture detection area is: /(I) The side length of the gesture detection area is as follows:

Step 305: calculating user gesture key point information in a gesture detection area in each frame of the continuous two-dimensional image frames; the user gesture keypoint information includes: three-dimensional coordinates of the user gesture keypoints.

In the embodiment of the invention, only the user gesture key point information in the gesture detection area in each of the continuous two-dimensional image frames can be calculated. Because all gesture key points or part of gesture key points are positioned in the gesture detection area, only user gesture key point information in the gesture detection area in each frame is needed to be focused, processing of irrelevant image information in each frame is avoided, accuracy of the acquired gesture key point information can be improved, and efficiency can be improved.

Step 306: and judging whether a preparation gesture is received or not according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames.

In the embodiment of the invention, the preparation posture can be preset, and the preparation posture, the preset dynamic control posture and the preset static control posture have large differentiation. For example, the preparation gesture may be: the fist or five fingers are fully extended.

When the user gesture key point information of at least one frame among the acquired continuous two-dimensional image frames matches the above-described preparation gesture, it can be considered that the preparation gesture is received. In the case where the user gesture key point information of each of the continuous two-dimensional image frames does not match the above-described preparation gesture, it can be considered that the preparation gesture is not received. When the above-described preparation gesture is received, it is explained that the gesture key point information of the subsequent user is gesture key point information for identifying the control intention.

Step 307: and when the preparation gesture is received, identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames.

Further, in the embodiment of the present invention, when the obtained corresponding gesture information of the user is the prepared gesture information, the control intention of the user is identified according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames, so that false detection of the control intention can be avoided.

Step 308: and generating a corresponding control instruction according to the control intention.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred, and that the acts are not necessarily all required in accordance with the embodiments of the application.

Fig. 11 is a control instruction generating apparatus according to an embodiment of the present invention, where the apparatus 400 may include:

an image acquisition module 401 for acquiring continuous two-dimensional image frames;

a pose keypoint information calculation module 402, configured to calculate user pose keypoint information for each of the consecutive two-dimensional image frames, where the user pose keypoint information includes: three-dimensional coordinates of the user gesture keypoints;

A control intention recognition module 403, configured to recognize a control intention of the user according to user gesture key point information of at least one frame in the continuous two-dimensional image frames;

and the control instruction generating module 404 is configured to generate a corresponding control instruction according to the control intention.

Alternatively, on the basis of fig. 11, referring to fig. 12, the control intention identifying module 403 may include:

a matching control gesture determining submodule 4031, configured to input, into a gesture matching model, user gesture key point information of each frame in the continuous two-dimensional image frames and control gesture key point information corresponding to each preset control gesture in sequence, so as to obtain a matching control gesture corresponding to at least one frame;

The first control intention recognition submodule 4032 is configured to determine a control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame.

Optionally, the matching control gesture determination submodule 4031 may include:

A matching confidence determining unit 40311, configured to input, into the gesture matching model, user gesture key point information of each frame in the continuous two-dimensional image frames and control gesture key point information corresponding to each preset control gesture in turn, to obtain a matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture;

And a matching control gesture determining unit 40312, configured to determine the current control gesture as a matching control gesture corresponding to the at least one frame if the matching confidence exceeds a preset threshold.

Optionally, the first control intention recognition submodule 4032 may include:

the first control intention recognition unit 40321 is configured to determine, if the matching control gesture is a static control gesture, a control intention corresponding to the matching control gesture according to a preset correspondence between the static control gesture and the control intention.

Optionally, the first control intention recognition submodule 4032 may include:

A variation detecting unit 40322, configured to detect, if the matching control gesture is a dynamic control gesture, a variation of a control amount gesture key point in the dynamic control gesture according to the current frame and at least one frame after the current frame timing;

and a second control intention recognition unit 40323, configured to determine, according to the dynamic control gesture and the amount of change of the control amount gesture key point, a control intention corresponding to the matching control gesture.

Optionally, the variation detecting unit 40322 may include:

And a first variation detecting subunit 403221, configured to input the current frame and at least one frame after the current frame time sequence into a variation determining model corresponding to the dynamic control gesture, and output a variation of a control amount gesture key point in the dynamic control gesture.

Optionally, the variation detecting unit 40322 may include:

Optionally, the gesture matching model includes: a first fully-connected network, a second fully-connected network, and a third fully-connected network; the matching confidence determining unit 40311 may include:

Alternatively, on the basis of fig. 11, referring to fig. 13, the gesture key point information includes: gesture keypoint information, the apparatus may further include:

the human body key point identification module 405 is configured to perform human body key point detection on each frame in the continuous two-dimensional image frames, so as to obtain an identification result of the human body key points;

An elbow coordinate determining module 406, configured to determine a left elbow coordinate and/or a right elbow coordinate in each frame according to the identification result of the human body key points;

A gesture detection area determining module 407, configured to determine a gesture detection area in each frame according to the left elbow coordinates and/or the right elbow coordinates; the gesture detection area comprises a finger key point and a wrist key point of a left hand, and/or a finger key point and a wrist key point of a right hand;

the gesture keypoint information calculation module 402 may include:

the gesture key point information first calculation submodule 4021 is configured to calculate user gesture key point information in a gesture detection area in each of the continuous two-dimensional image frames.

Optionally, the apparatus may further include:

the gesture keypoint information calculation module 402 may include:

Optionally, the apparatus may further include:

a preparation gesture determining module 408, configured to determine whether a preparation gesture is received according to user gesture key point information of at least one frame of the continuous two-dimensional image frames;

The control intention recognition module 403 may include:

The second control intention recognition submodule 4033 is configured to recognize, when a preparation gesture is received, a control intention of the user based on user gesture key point information of at least one frame in the continuous two-dimensional image frames.

Optionally, on the basis of fig. 11, referring to fig. 12, the apparatus further includes:

A control instruction sending module 409, configured to send the control instruction to a controlled device; the controlled device is used for executing the control instruction.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The embodiment of the invention also provides an electronic device, as shown in fig. 14, comprising a processor 91, a communication interface 92, a memory 93 and a communication bus 94, wherein the processor 91, the communication interface 92 and the memory 93 communicate with each other through the communication bus 94,

A memory 93 for storing a computer program;

the processor 91 is configured to execute the program stored in the memory 93, and implement the following steps:

Acquiring continuous two-dimensional image frames;

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the control instruction generating method according to any one of the above embodiments.

In a further embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the control instruction generating method of any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A control instruction generation method, characterized in that the method comprises:

Acquiring continuous two-dimensional image frames;

Calculating user gesture keypoint information for each of the successive two-dimensional image frames, comprising: calculating the gesture key point information of the authorized user reserved in each frame of the continuous two-dimensional image frames; the user gesture keypoint information includes: three-dimensional coordinates of the user gesture keypoints;

Inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model in sequence to obtain a matching control gesture corresponding to at least one frame, wherein the gesture matching model comprises the following steps: the gesture matching model includes: a first fully-connected network, a second fully-connected network, and a third fully-connected network; inputting user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and inputting control gesture key point information corresponding to each preset control gesture into the second fully-connected network; after adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the third fully-connected network, and outputting the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture through the third fully-connected network; if the matching confidence coefficient exceeds a preset threshold, determining the current control gesture as a matching control gesture corresponding to the at least one frame;

Determining the control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame, including: determining a control intention corresponding to the user gesture key point information of at least one frame in the continuous two-dimensional image frames based on a corresponding relation between preset user gesture key point information and the control intention of the user; the control intention of the user is an operation which the user wants to execute on the terminal;

2. The method of claim 1, wherein the determining the control intent corresponding to the user based on the matching control gesture corresponding to the at least one frame comprises:

3. The method of claim 1, wherein the determining the control intent corresponding to the user based on the matching control gesture corresponding to the at least one frame comprises:

4. A method according to claim 3, wherein detecting the amount of change in the control amount gesture key point in the dynamic control gesture from the current frame and at least one frame subsequent to the current frame timing comprises:

5. A method according to claim 3, wherein detecting the amount of change in the control amount gesture key point in the dynamic control gesture from the current frame and at least one frame subsequent to the current frame timing comprises:

6. The method of claim 1, wherein the gesture keypoint information comprises: gesture keypoint information, before calculating the user gesture keypoint information for each of the successive two-dimensional image frames, further comprises:

7. The method of claim 1, wherein the identifying the control intent of the user based on user gesture keypoint information of at least one of the successive two-dimensional image frames further comprises:

8. The method as recited in claim 1, further comprising:

9. A control instruction generating apparatus, characterized in that the apparatus comprises:

a gesture key point information calculation module, configured to calculate user gesture key point information of each frame in the continuous two-dimensional image frames, including: calculating the gesture key point information of the authorized user reserved in each frame of the continuous two-dimensional image frames; the user gesture keypoint information includes: three-dimensional coordinates of the user gesture keypoints;

The control intention recognition module is configured to input, into a gesture matching model, user gesture key point information of each frame in the continuous two-dimensional image frames and control gesture key point information corresponding to each preset control gesture in sequence, to obtain a matching control gesture corresponding to at least one frame, where the control intention recognition module includes: the gesture matching model includes: a first fully-connected network, a second fully-connected network, and a third fully-connected network; inputting user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and inputting control gesture key point information corresponding to each preset control gesture into the second fully-connected network; after adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the third fully-connected network, and outputting the matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture through the third fully-connected network; if the matching confidence coefficient exceeds a preset threshold, determining the current control gesture as a matching control gesture corresponding to the at least one frame; determining the control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame, including: determining a control intention corresponding to the user gesture key point information of at least one frame in the continuous two-dimensional image frames based on a corresponding relation between preset user gesture key point information and the control intention of the user; the control intention of the user is an operation which the user wants to execute on the terminal;

The control instruction generation module is used for generating a corresponding control instruction according to the control intention;

and the deleting module is used for deleting the gesture key points of the unauthorized user in each frame and reserving the gesture key points of the authorized user.

10. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-8 when executing a program stored on a memory.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.