CN117746489A

CN117746489A - Gesture detection method, gesture detection device, electronic equipment and storage medium

Info

Publication number: CN117746489A
Application number: CN202410108942.5A
Authority: CN
Inventors: 张程; 赵玲玲; 李中英; 张伟; 赵兵
Original assignee: Hefei Lianbao Information Technology Co Ltd
Current assignee: Hefei Lianbao Information Technology Co Ltd
Priority date: 2024-01-25
Filing date: 2024-01-25
Publication date: 2024-03-22

Abstract

The application provides a gesture detection method, a gesture detection device, an electronic device and a storage medium, wherein the gesture detection method is applied to the electronic device, the electronic device comprises a neural network processing unit NPU, and the gesture detection method comprises the following steps: acquiring an image frame aiming at a target object; inputting an image frame of a target object into a target detection model deployed in the NPU to obtain the position of a reference part of the target object in the image frame; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model; obtaining a reference parameter of a reference part based on the position of the reference part of the target object in the image frame; the reference parameters are used for representing the gesture of the target object; based on the reference parameters of the reference part, a posture detection result of the target object is determined. Real-time detection of the pose of the target object can be achieved.

Description

Gesture detection method, gesture detection device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a gesture detection method, a gesture detection device, an electronic device, and a storage medium.

Background

When a user learns or works using an electronic device such as a notebook computer for a long time, it is often easy to get too close to a computer screen or the posture of the head is not correct due to fatigue. The long-term incorrect posture is liable to cause problems for the vision, cervical vertebra, vertebra and other health of the user. To avoid health problems for the user that may occur due to prolonged gesture irregularities, it is desirable to be able to detect the user's gesture in real time. Therefore, how to realize real-time detection of the gesture of the user is a technical problem to be solved.

Disclosure of Invention

The application provides a gesture detection method, a gesture detection device, electronic equipment and a storage medium, so as to at least solve the technical problems in the prior art.

According to a first aspect of the present application, there is provided a gesture detection method applied to an electronic device including a neural network processing unit NPU, the method comprising:

acquiring an image frame aiming at a target object;

inputting an image frame of the target object to a target detection model deployed in an NPU to obtain the position of a reference part of the target object in the image frame; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model;

obtaining a reference parameter of a reference part of the target object based on the position of the reference part in the image frame; the reference parameters are used for representing the gesture of the target object;

and determining a gesture detection result of the target object based on the reference parameters of the reference part.

In the above scheme, the number of the reference parts of the target object is two; the reference parameters comprise the distance between two reference parts in the image frame and the inclination angle of the connecting line of the two reference parts relative to the reference line; the determining a gesture detection result of the target object based on the reference parameter of the reference part comprises the following steps:

Judging whether the distance between two reference parts in an image frame is greater than a distance preset threshold value or not, and judging whether the inclination angle of a connecting line of the two reference parts relative to a reference line is greater than an angle preset threshold value or not;

based on the determination result, a gesture detection result for the target object is determined.

In the above aspect, the determining, based on the determination result, a gesture detection result of the target object includes:

when the judging result indicates that the distance is larger than a distance preset threshold value and/or the inclination angle is larger than an angle preset threshold value, generating a detection result for indicating that the gesture of the target object is abnormal;

and when the distance is smaller than or equal to a distance preset threshold value and the inclination angle is smaller than or equal to an angle preset threshold value, generating a detection result used for representing that the posture of the target object is normal.

In the above scheme, the original model is composed of three networks, each network comprises a convolutional neural network CNN and a downsampling filter; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model, and comprises the following steps:

deleting part of CNNs of partial networks in the three networks of the original model, replacing CNNs of all the rest of the three networks of the original model with depth separable CNNs, and/or deleting part of downsampling filters of all the networks in the three networks of the original model to obtain a detection model to be trained;

And training the detection model to be trained based on a preset training data set to obtain a target detection model.

In the above aspect, before the inputting the image frame of the target object into the target detection model deployed in the NPU, the method further includes:

converting the format of the target detection model into a target format supported by the NPU to obtain a target detection model after format conversion;

and deploying the target detection model after format conversion in the NPU so that the target detection model can detect the position of the reference part of the target object in the image frame.

In the above scheme, the method further comprises:

when the detection result represents that the posture of the target object is abnormal, judging whether the first duration time of the distance between the two reference parts of the target object is larger than a distance preset threshold value is larger than a time preset threshold value or not;

when the first duration time is greater than a time preset threshold value, generating first abnormal prompt information;

and/or judging whether the second duration time of the inclination angle of the connecting line of the two reference parts of the target object relative to the reference line is larger than the angle preset threshold value is larger than the time preset threshold value;

and when the second duration time is greater than a time preset threshold value, generating second abnormal prompt information.

According to a second aspect of the present application, there is provided a gesture detection apparatus applied to an electronic device including a neural network processing unit NPU, the apparatus comprising:

a first acquisition unit configured to acquire an image frame for a target object;

the second acquisition unit is used for inputting the image frame of the target object into a target detection model deployed in the NPU to obtain the position of the reference part of the target object in the image frame; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model;

a third acquisition unit configured to obtain a reference parameter of a reference portion of the target object based on a position of the reference portion in the image frame; the reference parameters are used for representing the gesture of the target object;

and a first determining unit for determining a posture detection result of the target object based on the reference parameter of the reference part.

In the above scheme, the original model is composed of three networks, each network comprises a convolutional neural network CNN and a downsampling filter; the second obtaining unit is configured to delete a part of CNNs of a part of networks in the three networks of the original model, replace CNNs of the remaining networks in the three networks of the original model with depth separable CNNs, and/or delete a part of downsampling filters of each network in the three networks of the original model to obtain a to-be-trained detection model; and training the detection model to be trained based on a preset training data set to obtain a target detection model.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described herein.

According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method described herein.

In the method, an image frame aiming at a target object is acquired, the image frame of the target object is input into a target detection model deployed in an NPU, and the position of a reference part of the target object in the image frame is obtained; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model; obtaining a reference parameter of a reference part based on the position of the reference part of the target object in the image frame; the reference parameters are used for representing the gesture of the target object; based on the reference parameters of the reference part, a posture detection result of the target object is determined. The position of a reference part of a target object in an image frame is detected by utilizing a target detection model in combination with an NPU, a reference parameter of the reference part for representing the gesture of the target object is determined through the detected position, and a gesture detection result of the target object is determined based on the reference parameter of the reference part. Real-time detection of the pose of the target object can be achieved.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 shows a schematic flow chart of an implementation of a gesture detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a target object according to an embodiment of the present application in normal and abnormal postures;

FIG. 3 shows a flow block diagram of a gesture detection method of an embodiment of the present application;

fig. 4 is a schematic diagram showing the constitution of a posture detecting device according to an embodiment of the present application;

fig. 5 shows a schematic diagram of a composition structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present application more obvious and understandable, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In view of the above problems, an embodiment of the present application provides a gesture detection method, in which an image frame for a target object is acquired, the image frame of the target object is input to a target detection model deployed in an NPU (Neural Processing Unit, a neural network processing unit), and a position of a reference part of the target object in the image frame is obtained; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model; obtaining a reference parameter of a reference part based on the position of the reference part of the target object in the image frame; the reference parameters are used for representing the gesture of the target object; based on the reference parameters of the reference part, a posture detection result of the target object is determined. Efficient detection of the pose of the target object can be achieved.

The gesture detection method according to the embodiment of the present application is described in detail below.

The gesture detection method is applied to electronic equipment, and the electronic equipment comprises an NPU. As shown in fig. 1, the method includes:

s101: an image frame for a target object is acquired.

In the present application, the electronic device is an electronic device including an NPU, such as an intelligent notebook computer, an intelligent desktop computer, and the like. The target object is a user who performs entertainment, learning or work through the electronic device in front of the electronic device. In the step, an image frame aiming at a target object is acquired through a front camera of the electronic equipment.

S102: inputting an image frame of the target object to a target detection model deployed in an NPU to obtain the position of a reference part of the target object in the image frame; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model.

In this step, the target detection model is a model for identifying the position of the reference part of the target object in the image frame. The target detection model is deployed in the NPU and is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model. In the present application, the reference site of the target object includes both eyes of the target object. The position of the reference site of the target object in the image frame includes the position coordinates of both eyes of the target object in the image frame. The original model is a deep learning model for face recognition, the detection model to be trained is obtained by performing structural simplification (the detailed description of the specific simplification process is omitted in the following relevant places), and the target detection model is obtained by training the detection model to be trained.

The reason why the deep learning model is combined with the NPU to detect the face of the target object is that when the structure of the deep learning model is complex compared with that of a CPU (Central Processing Unit ), the forward reasoning algorithm on the CPU cannot meet the real-time requirement. The NPU is used as a neural network processing unit, so that the neural network processing efficiency is higher, real-time detection can be performed, and the deep learning function can be better played by combining a deep learning model. Compared to GPUs (Graphics Processing Unit, image processors), GPUs are more energy consuming and more expensive. And when other processes in the electronic device occupy the GPU, the GPU is difficult to reallocate resources for face detection. Overall efficiency is low. The NPU has the advantages of low power consumption, low cost and high processing speed. Therefore, the deep learning model is combined with the NPU to detect the face of the target object, so that the computing power resource of the electronic equipment can be fully utilized, the energy consumption is reduced, the real-time detection can be realized, and the detection efficiency is improved.

S103: obtaining a reference parameter of a reference part of the target object based on the position of the reference part in the image frame; the reference parameters are used to characterize the pose of the target object.

In this step, the reference parameters of the reference portion include the binocular distance of the target object and the inclination angle of the binocular connecting line with respect to the reference line. After the position of the reference part of the target object in the image frame, that is, the position coordinates of the eyes of the user in the image frame is obtained in the step S102, the distance between the eyes of the user and the inclination angle of the connecting line of the eyes relative to the reference line can be obtained according to the position coordinates of the eyes of the user. Wherein, the length of the distance between the eyes of the user can represent the distance between the user (target object) and the screen of the electronic equipment. In view of the above, in practical applications, if the size of an image frame acquired by a camera of an electronic device (typically, the size of a screen of the electronic device) is constant, when a user is closer to the screen of the electronic device, the distance between eyes of the user is typically longer in the acquired image frame. When the user is farther from the electronic device screen, the user's binocular distance is typically shorter in the acquired image frames. By utilizing the rules or the relations, the long distance between the eyes of the user indicates that the user is close to the screen, and the short distance between the eyes of the user indicates that the user is far from the screen.

The angle of inclination of the user's eyes connection relative to the reference line can characterize the degree of head offset of the user. In practical applications, the correct head posture of the user should be the right direction against the screen of the electronic device, and the connecting line of the eyes is parallel to the reference line (the inclination angle is 0). When the head posture of the user is not right (left or right), the binocular connecting line may generate an inclination angle with respect to the reference line. The greater the degree of head offset of the user, the greater the resulting tilt angle. The smaller the degree of head offset of the user, the smaller the inclination angle (minimum 0) is generated. As is clear from the above-described law or relation, the smaller the inclination angle is, the smaller the degree of head deviation of the user is, and the more the posture of the user is corrected. The larger the inclination angle, the larger the degree of head deviation of the user is, which means that the posture of the user is more abnormal.

S104: and determining a gesture detection result of the target object based on the reference parameters of the reference part.

In this step, after obtaining the reference parameters of the reference part, that is, the distance between the eyes of the user and the inclination angle of the connecting line of the eyes with respect to the reference line, the gesture detection result of the target object, that is, the detection result of the normal or abnormal gesture of the target object, can be obtained based on the comparison with the preset thresholds (such as the distance preset threshold and the angle preset threshold) of the two reference parameters.

In the scheme shown in steps S101 to S104, an image frame for a target object is acquired, and the image frame of the target object is input to a target detection model deployed in the NPU, so as to obtain the position of a reference part of the target object in the image frame; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model; obtaining a reference parameter of a reference part based on the position of the reference part of the target object in the image frame; the reference parameters are used for representing the gesture of the target object; based on the reference parameters of the reference part, a posture detection result of the target object is determined. According to the method and the device, the position of the reference part of the target object in the image frame is detected by combining the target detection model with the NPU, so that the computing power resource of the electronic equipment can be fully utilized, the energy consumption is reduced, the real-time detection can be realized, and the detection efficiency is improved. In addition, the reference parameters of the reference part used for representing the pose of the target object are determined through detecting the position of the reference part in the image frame. According to the method and the device, the specific gesture of the target object is not required to be detected in the whole, and the gesture detection result of the target object can be determined only based on the reference parameters of the reference part. The proposal is simple and easy to implement, and can realize the efficient detection of the gesture of the target object.

In an alternative scheme, the number of the reference parts of the target object is two; the reference parameters comprise the distance between two reference parts in the image frame and the inclination angle of the connecting line of the two reference parts relative to the reference line; the determining a gesture detection result of the target object based on the reference parameter of the reference part comprises the following steps:

In the present application, as shown in fig. 2, fig. 2 is a schematic diagram in the case where the posture of the target object is normal (left side in fig. 2) and abnormal (right side in fig. 2). Wherein E1 and E2 are the left eye and the right eye of the target object under normal posture, respectively. E3 and E4 are the left eye and the right eye of the target object in the case of abnormal posture, respectively. The distance between the two reference parts of the target object is the distance of the connecting line between E1 and E2 (or E3 and E4). The inclination angle of the connecting line of the two reference parts of the target object relative to the reference line is the inclination angle of the connecting line between E1 and E2 (or E3 and E4) relative to the reference line (horizontal line). The distance between the two reference parts of the target object is calculated by the formula (1), and the inclination angle of the connecting line of the two reference parts of the target object relative to the reference line is calculated by the formula (2):

θ＝tan ^-1 ((y ₁ -y ₂ )/(x ₁ -x ₂ ) Formula (2)

Wherein D is the distance between two reference parts E1 and E2 (or E3 and E4) of the target object. θ is the inclination angle of the two reference part connecting lines of the target object relative to the reference line. X is x ₁ 、x ₂ The abscissa of the positions of the two reference sites E1 and E2 (or E3 and E4) of the target object in the image frame, respectively. y is ₁ 、y ₂ The ordinate of the positions of the two reference sites E1 and E2 (or E3 and E4) of the target object in the image frame are respectively. tan represents the tangent function.

Judging whether the distance between the two reference parts of the target object is larger than a distance preset threshold value or not, and judging whether the inclination angle of the connecting line of the two reference parts of the target object relative to the reference line is larger than an angle preset threshold value or not. The angle preset threshold and the distance preset threshold are preset thresholds obtained from a user gesture database. Based on the result of the judgment, a posture detection result for the target object is determined. The specific judging process is shown in the following description of the related parts, and is not repeated.

In the method, the scheme of determining the gesture detection result for the target object is simple and easy to implement, and capable of realizing efficient detection of the gesture of the target object without consuming too much time by judging whether the distance between the two reference parts is larger than a distance preset threshold value and whether the inclination angle of the connecting line of the two reference parts relative to the reference line is larger than an angle preset threshold value.

In an optional aspect, the determining, based on the determination result, a gesture detection result of the target object includes:

In this application, as shown in fig. 2, when the distance between E1 and E2 (or between E3 and E4) is greater than the distance preset threshold, the distance between the target object and the screen of the electronic device is indicated to be too close, so that the distance between two reference parts in the acquired image frame is greater than the distance preset threshold. Then a detection result of the target object posture abnormality is generated at this time. And/or when the inclination angle of the connecting line between E1 and E2 (or E3 and E4) relative to the reference line (horizontal line) is larger than the angle preset threshold value, the head deviation degree of the target object is excessively large, and the posture is not correct. Then a detection result of the target object posture abnormality is generated at this time.

Conversely, when the distance between E1 and E2 (or E3 and E4) is smaller than or equal to the distance preset threshold, and when the inclination angle of the connecting line between E1 and E2 (or E3 and E4) relative to the reference line (horizontal line) is smaller than or equal to the angle preset threshold, it indicates that the distance of the target object from the screen of the electronic device is normal, the head deviation degree of the target object is in the normal range, and the posture is correct. Then a detection result is generated that the target object is in a normal posture.

In the method, the normal or abnormal gesture scheme for the target object is determined by comparing the distance between the two reference parts with the distance preset threshold value and comparing the inclination angle of the connecting line of the two reference parts relative to the reference line with the angle preset threshold value, so that the method is simple and easy to implement, is easy to implement, does not need to consume too much time, and can realize high-efficiency detection of the gesture of the target object.

In an alternative, the original model is made up of three networks, each comprising a convolutional neural network CNN and a downsampling filter; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model, and comprises the following steps:

In the application, the original model can be an MTCNN (Multi-task convolutional neural network, multitasking convolutional neural network) model, and the MTCNN algorithm is a face detection algorithm with the advantages of high accuracy and high speed. The algorithm consists of three networks, namely P-Net (P network), R-Net (R network) and O-Net (O network). Each network comprises several CNNs (convolutional neural network, convolutional neural networks) and downsampling filters (simply filters). The specific process of obtaining the detection model to be trained after the original model is structurally simplified is shown in table 1:

TABLE 1

As can be seen from Table 1, the test model to be trained is obtained by deleting part of CNNs of part of the three networks of the original model and replacing CNNs of each of the three networks of the original model with depth separable CNNs (e.g., deleting CNNs of convolution layer 3 of the original model R-Net. And replacing CNNs of convolution layers 1-3, convolution classification layer and candidate frame regression layer of the three networks of the original model, CNNs of convolution layers 1-2 of R network, and CNNs of convolution layers 1-4 of O network with depth separable CNNs). And/or deleting partial downsampling filters of each of the three networks of the original model (such as partially deleting the filters of the convolution layers 1-3 in the P network, the convolution layers 1-2 in the R network, the filters in the full connection layer, and the filters in the convolution layer 3, and partially deleting the filters of the convolution layers 1-4 and the full connection layer in the O network).

It can be appreciated that the original model is structurally simplified because the parameter amount of the original model is too large, and the original model is directly deployed in the NPU, which easily causes memory overflow. The original model is simplified in structure to obtain the detection model to be trained, as shown in table 1, compared with the original model, the detection model to be trained is simpler in structure, greatly reduced in parameter quantity, higher in forward reasoning execution efficiency and suitable for the requirement of low-power-consumption electronic equipment on the face detection function. The original model with simplified structure, though losing a small part of inference accuracy, greatly improves the inference speed, and for the gesture detection scene of the embodiment of the application, losing part of accuracy does not affect the final gesture detection result.

After the detection model to be trained is obtained, training the detection model to be trained by utilizing a preset training data set (which can be a public data set in the field) to obtain a target detection model. The training process of the detection model to be trained is a conventional model training process, please refer to the related art specifically, and details are omitted.

According to the method and the device, the to-be-trained detection model is obtained after the original model is structurally simplified, so that the problem of memory overflow caused by directly deploying the original model in the NPU can be effectively avoided, the to-be-trained detection model obtained after the structure is simplified is simpler in structure, the parameter quantity is greatly reduced, the forward reasoning execution efficiency is higher, and the high-efficiency detection of the gesture of the target object can be realized.

In an alternative aspect, before the inputting the image frame of the target object into the target detection model deployed in the NPU, the method further includes:

As shown in fig. 3, after an image frame of a target object is acquired by an electronic device front camera in a face acquisition module, the image frame is transferred into a CMOS (Complementary Metal Oxide Semiconductor ) image sensor, and the image sensor inputs the image frame into a target detection model in a reference parameter calculation module. And converting the format of the target detection model into a target format supported by the NPU, such as an OpenVINO format. And deploying the target detection model after format conversion in the NPU to obtain the position of the reference part of the target object in the image frame. And obtaining a reference parameter according to the position of the reference part of the target object in the image frame. The reference parameter is compared with a preset threshold value (a distance preset threshold value and an angle preset threshold value) in the gesture detection module to determine whether the gesture of the target object is abnormal (see the foregoing description for specific determination process, which is not repeated). When the posture is abnormal and continues for a certain time (exceeding a time preset threshold), abnormality notification information (first abnormality notification information and second abnormality notification information described below) is generated.

In the method, after the format of the target detection model is converted into the target format supported by the NPU, the target detection model after the format conversion is deployed in the NPU, so that the NPU can be matched with the target detection model to realize the detection of the position of the reference part of the target object in the image frame, the reasoning capacity of the target detection model is greatly improved, and further the efficient detection of the gesture of the target object is realized.

In an alternative, the method further comprises:

when the detection result represents that the posture of the target object is abnormal, judging whether the first duration time of the distance between the two reference parts of the target object is larger than a distance preset threshold value is larger than a time preset threshold value or not; and when the first duration time is greater than a time preset threshold value, generating first abnormal prompt information.

In the application, when the posture of the target object is abnormal, and the duration (the first duration) that the distance between the two reference parts of the target object is greater than the preset threshold (usually 30 seconds to 1 minute), the duration of the behavior of the target object that is too close to the screen of the electronic device is longer, and then a first abnormal prompt message such as "you are too close to the screen, please adjust to the normal distance" is generated.

In an alternative, the method further comprises:

judging whether the second duration time of the inclination angle of the connecting line of the two reference parts of the target object relative to the reference line is larger than the angle preset threshold value is larger than the time preset threshold value or not; and when the second duration time is greater than a time preset threshold value, generating second abnormal prompt information.

When the posture of the target object is abnormal and the duration (second duration) that the inclination angle of the two reference part connecting lines of the target object relative to the reference line is larger than the angle preset threshold is larger than the time preset threshold, the duration of the behavior indicating that the head deviation degree of the target object is too large is longer, and then second abnormal prompt information such as 'your head deviation degree is too large, please correct the head posture' is generated.

In the method, when the posture of the target object is abnormal, the abnormal prompt information (the first abnormal prompt information and/or the second abnormal prompt information) is generated, so that the abnormal condition of the posture of the target object can be corrected in time, and the influence on the health of the target object in aspects of eyesight, vertebra and the like caused by long-time posture abnormality is avoided.

The embodiment of the application provides a gesture detection device, which is applied to electronic equipment, wherein the electronic equipment comprises a neural network processing unit NPU, as shown in fig. 4, and the device comprises:

a first acquisition unit 401 for acquiring an image frame for a target object;

a second obtaining unit 402, configured to input an image frame of the target object to a target detection model deployed in an NPU, and obtain a position of a reference part of the target object in the image frame; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model;

a third obtaining unit 403, configured to obtain a reference parameter of a reference part of the target object based on a position of the reference part in the image frame; the reference parameters are used for representing the gesture of the target object;

A first determining unit 404 for determining a gesture detection result of the target object based on the reference parameter of the reference part.

In an alternative, the original model is made up of three networks, each comprising a convolutional neural network CNN and a downsampling filter; the second obtaining unit 402 is configured to delete a part of CNNs of a part of networks in the three networks of the original model, replace CNNs of the remaining networks in the three networks of the original model with depth separable CNNs, and/or delete a part of downsampling filters of each network in the three networks of the original model to obtain a to-be-trained detection model; and training the detection model to be trained based on a preset training data set to obtain a target detection model.

In an alternative scheme, the number of the reference parts of the target object is two; the reference parameters comprise the distance between two reference parts in the image frame and the inclination angle of the connecting line of the two reference parts relative to the reference line; the first determining unit 404 is configured to determine whether a distance between two reference parts in an image frame is greater than a distance preset threshold, and determine whether an inclination angle of a connecting line of the two reference parts with respect to a reference line is greater than an angle preset threshold; based on the determination result, a gesture detection result for the target object is determined.

In an alternative solution, the first determining unit 404 is configured to generate a detection result for characterizing that the posture of the target object is abnormal when the determination result characterizes that the distance is greater than a distance preset threshold and/or the inclination angle is greater than an angle preset threshold; and when the distance is smaller than or equal to a distance preset threshold value and the inclination angle is smaller than or equal to an angle preset threshold value, generating a detection result used for representing that the posture of the target object is normal.

In an alternative, the apparatus further comprises:

the conversion unit is used for converting the format of the target detection model into a target format supported by the NPU to obtain a target detection model after format conversion; and deploying the target detection model after format conversion in the NPU so that the target detection model can detect the position of the reference part of the target object in the image frame.

In an alternative, the method further comprises:

the prompting unit is used for judging whether the first duration time of the distance between the two reference parts of the target object is larger than a distance preset threshold value is larger than a time preset threshold value or not when the detection result represents that the gesture of the target object is abnormal; when the first duration time is greater than a time preset threshold value, generating first abnormal prompt information; and/or judging whether the second duration time of the inclination angle of the connecting line of the two reference parts of the target object relative to the reference line is larger than the angle preset threshold value is larger than the time preset threshold value; and when the second duration time is greater than a time preset threshold value, generating second abnormal prompt information.

It should be noted that, in the gesture detection device according to the embodiment of the present application, since the principle of solving the problem of the gesture detection device is similar to that of the gesture detection method described above, the implementation process, implementation principle and beneficial effects of the gesture detection device can be referred to the description of the implementation process, implementation principle and beneficial effects of the method described above, and the repetition is omitted.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 shows a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the respective methods and processes described above, such as a gesture detection method. For example, in some embodiments, the gesture detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the gesture detection method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the gesture detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A gesture detection method, wherein the method is applied to an electronic device, the electronic device comprising a neural network processing unit NPU, the method comprising:

acquiring an image frame aiming at a target object;

2. The method of claim 1, wherein the number of reference sites of the target object is two; the reference parameters comprise the distance between two reference parts in the image frame and the inclination angle of the connecting line of the two reference parts relative to the reference line; the determining a gesture detection result of the target object based on the reference parameter of the reference part comprises the following steps:

3. The method according to claim 2, wherein determining the gesture detection result for the target object based on the determination result includes:

4. The method according to claim 1, characterized in that the raw model consists of three networks, each comprising a convolutional neural network CNN and a downsampling filter; the target detection model is obtained by training a detection model to be trained, which is obtained by simplifying the structure of an original model, and comprises the following steps:

5. The method of claim 1, wherein prior to said inputting the image frame of the target object into a target detection model deployed in an NPU, the method further comprises:

6. A method according to claim 3, further comprising:

7. A gesture detection apparatus, wherein the apparatus is applied to an electronic device, the electronic device including a neural network processing unit NPU, the apparatus comprising:

8. The apparatus of claim 7, wherein the raw model is comprised of three networks, each network comprising a convolutional neural network CNN and a downsampling filter; the second obtaining unit is configured to delete a part of CNNs of a part of networks in the three networks of the original model, replace CNNs of the remaining networks in the three networks of the original model with depth separable CNNs, and/or delete a part of downsampling filters of each network in the three networks of the original model to obtain a to-be-trained detection model; and training the detection model to be trained based on a preset training data set to obtain a target detection model.

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.