CN110532984B

CN110532984B - Key point detection method, gesture recognition method, device and system

Info

Publication number: CN110532984B
Application number: CN201910830741.5A
Authority: CN
Inventors: 孙晨; 陈文科; 姚聪
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2022-10-11
Anticipated expiration: 2039-09-02
Also published as: CN110532984A

Abstract

The invention provides a key point detection method, a gesture recognition method, a device and a system, which relate to the technical field of artificial intelligence and comprise the following steps: acquiring an image to be detected, wherein the image to be detected comprises a target object; carrying out target detection on an image to be detected to obtain a position frame of each target object; merging the position frames based on the position frame of each target object to obtain a target merging frame; the image corresponding to the target merging frame comprises at least two target objects which are contacted; performing key point detection on an image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different thermodynamic diagrams are used to characterize key points located at different positions on the target object; keypoints of target objects in contact are determined based on the thermodynamic diagram. The method can effectively improve the detection accuracy of the key points.

Description

Key point detection method, gesture recognition method, device and system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a key point detection method, a gesture recognition method, a device and a system.

Background

The detection technology for target objects such as gestures and human faces in images is an important application of artificial intelligence. The key point detection of the target object is a key ring in the target detection, and the key point information of the target object is beneficial to more accurately determining the posture of the target object. In recent years, with the development of a deep learning method on the target object posture estimation problem, a method for detecting a target object by combining target object key point estimation is the most common and computationally efficient key point detection method at present, but the method can only process a single target object or a plurality of completely separated target objects, once contact interaction is performed between the target objects, such as: it is difficult to accurately detect key points of a target object by crossing both hands, writing with the palm of the other hand with one hand, and the like.

Disclosure of Invention

In view of the above, the present invention provides a method, a device and a system for detecting a keypoint, and a method, a device and a system for recognizing a gesture, which can effectively improve the accuracy of detecting the keypoint.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a method for detecting a key point, where the method includes: acquiring an image to be detected, wherein the image to be detected comprises a target object; carrying out target detection on the image to be detected to obtain a position frame of each target object; merging the position frames based on the position frame of each target object to obtain a target merging frame; the image corresponding to the target merging frame comprises at least two target objects which are in contact; performing key point detection on the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different said thermodynamic diagrams are used to characterise keypoints located at different positions on the target object; keypoints of each of the target objects in contact are determined based on the thermodynamic diagram.

Further, the step of merging the position frames based on the position frames of each target object to obtain a target merged frame includes: and repeatedly executing preset merging operation on the position frames based on the position frames of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, so as to obtain the target merging frame.

Further, the step of merging the position frames based on the position frames of each target object to obtain a target merged frame includes: repeatedly executing preset merging operation on the position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, and obtaining candidate merging frames; and screening the candidate merging frames according to the number of the target objects contained in the candidate merging frames to obtain target merging frames.

Further, the merging operation includes: determining a position frame pair to be combined from a plurality of position frames; wherein the position frame pair comprises two position frames; calculating the overlapping degree between the position frames in the position frame pair; if the calculated overlapping degree is larger than a preset overlapping degree threshold value, combining the position frames in the position frame pair into a new position frame; and determining the boundaries of the new position frames according to the boundaries of the two position frames in the position frame pair.

Further, the step of determining a pair of position frames to be merged from the plurality of position frames includes: obtaining the confidence of the position frame; sequencing the position frames according to the confidence of each position frame to obtain a position frame sequencing result; and determining a position frame pair to be combined from a plurality of position frames according to the position frame sequencing result.

Further, the step of obtaining the confidence of the position frame includes: and obtaining the confidence degrees of the two position frames in the position frame pair, and obtaining the confidence degree of the new position frame according to the confidence degrees of the two position frames.

Further, the step of performing key point detection on the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object includes: based on the target merging frame, local images are scratched from the image to be detected; the local image comprises a contacted target object; and carrying out size adjustment on the local image, and carrying out key point detection on the local image after size adjustment to obtain a plurality of thermodynamic diagrams of each target object.

Further, the step of performing keypoint detection on the resized local image to obtain a plurality of thermodynamic diagrams of each target object includes: and performing key point detection on the local image after the size adjustment through the trained detection model to obtain a plurality of thermodynamic diagrams of each target object.

Further, the method further comprises: inputting a plurality of training images marked with key point positions of target objects into a detection model to be trained, wherein the training images comprise at least two target objects which are in contact, and the overlapping degree between any two target objects reaches a preset overlapping degree threshold value; detecting the training image through the detection model to be trained, and outputting a thermodynamic diagram of each target object in the training image; obtaining the positions of key points in the target objects in the training image based on the thermodynamic diagrams of the target objects in the training image; and performing parameter optimization on the detection model to be trained based on the key point position obtained by the detection model to be trained and the labeled key point position, and determining that the training is finished to obtain the trained detection model when the matching degree between the key point position obtained by the detection model to be trained and the labeled key point position reaches a preset matching degree.

Further, the step of determining key points of each of the target objects in contact based on the thermodynamic diagram includes: obtaining the brightness value of each pixel point in the thermodynamic diagram; wherein the brightness values are used to characterize confidence of corresponding keypoints in the thermodynamic diagram; filtering the thermodynamic diagram according to a preset key point brightness threshold value and the obtained maximum brightness value; and determining key points of the contacted target objects according to the filtered thermodynamic diagram.

In a second aspect, an embodiment of the present invention further provides a gesture recognition method, where the method includes: performing key point detection on a hand image to be detected by adopting the key point detection method of any one of the first aspect to obtain key points of each hand; and recognizing gesture categories according to the key points of the hands.

In a third aspect, an embodiment of the present invention further provides a key point detecting device, where the device includes: the image acquisition module is used for acquiring an image to be detected, wherein the image to be detected comprises a target object; the target detection module is used for carrying out target detection on the image to be detected to obtain a position frame of each target object; the position frame merging module is used for merging the position frames based on the position frame of each target object to obtain a target merging frame; the image corresponding to the target merging frame comprises at least two target objects which are in contact; the key point detection module is used for carrying out key point detection on the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different said thermodynamic diagrams are used to characterise keypoints located at different positions on the target object; and the key point determining module is used for determining the key points of each target object which is contacted based on the thermodynamic diagram.

In a fourth aspect, an embodiment of the present invention further provides a gesture recognition apparatus, where the apparatus includes: a hand key point detection module, configured to perform key point detection on a hand image to be detected by using the key point detection method according to any one of the first aspect, so as to obtain key points of each hand; and the gesture recognition module is used for recognizing gesture categories according to the key points of the hands.

In a fifth aspect, an embodiment of the present invention provides a system for detecting a keypoint, where the system includes: the device comprises an image acquisition device, a processor and a storage device; the image acquisition device is used for acquiring an image to be detected; the storage device has stored thereon a computer program which, when executed by the processor, performs the keypoint detection method of any of the first aspects and the gesture recognition method of the second aspect.

In a sixth aspect, an embodiment of the present invention provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps of the keypoint detection method according to any one of the above first aspects and the steps of the gesture recognition method according to the second aspect when executing the computer program.

In a seventh aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the keypoint detection method according to any one of the above first aspects and the steps of the gesture recognition method according to the second aspect.

The embodiment of the invention provides a key point detection method, a gesture recognition method, a device and a system, which can obtain a position frame of each target object by firstly carrying out target detection on an image to be detected, then carry out position frame combination on the basis of the position frame of each target object to obtain a target combination frame, wherein the image corresponding to the target combination frame comprises at least two target objects which are contacted; detecting key points of the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different thermodynamic diagrams are used for characterizing key points located at different positions on the target object; and finally, determining key points of the contacted target objects based on the thermodynamic diagram. According to the method, the detection of the key points can be carried out on the basis of the target merging frames corresponding to the at least two contacted target objects, the thermodynamic diagrams capable of representing the key points at different positions on the contacted target objects are obtained, and therefore the key points of the target objects are determined on the basis of the thermodynamic diagrams.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the above-described technology of the disclosure.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for detecting a keypoint according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of two hands with three different degrees of contact provided by an embodiment of the present invention;

FIG. 4 is a flow chart illustrating a method for performing a merge operation multiple times in accordance with an embodiment of the present invention;

FIG. 5 is a merged view of a location box according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating another method for performing a merge operation multiple times in accordance with an embodiment of the present invention;

FIG. 7 shows a schematic diagram of a hand thermodynamic diagram provided by an embodiment of the invention;

fig. 8 is a block diagram illustrating a structure of a keypoint detection apparatus according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In consideration of the existing detection method of key points of objects, it is difficult to accurately detect key points on objects in contact and interaction. Based on this, in order to improve the above problems, embodiments of the present invention provide a method, a device, and a system for detecting a keypoint, where the technique may be applied to various fields requiring keypoint detection, such as human-computer interaction and gesture recognition, and for understanding, the following embodiments of the present invention are described in detail.

The first embodiment is as follows:

first, an example electronic device 100 for implementing the keypoint detection method, the gesture recognition method, the apparatus and the system according to the embodiments of the present invention is described with reference to fig. 1.

As shown in fig. 1, an electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image capture device 110 may take images (e.g., photographs, videos, etc.) desired by the user and store the taken images in the storage device 104 for use by other components.

Exemplary electronic devices for implementing a key point detection method, a gesture recognition method, an apparatus and a system according to embodiments of the present invention may be implemented on smart terminals such as smart phones, tablet computers, VR devices, cameras, and the like.

Example two:

referring to a flowchart of a key point detection method shown in fig. 2, the method specifically includes the following steps:

step S202, an image to be detected is obtained, and the image to be detected comprises a target object. The image to be detected can be an original image shot by an image acquisition device, or an image downloaded by a network, stored locally or uploaded manually. At least two target objects which are in contact, such as people, human faces, hands, vehicles and the like, can be included in the image to be detected, and the at least two target objects which are in contact can refer to two hands with different contact degrees shown in fig. 3: the left side of fig. 3 shows two hands on top of each other, the middle shows two hands partially crossed together, and the right side shows two hands almost completely overlapped. Of course, other separate target objects may be included in addition to the target objects that are in contact, and are not limited herein.

And step S204, carrying out target detection on the image to be detected to obtain a position frame of each target object.

In some optional embodiments, the target detection may be performed on the image to be detected based on a Neural network model such as a CNN (Convolutional Neural network), an R-CNN (Region-CNN) network model, or a Segnet network model, so as to obtain a position frame of each target object in the image to be detected.

Step S206, merging the position frames based on the position frame of each target object to obtain a target merging frame; and the image corresponding to the target merging frame comprises at least two target objects which are in contact.

At least two target objects which are contacted with each other can be partially overlapped, the position frames of the target objects are also overlapped with each other, and the position frames of the at least two target objects which are overlapped with each other are merged to obtain a target merged frame. It can be understood that, for the target merge boxes obtained based on the position boxes overlapped with each other, the number of target objects included in the corresponding image is not more than the number of target objects in contact in the image to be detected, and for some target objects with smaller contact areas, the target objects may not be within the range of the corresponding image of the target merge box.

Step S208, performing key point detection on the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different thermodynamic diagrams are used to characterize key points located at different positions on the target object.

In this embodiment, performing the keypoint detection on the image to be detected based on the target merge box may be understood as: and detecting the key points of each target object in the images to be detected at the target merging frame in a bottom-up mode to obtain the thermodynamic diagram of each target object in the target merging frame. Thermodynamic diagrams are diagrams showing the location of key points in a particular form, such as highlighting or color. In the process of obtaining the thermodynamic diagram of each target object based on the target merging box, a corresponding thermodynamic diagram can be generated for each key point on the current target object, that is, only one key point is represented in each thermodynamic diagram. For example, the target object is a hand, 21 key points are conventionally used for positioning the hand, the obtained thermodynamic diagram corresponding to each hand is 21, and different thermodynamic diagrams correspond to the key points at different positions of the hand.

And step S210, determining key points of the contacted target objects based on the thermodynamic diagrams. The position coordinates of key points represented by each thermodynamic diagram can be obtained firstly; then determining the original position coordinates of the key points in the thermodynamic diagram in the target object according to the mapping relation between the preset thermodynamic diagram and the target object; and finally, determining the key points of the contacted target objects according to the original position coordinates of each key point.

According to the key point detection method provided by the embodiment of the invention, the position frame of each target object can be obtained by firstly carrying out target detection on the image to be detected, then the position frames are merged based on the position frame of each target object, so that a target merging frame is obtained, and the image corresponding to the target merging frame comprises at least two target objects which are contacted; detecting key points of the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different thermodynamic diagrams are used to characterize key points located at different positions on the target object; and finally, determining key points of the contacted target objects based on the thermodynamic diagram. According to the method, the detection of the key points can be carried out on the basis of the target merging frames corresponding to the at least two contacted target objects, the thermodynamic diagrams capable of representing the key points at different positions on the contacted target objects are obtained, and therefore the key points of the target objects are determined on the basis of the thermodynamic diagrams.

When the step S206 is executed, the following two position frame merging manners may be adopted according to the actual detection scene to obtain the target merging frame:

the first merging mode: and repeatedly executing preset merging operation on the position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, so as to obtain a target merging frame. The merging method is suitable for simpler detection scenarios, such as when only one target position frame is obtained, or when more than two target position frames are obtained and the number of target objects contained in each target position frame is the same.

For a more complex detection scene, for example, a plurality of groups of contacted target objects in an image to be detected include two contacted target objects a and B and three contacted target objects C, D and E, and the three contacted target objects C, D and E are spaced from the target objects a and B. In this scenario, the target merge box may be obtained in the following merge manner two.

And a second merging mode: firstly, repeatedly executing preset merging operation on position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, and obtaining candidate merging frames; the number of the candidate merging boxes is at least two, and the number of the target objects contained in the candidate merging boxes is different. And then screening the candidate merging frames according to the number of the target objects contained in the candidate merging frames to obtain the target merging frames. In this embodiment, the candidate merge boxes may be classified according to the number of target objects included, where the number of target objects included in the same type of candidate merge box is the same, and the same type of candidate merge box is determined as a target merge box; the candidate merge box containing the required number of target objects may also be determined as the target merge box according to the number requirement of the actual detection target objects (e.g., specifying two target objects).

By adopting the second merging method, when the step S208 is executed, the key point detection method matched with the target merging box can be conveniently selected, for example, when the key point detection is performed by using the key point detection model based on the convolutional neural network, different key point detection models can be selected according to different numbers of target objects contained in the target merging box, so that the selected key point detection model has a higher matching degree with the key point detection task to be executed, and thus the accuracy of the detection result is improved.

In order to facilitate understanding of the two merging manners, the present embodiment describes a possible implementation manner of the merging operation in the merging manner, and refers to a flowchart of a method for performing the merging operation multiple times as shown in fig. 4, where the merging operation includes the following steps S402 to S406:

step S402, determining a position frame pair to be combined from a plurality of position frames; the position frame pair comprises two position frames, and when the first round executes the merging operation, the position frames of the two target objects are included in the position frame pair. And determining each two position frames in the plurality of position frames into a group of position frame pairs to be combined, wherein for example, the position frames comprise a position frame a, a position frame b, a position frame c and a position frame d, and the position frame pairs determined by combining the position frames in pairs comprise { ab }, { ac }, { ad }, { bc }, { bd } and { cd }.

Step S404, calculating the overlapping degree between the position frames in the position frame pair. For each pair of the position frames, the ratio of the area of the overlapped part between the two position frames in the position frame pair to the total area of the two position frames is calculated to obtain the overlapping degree of the two position frames.

Step S406, if the calculated overlapping degree is larger than a preset overlapping degree threshold value, combining the position frames in the position frame pair into a new position frame; wherein, the boundary of the new position frame is determined according to the boundaries of the two position frames in the position frame pair. For example, if the overlapping degree of the position frames a and b is greater than the preset overlapping degree threshold (e.g., 70%), the position frames a and b are merged to generate a new position frame. The new position frame may refer to a position frame merging schematic diagram shown in fig. 5, where the boundaries of the new position frame are obtained by respectively extending and intersecting boundaries of non-overlapping portions of the position frames a and b.

It can be understood that, after the new position frame is obtained by executing the step S406, the step S402 is performed again, and the position frame pair to be merged is determined continuously from the new position frame and the other position frames, that is, the steps S402 to S406 are performed again on the position frames until the overlapping degree between any two position frames is not greater than the preset overlapping degree threshold, so as to obtain a candidate merging frame or directly obtain the target merging frame.

In practical applications, some position frames may be inaccurately positioned, which causes the position frames to deviate from the target object, and merging the position frames affects not only merging efficiency but also accuracy of a merging result of the position frames. Based on this, when the step S402 is executed to determine the position frame pair to be merged from the plurality of position frames, the confidence of the position frames may be further considered to improve the above problem, specifically referring to the following:

first, the confidence of the location box is obtained. And when the target detection is carried out on the image to be detected through the neural network model, the confidence coefficient of the position frame of each target object can be generated.

And then, sequencing the position frames according to the confidence degrees of the position frames to obtain a position frame sequencing result. When the position boxes are ranked according to the confidence degree from high to low, the confidence degree of the position boxes ranked at the back is low, which indicates that the position boxes may deviate from the target and have poor reliability, and some position boxes can be screened out based on the low confidence degree. In specific implementation, position frames with lower partial confidence degrees can be filtered out based on a preset position frame confidence degree threshold value; or, the position frames after the specified ranking can be filtered out according to the original sorting result of the position frames; therefore, the sorting result of the position frames is determined according to the screened position frames. The position box sorting result is assumed to be the position box a, the position box b, the position box c and the position box d with the confidence degree arranged from high to low. In this embodiment, the position frames are sorted based on the confidence degrees of the position frames, which is beneficial to reducing the merging of the position frames with lower confidence degrees when merging the position frames in the subsequent process, so that the merging efficiency and accuracy of the position frames can be effectively improved.

Finally, determining a position frame pair to be combined from the plurality of position frames according to the position frame sequencing result; wherein the position frame pair comprises two position frames. The position frame sequencing result can reflect the accuracy and reliability of each position frame, so that the position frame pair suitable for preferential combination can be determined according to the position frame sequencing result, and each position frame pair to be combined can be determined sequentially pair by pair, and the position frames are prevented from being missed. Referring to the above location box ordering results as the location box a, the location box b, the location box c, and the location box d, the location box pair to be merged in this embodiment may include { ab }, { ac }, { ad }, { bc }, { bd }, and { cd }.

On the basis of determining the position frame pair based on the confidence, the present embodiment may also provide another possible implementation manner of the merging operation, which may refer to another flowchart of a method for performing the merging operation multiple times as shown in fig. 6, where the merging operation includes the following steps S602 to S610:

step S602, a confidence of the position frame is obtained.

And step S604, sequencing the position frames according to the confidence degrees of the position frames to obtain a position frame sequencing result.

Step S606, determining the position frame pair to be merged from the plurality of position frames according to the position frame sorting result.

In step S608, the overlapping degree between the position frames in the position frame pair is calculated.

In step S610, if the calculated overlap is greater than the preset overlap threshold, the position frames in the position frame pair are merged into a new position frame.

It can be understood that, after the new position frame is obtained by performing the step S610, the process returns to the step S602, that is, the steps S602 to S610 are performed again on the position frames until the overlapping degree between any two position frames is not greater than the preset overlapping degree threshold, so as to obtain a candidate merged frame or directly obtain a target merged frame.

When a new round of step S602 is executed, the confidence of the new position frame may be obtained as follows: and obtaining the confidence degrees of the two position frames in the position frame pair, and obtaining the confidence degree of the new position frame according to the confidence degrees of the two position frames.

In view of that the result of ranking the position frames based on the confidence may be used to determine the position frame pair suitable for preferential merging, the present embodiment may determine the high confidence of the two position frames as the confidence of the new position frame. Of course, there are other ways to determine the confidence of the new location box, such as: and determining the low confidence coefficient of the confidence coefficients of the two position frames as the confidence coefficient of the new position frame, or randomly selecting one confidence coefficient from the confidence coefficients of the two position frames as the confidence coefficient of the new position frame, or determining the average value of the confidence coefficients of the two position frames as the confidence coefficient of the new position frame.

In order to reduce the key point detection pressure of the image to be detected, in step S208, when performing key point detection on the image to be detected based on the target merge box, the embodiment may process the image to be detected first, that is, refer to the following: scratching a local image from the image to be detected based on the target merging frame; the partial image includes the target object in contact. Specifically, position parameters of the target combination frame in the image to be detected are obtained, and the position parameters may include coordinates (x, y) of a top left vertex of the target position frame and the height and width of the target combination frame; the position parameters of the target merging frame are calculated by using a cvRect function in OpenCv (Open Source Computer Vision Library), and an image with determined position parameters is output, wherein the image is a local image extracted from the image to be detected.

The sizes of local images obtained by cutting are different, the local images can be subjected to size adjustment in order to adapt to various key point detection scenes, key point detection is carried out on the local images after size adjustment, and a plurality of thermodynamic diagrams of each target object are obtained. In practical application, the size (including height and width) of the local image can be directly reset to the target size (including target height and target width), and the size adjusting mode is simple and efficient to operate. Or, in order to ensure the original size proportion of the local image, the height of the local image is adjusted to the target height h, and when the width of the local image is insufficient, the width of the local image is adjusted to the target width by adopting a 0 filling mode, so that the adjusted target height h and the adjusted target width still meet the original size proportion.

Through the step of picking partial images and adjusting the length and width of the partial images by adopting a resetting or filling mode, the model for detecting the key points can be used for only putting the learning capacity on the partial images containing the target object (such as a hand) with fixed size when the learning key points are detected, so that the learning pressure of the model for detecting the key points is effectively reduced. Based on this, the present embodiment provides an example of performing keypoint detection on a local image after resizing to obtain a plurality of thermodynamic diagrams for each target object as follows: and performing key point detection on the local image after the size adjustment through the trained detection model to obtain a plurality of thermodynamic diagrams of each target object.

The detection model can be a key point detection model based on a convolutional neural network, the input of the detection model is a local image with the size of (h, w), and the number of contacted target objects contained in the local image is assumed to be two; the output of the detection model is a 2 x n thermodynamic diagram of size (h/s, w/s); each thermodynamic diagram represents the confidence of one key point, and the position with the highest confidence in the thermodynamic diagram is the position of the corresponding key point. Wherein s is the down-sampling rate of the thermodynamic diagram relative to the local image, the down-sampling rate being coupled to the network structure design of the detection model; n is the number of key points of a single target object, and the number of the key points of each target object is the same; the 2 x n thermodynamic diagrams represent all key point positions of two target objects (such as a left hand and a right hand), the first n thermodynamic diagrams represent the key points of a first target object (such as the left hand), and the last n thermodynamic diagrams represent the key points of a second target object (such as the right hand); and the key that each target object's n thermodynamic diagrams characterize is fixed. Since the number of thermodynamic diagrams output by the detection model provided by the embodiment is twice of the number of key points in a single target object, the detection model can be called a dual-channel detection model. It is understood that when the number of target objects included in the local image that are in contact is M (M =3, 4, 5 \8230;), the number of thermodynamic diagrams output by the corresponding trained detection model is M times the number of key points in a single target object, and the detection model in this case may be referred to as an M-channel detection model.

For the convenience of understanding, the thermodynamic diagram and key points for representing the thermodynamic diagram can be described by taking hands as target objects; referring to the schematic diagram of the hand thermodynamic diagram shown in fig. 7, where the 1 st thermodynamic diagram of one hand (right hand) may represent the position information of the thumb tip keypoints, then the n +1 st thermodynamic diagram corresponds to the position information identifying the thumb tip keypoints of the other hand (left hand). Correspondingly, the 2 nd thermodynamic diagram represents the position information of the index finger tip key point of the right hand, and the n +2 nd thermodynamic diagram represents the position information of the index finger tip key point of the left hand.

The step of determining key points of the contacted target objects based on the thermodynamic diagram may include the following steps 1) to 3):

step 1), calculating the brightness value of each pixel point in the thermodynamic diagram; wherein the brightness value is used for characterizing the confidence of the corresponding key point in the thermodynamic diagram. The thermodynamic diagram is essentially a two-dimensional matrix h x w, wherein the larger the value of the corresponding position of the matrix is, the higher the brightness value of the pixel point in the thermodynamic diagram corresponding to visualization is. Finding the position coordinates of the pixel points with the maximum brightness value in the two-dimensional matrix by using the following formula (1):

(x,y)＝argmax(hx,hy) (1)

where argmax (hx, hy) represents the position coordinates (x, y) of the pixel point having the maximum luminance value determined by traversing the thermodynamic diagram from the reference coordinates (hx, hy) of the thermodynamic diagram.

Step 2), filtering the thermodynamic diagram according to a preset key point brightness threshold value and the acquired maximum brightness value by referring to the following formula (2); that is, if the position coordinate (x, y) of the pixel point with the maximum brightness value does not satisfy the formula (2), the thermodynamic diagram to which the pixel point belongs is filtered.

heatmap[x,y]>conf_thresh (2)

Wherein, heatmap [ x, y ] represents the acquired maximum brightness value, that is, the brightness value of the position coordinate (x, y), and conf _ thresh is a preset key point brightness threshold. If the brightness value of the position coordinate (x, y) is lower than a preset brightness threshold, it indicates that the brighter pixel point in the thermodynamic diagram is caused by interference factors such as noise and is not a key point of the target object.

And 3) determining key points of the contacted target objects according to the filtered thermodynamic diagram.

Calculating the coordinates of key points in the thermodynamic diagrams by adopting algorithms such as soft-argmax function and the like for each filtered thermodynamic diagram, and converting the calculated coordinates in the thermodynamic diagrams into original position coordinates in the target object according to a preset mapping relation between the thermodynamic diagrams and the target object; and finally, determining the key points of each target object according to the original position coordinates of all the converted key points.

After the key points of each target object are determined, all the key points of the target object are subjected to posture restoration, so that the contents such as the posture, the expression and the like of the target object are recognized according to the restoration result.

In addition, the training process of the detection model can refer to the following four steps:

firstly, inputting a plurality of training images marked with key point positions of target objects into a detection model to be trained, wherein the training images comprise at least two target objects which are in contact, and the overlapping degree between any two target objects reaches a preset overlapping degree threshold value.

Secondly, detecting the training image through the detection model to be trained, and outputting the thermodynamic diagram of each target object in the training image.

And thirdly, obtaining the positions of key points in each target object in the training image based on the thermodynamic diagram of each target object in the training image.

And fourthly, performing parameter optimization on the detection model to be trained based on the key point position obtained by the detection model to be trained and the labeled key point position until the matching degree between the key point position obtained by the detection model to be trained and the labeled key point position reaches a preset matching degree, determining that the training is finished, and obtaining the trained detection model.

In summary, the above embodiments can perform detection on the key points based on the target merging frames corresponding to the at least two contacted target objects to obtain a plurality of thermodynamic diagrams capable of characterizing the key points at different positions on the contacted target objects, so as to determine the key points of the target objects based on the thermodynamic diagrams.

Example three:

for the key point detection method provided in the second embodiment, an embodiment of the present invention provides a gesture recognition method, including the following steps 1 and 2:

step 1, performing key point detection on the hand image to be detected by adopting the key point detection method provided in the second embodiment to obtain key points of each hand. For a brief description, the step of obtaining the key points of the hand can refer to the corresponding content in the second embodiment of the foregoing method.

And 2, recognizing gesture types according to key points of each hand. In one specific implementation manner, for each hand, all key points of the hand can be subjected to posture restoration, so that the gesture category, such as shaking, clenching, grabbing and the like, of the hand is recognized according to the restoration result.

The gesture recognition method provided in this embodiment can obtain the key points of each hand by first using the key point detection method provided in the second embodiment, and then recognize the gesture category according to the key points of each hand. According to the embodiment, the detection accuracy of the key points can be improved based on the key point detection method, so that the accuracy of gesture category identification is effectively improved.

Example four:

as to the key point detecting method provided in the second embodiment, an embodiment of the present invention provides a key point detecting device, referring to a structural block diagram of a key point detecting device shown in fig. 8, where the device includes the following modules:

the image obtaining module 802 is configured to obtain an image to be detected, where the image to be detected includes a target object.

And the target detection module 804 is configured to perform target detection on the image to be detected to obtain a position frame of each target object.

A location frame merging module 806, configured to perform location frame merging based on the location frame of each target object to obtain a target merging frame; and the image corresponding to the target merging frame comprises at least two target objects which are in contact.

The key point detection module 808 is configured to perform key point detection on the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different thermodynamic diagrams are used to characterize key points located at different positions on the target object.

And a key point determining module 810 for determining key points of the contacted target objects based on the thermodynamic diagram.

The key point detection device provided by the embodiment of the invention can obtain the position frame of each target object by firstly carrying out target detection on an image to be detected, then carry out position frame combination based on the position frame of each target object to obtain a target combination frame, and the image corresponding to the target combination frame comprises at least two target objects which are contacted; detecting key points of the image to be detected based on the target merging frame to obtain a plurality of thermodynamic diagrams of each target object; wherein different thermodynamic diagrams are used to characterize key points located at different positions on the target object; and finally, determining key points of the contacted target objects based on the thermodynamic diagram. According to the method, the detection of the key points can be carried out on the basis of the target merging frames corresponding to the at least two contacted target objects, the thermodynamic diagrams capable of representing the key points at different positions on the contacted target objects are obtained, and therefore the key points of the target objects are determined on the basis of the thermodynamic diagrams.

In some embodiments, the location frame merging module 806 is further configured to: and repeatedly executing preset merging operation on the position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, so as to obtain a target merging frame.

In some embodiments, the location frame merging module 806 is further configured to: repeatedly executing preset merging operation on the position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, and obtaining candidate merging frames; and screening the candidate merging frames according to the number of the target objects contained in the candidate merging frames to obtain the target merging frames.

In some embodiments, the merge operation comprises determining a position box pair sub-operation and merging the new position box sub-operation, wherein determining the position box pair sub-operation comprises: determining a position frame pair to be combined from a plurality of position frames; wherein the position frame pair comprises two position frames; the merge new location box sub-operation includes: calculating the overlapping degree between the position frames in the position frame pair; if the calculated overlapping degree is larger than a preset overlapping degree threshold value, combining the position frames in the position frame pair into a new position frame; wherein, the boundary of the new position frame is determined according to the boundaries of the two position frames in the position frame pair.

In some embodiments, the determining the location box pair sub-operation further comprises: acquiring the confidence of the position frame; sorting the position frames according to the confidence of each position frame to obtain a position frame sorting result; and determining a position frame pair to be combined from the plurality of position frames according to the position frame sequencing result.

In some embodiments, the determining the location box pair sub-operation further comprises: and obtaining the confidence degrees of the two position frames in the position frame pair, and obtaining the confidence degree of the new position frame according to the confidence degrees of the two position frames.

In some embodiments, the keypoint detection module 808 is further configured to: scratching a local image from the image to be detected based on the target merging frame; the partial image comprises a contacted target object; and carrying out size adjustment on the local image, and carrying out key point detection on the local image after size adjustment to obtain a plurality of thermodynamic diagrams of each target object.

In some embodiments, the keypoint detection module 808 is further configured to: and performing key point detection on the local image after the size adjustment through the trained detection model to obtain a plurality of thermodynamic diagrams of each target object.

In some embodiments, the above-mentioned key point detecting device further includes a model training module (not shown in the figure), and the model training module is configured to: inputting a plurality of training images marked with the positions of key points of target objects into a detection model to be trained, wherein the training images comprise at least two target objects which are in contact, and the overlapping degree between any two target objects reaches a preset overlapping degree threshold value; detecting the training image through a detection model to be trained, and outputting a thermodynamic diagram of each target object in the training image; obtaining the positions of key points in each target object in the training image based on the thermodynamic diagram of each target object in the training image; and performing parameter optimization on the detection model to be trained based on the key point position obtained by the detection model to be trained and the labeled key point position until the matching degree between the key point position obtained by the detection model to be trained and the labeled key point position reaches a preset matching degree, and determining that the training is finished to obtain the trained detection model.

In some embodiments, the keypoint determination module 810 is further configured to: obtaining the brightness value of each pixel point in the thermodynamic diagram; the brightness value is used for representing the confidence degree of the corresponding key point in the thermodynamic diagram; filtering the thermodynamic diagram according to a preset key point brightness threshold value and the obtained maximum brightness value; and determining key points of the contacted target objects according to the filtered thermodynamic diagram.

The device provided in this embodiment has the same implementation principle and technical effect as those of the foregoing embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing second embodiment of the method.

Example five:

as for the gesture recognition method provided in the third embodiment, an embodiment of the present invention provides a gesture recognition apparatus, including:

and the hand key point detection module is used for detecting the key points of the hand image to be detected by adopting the key point detection method provided by the second embodiment to obtain the key points of each hand.

And the gesture recognition module is used for recognizing gesture categories according to the key points of all the hands.

Example six:

based on the foregoing embodiment, this embodiment provides a keypoint detection system, which includes: the system comprises an image acquisition device, a processor and a storage device; the image acquisition equipment is used for acquiring an image to be detected; the storage device has a computer program stored thereon, which when executed by the processor performs any of the keypoint detection methods provided in embodiment two.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

Further, this embodiment further provides an electronic device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor implements, when executing the computer program, the steps of any one of the keypoint detection methods provided in the second embodiment or the steps of the gesture recognition method provided in the third embodiment.

Further, this embodiment further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processing device, the step of any one of the keypoint detection methods provided in the second embodiment or the step of the gesture recognition method provided in the third embodiment is executed.

The method for detecting a key point, the gesture recognition method, the device and the computer program product of the system provided by the embodiments of the present invention include a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the following descriptions are only illustrative and not restrictive, and that the scope of the present invention is not limited to the above embodiments: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of keypoint detection, the method comprising:

acquiring an image to be detected, wherein the image to be detected comprises a target object;

carrying out target detection on the image to be detected to obtain a position frame of each target object;

merging the position frames based on the position frame of each target object to obtain a target merging frame; the image corresponding to the target merging frame comprises at least two target objects which are in contact;

performing key point detection on the image to be detected based on the target merging frame and a detection model to obtain a plurality of thermodynamic diagrams of each target object; wherein different said thermodynamic diagrams are used to characterise keypoints located at different positions on the target object; the detection model is obtained by training on the basis of a plurality of training images marked with key point positions of target objects, and the training images comprise at least two contacted target objects;

keypoints of each of the target objects in contact are determined based on the thermodynamic diagram.

2. The method according to claim 1, wherein the step of performing location frame merging based on the location frame of each target object to obtain a target merged frame comprises:

and repeatedly executing preset combination operation on the position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, and obtaining a target combination frame.

3. The method according to claim 1, wherein the step of performing location frame merging based on the location frame of each target object to obtain a target merged frame comprises:

repeatedly executing preset merging operation on the position frames based on the position frame of each target object until the overlapping degree between any two position frames is not greater than a preset overlapping degree threshold value, and obtaining candidate merging frames;

and screening the candidate merging frames according to the number of the target objects contained in the candidate merging frames to obtain target merging frames.

4. The method according to claim 2 or 3, wherein the merging operation comprises:

determining a position frame pair to be combined from a plurality of position frames; wherein the position frame pair comprises two position frames;

calculating the overlapping degree between the position frames in the position frame pair;

if the calculated overlapping degree is larger than a preset overlapping degree threshold value, combining the position frames in the position frame pair into a new position frame; and determining the boundaries of the new position frames according to the boundaries of the two position frames in the position frame pair.

5. The method of claim 4, wherein the step of determining the pair of position frames to be merged from the plurality of position frames comprises:

obtaining the confidence of the position frame;

sequencing the position frames according to the confidence of each position frame to obtain a position frame sequencing result;

and determining a position frame pair to be combined from a plurality of position frames according to the position frame sequencing result.

6. The method of claim 5, wherein the step of obtaining the confidence level of the location box comprises:

and obtaining the confidence degrees of the two position frames in the position frame pair, and obtaining the confidence degree of the new position frame according to the confidence degrees of the two position frames.

7. The method according to claim 1, wherein the step of performing keypoint detection on the image to be detected based on the target merge box to obtain a plurality of thermodynamic diagrams of each target object comprises:

based on the target merging frame, local images are scratched from the image to be detected; the partial image comprises a contacted target object;

and carrying out size adjustment on the local image, and carrying out key point detection on the local image after size adjustment to obtain a plurality of thermodynamic diagrams of each target object.

8. The method of claim 7, wherein the step of performing keypoint detection on the resized local image to obtain a plurality of thermodynamic diagrams for each of the target objects comprises:

and performing key point detection on the local image after the size adjustment through the trained detection model to obtain a plurality of thermodynamic diagrams of each target object.

9. The method of claim 8, further comprising:

inputting a plurality of training images marked with key point positions of target objects into a detection model to be trained, wherein the training images comprise at least two target objects which are in contact, and the overlapping degree between any two target objects reaches a preset overlapping degree threshold value;

detecting the training image through the detection model to be trained, and outputting a thermodynamic diagram of each target object in the training image;

obtaining the positions of key points in the target objects in the training image based on the thermodynamic diagrams of the target objects in the training image;

and performing parameter optimization on the detection model to be trained based on the key point position obtained by the detection model to be trained and the labeled key point position, and determining that the training is finished when the matching degree between the key point position obtained by the detection model to be trained and the labeled key point position reaches a preset matching degree to obtain the trained detection model.

10. The method of claim 1, wherein the step of determining key points for each of the target objects in contact based on the thermodynamic diagram comprises:

obtaining the brightness value of each pixel point in the thermodynamic diagram; wherein the brightness values are used to characterize confidence of corresponding keypoints in the thermodynamic diagram;

filtering the thermodynamic diagram according to a preset key point brightness threshold value and the obtained maximum brightness value;

and determining key points of the contacted target objects according to the filtered thermodynamic diagram.

11. A method of gesture recognition, the method comprising:

performing key point detection on a hand image to be detected by adopting the key point detection method as claimed in any one of claims 1 to 10 to obtain key points of each hand;

and recognizing gesture categories according to the key points of the hands.

12. A keypoint detection apparatus, characterized in that it comprises:

the image acquisition module is used for acquiring an image to be detected, and the image to be detected comprises a target object;

the target detection module is used for carrying out target detection on the image to be detected to obtain a position frame of each target object;

the position frame merging module is used for merging the position frames based on the position frame of each target object to obtain a target merging frame; the image corresponding to the target merging frame comprises at least two target objects which are in contact;

the key point detection module is used for carrying out key point detection on the image to be detected based on the target merging frame and the detection model to obtain a plurality of thermodynamic diagrams of each target object; wherein different said thermodynamic diagrams are used to characterise keypoints located at different positions on the target object; the detection model is obtained by training on the basis of a plurality of training images marked with the positions of key points of target objects, and the training images comprise at least two target objects which are in contact;

and the key point determining module is used for determining the key points of the contacted target objects based on the thermodynamic diagram.

13. A gesture recognition apparatus, the apparatus comprising:

a hand key point detection module, configured to perform key point detection on a hand image to be detected by using the key point detection method according to any one of claims 1 to 10, so as to obtain key points of each hand;

14. A keypoint detection system, the system comprising: the system comprises an image acquisition device, a processor and a storage device;

the image acquisition device is used for acquiring an image to be detected;

the storage device has stored thereon a computer program which, when executed by the processor, performs the keypoint detection method of any of claims 1 to 10 or the gesture recognition method of claim 11.

15. An electronic device comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor implements the steps of the keypoint detection method according to any of the preceding claims 1 to 10 or the gesture recognition method according to claim 11 when executing the computer program.

16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the keypoint detection method of any of the preceding claims 1 to 10 or the gesture recognition method of claim 11.