CN113297973A

CN113297973A - Key point detection method, device, equipment and computer readable medium

Info

Publication number: CN113297973A
Application number: CN202110570018.5A
Authority: CN
Inventors: 蔚栋; 安山
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2021-08-24

Abstract

The embodiment of the disclosure discloses a key point detection method, a key point detection device, electronic equipment and a computer readable medium. One embodiment of the method comprises: carrying out feature extraction on an image to be detected to obtain image features; inputting the image characteristics into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram; inputting an output result of a middle layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set; and generating the position information of the key point corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set. The implementation method realizes the requirement of simultaneously meeting the incidence relation between the accuracy and the key points.

Description

Key point detection method, device, equipment and computer readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method, a device, equipment and a computer readable medium for detecting key points.

Background

Keypoint detection is widely used in a variety of computer vision tasks. Relevant keypoint detection techniques include coordinate regression and thermodynamic regression.

However, when the above-mentioned method is adopted for key point detection, the following technical problems often exist:

when a coordinate regression method is adopted, the obtained key point coordinates are not accurate enough. When the thermodynamic regression method is adopted, the association relationship among the key points is weak. That is, the related detection methods cannot simultaneously satisfy the requirements of the association relationship between the accuracy and the key points.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Some embodiments of the present disclosure propose a key point detection method, apparatus, electronic device and computer readable medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method of keypoint detection, the method comprising: carrying out feature extraction on an image to be detected to obtain image features; inputting the image characteristics into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram; inputting an output result of a middle layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set; and generating the position information of the key point corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

In a second aspect, some embodiments of the present disclosure provide a keypoint detection apparatus, the apparatus comprising: the extraction unit is configured to perform feature extraction on the image to be detected to obtain image features; the thermodynamic diagram generating unit is configured to input the image features into a previously trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram; the coordinate generation unit is configured to input an output result of the intermediate layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set; and the position information generating unit is configured to generate the position information of the key points corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: by the key point detection method of some embodiments of the present disclosure, a thermodynamic diagram regression network and a coordinate regression network are combined, thereby combining the advantages of both approaches. Specifically, the thermodynamic diagram regression network predicts accurate coordinates of the key points, and the coordinate regression network predicts strong association between the key points, so that the requirement of accuracy and association between the key points can be met at the same time.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of the keypoint detection method of some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a keypoint detection method according to the present disclosure;

FIG. 3 illustrates hand keypoint detection results using a thermodynamic regression network;

FIG. 4 illustrates hand keypoint detection using a coordinate regression network;

FIG. 5 is a flow diagram of further embodiments of a keypoint detection method according to the present disclosure;

FIG. 6 is a schematic block diagram of some embodiments of a keypoint detection apparatus according to the present disclosure;

FIG. 7 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of the keypoint detection method of some embodiments of the present disclosure.

The execution subject of the keypoint detection method may be any computing device. The computing device may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

In the application scenario of fig. 1, the computing device may first input the image to be detected 101 into a feature extraction network 101 for feature extraction. On the basis, the obtained image features can be input into a pre-trained thermodynamic regression network 103 to obtain a key point thermodynamic diagram 104. In addition, the computing device may input the output of the middle layer (the third layer in fig. 1 is taken as an example) of the thermodynamic regression network 103 into a pre-trained coordinate regression network 105, resulting in a set of keypoint coordinates 106. The computing device may then generate keypoint location information 107 corresponding to the image to be detected based on the keypoint thermodynamic diagram 104 and the set of keypoint coordinates 106. For illustrative purposes, the keypoint location information 107 may be visually displayed on the image to be detected, as shown at 108.

With continued reference to fig. 2, a flow 200 of some embodiments of a keypoint detection method according to the present disclosure is shown. The key point detection method comprises the following steps:

step 201, performing feature extraction on an image to be detected to obtain image features.

In some embodiments, the execution subject of the keypoint detection method may perform feature extraction on the image to be detected by using various feature extraction algorithms to obtain image features. For example, the image to be detected may be input to a convolutional neural network to obtain image features. For another example, the image feature extraction may be performed by an algorithm such as a color histogram or a color correlation map. The image to be detected may be any image. For example, in the context of gesture recognition, the image to be detected may be an image currently captured by a camera. Of course, a pre-processed image of the captured image, or the like is also possible. As another example, it may be an image in a gallery specified by the user.

In some optional implementation manners of some embodiments, before performing feature extraction on an image to be detected to obtain an image feature, the method may further include: carrying out target part detection on an original image to be detected to obtain an image area for displaying a target part; and zooming the image area to a target size to obtain an image to be detected. In practice, the original image to be detected may have more contents. For example, in the scene of hand key point detection, the original image to be detected may display legs, body, head, and the like in addition to the hand. Other content can interfere with the detection of key points at the target site. Therefore, the target portion can be detected first, and an image region in which the target portion is displayed can be obtained. On the basis, in order to facilitate unified processing, the image area can be zoomed to a target size, and then the image to be detected is obtained.

In some optional implementation manners of some embodiments, generating, based on the keypoint thermodynamic diagram and the keypoint coordinate set, keypoint position information corresponding to the image to be detected includes: and generating the position information of the key point corresponding to the original image to be detected based on the key point thermodynamic diagram and the key point coordinate set. In these optional implementation manners, since the image to be detected is obtained on the basis of the original image to be detected, the position information of the key point corresponding to the original image to be detected can be generated as required.

In some optional implementation manners of some embodiments, mapping the key point thermodynamic diagram and the key point coordinate set to the image to be detected respectively to obtain a target image including the thermodynamic diagram mapping key point set and the coordinate mapping key point set, including: and respectively mapping the key point thermodynamic diagram and the key point coordinate set to an original image to be detected to obtain a target image containing the thermodynamic diagram mapping key point set and the coordinate mapping key point set.

Step 202, inputting the image characteristics into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram.

In some embodiments, the executing subject may input the image features into a pre-trained thermodynamic regression network to obtain a key point thermodynamic diagram. The principle of the thermodynamic diagram regression network is that a thermodynamic diagram (heatmap) is generated by using a two-dimensional gaussian function for the coordinate position of a standard (group Truth) of a key point, and finally, the position coordinate with the highest activation value in the diagram is taken as the final key point coordinate. As an example, the thermodynamic regression network may include multiple (e.g., 3) deconvolution layers and an output layer. In addition, the thermodynamic network may further include a network structure such as a residual block, as needed.

In practice, the accuracy of regression network prediction is high in thermodynamic diagrams, but the incidence relation between key points cannot be well predicted. Fig. 3 illustrates the detection results of the hand key points by using the thermodynamic regression network. In practice, in order to facilitate subsequent processing such as gesture detection by using the detection result of the key points, the key points are generally numbered. In this example, the 3 keypoints on the index finger are numbered 1-3 in order, and the 3 keypoints on the middle finger are numbered 4-6 in order, as shown at 301 in FIG. 3. However, as shown in 302 of fig. 3, although the positions of the respective key points are relatively accurate as a whole, the positions of the key points numbered 2 and 4 are changed, and if the key points are sequentially connected in this order, it is visually apparent that the connection lines between the key points are significantly changed. And the connecting lines between the key points may represent the association relationship between the key points. If gesture detection is subsequently performed, the association relationship between the key points needs to be utilized. For example, determining whether the user is in Biye requires a determination by the relative positional relationship between the connecting lines of the key points of the middle and index fingers.

And 203, inputting the output result of the intermediate layer of the thermodynamic diagram regression network into a pre-trained coordinate regression network to obtain a key point coordinate set.

In some embodiments, the executing entity may input an output result of the intermediate layer of the thermodynamic regression network into a pre-trained coordinate regression network to obtain a set of key point coordinates. Wherein, the output result of any middle layer can be selected and input into the coordinate regression network. As an example, the network structure of the coordinate regression network may include a plurality of convolution and pooling layers, and a recombination (Reshape) layer. In practice, the initial coordinate regression network and thermodynamic regression network may be trained in advance through some machine learning method by using a training sample set. For example, the initial coordinate regression network and thermodynamic regression network may be trained by using a back propagation and stochastic gradient descent method, and the coordinate regression network is obtained when a training stop condition is satisfied. The initial coordinate regression network and the thermodynamic regression network may be trained separately or jointly.

In practice, the accuracy of coordinate regression network prediction is low, but the incidence relation between key points can be well predicted. Fig. 4 illustrates the detection results of the hand key points by using the coordinate regression network. In this example, similar to FIG. 3, the 3 keypoints on the index finger are numbered 1-3 in order, and the 3 keypoints on the middle finger are numbered 4-6 in order, as shown at 401. As shown in 402, the connection lines between the key points do not change significantly, but the predicted positional coordinates of some key points (for example, the key point numbered 4) have a large deviation.

And 204, generating key point position information corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

In some embodiments, the executing body may generate the position information of the keypoint corresponding to the image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates. As an example, the execution subject performs coordinate regression on the keypoint thermodynamic diagram. Specifically, the position coordinates with the highest activation value in the keypoint thermodynamic diagram may be obtained first and averaged with the keypoint position information. On the basis, the obtained average value can be used as the position information of the key point corresponding to the image to be detected.

Some embodiments of the present disclosure provide methods that combine the advantages of both approaches by combining thermodynamic and coordinate regression networks. Therefore, the requirements of the incidence relation between the accuracy and the key points can be met simultaneously. In the process, compared with the process of respectively inputting the image features into the two branch networks, the coordinate regression is further carried out by utilizing the output result of the middle layer of the thermodynamic diagram network, the fusion of the capabilities of the two networks is facilitated, and the requirement of the incidence relation between the accuracy and the key points is further met.

With further reference to fig. 5, a flow 500 of further embodiments of a keypoint detection method is illustrated. The process 500 of the keypoint detection method includes the following steps:

and 501, extracting the features of the image to be detected to obtain the image features.

In some embodiments, the execution agent (e.g., the server shown in FIG. 1) on which the keypoint detection method is running.

And 502, inputting the image characteristics into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram.

And 503, inputting the output result of the intermediate layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set.

In some embodiments, specific implementations of steps 501-503 and technical effects thereof may refer to those embodiments corresponding to fig. 2, and are not described herein again.

And step 504, respectively mapping the key point thermodynamic diagrams and the key point coordinate sets to the image to be detected to obtain a target image comprising the thermodynamic diagram mapping key point sets and the coordinate mapping key point sets.

In some embodiments, for the key point thermodynamic diagram, the executing subject of the key point detection method may take at least one position with the highest activation value in the diagram, and on this basis, the thermodynamic diagram mapping key points in the image to be detected are obtained through certain mapping. For each key point coordinate in the key point coordinate set, a coordinate mapping key point can be obtained through certain mapping. Wherein the mapping may comprise matrix transformation, multiplication with fixed coefficients, etc., according to actual needs. It can be understood that the target image is obtained by mapping the key point thermodynamic diagram and the key point coordinate set to the image to be detected.

And 505, selecting a key point from each key point group in the target image as a target key point to obtain a target key point set, wherein each key point group comprises a corresponding thermodynamic diagram mapping key point and a coordinate mapping key point.

In some embodiments, the thermodynamic map keypoint set and the coordinate map keypoint set are both prediction results for keypoints in the image to be detected. Thus, for the same location (e.g., the tip of the thumb), there will be one thermodynamic map keypoint and one coordinate map keypoint, i.e., one group of keypoints corresponding thereto. And two key points in the key point group corresponding to the same position are also corresponding to each other. For each key point group, one key point can be selected as the key point corresponding to the position, namely the target key point. Since there are a plurality of positions of the target image, there are a plurality of key point groups. Thus, a set of target keypoints is obtained. As an example, one keypoint may be randomly chosen as the target keypoint. Thus, the combination of the two networks can be realized as a whole.

In some optional implementations of some embodiments, one keypoint from the keypoint groups is selected as the target keypoint based on a distance between two keypoints in each keypoint group. As an example, the distance between two keys may be the euclidean distance.

As an example, in response to determining that the distance is less than or equal to the preset threshold, the predicted results for the two networks are illustrated as being relatively close. That is, the location information of the key point is relatively accurate regardless of which network is selected as the prediction result. At this time, the coordinate mapping key points in the key point group are preferentially determined as the target key points, and because the coordinate mapping key points can also meet the requirements of the association relationship, the dual requirements of the accuracy and the association relationship between the key points can be met at the same time. And in response to determining that the distance is greater than the preset threshold, determining thermodynamic diagram mapping key points in the key point group as target key points. In the implementation manners, the selection of the key points is realized by setting the threshold value, and the requirements of the accuracy and the association relationship between the key points can be further considered.

Step 506, determining the position information of each target key point in the target key point set as the key point position information corresponding to the image to be detected.

In some optional implementations of some embodiments, the thermodynamic diagram regression network and the coordinate regression network are trained by: training the initial thermodynamic diagram regression network at a first learning rate until a convergence condition is met to obtain an intermediate thermodynamic diagram regression network; and performing combined training on the intermediate thermodynamic diagram regression network and the initial coordinate regression network at a second learning rate until a training end condition is met to obtain the thermodynamic diagram regression network and the coordinate regression network, wherein the second learning rate is less than the first learning rate.

In practice, the thermodynamic regression network is more sensitive to the location of key points, while the coordinate regression network is more sensitive to the correlation between key points. Therefore, the convergence directions of the two in the training process are different. On the basis, if two networks are directly trained, convergence towards two directions is equivalent to simultaneous convergence, and the overall convergence speed and the accuracy rate of the prediction result of the network are inevitably influenced.

In addition, the Learning rate (Learning rate) is an important super-parameter in deep Learning, which determines whether and when the objective function can converge to a local minimum. An appropriate learning rate enables the objective function to converge to a local minimum in an appropriate time.

Based on this, in these implementations, the initial thermodynamic regression network may be trained first with a larger first learning rate, so that the thermodynamic regression network converges quickly and the keypoint location information is learned first. On the basis, the intermediate thermodynamic diagram regression network and the initial coordinate regression network are subjected to combined training at a second smaller learning rate, so that the incidence relation between the key points can be learned while the local optimal solution is found, and the requirement of the incidence relation between the accuracy rate and the key points is met.

Compared to the description of some embodiments corresponding to fig. 2, the flow 500 of the keypoint detection method in some embodiments corresponding to fig. 5 obtains the target keypoint by selecting a keypoint from the group of keypoints. On the whole, part of key point thermodynamic diagrams are necessary, and part of key point coordinate sets are also necessary, so that the advantages of two networks can be integrated, and the requirements of the incidence relation between the accuracy and the key points are met simultaneously.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a keypoint detection apparatus, which correspond to those of the method embodiments shown in fig. 2, and which may be applied in various electronic devices in particular.

As shown in fig. 6, the keypoint detection apparatus 600 of some embodiments comprises: an extraction unit 601, a thermodynamic diagram generation unit 602, a coordinate generation unit 603, and a position information generation unit 604. Wherein, the extraction unit 601 is configured to perform feature extraction on the image to be detected, resulting in image features. The thermodynamic diagram generation unit 602 is configured to input the image features into a previously trained thermodynamic diagram regression network, resulting in a keypoint thermodynamic diagram. The coordinate generation unit 603 is configured to input the output result of the intermediate layer of the thermodynamic regression network into a coordinate regression network trained in advance, resulting in a set of keypoint coordinates. The position information generating unit 604 is configured to generate the position information of the keypoint corresponding to the image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates.

In an optional implementation of some embodiments, the location information generating unit 604 is further configured to: respectively mapping the key point thermodynamic diagrams and the key point coordinate sets to an image to be detected to obtain a target image comprising the thermodynamic diagram mapping key point sets and the coordinate mapping key point sets; selecting a key point from each key point group in the target image as a target key point to obtain a target key point set, wherein each key point group comprises a corresponding thermodynamic diagram mapping key point and a coordinate mapping key point; and determining the position information of each target key point in the target key point set as the key point position information corresponding to the image to be detected.

In an optional implementation of some embodiments, the location information generating unit 604 is further configured to: and selecting one key point from the key point groups as a target key point based on the distance between two key points in each key point group.

In an optional implementation of some embodiments, the location information generating unit 604 is further configured to: determining thermodynamic diagram mapping key points in the key point group as target key points in response to the fact that the determined distance is smaller than or equal to a preset threshold value; and in response to determining that the distance is greater than the preset threshold, determining the coordinate mapping key point in the key point group as the target key point.

In an alternative implementation of some embodiments, the thermodynamic regression network includes a plurality of deconvolution layers; and the coordinate generation unit 603 is configured to: and inputting the output result of the last deconvolution layer in the plurality of deconvolution layers into a coordinate regression network to obtain a key point coordinate set.

In an alternative implementation of some embodiments, the thermodynamic and coordinate regression networks are trained by: training the initial thermodynamic diagram regression network at a first learning rate until a convergence condition is met to obtain an intermediate thermodynamic diagram regression network; and performing combined training on the intermediate thermodynamic diagram regression network and the initial coordinate regression network at a second learning rate until a training end condition is met to obtain the thermodynamic diagram regression network and the coordinate regression network, wherein the second learning rate is less than the first learning rate.

In an optional implementation of some embodiments, the apparatus 600 further comprises: a detection unit and a scaling unit. The detection unit is configured to perform target portion detection on an original image to be detected, resulting in an image area displaying the target portion. The zooming unit is configured to zoom the image area to a target size, and an image to be detected is obtained. The location information generating unit 604 is further configured to: and generating the position information of the key point corresponding to the original image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

In an optional implementation of some embodiments, the location information generating unit 604 is further configured to: and respectively mapping the key point thermodynamic diagram and the key point coordinate set to an original image to be detected to obtain a target image containing the thermodynamic diagram mapping key point set and the coordinate mapping key point set.

It will be understood that the elements described in the apparatus 600 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.

Referring now to fig. 7, shown is a schematic diagram of an electronic device 700 suitable for use in implementing some embodiments of the present disclosure. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network via communications means 709, or may be installed from storage 708, or may be installed from ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: carrying out feature extraction on an image to be detected to obtain image features; inputting the image characteristics into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram; inputting an output result of a middle layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set; and generating the position information of the key point corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an extraction unit, a thermodynamic diagram generation unit, a coordinate generation unit, and a position information generation unit. The names of the units do not in some cases constitute a limitation on the units themselves, and for example, the extraction unit may also be described as a "unit for performing feature extraction on an image to be detected".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A keypoint detection method comprising:

carrying out feature extraction on an image to be detected to obtain image features;

inputting the image characteristics into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram;

inputting an output result of the intermediate layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set;

and generating the position information of the key point corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

2. The method of claim 1, wherein the generating, based on the keypoint thermodynamic diagram and the set of keypoint coordinates, keypoint location information corresponding to the image to be detected comprises:

respectively mapping the key point thermodynamic diagram and the key point coordinate set to the image to be detected to obtain a target image containing a thermodynamic diagram mapping key point set and a coordinate mapping key point set;

selecting a key point from each key point group in the target image as a target key point to obtain a target key point set, wherein each key point group comprises a corresponding thermodynamic diagram mapping key point and a coordinate mapping key point;

and determining the position information of each target key point in the target key point set as the key point position information corresponding to the image to be detected.

3. The method of claim 2, wherein said selecting a keypoint from each keypoint group in the target image as a target keypoint comprises:

and selecting one key point from each key point group as a target key point based on the distance between two key points in each key point group.

4. The method of claim 3, wherein the selecting one keypoint from the keypoint groups as a target keypoint based on the distance between two keypoints in each keypoint group comprises:

determining thermodynamic diagram mapping key points in the key point group as target key points in response to determining that the distance is smaller than or equal to a preset threshold;

in response to determining that the distance is greater than the preset threshold, determining a coordinate mapping keypoint of the keypoint group as a target keypoint.

5. The method of claim 1, wherein the thermodynamic regression network comprises a plurality of deconvolution layers; and

inputting the output result of the intermediate layer of the thermodynamic diagram regression network into a coordinate regression network to obtain a key point coordinate set, wherein the key point coordinate set comprises:

and inputting the output result of the last deconvolution layer in the plurality of deconvolution layers into a coordinate regression network to obtain a key point coordinate set.

6. The method of claim 1, wherein the thermodynamic regression network and the coordinate regression network are trained by:

training the initial thermodynamic diagram regression network at a first learning rate until a convergence condition is met to obtain an intermediate thermodynamic diagram regression network;

and performing combined training on the intermediate thermodynamic diagram regression network and the initial coordinate regression network at a second learning rate until a training end condition is met to obtain the thermodynamic diagram regression network and the coordinate regression network, wherein the second learning rate is less than the first learning rate.

7. The method according to claim 2, wherein before the feature extraction is performed on the image to be detected to obtain image features, the method further comprises:

carrying out target part detection on an original image to be detected to obtain an image area for displaying a target part;

zooming the image area to a target size to obtain the image to be detected; and

generating the position information of the key point corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set, wherein the position information comprises:

and generating the position information of the key point corresponding to the original image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

8. The method of claim 7, wherein the mapping the keypoint thermodynamic diagram and the set of keypoint coordinates to the image to be detected respectively to obtain a target image comprising a thermodynamic diagram mapped set of keypoint coordinates and a set of coordinate mapped keypoint coordinates comprises:

and mapping the key point thermodynamic diagram and the key point coordinate set to the original image to be detected respectively to obtain a target image containing the thermodynamic diagram mapping key point set and the coordinate mapping key point set.

9. A keypoint detection device comprising:

the extraction unit is configured to perform feature extraction on the image to be detected to obtain image features;

the thermodynamic diagram generating unit is configured to input the image features into a previously trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram;

the coordinate generating unit is configured to input an output result of the intermediate layer of the thermodynamic diagram regression network into a coordinate regression network trained in advance to obtain a key point coordinate set;

and the position information generating unit is configured to generate the position information of the key points corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.

10. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

11. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-8.