CN110334807B

CN110334807B - Deep learning network training method, device, equipment and storage medium

Info

Publication number: CN110334807B
Application number: CN201910471718.1A
Authority: CN
Inventors: 刘思阳
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2021-09-28
Anticipated expiration: 2039-05-31
Also published as: CN110334807A

Abstract

The invention discloses a training method, a device, equipment and a storage medium of a deep learning network. The method comprises the following steps: acquiring a true value image and a predicted image corresponding to the true value image output by a deep learning network; calculating the pixel mean error between the true value image and the predicted image; determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image; and determining the comprehensive loss of the image according to the basic loss of the image, wherein the comprehensive loss of the image is used for training the deep learning network. The method calculates the pixel mean error between a true value image and a predicted image; and the basic loss of the image is determined by utilizing the preset exponential loss function, the punishment on the all-zero image is enhanced, the derivative of the loss function is not flat, the local optimum is not easy to fall into, and the problem of saddle points is avoided.

Description

Deep learning network training method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a training method, a training device, training equipment and a storage medium for a deep learning network.

Background

With the continuous progress of science and technology, deep learning networks are gradually emerging. In the field of image processing, the deep learning network, as an operation model, can extract features in an image, for example: the deep learning network can be used for identifying key points of a human body, segmenting human figures and the like.

At present, before the deep learning network is used, the deep learning network needs to be trained, and in the training process, the deep learning network can predict accurate image information by continuously adjusting the deep learning network. However, when the deep learning network is trained, due to the complexity of the deep learning network, a situation that an image falls into a local optimum (saddle point) is easy to occur in the training process, that is, the loss gradient of the image processed by the deep learning network is not obvious, the penalty of a loss function on an all-zero-pixel image is small, and thus even if the deep learning network outputs the all-zero-pixel image, the obtained image loss is still small. If the training is rigor in local optimum, aiming at the image with unobvious loss gradient, the deep learning network always outputs the image with all zero pixels, and the global optimum can not be achieved, namely the accurate image information can not be output.

For example: as shown in FIG. 1, which is a true-value heat diagram of a human body key point, the pixel value of a white point in the center of FIG. 1 is 1, the pixel values of gray points around the white point are between 0 and 1, and the pixel values of the rest black points are 0. The situation that the heat map is trapped in a saddle point is as follows: the predicted heat map output by the deep learning network is a completely black heat map with all 0 pixel values, because the loss gradient of the true value heat map is not obvious, the loss between the true value heat map and the predicted heat map output by the deep learning network is small, the penalty of the existing loss function on the completely zero heat map is small, and the correct predicted heat map cannot be obtained, so that the deep learning network is easy to fall into the completely zero local optimum during training.

Disclosure of Invention

The invention mainly aims to provide a training method, a device, equipment and a storage medium of a deep learning network, so as to solve the problem that the existing deep learning network is easy to sink in all-zero local optimum during training.

Aiming at the technical problems, the invention solves the technical problems by the following technical scheme:

the invention provides a training method of a deep learning network, which comprises the following steps: acquiring a true value image and a predicted image corresponding to the true value image output by a deep learning network; calculating the pixel mean error between the true value image and the predicted image; determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image; and determining the comprehensive loss of the image according to the basic loss of the image, wherein the comprehensive loss of the image is used for training the deep learning network.

Wherein, the determining the image comprehensive loss according to the image basic loss comprises: determining the image basic loss as the image comprehensive loss; or, determining the image supplement loss by using a preset loss function; and calculating a weighted sum of the image fundamental loss and the image supplementary loss, and determining the weighted sum as the image comprehensive loss.

Wherein the truth-value image is a truth-value heat map; the predicted image is a prediction heat image; the obtaining of the true value image and the prediction image corresponding to the true value image output by the deep learning network include: acquiring a plurality of truth value heat maps corresponding to the same image and a prediction heat map corresponding to each truth value heat map output by a deep learning network; or acquiring a plurality of true value heat maps corresponding to each image in the plurality of images and a prediction heat map corresponding to each true value heat map output by the deep learning network.

If a plurality of true heat maps corresponding to the same image and a prediction heat map corresponding to each true heat map output by the deep learning network are obtained, calculating a pixel mean error between the true image and the prediction image, including: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; and calculating the mean value of all pixel mean value errors to obtain the mean pixel mean value error between each true value heat map and the corresponding prediction heat map.

If a plurality of true heat maps corresponding to each image in a plurality of images and a prediction heat map corresponding to each true heat map output by a deep learning network are obtained, calculating a pixel mean error between the true image and the prediction image, including: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; respectively calculating the mean value of pixel mean errors between each true value heat map corresponding to each image and the corresponding prediction heat map; and carrying out average value calculation on the average value of the pixel average value errors corresponding to each image to obtain the average pixel average value error between each true value heat map and the corresponding prediction heat map.

The method for determining the image basic loss by using the preset exponential loss function comprises the following steps:

the HM _ EMVD _ loss is the basic loss of the image, the DIFF is the pixel mean error, e is a natural constant, and α is a preset parameter.

The invention provides a training device of a deep learning network, which comprises an acquisition module, a training module and a control module, wherein the acquisition module is used for acquiring a true value image and a predicted image which is output by the deep learning network and corresponds to the true value image; the calculating module is used for calculating the pixel mean error between the true value image and the predicted image; the first determining module is used for determining the basic loss of the image by utilizing a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image; and the second determination module is used for determining the comprehensive loss of the image according to the basic loss of the image, and the comprehensive loss of the image is used for performing training on the deep learning network.

Wherein the first determining module is further configured to: determining the image basic loss as the image comprehensive loss; or, determining the image supplement loss by using a preset loss function; and calculating a weighted sum of the image fundamental loss and the image supplementary loss, and determining the weighted sum as the image comprehensive loss.

Wherein the truth-value image is a truth-value heat map; the predicted image is a prediction heat image; the obtaining module is further configured to: acquiring a plurality of truth value heat maps corresponding to the same image and a prediction heat map corresponding to each truth value heat map output by a deep learning network; or acquiring a plurality of true value heat maps corresponding to each image in the plurality of images and a prediction heat map corresponding to each true value heat map output by the deep learning network.

Wherein, if the obtaining module obtains a plurality of truth-value heat maps corresponding to the same image and a prediction heat map corresponding to each truth-value heat map output by the deep learning network, the calculating module is further configured to: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; and calculating the mean value of all pixel mean value errors to obtain the mean pixel mean value error between each true value heat map and the corresponding prediction heat map.

Wherein, if the obtaining module obtains a plurality of true heat maps corresponding to each image in a plurality of images and a predicted heat map corresponding to each true heat map output by the deep learning network, the calculating module is further configured to: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; respectively calculating the mean value of pixel mean errors between each true value heat map corresponding to each image and the corresponding prediction heat map; and carrying out average value calculation on the average value of the pixel average value errors corresponding to each image to obtain the average pixel average value error between each true value heat map and the corresponding prediction heat map.

The first determining module is specifically configured to perform the following calculation:

The invention provides training equipment of a deep learning network, which comprises a processor and a memory, wherein the processor is used for processing a training program; the processor is used for executing the training program of the deep learning network stored in the memory so as to realize the training method of the deep learning network.

The present invention provides a storage medium storing one or more programs executable by one or more processors to implement the above-described method of training a deep learning network.

The invention has the following beneficial effects:

the method calculates the pixel mean error between a true value image and a predicted image; and the basic loss of the image is determined by utilizing a preset exponential loss function, the mode enables the difference between the true value image and the predicted image to be more obvious, the loss of the all-zero image output by the deep learning network is larger, namely the punishment on the all-zero image is enhanced, the local optimum is not easy to fall into, and the problem of saddle points is avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a truth heat map of prior art human body key points;

FIG. 2 is a flow diagram of a method of training a deep learning network according to an embodiment of the invention;

FIG. 3 is a flowchart of the steps for determining a substantial loss of an image according to one embodiment of the present invention;

FIG. 4 is a flowchart of the steps for determining image integration loss according to one embodiment of the present invention;

FIG. 5 is a block diagram illustrating an architecture for determining image integration loss according to an embodiment of the present invention;

FIG. 6 is a flowchart of the steps for determining a substantial loss of an image according to another embodiment of the present invention;

FIG. 7 is a flowchart of the steps for determining image integration loss according to another embodiment of the present invention;

FIG. 8 is a block diagram of a training apparatus for a deep learning network according to an embodiment of the present invention;

FIG. 9 is a block diagram of a training device of a deep learning network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

The embodiment provides a training method forgotten by deep learning. Fig. 2 is a flowchart illustrating a training method of a deep learning network according to an embodiment of the invention.

Step S210, obtaining a true value image and a predicted image corresponding to the true value image output by the deep learning network.

The predicted image is an analysis result of the deep learning network prediction.

The true value image is the correct analysis result.

And one predicted image corresponds to a true value image, the accuracy of the predicted image is measured by using the true value image, and whether the deep learning network needs to be trained continuously is determined.

Furthermore, a sample image is collected in advance, sample analysis is performed on the sample image manually, and an analysis result is labeled to obtain a true value image. And inputting the sample image into a deep learning network, training the deep learning network to perform sample analysis, and outputting an analysis result which is a predicted image. For example: manually analyzing the human body key points of the sample image to obtain an analysis result of a human body key point image which can be extracted from the sample image, and performing pixel annotation on the human body key points in the human body key point image to obtain a true value image of the human body key points; inputting a sample image into a deep learning network, training the deep learning network to output a human key point image, wherein the human key point image output by the deep learning network is a predicted image of a human key point; the accuracy of the predicted image is generally lower than that of a true image, and the deep learning network is trained to improve the accuracy of the predicted image.

In this embodiment, the true value image may be a true value heat map, and the predicted image may be a predicted heat map. The heat map may be a heat map of key points of the human body.

Furthermore, a plurality of true value heat maps corresponding to the same image and a prediction heat map output by the deep learning network and corresponding to each true value heat map can be obtained; or acquiring a plurality of true value heat maps corresponding to each image in the plurality of images and a prediction heat map output by the deep learning network and corresponding to each true value heat map.

Step S220, calculating a pixel mean error between the true image and the predicted image.

The pixel mean error is: the error of the pixel mean of the true image and the pixel mean of the corresponding predicted image. The pixel mean means that all pixel points in the image are summed and averaged. The error of the pixel mean refers to a square of a difference value between the pixel mean of the true image and the pixel mean of the corresponding predicted image.

The calculation procedure of the pixel mean error will be described in the following embodiments.

Step S230, determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image.

The exponential loss function may be set as desired. For example: the exponential loss function, may be:

the HM _ EMVD _ loss is the basic loss of the image, the DIFF is the pixel mean error, e is a natural constant, and α is a preset parameter. α can be set and adjusted according to specific requirements, for example: different settings are carried out according to different multitask learning types.

The true value image and the predicted image corresponding to each other are taken as a set of true value image and predicted image, and if a plurality of sets of true value image and predicted image are acquired, it is necessary to calculate the pixel mean error of each set of true value image and predicted image and calculate the average value of a plurality of pixel mean errors, substitute the average pixel mean error into the exponential power of the exponential loss function, and further, take the average pixel mean error as DIFF.

In this embodiment, the image basis loss may be named EMVD (exponential mean difference loss) loss. The calculation function of the EMVD loss may be named EMVD loss function.

And step S240, determining the comprehensive loss of the image according to the basic loss of the image, so as to train the deep learning network according to the comprehensive loss of the image.

The image integration loss is the total loss of the prediction image.

In this embodiment, the image integration loss may be determined using an EMVD loss function, or may be determined using an EMVD loss function and a preset loss function. That is, the image fundamental loss can be directly determined as the image synthetic loss; or, determining the image supplement loss by using a preset loss function; and calculating a weighted sum of the image fundamental loss and the image supplementary loss, and determining the weighted sum as the image comprehensive loss. Further, the number of the preset loss functions may be one or more. The predetermined loss function is, for example, an L1 loss function, an L2 loss function.

In this embodiment, the present invention calculates the pixel mean error between the true image and the predicted image; and determining the basic loss of the image according to the pixel mean error, wherein the mode makes the difference between the true value image and the predicted image more obvious, and makes the loss of the full-zero image output by the deep learning network larger, namely, the punishment to the full-zero image is enhanced, the local optimum is not easy to fall into, and the problem of saddle point is avoided.

The method can be applied to a deep learning network of multi-task learning, the punishment on the all-zero image is enhanced, the EMVD loss function is continuously derivable, the derivative of the loss function is not flat, and the local optimization is not easy to fall into.

A more specific way of determining the image integration loss is given below.

The embodiment is described with respect to a case of a single sample (a single image), that is, a plurality of true-value heat maps corresponding to the same image and a predicted heat map corresponding to each true-value heat map output by the deep learning network are obtained, and the image comprehensive loss is determined by using the plurality of true-value heat maps and the plurality of predicted heat maps, so as to perform training on the deep learning network according to the image comprehensive loss.

FIG. 3 is a flowchart of the steps for determining the substantial loss of an image according to an embodiment of the present invention.

Step S310, a plurality of true heat maps corresponding to the same image and a predicted heat map corresponding to each true heat map output by the deep learning network are obtained.

Acquiring k (k is more than or equal to 1) truth value heat maps corresponding to the images and a predicted heat map corresponding to each truth value heat map to obtain k truth value heat maps and k predicted heat maps corresponding to the same image, namely k groups of truth value heat maps and predicted heat maps.

Step S320, performing summation and averaging on all pixel points in each true value heat map to obtain a pixel mean value of each true value heat map, and forming a matrix including the pixel mean values of all true value heat maps.

And calculating a matrix formed by pixel means of all the true-value heat maps. Specifically, the calculation formula may be:

SUM_true＝reduce_mean(h_true,axis＝(0,1))；

wherein the SUM_trueA matrix composed of the pixel means of each true heat map. h is_trueThe real-value heat map group is formed by superposing a plurality of real-value heat maps, axis (0,1) represents the pixel summation of all pixel points in each plane formed by a width axis (0 axis) and a height axis (1 axis) in the real-value heat map group, and reduce _ mean is the average valueA function. reduce mean is used to calculate the mean.

SUM_trueOf dimension 1 xk, i.e. 1 row by k columns of the matrix, SUM_trueEach element in the group is a pixel mean value of a true-value heat map, and the pixel mean value of each true-value heat map is obtained by summing and averaging all pixel points in the true-value heat map.

h_trueThe dimension of (d) is w × h × k, w (0 axis) represents the width of the true-value heat map, h (1 axis) represents the height of the true-value heat map, and k (2 axis) represents the number of the true-value heat maps. The plane formed by the 0 axis and the 1 axis is the plane of the true heat map, that is, h_trueK w × h are added together.

Step S330, summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map, and obtain a matrix comprising the pixel mean values of all prediction heat maps.

And calculating a matrix formed by pixel mean values of all the predicted heat maps. Specifically, the calculation formula may be:

SUM_pred＝reduce_mean(h_pred,axis＝(0,1))；

wherein the SUM_predA matrix of pixel means for each predicted heat map. h is_predThe method comprises the steps of representing a prediction heat map group formed by superposing a plurality of prediction heat maps, wherein axis (0,1) represents the sum of pixels of all pixel points in each plane formed by a width axis (0 axis) and a height axis (1 axis) in the prediction heat map group, and reduce _ mean is an average value function.

SUM_predOf dimension 1 xk, i.e. 1 row by k columns of the matrix, SUM_predEach element in the prediction heat map is a pixel mean value of a prediction heat map, and the pixel mean value of each prediction heat map is obtained by summing and averaging all pixel points in the prediction heat map.

h_predThe dimension of (d) is w × h × k, w (0 axis) represents the width of the predicted heat map, h (1 axis) represents the height of the predicted heat map, and k represents the number of predicted heat maps. The plane composed of the 0 axis and the 1 axis is the plane of the predicted heat map, that is, h_predK pieces of w × h are superimposed together. The corresponding true heat map and predicted heat map have equal w and equal h.

Step S340, calculating a pixel mean error between the corresponding true-value heat map and the predicted heat map according to the pixel mean of each true-value heat map and the pixel mean of each predicted heat map.

In the obtained true heat maps and the predicted heat maps, pixel mean errors between the corresponding true heat maps and the predicted heat maps are calculated according to the pixel mean of each true heat map and the pixel mean of each predicted heat map, and a matrix formed by the pixel mean errors is formed.

Specifically, the calculation formula may be:

DIFFS＝(SUM_true-SUM_pred)²；

the DIFFS is a matrix formed by pixel mean errors between the corresponding true heat map and the predicted heat map.

DIFFS has dimension of 1 × k, and each element in DIFFS is SUM_trueThe pixel mean and SUM of a true heat map_predThe difference of the pixel mean of the corresponding predicted heat map is squared, i.e. each element in DIFFS is SUM_trueThe pixel mean and SUM of a true heat map_predThe pixel mean error of the corresponding predicted heat map.

Step S350, calculating the mean value of all pixel mean value errors to obtain the mean pixel mean value error between each true value heat map and the corresponding prediction heat map.

Calculating the average value of all pixel average value errors to obtain the pixel average value error corresponding to the image, namely: an average pixel mean error between each true heat map and the corresponding predicted heat map.

Specifically, the calculation formula may be:

DIFF＝reduce_mean(DIFFS)；

wherein DIFF is the average of all pixel mean errors, i.e. the averaged pixel mean error between each true heat map and the corresponding predicted heat map, and reduce _ mean (DIFFs) is the average of the sums of all elements (all pixel mean errors) in DIFFs.

Step S360, determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function includes: pixel mean error between the true heat map (image-wise) and the predicted heat map.

Specifically, the calculation formula may be:

the HM _ EMVD _ loss represents the basic loss of the image, e is a natural constant, and alpha is a preset parameter. α can be set and adjusted according to specific requirements, for example: different settings are carried out according to different multitask learning types.

In this embodiment, the process of determining the basic loss of the image may be referred to as an execution process of the EMVD function.

In the present embodiment, the image fundamental loss can be directly determined as the image integrated loss. Of course, the image supplementary loss may be determined first, and the image comprehensive loss may be determined according to the image basic loss and the image supplementary loss.

FIG. 4 is a flowchart illustrating the steps of determining the total loss of an image according to an embodiment of the present invention. FIG. 5 is a block diagram illustrating an architecture for determining image integration loss according to an embodiment of the present invention.

In step S410, the image compensation loss is calculated by using a predetermined loss function with respect to the true heat map and the predicted heat map corresponding thereto.

The predetermined loss function is, for example: l1 loss function, L2 loss function.

Taking the L2 loss function as an example, the following calculation steps need to be performed:

step S1, determine l₁＝(h_true-h_pred)²。

Wherein l₁Is a matrix of squares of pixel differences of the true heat map and the corresponding predicted heat map, l₁Has dimensions w × h × k. w isThe width axis (0 axis), h is the height axis (1 axis), and the plane formed by the 0 axis and the 1 axis is the plane formed by the squares of the pixel differences of the true heat map and the corresponding predicted heat map. The number of the truth heat map and the prediction heat map is equal to k, and l is₁K w × h are added together.

Specifically, a true value heat map and a predicted heat map corresponding to each other are obtained, and a difference and a square of corresponding pixel points of the true value heat map and the predicted heat map are obtained.

Step S2, determine l₂＝reduce_mean(l₁,axis＝(0,1))。

Wherein l₂Represents a pair of₁The sum of the pixels of all the pixels in each plane composed of the middle width axis (0 axis) and the height axis (1 axis) is averaged, and axis (0,1) represents the pair l₁And summing the pixels of all pixel points in each plane formed by the middle 0 axis and the 1 axis, wherein the reduce _ mean is an average value function.

l₂Has a dimension of 1 xk, l₂Each element in (a) may represent an error between a true heat map and a corresponding predicted heat map.

In step S3, HM _ L2_ loss _ reduce _ mean (L) is calculated₂)。

Where HM _ L2_ loss is the L2 penalty, which represents the mean of the error between each true heat map and the corresponding predicted heat map.

Step S420, calculating the basic image loss by using an EMVD loss function according to the true heat map and the corresponding predicted heat map.

The step of calculating the basic image loss by using the EMVD loss function has already been described above, and is not described herein again.

In step S430, a weighted sum of the image fundamental loss and the image supplemental loss is calculated, and the weighted sum is determined as an image integrated loss.

Specifically, the calculation formula may be:

HM_loss＝W₁×HM_L2_loss+W₂×HM_EMVD_loss；

where HM _ loss represents the total loss of an image, and HM _ L2_ loss is L2 loss (complementary loss of an image)) HM _ EMVD _ loss is the basic loss of the image, W₁Is a first weight, W₂Is the second weight. The first weight and the second weight can be set according to experience and can also be continuously adjusted in the process of training the deep learning network.

After the image comprehensive loss is obtained, the deep learning network can be adjusted according to the image comprehensive loss so as to enable the deep learning network to output a more accurate predicted heat map.

According to the method, the image comprehensive loss is determined by adopting the image basic loss and the image supplementary loss, the difference between the true value image and the predicted image is further increased by using two loss functions, the punishment of a deep learning network on the all-zero heat map is enhanced, the deep learning network trapped under the local optimal condition has larger loss of outputting the all-zero heat map, the influence is increased in the back propagation process, the deep learning network is favorable for jumping out of the local optimal condition, and therefore the global optimal training effect is achieved.

Another more specific way of determining the image integration loss is given below.

The present embodiment is described with respect to a batch sample (multiple images), that is, a plurality of true-value heat maps corresponding to each image in the multiple images and a predicted heat map corresponding to each true-value heat map output by the deep learning network are obtained, and the multiple true-value heat maps and the multiple predicted heat maps are used to determine an image integration loss, so as to perform training on the deep learning network according to the image integration loss.

FIG. 6 is a flowchart of the steps for determining the substantial loss of an image according to another embodiment of the present invention.

Step S610 is performed to obtain a plurality of true heat maps corresponding to each of the plurality of images and a predicted heat map corresponding to each of the true heat maps output by the deep learning network.

Acquiring k (k is more than or equal to 1) true value heat maps corresponding to each image of the batch (the batch is more than or equal to 1) images and a prediction heat map corresponding to each true value heat map output by the deep learning network. Wherein, batch is 1, and is executed according to the above single sample condition, and since the single sample condition has been described, it is not described herein again.

For example: in the process of training the deep learning network, k true value images corresponding to the image A and a predicted image corresponding to each true value image are obtained, and k true value images corresponding to the image B and a predicted image corresponding to each true value image are obtained.

Step S620, performing summation and averaging on all pixel points in each true value heat map to obtain a pixel mean value of each true value heat map, and obtaining a matrix including the pixel mean values of the multiple true value heat maps corresponding to each image in the multiple images.

And calculating a matrix formed by pixel means of all the true-value heat maps.

Specifically, the calculation formula may be:

batch_SUM_true＝reduce_mean(h_true,axis＝(1,2))；

wherein, batch _ SUM_trueA matrix composed of the pixel means of each true heat map. h is_trueThe truth-value heat maps corresponding to each image in the batch images are superposed to form a matrix, axis is (1,2) represents the sum of pixels of all pixel points in each plane formed by a width axis (1 axis) and a height axis (2 axis) in the matrix, and reduce _ mean is an average function.

batch_SUM_trueThe dimension of (1) is batch × k, i.e., a matrix of k columns and rows of batch, where batch represents the number of images and k represents the number of true-value heat maps corresponding to each image. batch _ SUM_trueEach element in the group is a pixel mean value of a true-value heat map, and the pixel mean value of each true-value heat map is obtained by summing and averaging all pixel points in the true-value heat map.

h_trueThe dimension of (1) represents the width of the true-value heat map, h (2) represents the height of the true-value heat map, and k (3) represents the number of the true-value heat maps corresponding to each image. The plane formed by the 0 axis and the 1 axis is the plane of the true heat map, that is, h_trueIs a batch column truth heat map group, and each column truth heat map group is formed by superposing k pieces of w multiplied by h.

Step S630, summing and averaging all the pixel points in each predicted heat map to obtain a pixel mean value of each predicted heat map, and obtain a matrix including the pixel mean values of the plurality of predicted heat maps corresponding to each of the plurality of images.

batch_SUM_pred＝reduce_mean(h_pred,axis＝(1,2))；

wherein, batch _ SUM_predA matrix composed of the pixel means of all the predicted heat maps. h is_predThe prediction heat degree graph corresponding to each image in the batch images is superposed to form a matrix, axis is (1,2) the sum of pixels of all pixel points in each plane formed by a width axis (1 axis) and a height axis (2 axis) in the matrix is represented, and reduce _ mean is an average value function.

batch_SUM_predHas a dimension of batch × k, i.e., a matrix of batch rows and k columns, batch _ SUM_predEach element in the prediction heat map is a pixel mean value of a prediction heat map, and the pixel mean value of each prediction heat map is obtained by summing and averaging all pixel points in the prediction heat map.

h_predThe dimension of (1) represents the width of the true-value heat map, h (2) represents the height of the true-value heat map, and k (3) represents the number of the true-value heat maps corresponding to each image. The plane composed of the 0 axis and the 1 axis is the plane of the predicted heat map, that is, h_predIs a batch column prediction heat map group, and k w × h real-value heat map groups of each column are superposed together.

In step S640, a pixel mean error between the corresponding true-value heat map and the predicted heat map is calculated according to the pixel mean of each true-value heat map and the pixel mean of each predicted heat map.

Specifically, the calculation formula may be:

batch_DIFFS＝(batch_SUM_true-batch_SUM_pred)²；

wherein, the batch _ DIFFS is a matrix formed by pixel mean errors between the corresponding true heat map and the predicted heat map.

The dimension of batch _ DIFFS is batch × k. Each element in the batch _ DIFFS is batch _ SUM_trueThe pixel mean and batch _ SUM of a true heat map in (1)_predThe error between the pixel means of the corresponding predicted heat map.

Step S650, respectively calculating the mean value of the pixel mean error between each true-value heat map and the corresponding predicted heat map corresponding to each image.

For example: acquiring a plurality of true value images corresponding to the image A and a predicted image corresponding to each true value image, and acquiring a plurality of true value images corresponding to the image B and a predicted image corresponding to each true value image; and calculating the mean value a of the pixel mean errors between each true value heat map corresponding to the image A and the corresponding prediction heat map, and calculating the mean value B of the pixel mean errors between each true value heat map corresponding to the image B and the corresponding prediction heat map.

Specifically, the calculation formula may be:

batch_DIFF＝reduce_mean(batch_DIFFS,axis＝1)；

the batch _ DIFF is an average value of all pixel mean errors corresponding to each image, that is, an average pixel mean error between each true-value heat map and the corresponding predicted heat map corresponding to each image, and reduce _ mean (batch _ DIFFs, axis ═ 1) is obtained by summing and averaging all elements (all pixel mean errors) in the batch _ DIFFs.

Further, the dimension of batch _ DIFF is batch × 1, and each element in batch _ DIFF is the mean of pixel mean errors between the k true-value heat maps and the corresponding predicted heat map of one image.

Step S660, performing average calculation on the average of the pixel average errors corresponding to each image to obtain an average pixel average error between each true-value heat map and the corresponding prediction heat map.

For example: in the above example, the mean value a of the pixel mean errors between the respective true-value heat maps corresponding to image a and the respective predicted heat maps, and the mean value B of the pixel mean errors between the respective true-value heat maps corresponding to image B and the respective predicted heat maps are calculated, and the mean value of a and B, i.e., (a + B) ÷ 2, is calculated in this example.

Specifically, the calculation formula may be:

DIFF＝reduce_mean(batch_DIFFS)/batch；

DIFF is the average of all pixel mean errors, i.e., the averaged pixel mean error between each (batch times k) true heat map and the corresponding predicted heat map, and reduce mean (batch DIFFs) is the average of the sum of all elements (all pixel mean errors) in batch DIFFs.

Step S670, determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function includes: pixel mean error between the true heat map and the predicted heat map (for each image).

Specifically, the calculation formula may be:

FIG. 7 is a flowchart of the steps for determining the image integration loss according to another embodiment of the present invention.

In step S710, the image compensation loss is calculated by using a predetermined loss function with respect to the true heat map and the predicted heat map corresponding thereto.

step S1, determine l₁＝(h_true-h_pred)²。

Wherein l₁Is a matrix of squares of pixel differences of the true heat map and the corresponding predicted heat map, l₁Has dimensions of batch × w × h × k. batch is the number of images (0 axis), w is the width axis (1 axis), h is the height axis (2 axis), and k is the number of true heat maps (predicted heat maps) corresponding to each image. The plane formed by the 0 axis and the 1 axis is the plane formed by the squares of the pixel differences of the true heat map and the corresponding predicted heat map. The number of the truth heat map and the prediction heat map is equal to k, and l is₁For batch planar groups, each planar group comprises k w × h planes stacked together.

Step S2, determine l₂＝reduce_mean(l₁,axis＝(1,2))。

Wherein l₂Represents a pair of₁The sum of the pixels of all the pixels in each plane composed of the middle width axis (1 axis) and the height axis (2 axis) is averaged, and axis ═ 1,2 represents the pair l₁And summing the pixels of all pixel points in each plane formed by the middle 1 axis and the 2 axis, wherein the reduce _ mean is an average value function.

l₂Has a dimension of batch × k, l₂Each element in (a) may represent an error between a true heat map and a corresponding predicted heat map.

In the step 3, the step of,HM _ L2_ losses ═ reduce _ mean (L) is calculated₂,axis＝1)。

Where HM _ L2_ losses is the L2 loss for all images, and represents the mean of the errors between the respective true and predicted heat maps for the batch images.

In step S4, HM _ L2_ loss ═ reduce _ mean (HM _ L2_ losses)/batch is calculated.

Where HM _ L2_ loss is the L2 penalty for each image.

Step S720, calculating the basic image loss by using an EMVD loss function according to the true heat map and the corresponding predicted heat map.

In step S730, a weighted sum of the image fundamental loss and the image supplemental loss is calculated, and the weighted sum is determined as an image integrated loss.

Specifically, the calculation formula may be:

HM_loss＝W₁×HM_L2_loss+W₂×HM_EMVD_loss；

wherein, HM _ loss represents the image integration loss, HM _ L2_ loss is L2 loss (image supplement loss), HM _ EMVD _ loss is the image basic loss, and W is₁Is a first weight, W₂Is the second weight. The first weight and the second weight can be set according to experience and can also be continuously adjusted in the process of training the deep learning network.

The embodiment provides a training device of a deep learning network. Fig. 8 is a block diagram of a training apparatus of a deep learning network according to an embodiment of the invention.

In this embodiment, the training apparatus for a deep learning network includes: an obtaining module 810, a calculating module 820, a first determining module 830 and a second determining module 840.

The obtaining module 810 is configured to obtain a true value image and a predicted image corresponding to the true value image output by the deep learning network.

A calculating module 820, configured to calculate a pixel mean error between the true image and the predicted image.

A first determining module 830, configured to determine a basic image loss by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image.

A second determining module 840, configured to determine an image synthetic loss according to the image fundamental loss, where the image synthetic loss is used to perform training on the deep learning network.

Optionally, the first determining module 830 is further configured to: determining the image basic loss as the image comprehensive loss; or, determining the image supplement loss by using a preset loss function; and calculating a weighted sum of the image fundamental loss and the image supplementary loss, and determining the weighted sum as the image comprehensive loss.

Optionally, the truth-value image is a truth-value heat map; the predicted image is a prediction heat image; the obtaining module 810 is further configured to: acquiring a plurality of truth value heat maps corresponding to the same image and a prediction heat map corresponding to each truth value heat map output by a deep learning network; or acquiring a plurality of true value heat maps corresponding to each image in the plurality of images and a prediction heat map corresponding to each true value heat map output by the deep learning network.

Optionally, if the obtaining module 810 obtains a plurality of true heat maps corresponding to the same image and a predicted heat map corresponding to each true heat map output by the deep learning network, the calculating module 820 is further configured to: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; and calculating the mean value of all pixel mean value errors to obtain the mean pixel mean value error between each true value heat map and the corresponding prediction heat map.

Optionally, if the obtaining module 810 obtains a plurality of true-value heat maps corresponding to each image in a plurality of images and a predicted heat map corresponding to each true-value heat map output by the deep learning network, the calculating module 820 is further configured to: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; respectively calculating the mean value of pixel mean errors between each true value heat map corresponding to each image and the corresponding prediction heat map; and carrying out average value calculation on the average value of the pixel average value errors corresponding to each image to obtain the average pixel average value error between each true value heat map and the corresponding prediction heat map.

Optionally, the first determining module 830 is specifically configured to perform the following calculation:

the HM _ EMVD _ loss is the basic loss of the image, the DIFF is the average pixel mean error, e is a natural constant, and α is a preset parameter.

The functions of the apparatus of the present invention have been described in the method embodiments shown in fig. 2 to fig. 7, so that reference may be made to the related descriptions in the foregoing embodiments for details in the description of the present embodiment, which are not repeated herein.

The embodiment provides a training device of a deep learning network. Fig. 9 is a block diagram of a training apparatus of a deep learning network according to a fifth embodiment of the present invention.

In this embodiment, the training device of the deep learning network includes, but is not limited to: a processor 910, a memory 920.

The processor 910 is configured to execute a training program of the deep learning network stored in the memory 920 to implement the deep learning network training method described above.

Specifically, the processor 910 is configured to execute a training program of the deep learning network stored in the memory 920 to implement the following steps: acquiring a true value image and a predicted image corresponding to the true value image output by a deep learning network; calculating the pixel mean error between the true value image and the predicted image; determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image; and determining the comprehensive loss of the image according to the basic loss of the image, wherein the comprehensive loss of the image is used for training the deep learning network.

Optionally, the determining the image comprehensive loss according to the image basic loss includes: determining the image basic loss as the image comprehensive loss; or, determining the image supplement loss by using a preset loss function; and calculating a weighted sum of the image fundamental loss and the image supplementary loss, and determining the weighted sum as the image comprehensive loss.

Optionally, the truth-value image is a truth-value heat map; the predicted image is a prediction heat image; the obtaining of the true value image and the prediction image corresponding to the true value image output by the deep learning network include: acquiring a plurality of truth value heat maps corresponding to the same image and a prediction heat map corresponding to each truth value heat map output by a deep learning network; or acquiring a plurality of true value heat maps corresponding to each image in the plurality of images and a prediction heat map corresponding to each true value heat map output by the deep learning network.

Optionally, if multiple true-value heat maps corresponding to the same image and a prediction heat map corresponding to each true-value heat map output by the deep learning network are obtained, calculating a pixel mean error between the true-value image and the prediction image, including: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; and calculating the mean value of all pixel mean value errors to obtain the mean pixel mean value error between each true value heat map and the corresponding prediction heat map.

Optionally, if a plurality of true heat maps corresponding to each image in the plurality of images and a prediction heat map corresponding to each true heat map output by the deep learning network are obtained, calculating a pixel mean error between the true image and the predicted image includes: summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map; summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map; calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map; respectively calculating the mean value of pixel mean errors between each true value heat map corresponding to each image and the corresponding prediction heat map; and carrying out average value calculation on the average value of the pixel average value errors corresponding to each image to obtain the average pixel average value error between each true value heat map and the corresponding prediction heat map.

Optionally, determining the image fundamental loss by using a preset exponential loss function, including:

The present embodiment provides a storage medium. The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When executed by one or more processors, the one or more programs in the storage medium implement the method for training a deep learning network as described above.

Specifically, the processor is configured to execute a training program of the deep learning network stored in the memory to implement the following steps: acquiring a true value image and a predicted image corresponding to the true value image output by a deep learning network; calculating the pixel mean error between the true value image and the predicted image; determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image; and determining the comprehensive loss of the image according to the basic loss of the image, wherein the comprehensive loss of the image is used for training the deep learning network.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A training method of a deep learning network is characterized by comprising the following steps:

acquiring a true value image and a predicted image corresponding to the true value image output by a deep learning network; wherein the truth-value image is a truth-value heat map; the predicted image is a prediction heat image;

calculating the pixel mean error between the true value image and the predicted image; wherein the pixel mean error is: the error of the pixel mean value of the true value image and the pixel mean value of the predicted image; the error of the pixel mean value is the square of the difference value of the pixel mean value of the true value image and the pixel mean value of the corresponding predicted image;

determining the basic loss of the image by using a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image;

determining the comprehensive loss of the image according to the basic loss of the image, wherein the comprehensive loss of the image is used for training the deep learning network;

wherein, the determining the image basic loss by using the preset exponential loss function comprises:

HM_EMVD_loss＝e^(DIFF/α)-1；

2. The method of claim 1, wherein determining an image integration loss from the image basis loss comprises:

determining the image basic loss as the image comprehensive loss; or,

determining the image supplement loss by using a preset loss function; and calculating a weighted sum of the image fundamental loss and the image supplementary loss, and determining the weighted sum as the image comprehensive loss.

3. The method according to claim 1 or 2,

the obtaining of the true value image and the prediction image corresponding to the true value image output by the deep learning network include:

acquiring a plurality of truth value heat maps corresponding to the same image and a prediction heat map corresponding to each truth value heat map output by a deep learning network; or,

and acquiring a plurality of true value heat maps corresponding to each image in the plurality of images and a prediction heat map corresponding to each true value heat map output by the deep learning network.

4. The method according to claim 3, wherein if a plurality of true heat maps corresponding to the same image and a prediction heat map corresponding to each true heat map output by the deep learning network are obtained, the calculating a pixel mean error between the true image and the predicted image comprises:

summing and averaging all pixel points in each truth value heat map to obtain a pixel mean value of each truth value heat map;

summing and averaging all pixel points in each prediction heat map to obtain a pixel mean value of each prediction heat map;

calculating pixel mean errors between the corresponding true-value heat map and the corresponding prediction heat map according to the pixel mean of each true-value heat map and the pixel mean of each prediction heat map;

and calculating the mean value of all pixel mean value errors to obtain the mean pixel mean value error between each true value heat map and the corresponding prediction heat map.

5. The method according to claim 3, wherein if a plurality of true heat maps corresponding to each image in a plurality of images and a prediction heat map corresponding to each true heat map output by a deep learning network are obtained, the calculating a pixel mean error between the true image and the prediction image comprises:

respectively calculating the mean value of pixel mean errors between each true value heat map corresponding to each image and the corresponding prediction heat map;

and carrying out average value calculation on the average value of the pixel average value errors corresponding to each image to obtain the average pixel average value error between each true value heat map and the corresponding prediction heat map.

6. A training device of a deep learning network is characterized in that,

the acquisition module is used for acquiring a true value image and a predicted image corresponding to the true value image output by the deep learning network; wherein the truth-value image is a truth-value heat map; the predicted image is a prediction heat image;

the calculating module is used for calculating the pixel mean error between the true value image and the predicted image; wherein the pixel mean error is: the error of the pixel mean value of the true value image and the pixel mean value of the predicted image; the error of the pixel mean value is the square of the difference value of the pixel mean value of the true value image and the pixel mean value of the corresponding predicted image;

the first determining module is used for determining the basic loss of the image by utilizing a preset exponential loss function; the exponential power of the exponential-loss function comprises: a pixel mean error between the true image and the predicted image;

the second determination module is used for determining the comprehensive loss of the image according to the basic loss of the image, and the comprehensive loss of the image is used for training the deep learning network;

HM_EMVD_loss＝e^(DIFF/α)-1；

7. The apparatus of claim 6, wherein the first determining module is further configured to:

determining the image basic loss as the image comprehensive loss; or,

8. The apparatus according to claim 6 or 7,

the obtaining module is further configured to:

9. The apparatus of claim 8, wherein if the obtaining module obtains a plurality of true-value heat maps corresponding to a same image and a predicted heat map corresponding to each true-value heat map output by the deep learning network, the calculating module is further configured to:

10. The apparatus of claim 8, wherein if the obtaining module obtains a plurality of true heat maps corresponding to each of a plurality of images and a predicted heat map corresponding to each of the true heat maps output by the deep learning network, the calculating module is further configured to:

11. The training device of the deep learning network is characterized by comprising a processor and a memory; the processor is used for executing a training program of the deep learning network stored in the memory so as to realize the training method of the deep learning network of any one of claims 1-5.

12. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the method of training a deep learning network of any one of claims 1 to 5.