CN111191782A

CN111191782A - Convolutional network training method and device

Info

Publication number: CN111191782A
Application number: CN201811351684.4A
Authority: CN
Inventors: 侯国梁
Original assignee: Potevio Information Technology Co Ltd
Current assignee: Potevio Information Technology Co Ltd
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2020-05-22

Abstract

The application discloses a convolutional network training method and a convolutional network training device, wherein the method comprises the following steps: for each batch of mini-batch of training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesiveness loss value based on a cohesiveness principle; and correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures. By adopting the method and the device, the problem that the training performance is sharply reduced when the data scale is large can be effectively solved.

Description

Convolutional network training method and device

Technical Field

The present invention relates to mobile communication technologies, and in particular, to a convolutional network training method and apparatus.

Background

In recent years, the convolution deep learning technology gradually plays an important role in various projects. In summary, in addition to reinforcement learning and multi-network models represented by generative countermeasure networks (GANs), simple end-to-end deep convolutional networks mainly solve three problems in the visual sense: classification problems, regression problems and feature-based distance similarity problems.

The distance similarity of the deep convolutional network is used for solving the problem of similarity between images, such as face recognition, image searching and the like, even the classification problem is indirectly achieved through a threshold value.

The existing network training method generally utilizes convolution to extract features, designs a bottleeck layer in a mode of constructing a positive sample pair and a negative sample pair, and realizes that the positive samples can be close to the whole network and the negative samples can be far away from the deep convolution network with certain similarity recognition capability through different distance algorithms.

In the process of implementing the invention, the inventor finds that: various algorithms of the current network training scheme mainly consider that distance similarity distinguishing can be realized in the design process, but do not consider adopting the similar approach mode as much as possible. In this way, although the existing network training scheme has a good data prediction effect on a certain scale, when the data is increased to a certain scale and the vectors are densely distributed in the dimensional space, the degree of distinction between the vectors is rapidly reduced due to the spatial saturation, so that the performance is rapidly reduced.

For example, assuming that the sample output is a two-dimensional vector (x1, x2), a block diagram of a normally trained distance estimation implementation is shown in fig. 1 by using the existing network training method. The mapping of the corresponding vector to a two-dimensional space is schematically illustrated in fig. 2. As can be seen from the above figures, although the trained deep convolutional network can correctly distinguish between positive and negative samples and also has a good effect on similar picture aggregation, when the number of pictures increases during inference, although there is a distinction degree, it cannot make (gf), (ab), and (cd) converge as much as possible, and if e is incorrectly marked in the class of gf, it may seriously affect the performance of the scheme (the egf is subjected to a gather aggregation operation by using the above conventional network training method).

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide a convolutional network training method and apparatus, which can solve the problem of a steep decline of training performance when the data size is large.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a convolutional network training method, comprising:

for each batch of mini-batch of training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesiveness loss value based on a cohesiveness principle;

and correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.

Preferably, said determining the corresponding value of loss of cohesion based on the principle of cohesion comprises:

for each class c corresponding to the current mini-batch training sample, according to the class c characteristic vector corresponding to the mini-batch, the mean value of the current class c characteristic vector is calculated

And the standard deviation of the current class c eigenvector

Respectively updating;

using the updated

And said

According to

Calculating a loss correction value

Wherein the content of the first and second substances,

the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |₂ A 2 norm representing the vector x therein,max () represents that the operation of taking a larger value as a result is performed on all dimensions of the vector respectively β is a preset scaling factor;

using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to

Removing abnormal value, and removing abnormal value

According to

Taking the larger value as the result; wherein M is the total number of classes;

obtained according to said operation taking the larger value as a result

According to

Obtaining said loss of cohesion value loss_σ。

Preferably, the mean value of the current class c feature vector is used

The updating comprises the following steps:

if it is currently said

Is an initial value of 0, the value is added

Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

Preferably, the standard deviation of the current class c feature vector

The updating comprises the following steps:

if it is currently said

Is an initial value of 0, the value is added

Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present;

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation

To update the standard deviation of the class c eigenvectors before updating, left side of the equation

Is the standard deviation of the updated class c eigenvector.

Preferably, the correcting the loss value according to the cohesive loss value comprises:

according to loss ═ loss + loss_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss is_σThe value for the loss of cohesion is stated.

A convolutional network training apparatus, comprising:

the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding cohesive loss value based on a cohesive principle;

and the second unit is used for correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.

Preferably, the first unit is configured to, for each class c corresponding to the current mini-batch training sample, average the current class c feature vector according to the class c feature vector corresponding to the current mini-batch

And the standard deviation of the current class c eigenvector

Respectively updating; using the updated

And said

According to

Calculating a loss correction value

Wherein the content of the first and second substances,

the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |₂2 norm representing the vector of the vector, max () representing the operation of taking a larger value as a result on all dimensions of the vector, β being a preset scale factor, and using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to the ratio

Removing abnormal value, and removing abnormal value

According to

Taking the larger value as the result; wherein M is the total number of classes; obtained according to said operation taking the larger value as a result

According to

Obtaining said loss of cohesion value loss_σ。

Preferably, the first unit is configured to average the current class c feature vector

When updating, if the current

Is an initial value of 0, the value is added

Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

Preferably, the first unit is configured to determine a standard deviation of the current class c feature vector

When updating, if the current

Is an initial value of 0, the value is added

Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

Is the standard deviation of the updated class c eigenvector.

Preferably, the second unit is configured to obtain a loss ═ loss + loss_σCorrecting the loss value loss to obtain the corrected lossThe value loss', wherein loss_σThe value for the loss of cohesion is stated.

A convolutional network training apparatus, comprising:

a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the method of any of the above.

In summary, the convolutional network training method and apparatus provided by the present invention obtain the cohesive loss value based on the cohesive principle, correct the conventional loss value obtained by the existing method by using the cohesive loss value, and perform back propagation adjustment on the parameters of the deep convolutional network by using the corrected loss value. Therefore, the features output by the deep convolutional network of the same category can be effectively shrunk as much as possible, the same category can be gathered to the greatest extent, the larger the data volume is, the larger the distinguishable region provided by the features is, and the more effective similarity matching can be carried out, so that the problem that the training performance is sharply reduced when the data scale is larger can be effectively solved.

Drawings

FIG. 1 is a block diagram illustrating a normally trained distance estimation implementation when a sample is output as a two-dimensional vector (x1, x2) based on a conventional network training method;

FIG. 2 is a schematic diagram of the mapping of a two-dimensional vector corresponding to FIG. 1 to a two-dimensional space;

FIG. 3 is a schematic flow chart of a method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Fig. 3 is a schematic flow chart of a method according to an embodiment of the present invention, and as shown in fig. 3, the convolutional network training method implemented by the embodiment mainly includes:

step 301, for each batch of mini-batch training pictures (mini-batch), after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesive loss value based on a cohesive principle.

It should be noted here that in the prior art, the corresponding loss value is calculated only from the feature vector, and is not corrected any more. The difference between the step and the existing method is that the cohesiveness loss value needs to be determined based on the cohesion principle, so that the loss value obtained by the conventional method is further corrected in the subsequent steps, the features output by the deep convolution network of the same category are shrunk as much as possible, the larger the data volume is, the larger the distinguishable degree area provided by the features is, the more effective similarity matching can be performed, and the problem that the training performance is sharply reduced when the data scale is larger can be effectively solved.

In this step, the corresponding loss value may be calculated according to the feature vector by using the existing method, so as to obtain the conventional loss value, which is not described herein again.

In this embodiment, in order to improve training efficiency and reduce the overhead of computational resources, the mini-batch is used as a basic processing unit for training, that is, a loss value and a cohesive loss value are calculated for each mini-batch.

Preferably, the corresponding value of the loss of cohesion can be determined on the basis of the principle of cohesion, in particular by the following method:

step x1, for each class c corresponding to the training sample of the mini-batch at present, according to the class c feature vector corresponding to the mini-batch at present, carrying out average value treatment on the current class c feature vector

And the standard deviation of the current class c eigenvector

And respectively updating.

Preferably, the following method can be adopted for the present inventionMean of class c eigenvectors

Updating:

if it is currently said

Is an initial value of 0, the value is added

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

As can be seen from the above method, the updated

Is the modified mean of the past mini-batch vector weighted at α.

Preferably, canThe standard deviation of the current class c eigenvector is calculated by the following method

Updating:

if it is currently said

Is an initial value of 0, the value is added

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

Is the standard deviation of the updated class c eigenvector.

By the above standard deviation

The updating method can be seen, so that the updated state can be seen

Is the corrected standard deviation of the past mini-batch vector weighted at α.

Step x2, utilizing the updated

And said

According to

Calculating a loss correction value

Wherein the content of the first and second substances,

the feature vector is the ith feature vector of the c-th class in the mini-batch.

N is the number of the feature vectors of the c-th class participating in training in the mini-batch.

‖*‖₂2 norm representing the vector x thereto; II |₂Is defined as

The physical meaning is the hypersphere where the vector is.

max () represents that the operation resulting from the larger value is performed on all dimensions of the vector, respectively.

β is a preset scale factor, specifically, those skilled in the art can set a suitable scale value according to actual needs by referring to a standard normal distribution table, when β is 1,

coverage is approximateAt 31.7%, when β is 2,

the coverage ratio was 4.5%, when β was 3

The coverage ratio was 0.27%.

Step x3, using the loss correction value of all current classes and the mean value of the feature vectors of all classes according to

Removing abnormal value, and removing abnormal value

According to

Carrying out maxout operation; where M is the total number of classes.

In the above-mentioned step, according to

Computing

Here, because

And

all have historical weight terms, and

more depends on the processing of the present mini-batch, so that the negative value condition may exist, and the processing needs to be performed

Carry out maxout operations, i.e.

So as to remove the influence of abnormal values and ensure the accuracy of training.

Step x4, obtained after operation according to maxout

According to

Obtaining said loss of cohesion value loss_σ。

In the method, in consideration of the limits of operation speed, memory, video memory and the like in engineering training, a mini-batch is generally used for updating the whole, so that all the mean value variance tables can be supplemented through multiple rounds of mini-batch, and the mean value variance tables gradually shift to reasonable space vector positions in the training process in a historical weighting mode. Step x2 is a process for the case of poor labeling. For example, because an inevitable hundred percent of labels are correct, there may be a face label of a as a face label of B, and this situation is a wrong label or a face with poor quality, which may cause the mapped face vector to be far from the average center, and the distance is also reasonable, but forcing it to be added to loss and training is equivalent to forcing photos of different people caused by labeling errors to be gathered together, which may seriously affect the overall performance of the model. In step x3

The method is used for removing the parts which are free outside the main area, and the influence caused by special conditions is reduced as much as possible.

And 302, correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.

Preferably, the loss value can be corrected according to the cohesive loss value by the following method:

The back propagation adjustment in this step can be implemented by using the prior art, and the specific method is known by those skilled in the art and is not described herein again.

Fig. 4 is a schematic structural diagram of a convolutional network training apparatus corresponding to the above method embodiment, as shown in fig. 4, the apparatus includes:

And the standard deviation of the current class c eigenvector

Respectively updating; using the updated

And said

According to

Calculating a loss correction value

Wherein the content of the first and second substances,

Removing abnormal value, and removing abnormal value

According to

According to

Obtaining said loss of cohesion value loss_σ。

When updating, if the current

Is an initial value of 0, the value is added

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

When updating, if the current

Is an initial value of 0, the value is added

If not, according to

Is updated

Wherein the content of the first and second substances,

Is the standard deviation of the updated class c eigenvector.

Preferably, the second unit is configured to obtain a loss ═ loss + loss_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss is_σThe value for the loss of cohesion is stated.

The invention also provides an embodiment of a convolutional network training device, which comprises:

a memory; and a processor coupled to the memory, the processor configured to perform any of the method embodiments described above based on instructions stored in the memory.

Accordingly, the present invention further provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out any of the above-mentioned method embodiments.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A convolutional network training method, comprising:

2. The method of claim 1, wherein determining the respective cohesion loss value based on a cohesion principle comprises:

And the standard deviation of the current class c eigenvector

Respectively updating;

using the updated

And said

According to

According to, calculating the loss correction value

Wherein the content of the first and second substances,

the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |₂The | x | represents the 2 norm of the vector in the vector, max () represents the operation of taking the larger value as the result is respectively executed on all the dimensions of the vector, β is a preset proportionality coefficient;

Removing abnormal value, and removing abnormal value

According to

obtained according to said operation taking the larger value as a result

According to

Obtaining said loss of cohesion value loss_σ。

3. The method of claim 2, wherein the mean value of the class c current feature vector is determined

The updating comprises the following steps:

if it is currently said

Is an initial value of 0, the value is added

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

4. The method of claim 2, wherein the standard deviation of the current class c eigenvector

The updating comprises the following steps:

if it is currently said

Is an initial value of 0, the value is added

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

For updated class c characteristicStandard deviation of the amounts.

5. The method of claim 1, wherein said correcting said loss value based on said cohesive loss value comprises:

6. A convolutional network training apparatus, comprising:

7. The apparatus of claim 6, wherein the first unit is configured to, for each class c corresponding to the training sample of the mini-batch currently, average the current class c eigenvector according to the class c eigenvector corresponding to the mini-batch currently

And the standard deviation of the current class c eigenvector

Respectively updating; using the updated

And said

According to

Calculating a loss correction value

Wherein the content of the first and second substances,

the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |₂Representing the 2 norm of the vector in the vector, max () representing the operation of taking a larger value as a result to be respectively executed on all dimensions of the vector, β being a preset proportionality coefficient;

Removing abnormal value, and removing abnormal value

According to

According to

To obtain said cohesionLoss of sexual function value loss_σ。

8. The apparatus of claim 7, wherein the first means is configured to average the current class c eigenvector

When updating, if the current

Is an initial value of 0, the value is added

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

9. The apparatus of claim 7, wherein the first unit is configured to determine a standard deviation of the current class c eigenvector

When updating, if the current

Is an initial value of 0, the value is added

If not, according to

Is updated

Wherein the content of the first and second substances,

Is the standard deviation of the updated class c eigenvector.

10. The method of claim 6Means characterized by said second unit for determining a loss + loss_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss is_σThe value for the loss of cohesion is stated.

11. A convolutional network training apparatus, comprising:

a memory; and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-5 based on instructions stored in the memory.

12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-5.