CN111191782A - Convolutional network training method and device - Google Patents

Convolutional network training method and device Download PDF

Info

Publication number
CN111191782A
CN111191782A CN201811351684.4A CN201811351684A CN111191782A CN 111191782 A CN111191782 A CN 111191782A CN 201811351684 A CN201811351684 A CN 201811351684A CN 111191782 A CN111191782 A CN 111191782A
Authority
CN
China
Prior art keywords
value
loss
class
batch
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811351684.4A
Other languages
Chinese (zh)
Inventor
侯国梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Information Technology Co Ltd
Original Assignee
Potevio Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Potevio Information Technology Co Ltd filed Critical Potevio Information Technology Co Ltd
Priority to CN201811351684.4A priority Critical patent/CN111191782A/en
Publication of CN111191782A publication Critical patent/CN111191782A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a convolutional network training method and a convolutional network training device, wherein the method comprises the following steps: for each batch of mini-batch of training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesiveness loss value based on a cohesiveness principle; and correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures. By adopting the method and the device, the problem that the training performance is sharply reduced when the data scale is large can be effectively solved.

Description

Convolutional network training method and device
Technical Field
The present invention relates to mobile communication technologies, and in particular, to a convolutional network training method and apparatus.
Background
In recent years, the convolution deep learning technology gradually plays an important role in various projects. In summary, in addition to reinforcement learning and multi-network models represented by generative countermeasure networks (GANs), simple end-to-end deep convolutional networks mainly solve three problems in the visual sense: classification problems, regression problems and feature-based distance similarity problems.
The distance similarity of the deep convolutional network is used for solving the problem of similarity between images, such as face recognition, image searching and the like, even the classification problem is indirectly achieved through a threshold value.
The existing network training method generally utilizes convolution to extract features, designs a bottleeck layer in a mode of constructing a positive sample pair and a negative sample pair, and realizes that the positive samples can be close to the whole network and the negative samples can be far away from the deep convolution network with certain similarity recognition capability through different distance algorithms.
In the process of implementing the invention, the inventor finds that: various algorithms of the current network training scheme mainly consider that distance similarity distinguishing can be realized in the design process, but do not consider adopting the similar approach mode as much as possible. In this way, although the existing network training scheme has a good data prediction effect on a certain scale, when the data is increased to a certain scale and the vectors are densely distributed in the dimensional space, the degree of distinction between the vectors is rapidly reduced due to the spatial saturation, so that the performance is rapidly reduced.
For example, assuming that the sample output is a two-dimensional vector (x1, x2), a block diagram of a normally trained distance estimation implementation is shown in fig. 1 by using the existing network training method. The mapping of the corresponding vector to a two-dimensional space is schematically illustrated in fig. 2. As can be seen from the above figures, although the trained deep convolutional network can correctly distinguish between positive and negative samples and also has a good effect on similar picture aggregation, when the number of pictures increases during inference, although there is a distinction degree, it cannot make (gf), (ab), and (cd) converge as much as possible, and if e is incorrectly marked in the class of gf, it may seriously affect the performance of the scheme (the egf is subjected to a gather aggregation operation by using the above conventional network training method).
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a convolutional network training method and apparatus, which can solve the problem of a steep decline of training performance when the data size is large.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a convolutional network training method, comprising:
for each batch of mini-batch of training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesiveness loss value based on a cohesiveness principle;
and correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
Preferably, said determining the corresponding value of loss of cohesion based on the principle of cohesion comprises:
for each class c corresponding to the current mini-batch training sample, according to the class c characteristic vector corresponding to the mini-batch, the mean value of the current class c characteristic vector is calculated
Figure BDA0001865026100000021
And the standard deviation of the current class c eigenvector
Figure BDA0001865026100000022
Respectively updating;
using the updated
Figure BDA0001865026100000023
And said
Figure BDA0001865026100000024
According to
Figure BDA0001865026100000025
Figure BDA0001865026100000026
Calculating a loss correction value
Figure BDA0001865026100000027
Wherein the content of the first and second substances,
Figure BDA0001865026100000028
the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |2 A 2 norm representing the vector x therein,max () represents that the operation of taking a larger value as a result is performed on all dimensions of the vector respectively β is a preset scaling factor;
using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to
Figure BDA0001865026100000031
Removing abnormal value, and removing abnormal value
Figure BDA0001865026100000032
According to
Figure BDA0001865026100000033
Taking the larger value as the result; wherein M is the total number of classes;
obtained according to said operation taking the larger value as a result
Figure BDA0001865026100000034
According to
Figure BDA0001865026100000035
Figure BDA0001865026100000036
Obtaining said loss of cohesion value lossσ
Preferably, the mean value of the current class c feature vector is used
Figure BDA0001865026100000037
The updating comprises the following steps:
if it is currently said
Figure BDA0001865026100000038
Is an initial value of 0, the value is added
Figure BDA0001865026100000039
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure BDA00018650261000000310
If not, according to
Figure BDA00018650261000000311
Is updated
Figure BDA00018650261000000312
Wherein the content of the first and second substances,
Figure BDA00018650261000000313
α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation
Figure BDA00018650261000000314
To mean the class c eigenvectors before update, left side of equation
Figure BDA00018650261000000315
Is the mean value of the updated class c feature vector.
Preferably, the standard deviation of the current class c feature vector
Figure BDA00018650261000000316
The updating comprises the following steps:
if it is currently said
Figure BDA00018650261000000317
Is an initial value of 0, the value is added
Figure BDA00018650261000000318
Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure BDA00018650261000000319
If not, according to
Figure BDA00018650261000000320
Is updated
Figure BDA00018650261000000321
Wherein the content of the first and second substances,
Figure BDA00018650261000000322
α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation
Figure BDA00018650261000000323
To update the standard deviation of the class c eigenvectors before updating, left side of the equation
Figure BDA00018650261000000324
Is the standard deviation of the updated class c eigenvector.
Preferably, the correcting the loss value according to the cohesive loss value comprises:
according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss isσThe value for the loss of cohesion is stated.
A convolutional network training apparatus, comprising:
the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding cohesive loss value based on a cohesive principle;
and the second unit is used for correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
Preferably, the first unit is configured to, for each class c corresponding to the current mini-batch training sample, average the current class c feature vector according to the class c feature vector corresponding to the current mini-batch
Figure BDA0001865026100000041
And the standard deviation of the current class c eigenvector
Figure BDA0001865026100000042
Respectively updating; using the updated
Figure BDA0001865026100000043
And said
Figure BDA0001865026100000044
According to
Figure BDA0001865026100000045
Calculating a loss correction value
Figure BDA0001865026100000046
Wherein the content of the first and second substances,
Figure BDA0001865026100000047
the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |22 norm representing the vector of the vector, max () representing the operation of taking a larger value as a result on all dimensions of the vector, β being a preset scale factor, and using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to the ratio
Figure BDA0001865026100000048
Removing abnormal value, and removing abnormal value
Figure BDA0001865026100000049
According to
Figure BDA00018650261000000410
Taking the larger value as the result; wherein M is the total number of classes; obtained according to said operation taking the larger value as a result
Figure BDA00018650261000000411
According to
Figure BDA00018650261000000412
Figure BDA00018650261000000413
Obtaining said loss of cohesion value lossσ
Preferably, the first unit is configured to average the current class c feature vector
Figure BDA00018650261000000414
When updating, if the current
Figure BDA00018650261000000415
Is an initial value of 0, the value is added
Figure BDA00018650261000000416
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure BDA00018650261000000417
If not, according to
Figure BDA00018650261000000418
Is updated
Figure BDA00018650261000000419
Wherein the content of the first and second substances,
Figure BDA00018650261000000420
α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation
Figure BDA0001865026100000051
To mean the class c eigenvectors before update, left side of equation
Figure BDA0001865026100000052
Is the mean value of the updated class c feature vector.
Preferably, the first unit is configured to determine a standard deviation of the current class c feature vector
Figure BDA0001865026100000053
When updating, if the current
Figure BDA0001865026100000054
Is an initial value of 0, the value is added
Figure BDA0001865026100000055
Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure BDA0001865026100000057
If not, according to
Figure BDA0001865026100000056
Is updated
Figure BDA0001865026100000058
Wherein the content of the first and second substances,
Figure BDA0001865026100000059
α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation
Figure BDA00018650261000000510
To update the standard deviation of the class c eigenvectors before updating, left side of the equation
Figure BDA00018650261000000511
Is the standard deviation of the updated class c eigenvector.
Preferably, the second unit is configured to obtain a loss ═ loss + lossσCorrecting the loss value loss to obtain the corrected lossThe value loss', wherein lossσThe value for the loss of cohesion is stated.
A convolutional network training apparatus, comprising:
a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the method of any of the above.
In summary, the convolutional network training method and apparatus provided by the present invention obtain the cohesive loss value based on the cohesive principle, correct the conventional loss value obtained by the existing method by using the cohesive loss value, and perform back propagation adjustment on the parameters of the deep convolutional network by using the corrected loss value. Therefore, the features output by the deep convolutional network of the same category can be effectively shrunk as much as possible, the same category can be gathered to the greatest extent, the larger the data volume is, the larger the distinguishable region provided by the features is, and the more effective similarity matching can be carried out, so that the problem that the training performance is sharply reduced when the data scale is larger can be effectively solved.
Drawings
FIG. 1 is a block diagram illustrating a normally trained distance estimation implementation when a sample is output as a two-dimensional vector (x1, x2) based on a conventional network training method;
FIG. 2 is a schematic diagram of the mapping of a two-dimensional vector corresponding to FIG. 1 to a two-dimensional space;
FIG. 3 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 3 is a schematic flow chart of a method according to an embodiment of the present invention, and as shown in fig. 3, the convolutional network training method implemented by the embodiment mainly includes:
step 301, for each batch of mini-batch training pictures (mini-batch), after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesive loss value based on a cohesive principle.
It should be noted here that in the prior art, the corresponding loss value is calculated only from the feature vector, and is not corrected any more. The difference between the step and the existing method is that the cohesiveness loss value needs to be determined based on the cohesion principle, so that the loss value obtained by the conventional method is further corrected in the subsequent steps, the features output by the deep convolution network of the same category are shrunk as much as possible, the larger the data volume is, the larger the distinguishable degree area provided by the features is, the more effective similarity matching can be performed, and the problem that the training performance is sharply reduced when the data scale is larger can be effectively solved.
In this step, the corresponding loss value may be calculated according to the feature vector by using the existing method, so as to obtain the conventional loss value, which is not described herein again.
In this embodiment, in order to improve training efficiency and reduce the overhead of computational resources, the mini-batch is used as a basic processing unit for training, that is, a loss value and a cohesive loss value are calculated for each mini-batch.
Preferably, the corresponding value of the loss of cohesion can be determined on the basis of the principle of cohesion, in particular by the following method:
step x1, for each class c corresponding to the training sample of the mini-batch at present, according to the class c feature vector corresponding to the mini-batch at present, carrying out average value treatment on the current class c feature vector
Figure BDA0001865026100000061
And the standard deviation of the current class c eigenvector
Figure BDA0001865026100000071
And respectively updating.
Preferably, the following method can be adopted for the present inventionMean of class c eigenvectors
Figure BDA0001865026100000072
Updating:
if it is currently said
Figure BDA0001865026100000073
Is an initial value of 0, the value is added
Figure BDA0001865026100000074
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure BDA0001865026100000075
If not, according to
Figure BDA0001865026100000076
Is updated
Figure BDA0001865026100000077
Wherein the content of the first and second substances,
Figure BDA0001865026100000078
α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation
Figure BDA0001865026100000079
To mean the class c eigenvectors before update, left side of equation
Figure BDA00018650261000000710
Is the mean value of the updated class c feature vector.
As can be seen from the above method, the updated
Figure BDA00018650261000000711
Is the modified mean of the past mini-batch vector weighted at α.
Preferably, canThe standard deviation of the current class c eigenvector is calculated by the following method
Figure BDA00018650261000000712
Updating:
if it is currently said
Figure BDA00018650261000000713
Is an initial value of 0, the value is added
Figure BDA00018650261000000714
Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure BDA00018650261000000715
If not, according to
Figure BDA00018650261000000716
Is updated
Figure BDA00018650261000000717
Wherein the content of the first and second substances,
Figure BDA00018650261000000718
α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation
Figure BDA00018650261000000719
To update the standard deviation of the class c eigenvectors before updating, left side of the equation
Figure BDA00018650261000000720
Is the standard deviation of the updated class c eigenvector.
By the above standard deviation
Figure BDA00018650261000000721
The updating method can be seen, so that the updated state can be seen
Figure BDA00018650261000000722
Is the corrected standard deviation of the past mini-batch vector weighted at α.
Step x2, utilizing the updated
Figure BDA00018650261000000723
And said
Figure BDA00018650261000000724
According to
Figure BDA00018650261000000725
Figure BDA00018650261000000726
Calculating a loss correction value
Figure BDA00018650261000000727
Wherein the content of the first and second substances,
Figure BDA00018650261000000728
the feature vector is the ith feature vector of the c-th class in the mini-batch.
N is the number of the feature vectors of the c-th class participating in training in the mini-batch.
‖*‖22 norm representing the vector x thereto; II |2Is defined as
Figure BDA0001865026100000081
The physical meaning is the hypersphere where the vector is.
max () represents that the operation resulting from the larger value is performed on all dimensions of the vector, respectively.
β is a preset scale factor, specifically, those skilled in the art can set a suitable scale value according to actual needs by referring to a standard normal distribution table, when β is 1,
Figure BDA0001865026100000082
coverage is approximateAt 31.7%, when β is 2,
Figure BDA0001865026100000083
the coverage ratio was 4.5%, when β was 3
Figure BDA0001865026100000084
The coverage ratio was 0.27%.
Step x3, using the loss correction value of all current classes and the mean value of the feature vectors of all classes according to
Figure BDA0001865026100000085
Removing abnormal value, and removing abnormal value
Figure BDA0001865026100000086
According to
Figure BDA0001865026100000087
Carrying out maxout operation; where M is the total number of classes.
In the above-mentioned step, according to
Figure BDA0001865026100000088
Computing
Figure BDA0001865026100000089
Here, because
Figure BDA00018650261000000810
And
Figure BDA00018650261000000811
all have historical weight terms, and
Figure BDA00018650261000000812
more depends on the processing of the present mini-batch, so that the negative value condition may exist, and the processing needs to be performed
Figure BDA00018650261000000813
Carry out maxout operations, i.e.
Figure BDA00018650261000000814
So as to remove the influence of abnormal values and ensure the accuracy of training.
Step x4, obtained after operation according to maxout
Figure BDA00018650261000000815
According to
Figure BDA00018650261000000816
Figure BDA00018650261000000817
Obtaining said loss of cohesion value lossσ
In the method, in consideration of the limits of operation speed, memory, video memory and the like in engineering training, a mini-batch is generally used for updating the whole, so that all the mean value variance tables can be supplemented through multiple rounds of mini-batch, and the mean value variance tables gradually shift to reasonable space vector positions in the training process in a historical weighting mode. Step x2 is a process for the case of poor labeling. For example, because an inevitable hundred percent of labels are correct, there may be a face label of a as a face label of B, and this situation is a wrong label or a face with poor quality, which may cause the mapped face vector to be far from the average center, and the distance is also reasonable, but forcing it to be added to loss and training is equivalent to forcing photos of different people caused by labeling errors to be gathered together, which may seriously affect the overall performance of the model. In step x3
Figure BDA0001865026100000091
The method is used for removing the parts which are free outside the main area, and the influence caused by special conditions is reduced as much as possible.
And 302, correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
Preferably, the loss value can be corrected according to the cohesive loss value by the following method:
according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss isσThe value for the loss of cohesion is stated.
The back propagation adjustment in this step can be implemented by using the prior art, and the specific method is known by those skilled in the art and is not described herein again.
Fig. 4 is a schematic structural diagram of a convolutional network training apparatus corresponding to the above method embodiment, as shown in fig. 4, the apparatus includes:
the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding cohesive loss value based on a cohesive principle;
and the second unit is used for correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
Preferably, the first unit is configured to, for each class c corresponding to the current mini-batch training sample, average the current class c feature vector according to the class c feature vector corresponding to the current mini-batch
Figure BDA0001865026100000092
And the standard deviation of the current class c eigenvector
Figure BDA0001865026100000093
Respectively updating; using the updated
Figure BDA0001865026100000094
And said
Figure BDA0001865026100000095
According to
Figure BDA0001865026100000096
Calculating a loss correction value
Figure BDA0001865026100000097
Wherein the content of the first and second substances,
Figure BDA0001865026100000098
the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |22 norm representing the vector of the vector, max () representing the operation of taking a larger value as a result on all dimensions of the vector, β being a preset scale factor, and using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to the ratio
Figure BDA0001865026100000101
Removing abnormal value, and removing abnormal value
Figure BDA0001865026100000102
According to
Figure BDA0001865026100000103
Taking the larger value as the result; wherein M is the total number of classes; obtained according to said operation taking the larger value as a result
Figure BDA0001865026100000106
According to
Figure BDA0001865026100000104
Figure BDA0001865026100000105
Obtaining said loss of cohesion value lossσ
Preferably, the first unit is configured to average the current class c feature vector
Figure BDA0001865026100000107
When updating, if the current
Figure BDA0001865026100000108
Is an initial value of 0, the value is added
Figure BDA0001865026100000109
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure BDA00018650261000001010
If not, according to
Figure BDA00018650261000001011
Is updated
Figure BDA00018650261000001012
Wherein the content of the first and second substances,
Figure BDA00018650261000001013
α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation
Figure BDA00018650261000001014
To mean the class c eigenvectors before update, left side of equation
Figure BDA00018650261000001015
Is the mean value of the updated class c feature vector.
Preferably, the first unit is configured to determine a standard deviation of the current class c feature vector
Figure BDA00018650261000001016
When updating, if the current
Figure BDA00018650261000001017
Is an initial value of 0, the value is added
Figure BDA00018650261000001018
Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure BDA00018650261000001019
If not, according to
Figure BDA00018650261000001020
Is updated
Figure BDA00018650261000001022
Wherein the content of the first and second substances,
Figure BDA00018650261000001021
α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation
Figure BDA00018650261000001023
To update the standard deviation of the class c eigenvectors before updating, left side of the equation
Figure BDA00018650261000001024
Is the standard deviation of the updated class c eigenvector.
Preferably, the second unit is configured to obtain a loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss isσThe value for the loss of cohesion is stated.
The invention also provides an embodiment of a convolutional network training device, which comprises:
a memory; and a processor coupled to the memory, the processor configured to perform any of the method embodiments described above based on instructions stored in the memory.
Accordingly, the present invention further provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out any of the above-mentioned method embodiments.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A convolutional network training method, comprising:
for each batch of mini-batch of training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding cohesiveness loss value based on a cohesiveness principle;
and correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
2. The method of claim 1, wherein determining the respective cohesion loss value based on a cohesion principle comprises:
for each class c corresponding to the current mini-batch training sample, according to the class c characteristic vector corresponding to the mini-batch, the mean value of the current class c characteristic vector is calculated
Figure FDA0001865026090000011
And the standard deviation of the current class c eigenvector
Figure FDA0001865026090000012
Respectively updating;
using the updated
Figure FDA0001865026090000013
And said
Figure FDA0001865026090000014
According to
Figure FDA0001865026090000015
Figure FDA0001865026090000016
According to, calculating the loss correction value
Figure FDA0001865026090000017
Wherein the content of the first and second substances,
Figure FDA0001865026090000018
the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |2The | x | represents the 2 norm of the vector in the vector, max () represents the operation of taking the larger value as the result is respectively executed on all the dimensions of the vector, β is a preset proportionality coefficient;
using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to
Figure FDA0001865026090000019
Removing abnormal value, and removing abnormal value
Figure FDA00018650260900000110
According to
Figure FDA00018650260900000111
Taking the larger value as the result; wherein M is the total number of classes;
obtained according to said operation taking the larger value as a result
Figure FDA00018650260900000112
According to
Figure FDA00018650260900000113
Figure FDA00018650260900000114
Obtaining said loss of cohesion value lossσ
3. The method of claim 2, wherein the mean value of the class c current feature vector is determined
Figure FDA00018650260900000210
The updating comprises the following steps:
if it is currently said
Figure FDA00018650260900000211
Is an initial value of 0, the value is added
Figure FDA00018650260900000213
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure FDA00018650260900000212
If not, according to
Figure FDA0001865026090000021
Is updated
Figure FDA0001865026090000022
Wherein the content of the first and second substances,
Figure FDA0001865026090000023
α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation
Figure FDA0001865026090000024
To mean the class c eigenvectors before update, left side of equation
Figure FDA0001865026090000025
Is the mean value of the updated class c feature vector.
4. The method of claim 2, wherein the standard deviation of the current class c eigenvector
Figure FDA0001865026090000026
The updating comprises the following steps:
if it is currently said
Figure FDA00018650260900000214
Is an initial value of 0, the value is added
Figure FDA00018650260900000215
Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure FDA00018650260900000216
If not, according to
Figure FDA0001865026090000027
Is updated
Figure FDA0001865026090000028
Wherein the content of the first and second substances,
Figure FDA0001865026090000029
α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation
Figure FDA00018650260900000218
To update the standard deviation of the class c eigenvectors before updating, left side of the equation
Figure FDA00018650260900000217
For updated class c characteristicStandard deviation of the amounts.
5. The method of claim 1, wherein said correcting said loss value based on said cohesive loss value comprises:
according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss isσThe value for the loss of cohesion is stated.
6. A convolutional network training apparatus, comprising:
the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding cohesive loss value based on a cohesive principle;
and the second unit is used for correcting the loss value according to the cohesive loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
7. The apparatus of claim 6, wherein the first unit is configured to, for each class c corresponding to the training sample of the mini-batch currently, average the current class c eigenvector according to the class c eigenvector corresponding to the mini-batch currently
Figure FDA0001865026090000031
And the standard deviation of the current class c eigenvector
Figure FDA0001865026090000032
Respectively updating; using the updated
Figure FDA0001865026090000033
And said
Figure FDA0001865026090000034
According to
Figure FDA0001865026090000035
Figure FDA0001865026090000036
Calculating a loss correction value
Figure FDA0001865026090000037
Wherein the content of the first and second substances,
Figure FDA0001865026090000038
the number of the ith feature vector of the c-th class in the mini-batch is N, and the number of the feature vectors of the c-th class participating in training in the mini-batch is N; II |2Representing the 2 norm of the vector in the vector, max () representing the operation of taking a larger value as a result to be respectively executed on all dimensions of the vector, β being a preset proportionality coefficient;
using the loss correction values of all current classes and the mean value of the feature vectors of all classes according to
Figure FDA0001865026090000039
Removing abnormal value, and removing abnormal value
Figure FDA00018650260900000310
According to
Figure FDA00018650260900000311
Taking the larger value as the result; wherein M is the total number of classes; obtained according to said operation taking the larger value as a result
Figure FDA00018650260900000312
According to
Figure FDA00018650260900000313
To obtain said cohesionLoss of sexual function value lossσ
8. The apparatus of claim 7, wherein the first means is configured to average the current class c eigenvector
Figure FDA00018650260900000314
When updating, if the current
Figure FDA00018650260900000315
Is an initial value of 0, the value is added
Figure FDA00018650260900000316
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure FDA00018650260900000317
If not, according to
Figure FDA00018650260900000318
Is updated
Figure FDA00018650260900000319
Wherein the content of the first and second substances,
Figure FDA00018650260900000320
α is the preset weight coefficient of the mean value of the class c characteristic vector participating in training in the mini-batch, 0 is more than or equal to α is less than or equal to 1, and the right side of the equation
Figure FDA00018650260900000321
To mean the class c eigenvectors before update, left side of equation
Figure FDA00018650260900000322
Is the mean value of the updated class c feature vector.
9. The apparatus of claim 7, wherein the first unit is configured to determine a standard deviation of the current class c eigenvector
Figure FDA0001865026090000041
When updating, if the current
Figure FDA0001865026090000042
Is an initial value of 0, the value is added
Figure FDA0001865026090000043
Updating the standard deviation of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure FDA0001865026090000044
If not, according to
Figure FDA0001865026090000045
Is updated
Figure FDA0001865026090000046
Wherein the content of the first and second substances,
Figure FDA0001865026090000047
α is the standard deviation of the c-th class feature vector participating in training in the mini-batch, is a preset weight coefficient, is more than or equal to 0 and less than or equal to α and less than or equal to 1, and is the right side of the equation
Figure FDA0001865026090000048
To update the standard deviation of the class c eigenvectors before updating, left side of the equation
Figure FDA0001865026090000049
Is the standard deviation of the updated class c eigenvector.
10. The method of claim 6Means characterized by said second unit for determining a loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss isσThe value for the loss of cohesion is stated.
11. A convolutional network training apparatus, comprising:
a memory; and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-5 based on instructions stored in the memory.
12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-5.
CN201811351684.4A 2018-11-14 2018-11-14 Convolutional network training method and device Withdrawn CN111191782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811351684.4A CN111191782A (en) 2018-11-14 2018-11-14 Convolutional network training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811351684.4A CN111191782A (en) 2018-11-14 2018-11-14 Convolutional network training method and device

Publications (1)

Publication Number Publication Date
CN111191782A true CN111191782A (en) 2020-05-22

Family

ID=70708905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811351684.4A Withdrawn CN111191782A (en) 2018-11-14 2018-11-14 Convolutional network training method and device

Country Status (1)

Country Link
CN (1) CN111191782A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958257A (en) * 2017-10-11 2018-04-24 华南理工大学 A kind of Chinese traditional medicinal materials recognition method based on deep neural network
CN108985135A (en) * 2017-06-02 2018-12-11 腾讯科技(深圳)有限公司 A kind of human-face detector training method, device and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985135A (en) * 2017-06-02 2018-12-11 腾讯科技(深圳)有限公司 A kind of human-face detector training method, device and electronic equipment
CN107958257A (en) * 2017-10-11 2018-04-24 华南理工大学 A kind of Chinese traditional medicinal materials recognition method based on deep neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANDONG WEN ETC.: ""A Discriminative Feature Learning Approach for Deep Face Recognition"" *

Similar Documents

Publication Publication Date Title
TWI794157B (en) Automatic multi-threshold feature filtering method and device
CN110147744B (en) Face image quality assessment method, device and terminal
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
WO2019024808A1 (en) Training method and apparatus for semantic segmentation model, electronic device and storage medium
WO2023040510A1 (en) Image anomaly detection model training method and apparatus, and image anomaly detection method and apparatus
CN108846855B (en) Target tracking method and device
CN110765882B (en) Video tag determination method, device, server and storage medium
US20230014448A1 (en) Methods for handling occlusion in augmented reality applications using memory and device tracking and related apparatus
CN108491872B (en) Object re-recognition method and apparatus, electronic device, program, and storage medium
CN112614110B (en) Method and device for evaluating image quality and terminal equipment
CN106651812B (en) A kind of multichannel PSF scaling methods of simple lens imaging
CN110647916A (en) Pornographic picture identification method and device based on convolutional neural network
CN110661727A (en) Data transmission optimization method and device, computer equipment and storage medium
CN111383250A (en) Moving target detection method and device based on improved Gaussian mixture model
CN111160229A (en) Video target detection method and device based on SSD (solid State disk) network
CN111583146A (en) Face image deblurring method based on improved multi-scale circulation network
CN107564013B (en) Scene segmentation correction method and system fusing local information
CN111860054A (en) Convolutional network training method and device
CN110060290B (en) Binocular parallax calculation method based on 3D convolutional neural network
CN111191782A (en) Convolutional network training method and device
CN114758130B (en) Image processing and model training method, device, equipment and storage medium
CN116152612A (en) Long-tail image recognition method and related device
CN113326832B (en) Model training method, image processing method, electronic device, and storage medium
CN112200730B (en) Image filtering processing method, device, equipment and storage medium
US9679363B1 (en) System and method for reducing image noise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200522