CN111626098A

CN111626098A - Method, device, equipment and medium for updating parameter values of model

Info

Publication number: CN111626098A
Application number: CN202010275896.XA
Authority: CN
Inventors: 姜慧明
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2020-09-04
Anticipated expiration: 2040-04-09
Also published as: CN111626098B

Abstract

The embodiment of the invention provides a method, a device, equipment and a medium for updating parameter values of a model, which comprises the following steps: obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained, wherein the preset model comprises a plurality of submodels, and each submodel is used for identifying the sample image; obtaining an identification result output after the sample image is respectively identified by the plurality of sub-models; weighting the recognition results output by the sub models according to the weights corresponding to the sub models to obtain the processed recognition results; determining loss differences between the processed recognition results and recognition results output by the sub models respectively; determining the overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of submodels; and updating the parameter values of the sub models respectively according to the overall loss value.

Description

Method, device, equipment and medium for updating parameter values of model

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a method, an apparatus, a device, and a medium for updating a parameter value of a model.

Background

Neural networks are learned in a variety of ways. Competitive learning refers to the right of all units in a network unit population to compete with each other for responses to external stimulus patterns. The connection rights of the competing winning cells change towards a more favorable competition for this stimulation pattern.

For image recognition problems, competitive learning can be typically employed to build the model. In this case, the competitive learning includes inter-class competition in the model parameter learning process, performance competition of output results of each submodel when multiple models are learned together, and the like.

In the related art, in the competitive learning process, one model to be trained may include n submodels, and competitive learning exists among the n submodels, generally, in order to improve the real-time performance of Inference (a process of inputting an untrained image into a trained model for testing), only the submodel with the best performance among the n submodels is usually selected when the model finally falls to the ground, and other submodels are discarded. Although the real-time performance of Inference can be improved more efficiently by this method, the actual performance of the selected optimal submodel is not good, and the accuracy and efficiency of image recognition by using the selected optimal submodel are not expected.

Disclosure of Invention

In view of the above problems, a method, an apparatus, a device and a storage medium for updating parameter values of a model according to embodiments of the present invention are proposed to overcome or at least partially solve the above problems.

In order to solve the above problem, a first aspect of the present invention discloses a method for updating parameter values of a model, including:

obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained, wherein the preset model comprises a plurality of submodels, and each submodel is used for identifying the sample image;

obtaining a recognition result output after the sample image is recognized by each of the plurality of sub-models;

weighting the recognition results output by the sub models according to the weights corresponding to the sub models respectively to obtain processed recognition results;

determining loss differences between the processed recognition results and recognition results output by the sub models respectively;

determining an overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of sub-models;

and updating the parameter values of the sub-models respectively according to the overall loss value.

Optionally, after updating the parameter values of the plurality of submodels respectively according to the overall loss value, the method further includes:

determining the parameter average value of each of the plurality of sub-models in a plurality of rounds of training before the round of training;

and updating the updated parameter values of the sub-models again according to a preset coefficient, the updated parameter values of the sub-models and the average parameter values of the sub-models in multiple rounds of training before the round of training to obtain new parameter values of the sub-models after the round of training is finished.

Optionally, determining a loss difference between each of the processed recognition results and the recognition result output by each of the plurality of submodels includes:

determining cosine distances between the processed recognition results and recognition results output by the sub models respectively, and taking the cosine distances as the loss difference;

or determining relative entropies between the processed recognition results and recognition results output by the sub models respectively, and taking the relative entropies as the loss difference.

Optionally, determining an overall loss value of the preset model according to each loss difference, the processed recognition result, the label, and the recognition result output by each of the plurality of submodels, includes:

determining a first loss value corresponding to each of the plurality of submodels according to the label and the identification result output by each of the plurality of submodels;

determining a second loss value corresponding to the processed identification result according to the label and the processed identification result;

and determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the overall loss value of the preset model.

Optionally, each sample image carries a plurality of attribute tags, and each sub-model is used for identifying a plurality of attributes of the sample image; determining an overall loss value of the preset model according to each loss difference, the recognition result obtained by the weight post-processing, the label and the recognition result output by each of the plurality of submodels, wherein the overall loss value comprises;

for each attribute, determining an overall loss value corresponding to the attribute according to each loss difference corresponding to the attribute, the processed identification result corresponding to the attribute, the attribute label of the attribute and the identification result corresponding to the attribute output by each of the plurality of submodels;

and determining the sum of the overall loss values corresponding to the attributes as the overall loss value of the preset model.

Optionally, the preset model further includes a weight processing branch; the method further comprises the following steps:

obtaining a weight distribution proportion of the weight processing branch output, wherein the weight distribution proportion represents the ratio of weights corresponding to the identification results output by the sub models;

according to the weights corresponding to the submodels, weighting the recognition results output by the submodels respectively to obtain processed recognition results, and the method comprises the following steps:

according to the weight distribution proportion, carrying out weighted summation on the recognition results output by the sub-models respectively to obtain a processed recognition result;

updating the parameter values of the sub-models respectively according to the overall loss value, comprising:

and respectively updating the parameter values of the weight processing branches and the respective parameter values of the plurality of submodels according to the overall loss value.

Optionally, the weight processing branch comprises: a plurality of primary full-link layers respectively connected to the convolution layers of the plurality of submodels, and a secondary full-link layer connected to the plurality of primary full-link layers; wherein the weight distribution ratio is obtained according to the following steps:

obtaining a feature map output by each convolution layer of the plurality of sub-models, wherein the feature map is obtained by performing feature extraction on the sample image by each convolution layer of the plurality of sub-models;

respectively inputting the characteristic diagram output by the convolution layer of each sub-model into a primary full-connection layer connected to the convolution layer to obtain a result output by the primary full-connection layer;

and inputting the respective output results of the plurality of primary full-connection layers into the secondary full-connection layer to obtain the weight proportion output by the secondary full-connection layer.

Optionally, after updating the parameter values of the sub models respectively according to the overall loss value, the method further includes:

taking the test images in the test set as input, testing the preset model at the end of training to obtain test results corresponding to a plurality of sub models in the preset model at the end of training;

and screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain an image recognition model for image recognition.

In a second aspect of the embodiments of the present invention, there is provided a parameter value updating apparatus for a model, including:

the system comprises an input module, a training module and a training module, wherein the input module is used for obtaining a sample image carrying a label and inputting the sample image into a preset model to be trained, the preset model comprises a plurality of sub-models, and each sub-model is used for identifying the sample image;

an output result obtaining module, configured to obtain an identification result that is output after the sample image is identified by each of the plurality of submodels;

the weight processing module is used for weighting the recognition results output by the sub models according to the weights corresponding to the sub models respectively to obtain the processed recognition results;

a loss difference determining module, configured to determine loss differences between the processed recognition results and recognition results output by the multiple submodels, respectively;

the overall loss determining module is used for determining an overall loss value of the preset model according to each loss difference, the processed recognition result, the label and the recognition result output by each of the plurality of submodels;

and the parameter updating module is used for respectively updating the parameter values of the sub models according to the overall loss value.

In a third aspect of the embodiments of the present invention, an electronic device is further disclosed, including:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform a method for updating parameter values of one or more models as described in embodiments of the first aspect of the invention.

In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is further disclosed, which stores a computer program for causing a processor to execute the method for updating parameter values of a model according to the embodiments of the first aspect of the present invention.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, a sample image carrying a label is input into a preset model, the identification results output by a plurality of submodels are weighted according to the weights corresponding to the submodels in the preset model to obtain the processed identification results, then the loss difference between the processed identification results and the identification results output by the submodels is determined, the overall loss value is determined according to the loss differences, the processed identification results, the label and the identification results output by the submodels, and the parameter values of the submodels are updated according to the overall loss value.

According to the embodiment of the invention, the recognition results output by the sub-models are weighted according to the weights corresponding to the sub-models to obtain the processed recognition results, so that the recognition results output by the sub-models are fused, and stronger association is established among the competitive learning sub-models. And because the loss difference between the processed recognition result and the recognition result output by each of the plurality of submodels is determined, and the overall loss value is determined according to the loss difference, the processed recognition result, the recognition result output by each submodel and the like, the overall loss value can simultaneously represent the loss of each submodel and the loss of the fused recognition result, so that when the parameters of each submodel are updated according to the overall loss value, the submodel with weaker learning capability can assist the updating of the parameters of the submodel with better learning capability in the plurality of submodels with strengthened relevance, the performance of the finally retained submodel can be better, and the accuracy and the recognition efficiency of the retained submodel for recognizing the image are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor

FIG. 1 is a schematic structural diagram of a default model according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of a method for updating parameter values of a model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of another preset model according to an embodiment of the present invention;

fig. 4 is a block diagram of a parameter value updating apparatus of a model according to an embodiment of the invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below to clearly and completely describe the technical solutions in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a schematic structural diagram of a preset model according to an embodiment of the present invention is shown, where the preset model may include a plurality of sub models. Only two

submodels

101 and 102 are shown in fig. 1, and in practice, the model structures of the submodels may be the same or different, and the submodels may have different initial parameters or the same initial parameters. In one possible embodiment, the shallow network structures (for example, in fig. 1, Conv2-Conv3 are shallow network structures, and in contrast, Conv4-Conv5 are deep network structures) of the sub-models and the corresponding initial parameters may be the same or different. Wherein Conv denotes a convolution layer (convolution).

The input of the multiple submodels is the same, and the same image recognition task can be executed on the same input, that is, the multiple submodels can all perform face recognition on the same input a. In practice, the multiple sub-models in the preset model can be applied to various image recognition tasks, such as face recognition, pedestrian attribute in a video structuring task, vehicle attribute recognition, clothing fine-grained attribute recognition, fingerprint recognition and the like.

A method for updating parameter values of a model according to an embodiment of the present invention is described with reference to the preset model shown in fig. 1.

Referring to fig. 2, a flowchart illustrating steps of a method for updating parameter values of a model in an embodiment is shown, and as shown in fig. 2, the method may specifically include the following steps:

step S201: and obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained.

The preset model comprises a plurality of submodels, wherein each submodel is used for identifying the sample image.

In the embodiment of the present invention, the number of the sub-models may be set according to actual requirements, and may be, for example, 3 or 5. The label carried by each sample image can be set according to the recognition task of the preset model.

For example, if the recognition task of the preset model is face recognition, the carried tag may be an ID of a face in the sample image, the ID is used for uniquely characterizing the real identity of the face, and the ID may be a number. For another example, if the recognition task of the preset model is pedestrian attribute recognition, the carried tag can represent which human pedestrian is in the sample image. For another example, if the identification task of the preset model is fingerprint identification, the carried tag may be a fingerprint ID of a fingerprint in the sample image, and the fingerprint ID may be used to uniquely characterize a real finger corresponding to the fingerprint.

As shown in fig. 1, a sample image carrying a label may be used as an input, a convolutional layer Conv1 of a preset model may perform convolution processing on the sample image to obtain a feature map after convolution processing, the feature map after convolution processing is respectively used as an input of a sub-model 101 and an input of a sub-model 102, and the sub-model 101 and the sub-model 102 respectively perform image recognition on the feature map after convolution processing, for example, both perform face recognition or both perform pedestrian attribute recognition.

Step S202: and obtaining the identification result output after the plurality of sub-models respectively identify the sample image.

In the embodiment of the invention, the plurality of sub-models can respectively perform image recognition on the feature map after convolution processing to obtain the recognition result output by each sub-model.

As shown in fig. 1, each of the submodel 101 and the submodel 102 may include a plurality of convolution kernels at different levels, the convolution kernels may perform convolution processing at different levels on the feature map after the convolution processing, then, the feature map obtained by the submodel 101 after the convolution processing at different levels is used as an input of the FC1, the FC1 outputs a recognition result P1, the feature map obtained by the submodel 102 after the convolution processing at different levels is used as an input of the FC2, and the FC2 outputs a recognition result P2.

In this embodiment, the characterization modes of the identification result may be different corresponding to different identification tasks, for example, if the identification task is a fingerprint identification task, the identification result may be a matching probability, that is, a probability that a fingerprint in the characterization sample image and a fingerprint in the prestored image in the base library are the same fingerprint. For another example, if the recognition task is an attribute recognition task, and if the recognition task is a pedestrian attribute recognition, the recognition result may be a1 × 2 vector, and two values of the vector respectively represent that the pedestrian is present and not the pedestrian. Of course, a vector of 1 × 3 is also possible, and three values in the vector of 1 × 3 represent a pedestrian, a non-pedestrian, and an unknown, respectively.

Step S203: and weighting the recognition results output by the sub models according to the weights corresponding to the sub models respectively to obtain the processed recognition results.

In one embodiment, the weight corresponding to each of the plurality of submodels may be preset, and the weight corresponding to one submodel may reflect: the ratio of the recognition results output by the submodel to the recognition results output by all submodels. The weight may be a positive number smaller than the positive number, and the sum of the weights corresponding to the respective submodels may be smaller than or equal to 1.

The weighting processing of the recognition results output by each of the plurality of submodels may be performed by performing weighted summation of the recognition results output by each of the plurality of submodels according to weights corresponding to each of the plurality of submodels, and taking the result after the weighted summation as the processed recognition result. The weighted summation of the recognition results output by the multiple submodels can be understood as the fusion of the recognition results output by the multiple submodels, so that the processed recognition result obtained after the fusion can be regarded as the result of the whole preset model for recognizing the pattern book.

For example, as shown in fig. 1, taking the weight corresponding to the sub-model 101 as 0.4 and the weight corresponding to the sub-model 102 as 0.6 as an example, the recognition result P1 output from the sub-model 101 and the recognition result P2 output from the sub-model 102 are weighted and summed to obtain P3, and P3 is 0.4 × P1+0.6 × P2. The P3 can be regarded as the recognition result of the preset model for image recognition.

Step S204: and determining loss differences between the processed recognition results and the recognition results output by the sub models respectively.

In this embodiment, since the processed recognition result is a result obtained by weighted summation of the recognition results output by each of the plurality of submodels, the difference between the recognition result output by each submodel and the processed recognition result may be further determined, and the difference is used as the loss difference.

For example, as shown in fig. 1, the difference between P1 and P3, and the difference between P2 and P3 may be determined, respectively, so that the loss difference L1 corresponding to the sub model 101 and the loss difference L2 corresponding to the sub model 102 may be obtained.

In one embodiment, the loss difference may be determined by step S20241 or step S2042 as follows:

step S2041: and determining cosine distances between the processed recognition results and recognition results output by the sub models respectively, and taking the cosine distances as the loss difference.

The cosine distance is also called cosine similarity, and is a measure for measuring the difference between two individuals by using a cosine value of an included angle between two vectors in a vector space. In some attribute recognition tasks, the recognition result may be a1 × n vector, and then the processed recognition result may also be a1 × n vector, and then the cosine distance between the recognition result output by each submodel and the processed recognition result in the vector space may be calculated, and further the cosine distance may be used as the loss difference. Wherein, the value range of the cosine distance can be [0,1 ].

Step S2042: and determining relative entropies between the processed recognition results and the recognition results output by the sub models respectively, and taking the relative entropies as the loss difference.

The relative entropy, also known as Kullback-Leibler divergence or information divergence (information divergence), is an asymmetry measure of the difference between two probability distributions, which is equivalent to the difference in information entropy (Shannon entropy) of the two probability distributions.

In this embodiment, in some recognition tasks of face recognition or fingerprint recognition, the recognition result may be a matching probability, and the processed recognition result may also be a matching probability, and then a difference value of information entropy between the recognition result output by each sub-model and the processed recognition result may be calculated, and the difference value may be used as a loss difference.

Step S205: and determining the overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of sub-models.

Step S206: and updating the parameter values of the sub-models respectively according to the overall loss value.

In this embodiment, since the overall loss value may be obtained from the processed recognition result, each loss difference, the label, and the recognition result output by each of the multiple submodels, the overall loss value may represent the loss of the multiple submodels for recognizing the sample image as a whole, that is, may reflect the capability of the multiple submodels for recognizing the sample as a whole, and further may establish a stronger association between the multiple submodels in the competitive learning, thereby fully integrating the performance of each submodel. When the parameter values of each submodel are updated, the submodel with the weaker learning ability can assist the learning of the submodel with the stronger learning ability, so that the updating direction of the parameter values of each submodel approaches to the global optimum, and the image identification accuracy of each submodel (particularly the submodel with the better learning ability) can be improved.

The overall loss value may include a loss value corresponding to the processed recognition result, a loss value corresponding to the recognition result of each of the plurality of submodels, and a loss difference corresponding to each submodel.

In one embodiment, the overall loss value of the predetermined model may be determined by:

step S2051: and determining a first loss value corresponding to each of the plurality of submodels according to the label and the identification result output by each of the plurality of submodels.

The sample image input to the preset model carries the label, and the label corresponds to different identification tasks, so that the real situation of the sample image under the identification tasks can be reflected. For example, if the recognition task is pedestrian attribute recognition, then the tag may characterize whether the person in the sample image is a real pedestrian.

In practice, a loss function in the related art may be adopted to determine a first loss value corresponding to each of the plurality of submodels according to the identification result output by each of the label and the plurality of submodels. The first loss value may characterize a gap between the recognition result output by the submodel and the truth characterized by the label.

Step S2052: and determining a second loss value corresponding to the processed identification result according to the label and the processed identification result.

Similarly, after the recognition results output by the multiple submodels are weighted, the obtained processed recognition results can represent the recognition results of the whole preset model on the sample image, and then the second loss value can be determined according to the label and the processed recognition results by adopting a loss function in the related technology, and can represent the difference between the recognition results output by the multiple submodels and the real situation represented by the label.

Step S2053: and determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the overall loss value of the preset model.

In the embodiment of the invention, the overall loss value can be the sum of the second loss value, the loss difference corresponding to each submodel and the first loss value corresponding to each of the submodels, so that stronger association among the submodels is established through the determination of the overall loss value.

The above embodiment has explained the updating of the parameter values of the model by taking an input sample image as an example. In practice, there may be a plurality of sample images for training, and each sample image input to the preset model may be one or more sample images during each training round, so that the loss value may be determined for each sample image input during each training round according to the method described in the above embodiment, and at the end of each training round, the parameter values of each of the plurality of sub models may be updated according to the overall loss value at the end of the training round.

In practice, in one specific implementation, after updating the parameter values of multiple submodels in each round, parameter value correction may be performed on each submodel to achieve faster and more accurate convergence. After updating the parameter values of each sub-model in each round of training, the method may further include the following steps:

step S207: and determining the parameter average value of each of the plurality of sub-models in a plurality of rounds of training before the round of training.

In this embodiment, when updating the parameter value of the sub-model in each round, the parameter value of each model may be updated once according to the overall loss value determined in the round, and after the update, the parameter value of each sub-model at the end of the round of training (hereinafter, the parameter value is referred to as a parameter value to be corrected) may be obtained.

In example #1, as shown in fig. 1, if the update of the round is the nth round, the parameter value of the submodel 101 is m1 and the parameter value of the submodel 102 is m2 after the update.

In this embodiment, the updated parameter value of each sub-model at the end of each round of training can be recorded, so that the parameter value of each sub-model at the end of each round of training before the current round of training can be obtained, and the average value of the parameter value of each sub-model in multiple rounds of training before the current round of training can be determined.

Example #2, as shown in FIG. 1, the parameter mean1 for the submodel 101 in the n-1 updates before the nth update and the parameter mean2 for the submodel 102 in the n-1 updates before the nth update may be determined.

Step S208: and updating the updated parameter values of the sub-models again according to a preset coefficient, the updated parameter values of the sub-models and the average parameter values of the sub-models in multiple rounds of training before the round of training to obtain new parameter values of the sub-models after the round of training is finished.

In this embodiment, the parameter value to be corrected of the current round of each sub-model may be corrected according to the preset coefficient and the parameter average value of each sub-model, so as to obtain a corrected parameter value, and the corrected parameter value is used as a new parameter value of the sub-model after the training of the current round is finished (hereinafter, the new parameter value is referred to as the corrected parameter value).

In practice, after obtaining a new parameter value for each submodel, the new parameter value may be updated in a further round of training thereafter.

Specifically, the new parameter value of each sub-model after the training round is finished can be determined by the following formula:

wherein, y_(m,n)Represents the corrected parameter value of the mth sub-model after the nth training round is finished,

is a predetermined coefficient, x_(m,n)For the parameter value to be corrected, x, of the mth sub-model at the end of the nth round of training_meanThe average value of each parameter value obtained in n-1 rounds of updating before the nth round of training is taken as the mth sub-model.

As shown in fig. 1, in the above example #1 and example #2, if the preset coefficient is set to 0.99, the parameter is corrected, and then the corrected parameter value m of the sub-model 101 is obtained₁₀₁The corrected parameter value m of the submodel 102, 0.99 × m1+ (1-0.99) × mean₁₀₂＝0.99×m2+(1-0.99)×mean2。

When the embodiment is adopted, because the record can be updated according to the historical parameters in the training process during each round of updating, the parameter values after each submodel is updated are corrected once, the parameter value updating direction can be more accurate, and the performance of the submodel is better.

In practical applications, the recognition task performed by the preset model may be an attribute recognition task, and each sub-model in the preset model may be used to recognize the attribute of the image. For example, whether a person in the image wears a hat or not is identified, and in this case, the tag carried by the sample image may be an attribute tag.

In some specific application scenarios, it may be desirable to identify multiple attributes in the identification image at the same time, for example, identifying whether the person in the image is wearing a hat or a skirt. In this case, each sample image may carry a plurality of attribute tags, each attribute tag may characterize an attribute of the sample image. Accordingly, each sub-model may be used to separately identify a plurality of attributes of the sample image.

For example, the sample image carries 2 attribute tags, wherein one attribute tag is whether the person in the sample image wears a hat, if the attribute tag is a1, the person wears the hat, and if the attribute tag is a0, the person does not wear the hat. The other attribute label indicates whether the person in the sample image wears a skirt, and if the attribute label is B1, the person wears the skirt, and if the attribute label is B0, the person does not wear the skirt. The sub-model 101 identifies whether the person in the sample image wears a hat or a skirt, and accordingly the sub-model 101 outputs an attribute identification result of whether the person wears a hat or not and an attribute identification result of whether the person wears a skirt or not.

In practice, in the case that the sample image carries a plurality of attribute tags, each sub-model outputs an identification result corresponding to each attribute. If 3 attribute tags are carried, each sub-model outputs 3 identification results, wherein each identification result corresponds to one attribute.

In this application scenario, since each sub-model outputs a plurality of recognition results with different attributes, when determining the overall loss value of the preset model, the method may include the following steps:

step S2061': and for each attribute, determining an overall loss value corresponding to the attribute according to each loss difference corresponding to the attribute, the processed identification result corresponding to the attribute, the attribute label of the attribute and the identification result corresponding to the attribute output by each of the plurality of submodels.

Step S2062': and determining the sum of the overall loss values corresponding to the attributes as the overall loss value of the preset model.

In this embodiment, the overall loss value corresponding to each attribute may represent the accuracy of the preset model for identifying the attribute of the sample image.

In a specific implementation, the overall loss value corresponding to each attribute may be performed through the processes from step S202 to step S206. Specifically, the loss difference corresponding to each sub-model identifying each attribute may be determined. For each submodel, a loss value corresponding to each attribute in the submodel may be determined according to each attribute label and the identification result corresponding to the attribute label. Similarly, the recognition results corresponding to each attribute output by each submodel can be weighted and summed to obtain the processed recognition result corresponding to each attribute, and the loss corresponding to each attribute is determined according to the processed recognition result corresponding to each attribute and the attribute label.

For example, as shown in fig. 1, it is assumed that there are 2 attribute tags, namely attribute tag a and attribute tag B, wherein attribute tag a indicates whether a hat is worn and attribute tag B indicates whether a skirt is worn. The recognition results outputted by the submodels 101 are P_a1 and P_b1, wherein P_a1 corresponds to the result of recognition of whether or not a hat is worn, P_b1 corresponds to the result of identification of whether or not a skirt is worn. Similarly, the recognition results output by the submodels 102 are P_a2 and P_b2. P may be paired according to the weights corresponding to the submodel 101 and the submodel 102, respectively_a1 and P_a2 weighted summation to get P_a3, to P_b1 and P_b2 weighted summation to get P_b3。

Further, P can be obtained_a1 and P_a3 difference in loss L between_a1、P_a2 and P_a3 difference in loss L between_a2、P_b1 and P_b3 difference in loss L between_b1、P_b2 and P_b3 difference in loss L between_b2. According to P_a1 and attribute label A can result in a loss value L_a1' according to P_b1 and attribute tag B get a loss value L_b1' wherein the loss value L_a1' and L_b1' is the corresponding penalty for sub-model 101. In the same way, according to P_a2 and attribute tag a may yield a loss value L_a2' according to P_b2 and attribute tag B get the loss value L_b2', wherein the loss value L_a2' and L_b2 "is the corresponding penalty for sub-model 102.

Further, labels A and P are labeled according to attributes_a3, obtaining a loss value L_a3, labeling B and P according to attributes_b3, obtaining a loss value L_b3. The loss value corresponding to the attribute label a is L_a1、L_a1'、L_a2、L_a2' and L_a3' and the loss value corresponding to the attribute label B is L_b1、L_b1'、L_b2、L_b2' and L_b3' and (b).

The overall loss value of the preset model is the sum of the loss value corresponding to the attribute label a and the loss value corresponding to the attribute label B.

In practice, in order to improve the autonomous learning model of the preset model and avoid the problems of poor generalization and unreasonable performance caused by artificially setting the weights corresponding to the sub-models, in an embodiment, the weights corresponding to the sub-models may also be learned in the training process.

Specifically, in a specific implementation, the preset model may further include a weight processing branch, where an input of the weight processing branch is a feature obtained by performing convolution processing on the sample image in each sub-model, and in the actual processing, the method may further include the following steps:

step S2020: and obtaining a weight distribution proportion of the weight processing branch output, wherein the weight distribution proportion represents the ratio of weights corresponding to the identification results output by the sub models respectively.

The weight distribution ratio may be obtained while obtaining the recognition results output by each of the plurality of submodels.

In one embodiment, the weight processing branch comprises a plurality of primary fully-connected layers and a secondary fully-connected layer, wherein the input of the secondary fully-connected layer can be simultaneously connected with the output of the plurality of primary fully-connected layers. Wherein the input terminals of different first-level fully-connected layers can be connected with the output terminals of the convolution layers of a different sub-model.

Referring to fig. 3, a schematic structural diagram of the preset model shown in fig. 1 after adding a weight processing branch is shown, and as shown in fig. 3, the weight processing branch may include: two primary full-link layers, FC3 and FC4, and one secondary full-link layer, FC 5. The input end of the first-level full-connection layer FC3 is connected with the output end of the convolution layer in the sub-model 101, the input end of the first-level full-connection layer FC4 is connected with the output end of the convolution layer in the sub-model 102, and the input end of the second-level full-connection layer FC5 is connected with the output end of the first-level full-connection layer FC3 and the output end of the first-level full-connection layer FC4 at the same time.

Accordingly, how the weight processing branch outputs the weight distribution ratio will be described with reference to fig. 3. Specifically, the weight distribution ratio is obtained according to the following steps:

step S20201: and obtaining a feature map output by the convolution layer of each of the plurality of sub-models, wherein the feature map is obtained by performing feature extraction on the sample image by the convolution layer of each of the plurality of sub-models.

Step S20202: and respectively inputting the characteristic diagram output by the convolution layer of each sub-model into a primary full-connection layer connected to the convolution layer to obtain a result output by the primary full-connection layer.

In this embodiment, the characteristic diagram output by the convolution layer of each sub-model may be input to the first fully-connected layer connected to the output end of the convolution layer of the sub-model, and the result output by the first fully-connected layer may be obtained through the processing of the first fully-connected layer.

Step S20203: and inputting the respective output results of the plurality of primary full-connection layers into the secondary full-connection layer to obtain the weight proportion output by the secondary full-connection layer.

In this embodiment, the result output by each primary fully-connected layer may be input to a secondary fully-connected layer, and the secondary fully-connected layer may perform information processing on the result output by each primary fully-connected layer to obtain the weight of each submodel in competition learning, so as to form a weight ratio for output. In this way, the preset model can automatically correlate the output results of all the primary full-connection layers, and further learn a weight proportion, wherein the sum of each weight in the weight proportion is less than or equal to 1.

For example, as shown in fig. 1, if the weight ratio is 0.4:0.6, it may indicate that the weight of the sub-model 101 is 0.4 and the weight of the sub-model 102 is 0.6.

Accordingly, the post-processing recognition result may be obtained by:

step S203': and according to the weight distribution proportion, carrying out weighted summation on the identification results output by the sub-models respectively to obtain the processed identification result.

In this embodiment, the weight corresponding to each sub-model may be obtained according to the weight distribution proportion output by the second-level full-connected layer, and then the recognition results output by the plurality of sub-models are weighted and summed to obtain the processed recognition result.

In this embodiment, when updating the parameter values of each submodel, the parameter values of the weight processing branches may be updated according to the overall loss value, so that the weight processing branches may be trained together.

In this embodiment, after the preset model is trained by using a plurality of sample images as training samples, the accuracy of image recognition according to each submodel in the preset model may be retained, and the submodel with the highest accuracy, that is, the best performance, is retained, thereby obtaining the image recognition model. In one embodiment, after performing multiple rounds of updating on the parameter values of each sub-model, an image recognition model finally used for image recognition may be obtained through a process including the following steps:

step S207: and taking the test images in the test set as input, testing the preset model at the end of training to obtain test results corresponding to a plurality of sub models in the preset model at the end of training.

The test image in the test set can be obtained according to the identification task based on training the preset model, for example, if the identification task is a fingerprint identification task, the test image is a fingerprint image for test, if the identification task is a pedestrian attribute identification task, the test image is a pedestrian image for test, and if the identification task is a clothing fine-grained attribute identification task, the test image is a person clothing image for test.

In practice, the preset model after training includes a plurality of trained submodels, and the trained submodels can respectively identify the test image, so as to obtain identification results respectively output by the plurality of submodels, where the identification results are test results.

And the characterization modes of the test results can be different according to different recognition tasks. For example, for the task of fine-grained attribute identification of clothing, if the person in the clothing image of the test person is a person wearing a hat, the test result is a vector of 1 × 2, which is output after the sub-model identifies the clothing image of the test person and is used for judging whether the person is wearing a hat. For example, (0.8, 0.2) indicates that the probability of wearing a hat is 0.8.

Step S208: and screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain an image recognition model for image recognition.

In this embodiment, the accuracy rate of the recognition of the test image by each of the plurality of submodels may be determined according to the test result, and then the submodel corresponding to the highest accuracy rate may be determined as the submodel satisfying the preset test condition according to the order from the highest accuracy rate to the lowest accuracy rate. Of course, in practice, the submodel corresponding to the accuracy reaching the preset accuracy may also be determined as the submodel meeting the preset test condition.

In specific implementation, the submodels with the test results meeting the preset test conditions can be reserved, and the rest submodels can be discarded, so that the image recognition model is obtained.

For example, as shown in fig. 1, for the fine-grained attribute recognition task of clothing, if the test result output by the sub-model 101 is (0.8, 0.2), it indicates that the probability of wearing a hat is 0.8, and if the test result output by the sub-model 102 is (0.9, 0.1), it indicates that the probability of wearing a hat is 0.9, and in practice, if a person in the test image wears a hat, the accuracy of the sub-model 102 is higher, so that the sub-model 102 can be retained, the sub-model 101 is discarded, and the obtained image recognition model may include only the sub-model 102.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Based on the same inventive concept, referring to fig. 4, a schematic diagram of a framework of a parameter value updating apparatus of a model according to an embodiment of the present invention is shown, and as shown in fig. 4, the framework may specifically include the following modules:

an input module 401, configured to obtain a sample image carrying a label, and input the sample image into a preset model to be trained, where the preset model includes multiple submodels, and each submodel is used to identify the sample image;

an output result obtaining module 402, configured to obtain an identification result that is output after the sample image is identified by each of the multiple sub-models;

a weight processing module 403, configured to perform weighting processing on the recognition results output by the multiple submodels according to weights corresponding to the multiple submodels, to obtain processed recognition results;

a loss difference determining module 404, configured to determine loss differences between the processed recognition results and recognition results output by the sub models respectively;

an overall loss determining module 405, configured to determine an overall loss value of the preset model according to each loss difference, the processed recognition result, the tag, and a recognition result output by each of the plurality of submodels;

and a parameter updating module 406, configured to update the parameter values of the multiple submodels respectively according to the overall loss value.

Optionally, the apparatus may further include a parameter modification module, where the parameter modification module specifically includes the following units:

a parameter average value determining unit, configured to determine an average value of each parameter of the plurality of sub-models in a plurality of rounds of training before the round of training;

the parameter correction unit may be configured to update the updated parameter values of the sub models again according to a preset coefficient, the updated parameter values of the sub models, and an average parameter value of the sub models in multiple rounds of training before the round of training, so as to obtain new parameter values of the sub models after the round of training is completed.

Optionally, the loss difference determining module 404 may be configured to determine cosine distances between the processed recognition results and recognition results output by the multiple submodels, respectively, and use the cosine distances as the loss differences; alternatively, the first and second electrodes may be,

the method may be configured to determine relative entropies between the processed recognition results and recognition results respectively output by the multiple submodels, and use the relative entropies as the loss difference.

Optionally, the overall loss determining module 405 may include the following units:

a first determining unit, configured to determine, according to the tag and an identification result output by each of the plurality of submodels, a first loss value corresponding to each of the plurality of submodels;

a second determining unit, configured to determine, according to the tag and the processed identification result, a second loss value corresponding to the processed identification result;

and the third determining unit is used for determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the overall loss value of the preset model.

Optionally, each sample image carries a plurality of attribute tags, and each sub-model is used for identifying a plurality of attributes of the sample image; the overall loss determination module may include the following units;

a fourth determining unit, configured to determine, for each attribute, an overall loss value corresponding to the attribute, based on each loss difference corresponding to the attribute, a processed identification result corresponding to the attribute, an attribute tag of the attribute, and an identification result corresponding to the attribute output by each of the plurality of submodels;

the fifth determining unit may be configured to determine a sum of overall loss values corresponding to the multiple attributes as an overall loss value of the preset model.

Optionally, the preset model further includes a weight processing branch; the apparatus may further include the following modules:

a weight distribution proportion obtaining module, configured to obtain a weight distribution proportion output by the weight processing branch, where the weight distribution proportion represents a ratio of weights corresponding to recognition results output by the multiple submodels;

the weight processing module 403 may be specifically configured to perform weighted summation on the recognition results output by each of the multiple submodels according to the weight distribution ratio, so as to obtain a processed recognition result;

the parameter updating module 406 may be specifically configured to update the parameter values of the weight processing branch and the parameter values of the multiple submodels respectively according to the overall loss value.

Optionally, the weight processing branch comprises: a plurality of primary full-link layers respectively connected to the convolution layers of the plurality of submodels, and a secondary full-link layer connected to the plurality of primary full-link layers; wherein:

each primary full-connection layer is used for processing the characteristic diagram output by the convolution layer of the corresponding sub-model and outputting a result; the characteristic graph is obtained by performing characteristic extraction on the sample image by the convolution layer of the sub-model;

and the second-level full connection layer is used for processing the results output by the plurality of first-level full connection layers to obtain a weight proportion.

Optionally, the apparatus may further include the following modules:

the test module is used for testing the preset model after training is finished by taking the test images in the test set as input, and obtaining test results respectively corresponding to a plurality of sub models in the preset model after training is finished;

and the screening module is used for screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain the image recognition model for image recognition.

An embodiment of the present invention further provides an electronic device, which may be used to execute a parameter value updating method for a model, and may include a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the parameter value updating method for the model.

Embodiments of the present invention further provide a computer-readable storage medium storing a computer program for causing a processor to execute the method for updating the parameter value of the model according to the embodiments of the present invention.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The above detailed description is provided for the parameter value updating method, apparatus, device and storage medium of the model provided by the present invention, and a specific example is applied in this document to illustrate the principle and implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and its core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for updating parameter values of a model, comprising:

2. The method of claim 1, wherein after updating the parameter values of the plurality of submodels, respectively, according to the overall loss value, the method further comprises:

3. The method of claim 1, wherein determining a loss difference between the processed recognition results and the recognition results respectively output by the plurality of submodels comprises:

4. The method of claim 1, wherein determining the overall loss value of the predetermined model according to the loss differences, the processed recognition result, the label, and the recognition result output by each of the sub-models comprises:

5. The method of claim 1, wherein each sample image carries a plurality of attribute tags, each sub-model identifying a plurality of attributes of the sample image; determining an overall loss value of the preset model according to each loss difference, the recognition result obtained by the weight post-processing, the label and the recognition result output by each of the plurality of submodels, wherein the overall loss value comprises;

6. The method of claim 1, wherein the pre-set model further comprises a weight processing branch; the method further comprises the following steps:

7. The method of claim 6, wherein the weight processing branch comprises: a plurality of primary full-link layers respectively connected to the convolution layers of the plurality of submodels, and a secondary full-link layer connected to the plurality of primary full-link layers; wherein the weight distribution ratio is obtained according to the following steps:

8. The method according to any of claims 1-7, wherein after updating the parameter values of the plurality of submodels separately according to the overall loss value, the method further comprises:

9. An apparatus for updating parameter values of a model, comprising:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing implementing a method for updating parameter values of a model according to any of claims 1-8.

11. A computer-readable storage medium storing a computer program for causing a processor to execute a parameter value updating method of a model according to any one of claims 1 to 8.