CN111598182A

CN111598182A - Method, apparatus, device and medium for training neural network and image recognition

Info

Publication number: CN111598182A
Application number: CN202010443246.1A
Authority: CN
Inventors: 于志鹏; 吴一超; 梁鼎; 郭秋杉
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-08-28
Anticipated expiration: 2040-05-22
Also published as: CN111598182B

Abstract

The embodiment of the application provides a method, a device, equipment and a medium for training a neural network and recognizing an image. The method comprises the following steps: acquiring an image sample data set, wherein the image sample data set comprises at least one image sample data; respectively inputting the image sample data sets into a plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks; determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first prediction results; network parameters of the plurality of neural networks are adjusted based on the loss function.

Description

Method, apparatus, device and medium for training neural network and image recognition

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a medium for training a neural network and recognizing an image.

Background

Knowledge distillation is a method that a teacher model (teacher model) is trained in advance, then a loss function is obtained by using the output of the teacher model and a real label of training data, and then a student model (student model) is trained based on the loss function, so that the result of the student model is close to the output result of the teacher model.

In the learning process, an additional teacher model is often required to be trained, and the training effect of the student model is very dependent on the quality of the teacher model. In addition, the current knowledge distillation process is long, requires multi-stage training and is very resource-consuming.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a medium for training a neural network and recognizing an image, so as to simplify the training process and improve the performance of the neural network.

In a first aspect, an embodiment of the present application provides a method for training a neural network, including: acquiring an image sample data set, wherein the image sample data set comprises at least one image sample data; respectively inputting the image sample data set into a plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks; determining a loss function of the plurality of neural networks in the current iteration training based on a plurality of first prediction results; adjusting network parameters of the plurality of neural networks based on the loss function.

Optionally, the determining a loss function of the neural networks in the current iteration training based on the first prediction results includes: respectively determining a first loss function corresponding to each neural network and determining a second loss function; determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first loss functions and the second loss functions.

Optionally, the determining the first loss function corresponding to each neural network and determining the second loss function respectively includes: determining a first loss function for each neural network based on each first prediction result and a label of the image sample data set; determining a second prediction result based on the plurality of first prediction results; determining the second loss function based on the second prediction result and a label of the image sample data set.

Optionally, the determining a second prediction result based on the plurality of first prediction results includes: and determining a first prediction result corresponding to a minimum loss function in the first loss functions of the plurality of neural networks as the second prediction result.

Optionally, the determining a second prediction result based on the plurality of first prediction results includes: determining a weight of each first prediction result; obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result; wherein the weight is a weight at which the second loss function takes a minimum value.

Optionally, the first prediction result comprises a plurality of classification results; the determining a second prediction result based on the plurality of first prediction results comprises: obtaining the minimum value of each classification result in the first prediction results of the plurality of neural networks; and obtaining the second prediction result according to the minimum value of each classification result.

Optionally, the determining a second prediction result based on the plurality of first prediction results includes: obtaining a verification image dataset comprising at least one verification image data; determining a performance parameter for each of the neural networks based on the validation image dataset; determining the weight of each neural network based on the performance parameters, wherein the values of the performance parameters and the weights are in positive correlation; and obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result.

Optionally, at least part of the image sample data input to at least two of the plurality of neural networks is different.

Optionally, before the inputting the image sample data sets into a plurality of neural networks respectively, the method further includes: respectively performing data enhancement processing on at least part of image sample data in the image sample data set to obtain a plurality of image sample data sets, and respectively taking the plurality of image sample data sets as the input of the plurality of neural networks; wherein the types and/or processing parameters of the data enhancement processing adopted to obtain the plurality of image sample data sets are different.

Optionally, the type of the data enhancement processing includes at least one of the following: scale transformation, translation, rotation, cropping, color warping.

Optionally, in a case that the type of the data enhancement processing includes scaling, the processing parameter includes a scaling parameter; and/or, in case the kind of the data enhancement processing comprises translation, the processing parameter comprises translation amount; and/or, in case the kind of the data enhancement processing comprises rotation, the processing parameter comprises rotation amount; and/or, in the case that the kind of the data enhancement processing includes clipping, the processing parameter includes a clipping value; and/or, in case the kind of the data enhancement processing comprises color warping, the processing parameter comprises a color warping parameter.

In a second aspect, an embodiment of the present application provides an image recognition method, including: acquiring an image to be identified; and identifying the image to be identified based on at least one of the plurality of neural networks obtained by training according to the method of the first aspect to obtain an identification result.

In a third aspect, an embodiment of the present application provides an apparatus for training a neural network, including: a first obtaining module, configured to obtain an image sample data set, where the image sample data set includes at least one image sample data; the input module is used for respectively inputting the image sample data set into a plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks; a determining module, configured to determine a loss function of the neural networks in the current iteration training based on a plurality of first prediction results; an adjusting module for adjusting network parameters of the plurality of neural networks based on the loss function.

Optionally, when the determining module determines the loss function of the neural networks in the current iteration training based on the first prediction results, the determining module specifically includes: respectively determining a first loss function corresponding to each neural network and determining a second loss function; determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first loss functions and the second loss functions.

Optionally, when the determining module determines the first loss function corresponding to each neural network and determines the second loss function, the determining module specifically includes: determining a first loss function for each neural network based on each first prediction result and a label of the image sample data set; determining a second prediction result based on the plurality of first prediction results; determining the second loss function based on the second prediction result and a label of the image sample data set.

Optionally, the determining module determines the second prediction result based on the plurality of first prediction results, including: and determining a first prediction result corresponding to a minimum loss function in the first loss functions of the plurality of neural networks as the second prediction result.

Optionally, the determining module determines the second prediction result based on the plurality of first prediction results, including: determining a weight of each first prediction result; obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result; wherein the weight is a weight at which the second loss function takes a minimum value.

Optionally, the first prediction result comprises a plurality of classification results; the determining module 53 determines a second prediction result based on the plurality of first prediction results, including: obtaining the minimum value of each classification result in the first prediction results of the plurality of neural networks; and obtaining the second prediction result according to the minimum value of each classification result.

Optionally, the determining module determines the second prediction result based on the plurality of first prediction results, including: obtaining a verification image dataset comprising at least one verification image data; determining a performance parameter for each of the neural networks based on the validation image dataset; determining the weight of each neural network based on the performance parameters, wherein the values of the performance parameters and the weights are in positive correlation; and obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result.

Optionally, the apparatus further comprises: the data enhancement processing module is used for respectively carrying out data enhancement processing on at least part of image sample data in the image sample data set to obtain a plurality of image sample data sets, and respectively taking the plurality of image sample data sets as the input of the plurality of neural networks; wherein the types and/or processing parameters of the data enhancement processing adopted to obtain the plurality of image sample data sets are different.

In a fourth aspect, an embodiment of the present application provides a face recognition apparatus, including: the second acquisition module is used for acquiring an image to be identified; the recognition module is configured to recognize the image to be recognized based on at least one of the plurality of neural networks trained by the method according to the first aspect, so as to obtain a recognition result.

In a fifth aspect, an embodiment of the present application provides an electronic device, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first and second aspects.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the methods of the first and second aspects.

According to the method, the device, the equipment and the medium for training the neural network and the image recognition, the acquired image sample data set is respectively input into the plurality of neural networks, the first prediction result output by each neural network in the plurality of neural networks is obtained, the loss function of the plurality of neural networks in the current iteration training is determined based on the plurality of first prediction results, and then the network parameters of the plurality of neural networks are adjusted based on the loss function, wherein the image sample data set comprises at least one image sample data. Because the first prediction results of the plurality of neural networks are adopted to replace the prediction results of the teacher model, the training of the teacher model can be omitted in the neural network training process of the embodiment, and the training process is simplified.

Drawings

FIG. 1 is a flow chart of a method for training a neural network according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of determining a second predicted result based on a first predicted result according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating classification results of various neural networks provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of determining a second prediction based on a first prediction according to another embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for training a neural network according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In knowledge distillation, a teacher model is generally a complex model with good reasoning performance, and a student model is a simplified model with low complexity. By having the goals associated with the teacher model as part of the loss function of the student model, the student model can be made to learn the features in the teacher model during the training process, enabling the distillation of knowledge from the teacher model to the student model.

However, in the prior art, an additional teacher model needs to be trained, and the student model acquires knowledge from the teacher model, so that the performance of the student model is often dependent on the teacher model.

In order to solve the above technical problems in the prior art, in the embodiments of the present application, the prediction results obtained by performing collaborative training on a plurality of student models are used to replace the prediction results of the teacher model, and then the prediction results of the plurality of student models are propagated in the reverse direction to train each individual student model, thereby omitting the training of the teacher model.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

FIG. 1 is a flow chart of a method for distillation of knowledge provided in the examples of the present application. The embodiment of the application aims at the technical problems in the prior art, and provides a knowledge distillation method, as shown in figure 1, which comprises the following specific steps:

step 101, acquiring an image sample data set.

Wherein the image sample data set comprises at least one image sample data.

It should be noted that the technical solutions provided in the present application are also applicable to the processing of audio data and video data. In the present application, taking image data as an example, training of a plurality of neural networks is performed through an image sample data set, so that the plurality of neural networks obtained through common training are used for image recognition. For audio data and video data, an audio sample data set or a video sample data set may be used as a sample data set for neural network training in a process of training a plurality of neural networks to implement training of the plurality of neural networks, so that the plurality of neural networks obtained by training implement identification of the audio data or the video data.

Taking the image sample data as an example, the image sample data may be an image acquired in an automatic driving scene. For example, an environmental image around the vehicle captured by an image capturing device on the autonomous vehicle is acquired as a training sample image. The environmental image may include objects such as pedestrians, vehicles, lane lines, light poles, signboards, buildings, and the like.

In other scenes, the image sample data can also be a face image acquired when the smart phone is unlocked, wherein the unlocking comprises screen locking and unlocking, application unlocking and the like. For example, a camera of a smartphone acquires an image including a human face as image sample data, or a human face image acquired through a public data set as image sample data. Of course, the face image may also be an image obtained from some other scenes applying the face recognition function, which is not specifically limited in this embodiment.

In some other scenarios, the image sample data may also be a public transportation environment image. For example, a public transportation environment image captured by an image capturing device on a public transportation vehicle is acquired, wherein the public transportation environment image includes passengers.

In addition, the image sample data may also be images acquired by various cameras in the city including various sensitive objects, abnormal objects, and abnormal behaviors.

It should be noted that the image sample data in this embodiment may include images in a plurality of scenes, and the images in the several scenes described above are only exemplary and do not limit this embodiment.

And 102, respectively inputting the image sample data sets into a plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks.

The step of training the neural networks is to train each of the plurality of neural networks according to the image sample data set, and each neural network obtains a first prediction result. The training process for each student model may include: convolution, full connection, nonlinear transformation, normalization and the like. For a specific processing procedure, reference may be made to the description of the prior art, which is not described herein again.

Optionally, the plurality of neural networks have the same structure, for example, each of the plurality of neural networks includes a convolutional layer and a fully connected layer, and the connection relationship between the layers and the number of each layer are the same. Of course, the present embodiment may also adopt a neural network of a different structure. In practical applications, the structures of the plurality of neural networks are generally set to be the same.

And 103, determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first prediction results.

In the prior art, the loss function of each neural network in the current iteration training is determined according to the prediction result of the teacher model and the label of the image sample data set.

In this embodiment, the loss function of each neural network in the current iteration training is determined according to the plurality of first prediction results and the labels of the image sample data set. It can be understood that, in the embodiment, a plurality of first prediction results are used to replace the prediction results of the teacher model, so that the training process of the teacher model can be omitted in the training process.

And 104, adjusting network parameters of the plurality of neural networks based on the loss function.

If the iterative training of the current round does not reach the convergence state, adjusting the network parameters of each neural network based on the loss function, and continuing to perform the next iterative training; and if the iterative training of the current round reaches the convergence state, finishing the training. For a specific implementation process of adjusting network parameters of a neural network according to a loss function, reference may be made to description of the prior art, and this embodiment is not described herein again.

According to the embodiment of the application, the acquired image sample data set is respectively input into the plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks, a loss function of the plurality of neural networks in the current iteration training is determined based on the plurality of first prediction results, and then network parameters of the plurality of neural networks are adjusted based on the loss function, wherein the image sample data set comprises at least one image sample data. Because the first prediction results of the plurality of neural networks are adopted to replace the prediction results of the teacher model, the training of the teacher model can be omitted in the neural network training process of the embodiment, and the training process is simplified.

Optionally, in this embodiment, the plurality of neural networks are trained cooperatively, that is, the plurality of neural networks are trained in parallel.

Wherein determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first prediction results comprises:

step a, respectively determining a first loss function corresponding to each neural network, and determining a second loss function.

Wherein each neural network corresponds to a first loss function and a second loss function respectively.

And b, determining the loss functions of the neural networks in the iterative training of the current round based on the first loss functions and the second loss functions.

Taking a neural network as an example, the loss function obtained in the current iteration training is obtained according to the first loss functions of all the neural networks and the second loss function of the neural networks, which can be specifically seen in the following formula (1):

in the formula (1), L represents a Loss function (namely Loss in FIG. 2) obtained by each student model in the current round of iterative training process, i represents the ith neural network in m neural networks, m represents the total number of the neural networks, m and i are integers greater than 0, i is greater than 0 and is less than or equal to m,

representing a second loss function corresponding to the ith neural network,

the first loss function corresponding to the ith neural network, λ is a constant,

representing the summation of the first loss functions over m neural networks.

Optionally, the determining a first loss function corresponding to each neural network and determining a second loss function respectively includes:

step a1, determining a first loss function for each neural network based on each first prediction result and the label of the image sample dataset.

In this embodiment, the same image sample data set is used by the plurality of neural networks, and thus the labels of the input data of each neural network are the same.

Each neural network outputs a first prediction result for the set of image sample data. A first loss function for each neural network may be determined based on the first prediction result for each neural network and the label of the image sample dataset, respectively.

As shown in fig. 2, the plurality of neural networks includes m neural networks, m is an integer greater than 0, the first prediction result of the ith neural network in the m neural networks is logits-i, the label of the image sample data set is y, and the first loss function of the ith neural network is

That is, it can be determined from the first prediction results logits-i and the label y

Taking face recognition as an example, the image sample data is a face image, the label is a face feature included in the face image, and the prediction result may be a face recognition result.

In one example, the plurality of neural networks includes a first neural network, a second neural network, a third neural network; the first prediction results obtained by the first neural network, the second neural network and the third neural network according to the image sample data set are respectively a first prediction result logits-1, a first prediction result logits-2 and a first prediction result logits-3, the label of the image sample data set is y, and the first loss functions of the first neural network, the second neural network and the third neural network are respectively

And

that is, it can be determined from the first prediction results logits-1 and the label y

From the first predictors logits-2 and the label y can be determined

From the first predictors logits-3 and the label y can be determined

Step a2, determining a second prediction result based on the plurality of first prediction results.

The second prediction result in this step may be considered as the prediction result of the teacher model, and it should be noted that the teacher model is not involved in this embodiment, and the second prediction result is used as a substitute for the prediction result of the teacher model in the prior art.

Please continue to refer to fig. 2, which is a flowchart illustrating the determination of the second prediction result based on the m first prediction results. That is, logits are determined based on logit-1, logit-2, …, logit-m.

Step a3, determining a second loss function based on the second prediction result and the label of the image sample data set.

For step a3, reference may be made to the detailed description of step a1, which is not described herein again.

In this embodiment, the execution sequence of the steps a1 to a2 is not specifically limited, and the step a1 may be executed first, and then the step a2 and the step a3 are executed, or the step a2 and the step a3 may be executed first, and then the step a1 is executed.

In this embodiment, a plurality of different embodiments may be adopted to determine the second prediction result, including at least the following embodiments:

in an alternative embodiment, the first prediction result corresponding to the smallest first loss function in the first loss functions of the plurality of neural networks may be determined as the second prediction result. Please continue to refer toFig. 2 is a diagram illustrating a first prediction result output by the neural network corresponding to the minimum first loss function as a second prediction result by comparing the first loss functions of the m neural networks. For example, if the first loss function of the 1 st neural network of the m neural networks is the minimum among the first loss functions of the m neural networks, the first prediction result is obtained

As a second prediction result.

Suppose that the first prediction result of the ith neural network in the m neural networks is Z_iThe second prediction result is Z_tThen Z is_t＝h(Z₁,Z₂,...,Z_m) Wherein h represents a pair Z₁,Z₂,...,Z_mThe operational relationship of (1).

In the present embodiment, the following formula can be used to express:

in the formula (2), y represents a label of the image sample data set, L_CE(Z_iY) represents a first prediction result Z according to the ith neural network_iAnd label y of the image sample data set, the determined first loss function of the ith neural network,

representing the neural network corresponding to the minimum value of the m first loss functions, denoted as k, Z_t＝Z_kRepresenting the first prediction result of the kth neural network as the second prediction result, and k and m are integers greater than 0.

In another alternative embodiment, determining the second predicted outcome based on the plurality of first predicted outcomes includes:

and b1, determining the weight of the plurality of first prediction results.

Wherein the weights of the plurality of first prediction results are weights when the second penalty function takes a minimum value.

Specifically, all the first predicted results z are obtained_iA weight α i is set such that all first predictors z_iThe result of the weighted addition is minimized to the loss function calculated by the actual tag and all z's are added_iAnd weighting the result after the addition to be used as a second prediction result. The above process can be expressed as the following formula (3):

in the formula (3), the reaction mixture is,

representing α belonging to a real number field of m dimensions, Z representing the set of all first predictors, α^TA transposed matrix representing a matrix of all weights, y a label of the image sample data set, α i a weight proportion corresponding to the ith first predictor, and the sum of the weights corresponding to all the first predictors is 1, that is,

representation α^TWith all first predicted results z_iThe result of the weighted addition is the least lossy calculated with the label y, where i is an integer greater than 0.

And b2, obtaining a weighted sum of the plurality of first prediction results based on the weights and the corresponding first prediction results, and determining the weighted sum as a second prediction result.

For example, if the neural network includes a first neural network, a second neural network and a third neural network, the first prediction results corresponding to the first neural network, the second neural network and the third neural network are the first prediction results z₁The second prediction result z₂And a third prediction result z₃First prediction result z₁The second prediction result z₂And a third prediction result z₃The corresponding weights are α 1, α 2, and α 3, respectively, and the second prediction result is z₁*α1+z₂*α2+z₃*α3。

In another alternative embodiment, the first prediction result for each of the plurality of neural networks comprises a plurality of classification results; determining a second prediction result based on the plurality of first prediction results, comprising:

and c1, acquiring the minimum value of each classification result in the first prediction results of the plurality of neural networks.

And c2, obtaining the second prediction result according to the minimum value of each classification result.

The second prediction result is a prediction result of multiple dimensions, and the prediction result of each dimension corresponds to the minimum value of each classification result.

For example, as shown iN fig. 3, the first prediction result of each of the m neural networks includes N classification results, and for the ith neural network, the N classification results output by the ith neural network are denoted as iN, and taking the 1 st and 2 nd neural networks as examples, the N classification results output by the 1 st neural network are 11, 12, …, 1N, and the N classification results output by the 2 nd neural network are 21, 22, …, 2N, and so on, the N classification results output by the mth neural network are m1, m2, …, mN. Then, in this embodiment, the minimum value is taken from the classification results of 11, 21, …, m1, and the minimum value is taken from the classification results of 12, 22, …, m2, and so on, and finally N minimum values are obtained, where the N minimum values constitute a second prediction result, which can be understood as that the second prediction result is a prediction result in N dimensions, where the N minimum values are respectively used as prediction results in each dimension in the N dimensions. I.e. one minimum value for each dimension.

Optionally, before step c1, normalization processing may be performed on each first prediction result. Wherein, the normalization processing can be performed on each first prediction result by adopting the following formula:

P＝soft max(z)＝soft max(z-z^c)； (4)

in the formula (4), z^cThe prediction result corresponding to the dimension of the real category of the image sample data is obtained, wherein the real category is the category corresponding to the label. For example, if the 1 st prediction model is classified into 100 classification resultsIf the true category is the 89 th classification result, the 89 th classification result is a prediction result corresponding to the dimension of the true category of the image sample data; softmax () is a function that maps variables to real numbers from 0 to 1;

in the formula (5), it can be understood that m neural networks and N classification results of each neural network form a matrix, z_t,jRepresenting the second prediction z_tThe element of the j-th row of (c),

represents z^cRow j and column i, wherein m neural networks are

Of each neural network, N classification results of each neural network are

The row(s).

In yet another alternative embodiment, determining the second predicted outcome based on the plurality of first predicted outcomes includes:

step d1, obtaining a verification image dataset.

Wherein the verification image dataset comprises at least one verification image data; the verification image dataset may be obtained as follows: acquiring an original sample data set; the original sample data set is divided into an image sample data set and a verification image data set, and certainly, the verification image data set may also be a data set obtained through other approaches, which is not specifically limited in this embodiment. Notably, the categories included in the validation dataset are at least partially identical to the categories included in the image sample dataset. For example, if the image sample data set includes 100 categories, the categories included in the verification data set need to be some or all of the 100 categories.

Step d2, determining performance parameters of each neural network based on the validation data set.

The performance parameters are used for representing the accuracy of the prediction result of the neural network on the verification data set. The method can be used for verifying whether the accuracy of the data set is within a preset range or not, or whether the error recognition rate is within a fault-tolerant range or not.

Optionally, the performance parameter may be represented by a generalization capability of the neural network on the verification data set, where the generalization capability refers to an adaptive capability of the machine learning algorithm to a new sample data set, and in short, a reasonable result is output by adding a new data set to an original image sample data set and training. The purpose of learning is to learn the rules underlying the image sample data set, and for data sets other than the image sample data set with the same rules, a trained network can also give appropriate output, and the capability is called generalization capability. This can be understood as the predictive ability of the neural network on unknown data.

Step d3, determining the weight of each neural network based on the performance parameters.

Wherein, the value of the performance parameter and the value of the weight are in positive correlation.

In an optional implementation manner, the same or different weights may be assigned to different neural networks according to the merits of the training effect, for example, if the merits of the training effect and the values of the weights assigned to the neural networks are defined in advance to be in positive correlation, then the neural network with the better training effect may be assigned with a higher weight, and the neural network with the worse training effect may be assigned with a lower weight of the preset rule to determine the weights of the plurality of neural networks.

In another alternative embodiment, the weights of the plurality of neural networks may also be determined according to the following equation (6):

in the formula (6), w_kRepresents the weight ratio of the kth neural network,

represents C_ijThe inverse of the matrix of (a) is,

represents C_ijThe k row and j column elements in the inverse matrix of (1); i and j represent the ith and jth neural networks in the m neural networks respectively, wherein i is equal to j or is not equal to j;

wherein:

in the formula (7), f_i(x_k) Representing a first prediction of the output of the ith neural network, f_j(x_k) Representing a first prediction, x, of the jth neural network output_kRepresenting the k-th image in the verification image set, t representing the label of the verification image set, and N representing the total number of verification images in the verification image set.

At initialization, w_kAre all 1/m, and then, each round of training is carried out, the first prediction result f output by the neural network is output_i(x_k) Calculating C_ijAnd update w_kTherefore, the weight of each student model is continuously iterated, and the output of the finally combined model is optimal as a whole.

And d4, obtaining a weighted sum of the first prediction results based on the weights and the corresponding first prediction results, and determining the weighted sum as the second prediction result.

For step d4, refer to the detailed description of step b2, and will not be described herein.

In the above embodiment, at least part of the image sample data input to at least two of the plurality of neural networks is different. That is, the image sample data sets input into each neural network may be the same image sample data set, may be different image sample data sets, or may be partially the same image sample data set. For example, if the image sample data set includes 100 image sample data and the number of the neural networks is 5, the image sample data input to the 5 neural networks is the 100 image sample data; or, the 100 image sample data are processed for 4 times to obtain 4 other 100 image sample data, and the 5 image sample data sets respectively correspond to the input data of the 5 neural networks; or, partial image sample data in 100 image sample data may be processed, for example, 21 st to 40 th, 41 st to 60 th, 61 st to 80 th, and 81 st to 100 th image sample data in 100 image sample data are processed respectively, or the same data in 100 image sample data are processed differently, so as to obtain 4 other 100 image sample data, where the 5 image sample data sets correspond to input data of 5 neural networks respectively.

Optionally, the processing of the image sample data in the above embodiment includes: and carrying out data enhancement processing on the image sample data. Before inputting the image sample data sets into the plurality of neural networks, respectively, the method of this embodiment further includes: respectively performing data enhancement processing on at least part of image sample data in the image sample data set to obtain a plurality of image sample data sets, and respectively taking the plurality of image sample data sets as the input of a plurality of neural networks; wherein the types and/or processing parameters of the data enhancement processing used to obtain the plurality of image sample data sets are different. The type of data enhancement processing includes at least one of: scale transformation, translation, rotation, cropping, color warping.

Wherein, in the case that the type of data enhancement processing includes scaling, the processing parameters include scaling parameters, such as the enlargement, reduction scale, and the like of the image; and/or, in case the kind of data enhancement processing includes translation, the processing parameter includes translation amount, such as step size of image translation, etc.; and/or, in case the kind of data enhancement processing includes rotation, the processing parameter includes rotation amount, such as one or a combination of more of rotation angle, rotation direction, etc. of the image; and/or, in case the kind of data enhancement processing includes cropping, the processing parameter includes a cropping value, such as one or a combination of more of a cropping position, a cropping size, etc. of the image; and/or, in case the kind of data enhancement processing comprises color warping, the processing parameters comprise color warping parameters comprising at least one of brightness, saturation, grey scale and reverse color.

As shown in fig. 4, h (x,₁)、h(x,₂)、h(x,_m) Respectively, represent performing data enhancement processing on image sample data x, wherein,₁,₂,_mrespectively representing the kind and/or processing parameters of the data enhancement process used by the 1 st, 2 nd and m-th neural networks.

Taking the example that the image sample data is an image and the image is translated, for the same image sample data, different translation amounts can be adopted to process the same image sample data, and translation and other operations except translation, such as scale transformation, rotation, cropping and color distortion, can also be respectively adopted to process the same image sample data, so that different image sample data are obtained to be input into different neural networks for training. Thereby expanding the learning space of different neural networks and enabling the multiple neural networks to be complemented with each other.

On the basis of the above embodiments, the embodiments of the present application further provide an image recognition method. The image recognition method comprises the following steps: acquiring an image to be identified; at least one image to be recognized is recognized based on the neural networks obtained by training in the method embodiment for training the neural networks, and a recognition result is obtained. The image to be recognized is an environment image around the vehicle, which is acquired by image acquisition equipment in an automatic driving scene; and performing image recognition on the image to be recognized, including recognizing targets such as pedestrians, vehicles, lane lines, lamp posts, signboards, buildings and the like in the image to obtain a target recognition result, so that driving decision information is provided for automatic driving. And under the condition that the image to be recognized is a face image, performing image recognition on the image to be recognized, including recognizing a face in the image to be recognized to obtain a face recognition result, wherein the face recognition result can be applied to screen locking and unlocking of the smart phone and application unlocking of the smart phone. And under the condition that the image to be recognized is the public traffic environment image, performing image recognition on the image to be recognized, including recognizing abnormal behaviors of passengers, wherein the abnormal behaviors of the passengers include behaviors of fighting, falling and the like. And under the condition that the image to be recognized is the image shot by the camera in the city, performing image recognition on the image to be recognized, such as danger detection (such as gun holding detection) and garbage classification, and obtaining a danger detection result and a garbage classification result. Compared with the existing training method, the performance of the neural network (student model) obtained by training in the embodiment is improved, so that the recognition rate and the recognition accuracy can be improved in the image recognition process.

Fig. 5 is a schematic structural diagram of an apparatus for training a neural network according to an embodiment of the present disclosure. The apparatus for training a neural network provided in the embodiment of the present application may perform the processing procedure provided in the embodiment of the method for training a neural network, as shown in fig. 5, the apparatus 50 for training a neural network includes: a first obtaining module 51, an input module 52, a determining module 53 and an adjusting module 54; the first obtaining module 51 is configured to obtain an image sample data set, where the image sample data set includes at least one image sample data; an input module 52, configured to input the image sample data set into a plurality of neural networks respectively, so as to obtain a first prediction result output by each of the plurality of neural networks; a determining module 53, configured to determine a loss function of the neural networks in the current iteration training based on a plurality of first prediction results; an adjusting module 54, configured to adjust network parameters of the plurality of neural networks based on the loss function.

Optionally, the determining module 53 determines, based on the first prediction results, a loss function of the neural networks in the current iteration training, including: respectively determining a first loss function corresponding to each neural network and determining a second loss function; determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first loss functions and the second loss functions.

Optionally, the determining module 53 determines a first loss function corresponding to each of the neural networks and determines a second loss function respectively, including: determining a first loss function for each neural network based on each first prediction result and a label of the image sample data set; determining a second prediction result based on the plurality of first prediction results; determining the second loss function based on the second prediction result and a label of the image sample data set.

Optionally, the determining module 53 determines the second prediction result based on the plurality of first prediction results, including: and determining a first prediction result corresponding to a minimum loss function in the first loss functions of the plurality of neural networks as the second prediction result.

Optionally, the determining module 53 determines the second prediction result based on the plurality of first prediction results, including: determining a weight of each first prediction result; obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result; wherein the weight is a weight at which the second loss function takes a minimum value.

Optionally, the determining module 53 determines the second prediction result based on the plurality of first prediction results, including: acquiring a verification image dataset, the verification image dataset comprising at least one verification image data; determining a performance parameter for each of the neural networks based on the validation image dataset; determining the weight of each neural network based on the performance parameters, wherein the values of the performance parameters and the weights are in positive correlation; and obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result.

Optionally, the apparatus further comprises: a data enhancement processing module 55, configured to perform data enhancement processing on at least part of the image sample data sets respectively to obtain a plurality of image sample data sets, and use the plurality of image sample data sets as inputs of the plurality of neural networks respectively; wherein the types and/or processing parameters of the data enhancement processing adopted to obtain the plurality of image sample data sets are different.

The training apparatus of the neural network shown in fig. 5 can be used to implement the technical solution of the embodiment of the training method of the neural network, and the implementation principle and the technical effect are similar, and are not described herein again.

Fig. 6 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application. The image recognition apparatus provided in the embodiment of the present application may execute the processing procedure provided in the embodiment of the image recognition method, as shown in fig. 6, the image recognition apparatus 60 includes: a second obtaining module 61 and an identifying module 62; the second obtaining module 61 is configured to obtain an image to be identified; the recognition module 62 is configured to recognize the image to be recognized based on at least one of the plurality of neural networks trained by the neural network training method described in the foregoing embodiment, so as to obtain a recognition result.

The image recognition apparatus in the embodiment shown in fig. 6 can be used to implement the technical solution of the above-mentioned image recognition method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device provided in the embodiment of the present application may execute the processing flow provided in the embodiment of the neural network training method or the embodiment of the image recognition method, as shown in fig. 7, the electronic device 70 includes: memory 71, processor 72, computer programs and communication interface 73; wherein the computer program is stored in the memory 71 and is configured to execute the process flow of the above training method embodiment of the neural network and/or the image recognition method embodiment by the processor 72.

The electronic device of the embodiment shown in fig. 7 may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

In addition, the present application also provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the training method embodiment and the image recognition method of the neural network described in the above embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of training a neural network, comprising:

acquiring an image sample data set, wherein the image sample data set comprises at least one image sample data;

respectively inputting the image sample data set into a plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks;

determining a loss function of the plurality of neural networks in the current iteration training based on a plurality of first prediction results;

adjusting network parameters of the plurality of neural networks based on the loss function.

2. The method of claim 1, wherein determining the loss function of the plurality of neural networks in the current iteration of training based on the plurality of first predictors comprises:

respectively determining a first loss function corresponding to each neural network and determining a second loss function;

determining a loss function of the plurality of neural networks in the current iteration training based on the plurality of first loss functions and the second loss functions.

3. The method of claim 2, wherein the separately determining a first loss function corresponding to each of the neural networks and determining a second loss function comprises:

determining a first loss function for each neural network based on each first prediction result and a label of the image sample data set;

determining a second prediction result based on the plurality of first prediction results;

determining the second loss function based on the second prediction result and a label of the image sample data set.

4. The method of claim 3, wherein determining a second predictor based on the plurality of first predictors comprises:

and determining a first prediction result corresponding to a minimum loss function in the first loss functions of the plurality of neural networks as the second prediction result.

5. The method of claim 3, wherein determining a second predictor based on the plurality of first predictors comprises:

determining a weight of each first prediction result;

obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result;

wherein the weight is a weight at which the second loss function takes a minimum value.

6. The method of claim 3, wherein the first prediction result comprises a plurality of classification results;

the determining a second prediction result based on the plurality of first prediction results comprises:

obtaining the minimum value of each classification result in the first prediction results of the plurality of neural networks;

and obtaining the second prediction result according to the minimum value of each classification result.

7. The method of claim 3, wherein determining a second predictor based on the plurality of first predictors comprises:

obtaining a verification image dataset comprising at least one verification image data;

determining a performance parameter for each of the neural networks based on the validation image dataset;

determining the weight of each neural network based on the performance parameters, wherein the values of the performance parameters and the weights are in positive correlation;

and obtaining a weighted sum of the plurality of first predicted results based on the weights and the corresponding first predicted results, and determining the weighted sum as the second predicted result.

8. The method of any one of claims 1-7, wherein at least some of the image sample data input to at least two of the plurality of neural networks is different.

9. The method of claim 8, wherein prior to said inputting said image sample data sets into a plurality of neural networks respectively, the method further comprises:

respectively performing data enhancement processing on at least part of image sample data in the image sample data set to obtain a plurality of image sample data sets, and respectively taking the plurality of image sample data sets as the input of the plurality of neural networks;

wherein the types and/or processing parameters of the data enhancement processing adopted to obtain the plurality of image sample data sets are different.

10. The method of claim 9, wherein the type of data enhancement processing comprises at least one of: scale transformation, translation, rotation, cropping, color warping.

11. The method according to claim 9 or 10, wherein in case the kind of the data enhancement processing comprises a scaling, the processing parameters comprise scaling parameters;

and/or, in case the kind of the data enhancement processing comprises translation, the processing parameter comprises translation amount;

and/or, in case the kind of the data enhancement processing comprises rotation, the processing parameter comprises rotation amount;

and/or, in the case that the kind of the data enhancement processing includes clipping, the processing parameter includes a clipping value;

and/or, in case the kind of the data enhancement processing comprises color warping, the processing parameter comprises a color warping parameter.

12. An image recognition method, comprising:

acquiring an image to be identified;

the image to be recognized is recognized based on at least one of the plurality of neural networks trained by the method according to any one of claims 1 to 11, and a recognition result is obtained.

13. An apparatus for training a neural network, comprising:

a first obtaining module, configured to obtain an image sample data set, where the image sample data set includes at least one image sample data;

the input module is used for respectively inputting the image sample data set into a plurality of neural networks to obtain a first prediction result output by each neural network in the plurality of neural networks;

a determining module, configured to determine a loss function of the neural networks in the current iteration training based on a plurality of first prediction results;

an adjusting module for adjusting network parameters of the plurality of neural networks based on the loss function.

14. The apparatus of claim 13, wherein the determining module, when determining the loss function of the neural networks in the current iteration based on the first prediction results, specifically comprises:

15. The apparatus according to claim 14, wherein the determining module determines the first loss function and the second loss function corresponding to each neural network respectively, and specifically includes:

16. A face recognition apparatus, comprising:

the second acquisition module is used for acquiring an image to be identified;

an identification module, configured to identify the image to be identified based on at least one of the plurality of neural networks trained according to the method of any one of claims 1 to 11, so as to obtain an identification result.

17. An electronic device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-12.

18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-12.