CN113159085B

CN113159085B - Classification model training and image-based classification method and related device

Info

Publication number: CN113159085B
Application number: CN202011604336.0A
Authority: CN
Inventors: 钱扬
Original assignee: Beijing Aibee Technology Co Ltd
Current assignee: Beijing Aibee Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-05-28
Anticipated expiration: 2040-12-30
Also published as: CN113159085A

Abstract

The application provides a training method and a training device for a classification model. The weight of each dimension feature in the first target feature is determined, and the second model is trained by using the weight, the first target feature and the second target feature, so that the classification model with high accuracy and low complexity can be obtained through training. Correspondingly, the classification method based on the image uses the trained second classification model to obtain the classification result of the target in the image, and the resource occupation amount of the classification operation can be reduced on the premise of ensuring the accuracy of the classification result.

Description

Classification model training and image-based classification method and related device

Technical Field

The application relates to the field of electronic information, in particular to a classification model training method, an image-based classification method and a related device.

Background

With the development of machine learning technology, neural network models are applied as classifiers in various fields. As the demand for classification accuracy increases, the complexity (e.g., number of layers) of the neural network model increases, and the higher the complexity the more resources the neural network model occupies at run-time.

In practice, especially in the application scenario with time-dependent demands, the complexity of the neural network model needs to be limited due to the computational limitation of hardware resources, that is, the neural network model with lower complexity is adopted.

Therefore, how to reduce the resource occupation amount of the classification operation on the premise of ensuring the accuracy of the classification result becomes a problem to be solved.

Disclosure of Invention

The application provides a training method and device for a classification model, and aims to solve the problem of how to obtain a classification model with higher accuracy and lower complexity. Correspondingly, the application also provides an image-based classification method, so that the resource occupation amount of classification operation is reduced on the premise of ensuring the accuracy of classification results.

In order to achieve the above object, the present application provides the following technical solutions:

a method of training a classification model, comprising:

Respectively acquiring a first feature and a second feature, wherein the first feature is a feature extracted from sample data by a trained first model, the second feature is a feature extracted from the sample data by a second model, and the complexity of the first model is higher than that of the second model;

the first feature is divided into dimensions again to obtain a first target feature;

Determining weights of features of each dimension in the first target feature;

training the second model using the weights, the first target feature, and a second target feature, the second target feature being obtained by repartitioning the second feature.

Optionally, the re-dividing the first feature into dimensions to obtain a first target feature includes:

The first characteristics are divided into dimensions again according to the information quantity or the influence degree on the output result of the classification model to obtain the first target characteristics;

wherein for any dimension of the feature, the larger the information amount, the larger the weight; or for the characteristics of any dimension, the greater the influence degree on the output result, the greater the weight.

Optionally, the step of repartitioning the dimensions of the first feature based on the information amount to obtain a first target feature includes:

acquiring a principal component analysis transformation matrix of the first feature;

And linearly transforming the first feature by using the principal component analysis transformation matrix to obtain the first target feature with decreasing information quantity of the feature of each dimension.

Optionally, the determining the weight of the feature of each dimension in the first target feature includes:

Determining the weight of the characteristics of the target dimension according to the variance and the total variance of the characteristics of the target dimension; the target dimension is any dimension, and the total variance is the sum of variances of the features of the dimensions.

Optionally, the step of re-dividing the dimensions to obtain a first target feature based on the influence degree of the first feature on the output result of the classification model includes:

acquiring a linear discriminant analysis transformation matrix of the first feature;

And performing linear transformation on the first characteristic by using the linear discriminant analysis transformation matrix to obtain the first target characteristic.

Determining the weight of the characteristic of the target dimension according to the characteristic quantity and the total characteristic quantity of the target dimension in the linear discriminant analysis transformation matrix; the target dimension is any dimension, and the total feature quantity is the sum of feature quantities of all dimensions in the linear discriminant analysis transformation matrix;

And taking the weight of each dimension in the linear discriminant analysis transformation matrix as the weight of each corresponding dimension in the first target feature.

Optionally, the first model is a teacher model in the knowledge distillation model;

The second model is a student model in the knowledge distillation model.

Optionally, the training the second model using the weight, the first target feature, and the second target feature includes:

determining a distance between the first target feature and the second target feature according to the weight, the first target feature and the second target feature;

training the second model using the distance as a first loss function.

determining a first similarity matrix and a second similarity matrix by using the distance, wherein the first similarity matrix is the similarity matrix of the first feature, and the second similarity matrix is the similarity matrix of the second feature;

Determining a second loss function according to the first similarity matrix and the second similarity matrix;

the second model is trained using a second loss function.

An image-based classification method, comprising:

Inputting the image into a classification model to obtain a classification result of the target in the image output by the classification model, wherein the classification model is used as a second model and is obtained by training by using the training method of the classification model.

A training apparatus for classification models, comprising:

The first acquisition module is used for respectively acquiring a first feature and a second feature, wherein the first feature is a feature extracted from sample data by a trained first model, the second feature is a feature extracted from the sample data by a second model, and the complexity of the first model is higher than that of the second model;

the second acquisition module is used for re-dividing the first characteristics into dimensions to obtain first target characteristics;

A determining module, configured to determine weights of features of respective dimensions in the first target feature;

And the training module is used for training the second model by using the weight, the first target feature and the second target feature, wherein the second target feature is obtained by re-dividing the second feature into dimensions.

An electronic device, comprising:

A memory and a processor;

the memory is used for storing a program, and the processor is used for running the program to realize the training method of the classification model or the classification method based on the image.

A computer-readable storage medium having stored thereon a program which, when read by a computing device, implements the training method of classification model or the image-based classification method described above.

According to the training method and device for the classification model, first characteristics of the first model for sample data extraction after training and second characteristics of the second model for sample image extraction are respectively obtained, and the first characteristics are divided into dimensions again to obtain first target characteristics. And determining the weight of each dimension characteristic in the first target characteristics, and training a second model by using the weight, the first target characteristics and the second target characteristics. Because the complexity of the first model is higher than that of the second model, the trained first model is more accurate than the second model, and therefore training the second model using the first target feature can enable the second model to learn to a higher accuracy. Furthermore, the weight of the feature of each dimension is also used as the training basis, so that finer granularity training can be realized, and the accuracy of the second model is further improved, and therefore, the classification model with high accuracy and lower complexity can be obtained through training. Correspondingly, the classification method based on the image uses the trained second classification model to obtain the classification result of the target in the image, and the resource occupation amount of the classification operation can be reduced on the premise of ensuring the accuracy of the classification result.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a training method of a classification model according to an embodiment of the present application;

FIG. 2 is a flow chart of a training method of a classification model according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a training device for classification models according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 is a training method of a classification model according to an embodiment of the present application, including the following steps:

S101, respectively acquiring a first feature and a second feature.

The first characteristic is the characteristic of the first model after training extracted from the sample data, and the second characteristic is the characteristic of the second model extracted from the sample data.

In this embodiment, the complexity of the first model is higher than that of the second model, so the trained first model can have higher classification accuracy than the trained second model. Therefore, the training of the first model is used as one of the bases for training the second model, and the specific steps are as follows.

S102, the first feature is divided into dimensions again to obtain a first target feature.

Specifically, the first feature may be re-divided into dimensions based on the information amount, to obtain a first target feature.

It should be emphasized that, in the conventional model such as a neural network, the information amount occupied by the features of each dimension in the sample data cannot be represented by the features (typically, feature vectors) extracted from the input data.

And the first target feature is obtained after the dimensions are re-divided based on the information quantity, so that the features of each dimension in the first target feature have the information quantity distinction.

For example, the first feature vector includes 512-dimensional features, and typically, one column in the feature vector matrix is one dimension, and data located in the same column is a feature in the same dimension.

After the dimensions are re-divided, the first target feature vector includes 512-dimensional features, and each dimension is arranged in the order of from large to small information, that is, the information amount of the features in the first column in the re-divided feature vector matrix is the largest and is larger than the information amount of the features in the second column, and the information amount of the features in the second column is larger than the information amount … of the features in the third column, and so on.

The specific manner of repartitioning the dimensions in accordance with the amount of information will be described in the following embodiments.

Or the dimensions can be re-divided based on the influence degree of the output result of the classification model to obtain the first target features, so that the features of the dimensions in the first target features have the influence degree distinction. The specific division will be described in the following embodiments.

S103, determining the weight of each dimension of the first target feature.

Wherein the larger the information amount is, the larger the weight is for the characteristics of any dimension divided by the information amount, and the larger the influence degree on the output result is for any dimension divided by the influence degree on the output result.

The manner in which the weights are calculated may be found in the prior art and will be illustrated in the following embodiments.

S104, training a second model by using the weights, the first target features and the second target features.

The second target feature is obtained by re-dividing the second feature into dimensions in the same manner as the first feature.

It will be appreciated that the purpose of training the second model using the weighted first and second target features is to enable the second model to learn the classification knowledge of the first model on the sample data and to add weights for each dimension feature to the knowledge so that the second model learns more fine-grained knowledge.

For an illustration of the training process, see the following examples.

In the flow shown in fig. 1, the second model is trained by using the features extracted from the sample data by the first model after training, the features extracted from the sample data by the second model, and the weights of the features in each dimension, so that the second model can learn the classification knowledge of the first model, and the classification accuracy of the second model can be improved compared with the case that the second model is directly trained by using the sample data because the complexity of the first model is higher than that of the second model. In summary, the accuracy of the second model is improved with the light weight advantage.

Furthermore, because the weight of the features of each dimension is also used as a training basis, the second model can learn the classification knowledge of the first model in a finer granularity, so that the accuracy is further improved.

The above embodiments will be described in detail below using a knowledge distillation model as an example.

Fig. 2 is a training method of a classification model according to an embodiment of the present application, including the following steps:

s201, inputting sample data into a teacher model and a student model in the knowledge distillation model to obtain a first feature and a second feature respectively.

Knowledge distillation models include teacher models and student models. The complexity of the teacher model is higher than that of the student model, so that the classification accuracy of the teacher model is high, and the resource occupation amount of the student model in operation is smaller. Therefore, in practice, a student model is generally used for prediction, and a teacher model is used for training the student model, so that the trained student model can learn the classification knowledge of the teacher model, thereby having higher accuracy.

It is understood that the sample data may be a plurality of pieces of sample data. Any piece of sample data (such as an image) is input into the teacher network, and a feature (such as a feature vector matrix) extracted by the teacher network, that is, a first feature, can be obtained. Any piece of sample data is input into the student network, and one feature (for example, a feature vector matrix) extracted by the student network, namely, a second feature, can be obtained.

The features extracted from the sample data by the first model and the second model may be intermediate data, not output data of the models.

For the training process of the first model, reference is made to the prior art.

S202, acquiring a principal component analysis transformation matrix of the first feature.

Principal component analysis is a common algorithm for extracting principal components in data, and the method for obtaining the principal component analysis transformation matrix can be seen in the prior art.

S203, performing linear transformation on the first features by using a principal component analysis transformation matrix to obtain first target features with decreasing information quantity of the features of each dimension.

Specifically, assuming that the first feature is denoted as f _t and the principal component analysis transformation matrix is denoted as T, tf _t is the first target feature.

Based on the characteristics of principal component analysis, the information amount of the features in each dimension is decreased in the first target features obtained in the above manner.

S204, determining the weight of the feature of the target dimension according to the variance and the total variance of the feature of the target dimension.

Wherein the target dimension is any dimension, and the total variance is the sum of variances of the features of each dimension, namely as shown in formula (1):

For the weight of the feature of the ith dimension, \delta_i is the variance of the feature of the ith dimension, Σ _i \delta_i is the sum of the variances of the features of the i dimensions. It can be seen that the larger the variance is, the larger the weight is, because the variance is a parameter reflecting the amount of information.

It should be emphasized that, since the dimensions of the feature are re-divided in S203 to obtain the feature of each dimension divided by the information amount, S204 can obtain the weight of the feature of each dimension, which is the reflected information amount, that is, the importance degree, by using the equation (1).

S205, calculating a distance function by using the weights, the first target feature and the second target feature.

Specifically, assuming that the second feature is denoted as f _s, the distance function is as in equation (2):

Where Tf _t is the second target feature and W is the weight of the feature in the ith dimension.

S206, training a second model by using the distance function.

Specifically, the training mode of the student model of the knowledge distillation model comprises two modes, and in the step, the distance function is combined into the two modes to train the student model.

The first way is: training based on absolute distance. I.e. the loss function for training the student model is: loss _kd＝Dist(f_t,f_s), in this step, a second model is trained using the distance function of equation (2) instead of the Loss function.

The second way is: training based on relative distance.

Calculating a similarity matrix of the first feature by using a distance function of the formula (2) to obtain a first similarity matrix A _t, calculating a similarity matrix of the second feature by using a distance function of the formula (2) to obtain a second similarity matrix A _s, and training a second model by using a Loss function Loss _kd＝Norm(A_t,A_s). Norms represent Norm operations.

The flow shown in fig. 2 uses the information quantity of the features in each dimension as a basis to determine the weight indicating the importance degree of the features in each dimension, and uses the weight as a training basis to realize finer-granularity knowledge distillation, thereby further improving the performance of the student network.

And, through the use with the existing algorithm, it is convenient to insert into the existing knowledge distillation algorithm frame, so have commonality.

The specific implementation of the repartitioning of the dimensions based on the degree of influence on the output result of the classification model is similar to the repartitioning of the dimensions based on the amount of information, except that:

1. the matrix used for the linear transformation is a linear discriminant analysis transformation matrix.

Namely: and obtaining a linear discriminant analysis transformation matrix of the first feature, and performing linear transformation on the first feature by using the linear discriminant analysis transformation matrix to obtain a first target feature.

Unlike principal component analysis, PCA, variance maximization theory, the idea of the linear discriminant analysis, LDA, algorithm is to project the data into a low-dimensional space, so that the same class of data is as compact as possible, and different classes of data are as dispersed as possible. The specific way of obtaining the LDA transform matrix can be seen in the prior art.

2. The manner in which the weights for the various dimensions are calculated is different.

Specifically, the weights of each dimension in the LDA transform matrix are determined first, namely: and determining the weight of the feature of the target dimension according to the feature quantity and the total feature quantity of the target dimension in the LDA transformation matrix.

The target dimension is any dimension in the LDA transformation matrix, and the total characteristic quantity is the sum of characteristic quantities of all dimensions in the LDA transformation matrix.

And then taking the weight of each dimension in the LDA transformation matrix as the weight of each corresponding dimension in the first target feature. It will be appreciated that because the specific form of the features is a matrix, the corresponding dimensions of the two matrices are in the same order, in the same location. For example, a first column in the LDA transform matrix corresponds to a first column in the first target feature, a second column in the LDA transform matrix corresponds to a second column in the first target feature, and so on.

The training method of the classification model and the classification model obtained by training can be applied to a scene of pedestrian re-recognition based on images:

the requirements for classification are: and identifying all images belonging to the same person according to the pedestrian images acquired by the cameras.

Based on the above-mentioned classification requirements, the training process for the classification model is: the teacher model is trained using the sample images.

The sample images are input into a teacher model and a student model, respectively, to obtain a first feature and a second feature. The sample image used in this step may be different from or the same as the sample image used for training the teacher model.

And obtaining the first target feature and the second target feature in the mode, and determining the weight of each dimension feature in the first target feature. The student model is trained in the manner described above.

The image-based classification method using the trained student model includes the steps of:

1. and inputting the images into the trained student model to obtain the feature vector of the target in the images output by the student model.

2. The distance between the feature vector and the feature vector of the other image is calculated. If the distance is smaller than the set threshold value, judging that targets in the two images are the same person; otherwise, a different person is identified.

The student model learns the classification knowledge of the teacher model, so that the classification accuracy is higher, and the complexity of the student model is lower than that of the teacher model, so that the occupied computing resources are fewer in the application scene of the same person, and the aim of reducing the resource occupation amount of classification operation on the premise of ensuring the accuracy of classification results is fulfilled.

Fig. 3 is a training device for a classification model according to an embodiment of the present application, including: the device comprises a first acquisition module, a second acquisition module, a determination module and a training module.

The first acquisition module is used for respectively acquiring a first feature and a second feature, wherein the first feature is a feature extracted from sample data by a trained first model, the second feature is a feature extracted from the sample data by a second model, and the complexity of the first model is higher than that of the second model.

And the second acquisition module is used for re-dividing the first characteristics into dimensions to obtain first target characteristics.

And the determining module is used for determining the weight of each dimension of the first target feature.

Optionally, the second obtaining module is configured to re-divide the first feature into dimensions to obtain a first target feature, and includes:

The second obtaining module is specifically configured to re-divide dimensions of the first feature based on an information amount or a degree of influence on an output result of the classification model, to obtain the first target feature;

Optionally, the second obtaining module is configured to re-divide dimensions of the first feature based on the information amount to obtain a first target feature, where the second obtaining module includes:

The second acquisition module is specifically configured to acquire a principal component analysis transformation matrix of the first feature; and linearly transforming the first feature by using the principal component analysis transformation matrix to obtain the first target feature with decreasing information quantity of the feature of each dimension.

Optionally, the determining module is configured to determine weights of features of each dimension in the first target feature, including:

The determining module is specifically configured to determine a weight of a feature of a target dimension according to a variance and a total variance of the feature of the target dimension; the target dimension is any dimension, and the total variance is the sum of variances of the features of the dimensions.

Optionally, the second obtaining module is configured to re-divide dimensions based on the degree of influence of the first feature on the output result of the classification model to obtain a first target feature, where the first target feature includes:

The second acquisition module is specifically configured to acquire a linear discriminant analysis transformation matrix of the first feature; and performing linear transformation on the first characteristic by using the linear discriminant analysis transformation matrix to obtain the first target characteristic.

The determining module is specifically configured to determine a weight of a feature of the target dimension according to the feature quantity and the total feature quantity of the target dimension in the linear discriminant analysis transformation matrix; the target dimension is any dimension, and the total feature quantity is the sum of feature quantities of all dimensions in the linear discriminant analysis transformation matrix; and taking the weight of each dimension in the linear discriminant analysis transformation matrix as the weight of each corresponding dimension in the first target feature.

Optionally, the first model is a teacher model in the knowledge distillation model; the second model is a student model in the knowledge distillation model.

Optionally, the training module is configured to train the second model using the weight, the first target feature and the second target feature, and includes:

The training module is specifically configured to determine a distance between the first target feature and the second target feature according to the weight, the first target feature and the second target feature; training the second model using the distance as a first loss function.

The training module is specifically configured to determine a first similarity matrix and a second similarity matrix by using the distance, where the first similarity matrix is a similarity matrix of the first feature, and the second similarity matrix is a similarity matrix of the second feature; determining a second loss function according to the first similarity matrix and the second similarity matrix; the second model is trained using a second loss function. The device of the embodiment can improve the accuracy of the model and reduce the complexity of the model, thereby reducing the resource occupation amount of classification operation on the premise of ensuring the accuracy of the classification result.

The embodiment of the application also discloses an electronic device, which comprises: memory and a processor. The memory is used for storing a program, and the processor is used for running the program to realize the training method of the classification model or the classification method based on the image.

The embodiment of the application also discloses a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and when the program is read by a computing device, the training method of the classification model or the classification method based on the image is realized.

The functions of the methods of embodiments of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored on a computing device readable storage medium. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of training a classification model, comprising:

Determining weights of features of each dimension in the first target feature;

2. The method of claim 1, wherein the re-sizing the first feature to obtain a first target feature comprises:

3. The method of claim 2, wherein re-dividing the dimensions of the first feature based on the information amount to obtain a first target feature comprises:

4. The method of claim 2, wherein re-partitioning dimensions of the first feature based on a degree of influence on an output result of the classification model to obtain a first target feature comprises:

5. The method of claim 1, wherein the training the second model using the weights, the first target feature, and the second target feature comprises:

training the second model using the distance as a first loss function.

6. The method of claim 1, wherein the training the second model using the weights, the first target feature, and the second target feature comprises:

the second model is trained using a second loss function.

7. An image-based classification method, comprising:

Inputting the image into a classification model to obtain a classification result of the target in the image output by the classification model, wherein the classification model is used as a second model and is obtained by training the classification model according to the training method of any one of claims 1-6.

8. A training device for a classification model, comprising:

9. An electronic device, comprising:

A memory and a processor;

the memory is adapted to store a program and the processor is adapted to run the program to implement the method of any one of claims 1-6 or 7.

10. A computer readable storage medium having stored thereon a program, characterized in that the method of any of claims 1-6 or 7 is implemented when said program is read by a computing device.