CN116611482B

CN116611482B - Model training method, device, electronic equipment and medium

Info

Publication number: CN116611482B
Application number: CN202310871661.0A
Authority: CN
Inventors: 刘安华
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-10-17
Anticipated expiration: 2043-07-14
Also published as: CN116611482A

Abstract

The disclosure provides a model training method, a device, electronic equipment and a medium, and relates to the technical field of vehicles, wherein the method comprises the following steps: and obtaining a first floating point model, performing post-quantization processing on the first floating point model to obtain a first quantization model, taking the first floating point model as a teacher model for knowledge distillation, taking the first quantization model as a student model for knowledge distillation, and performing quantization perception training based on the first floating point model and the first quantization model to obtain a model after the quantization perception training. By adopting the method, the accuracy of the quantization model obtained by training can be improved, so that the trained model is suitable for being deployed on a vehicle.

Description

Model training method, device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of vehicles, and in particular relates to a model training method, a model training device, electronic equipment and a storage medium.

Background

With the development of deep learning, neural networks are widely used in various fields, for example, in the field of vehicles to assist in driving the vehicles. When the vehicle is driven, the model helps the vehicle to quickly make judgment and decision. However, most of the current models are complex, resulting in slow calculation speed, large required memory, and unsuitable for deployment on vehicles.

In the related art, the problem of low calculation speed and large required memory can be solved through model quantization, so that the adaptability of deploying the model on a vehicle is improved. However, model quantization is a technology for converting floating point calculation into low-bit fixed point calculation, which can effectively reduce the calculation intensity, parameter size and memory consumption of the model, but the model quantization often brings about huge precision loss, which brings about additional challenges to the model deployed in the vehicle.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a model training method, apparatus, electronic device, and medium.

According to a first aspect of embodiments of the present disclosure, there is provided a model training method, the model training method including:

acquiring a first floating point model, wherein the first floating point model is obtained by training a neural network model to be trained to be converged by utilizing a first sample data set;

post-quantization processing is carried out on the first floating point model to obtain a first quantization model, wherein the first quantization model comprises first quantization parameters;

and performing quantization perception training based on the first floating point model and the first quantization model to obtain a model after quantization perception training, wherein in the quantization perception training process, the first floating point model is used as a teacher model for knowledge distillation, the first quantization model is used as a student model for knowledge distillation, the first quantization parameter is kept unchanged, and the model after quantization perception training is used for processing any one of the following data acquired by a vehicle: image data, audio data, point cloud data, and text data.

In some embodiments, the performing the quantized perceptual training based on the first floating point model and the first quantized model to obtain a quantized perceptually trained model includes:

constructing a knowledge distillation loss function, wherein the distillation loss function characterizes the distribution difference of intermediate result data of the first floating point model and intermediate result data corresponding to the first quantization model;

determining a preset loss function based on the knowledge distillation loss function and a loss function corresponding to the first quantization model, wherein the loss function corresponding to the first quantization model represents the difference between the output result of the first quantization model and a sample real label;

and replacing the loss function of the first quantized model by using the preset loss function, and performing quantized perception training on the replaced quantized model to obtain a quantized perception trained model.

In some embodiments, the intermediate result data includes a calculation of at least one layer preceding an output layer of the corresponding model.

In some embodiments, the method further comprises:

acquiring a second quantization model comprising the first quantization parameter;

splicing the second quantization model and the first floating point model to obtain a spliced model;

Training the spliced model by using a second sample data set to obtain a trained spliced model;

separating a second floating point model from the trained spliced model;

and taking the separated second floating point model as the first floating point model, and returning to execute the step of performing post-quantization processing on the first floating point model to obtain a first quantization model until a preset training stop condition is met.

In some embodiments, the spliced model includes a feature extraction network and a detection head, the detection head of the spliced model has the same network structure as the detection head of the second quantization model, the feature extraction network of the spliced model includes a first feature extraction network of the first floating point model, a second feature extraction network of the second quantization model, and a mean value calculation node, and the features extracted by the second feature extraction network on input data and the features extracted by the first feature extraction network on the input data are processed by the mean value calculation node and then used as input of the detection head of the spliced model.

In some embodiments, the performing post-quantization processing on the first floating point model to obtain a first quantization model includes:

Reasoning the data to be predicted in the third sample data set by using the first floating point model, and obtaining parameters to be quantized of a preset layer in the first floating point model when reasoning is performed on each data to be predicted respectively, wherein the parameters to be quantized comprise an activation value and a weight value;

determining initial quantization parameters based on parameters to be quantized of a preset layer in the first floating point model;

constructing a search space based on the initial quantization parameter;

determining the first quantization parameter from the search space based on a model performance evaluation index corresponding to a candidate quantization parameter sampled from the search space, wherein the model performance evaluation index corresponding to the candidate quantization parameter is a model performance evaluation index corresponding to a candidate quantization model obtained by assigning the candidate quantization parameter to the first floating point model;

and assigning the first quantization parameter to the first floating point model to obtain the first quantization model.

In some embodiments, the determining the first quantization parameter from the search space based on the model performance evaluation index corresponding to the candidate quantization parameter sampled from the search space includes:

predicting probability distribution of model performance evaluation indexes based on model performance evaluation indexes corresponding to each candidate quantization parameter sampled from the search space;

Determining the next candidate quantization parameter which corresponds to the optimal model performance evaluation index from the search space based on the probability distribution of the model performance evaluation index;

returning to execute the step of predicting the probability distribution of the model performance evaluation index based on the model performance evaluation index corresponding to each candidate quantization parameter sampled from the search space until the preset search stop condition is satisfied, and determining the most recently determined candidate quantization parameter as the first quantization parameter

According to a second aspect of embodiments of the present disclosure, there is provided a model training apparatus including:

the first acquisition module is configured to acquire a first floating point model, wherein the first floating point model is obtained by training a neural network model to be trained by using a first sample data set until convergence;

the post-quantization module is configured to perform post-quantization processing on the first floating point model to obtain a first quantization model, wherein the first quantization model comprises first quantization parameters;

the quantized perception training module is configured to perform quantized perception training based on the first floating point model and the first quantized model to obtain a model after quantized perception training, wherein in the quantized perception training process, the first floating point model is used as a teacher model for knowledge distillation, the first quantized model is used as a student model for knowledge distillation, the first quantized parameters are kept unchanged, and the model after quantized perception training is used for processing any one of the following data collected by a vehicle: image data, audio data, point cloud data, and text data.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of the method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method provided by the first aspect of the present disclosure.

According to the model training method, device, electronic equipment and medium, a first floating point model is obtained, post-quantization processing is conducted on the first floating point model, a first quantization model is obtained, the first floating point model is used as a teacher model for knowledge distillation, the first quantization model is used as a student model for knowledge distillation, quantization perception training is conducted on the basis of the first floating point model and the first quantization model, and a model after the quantization perception training is obtained. As the floating point model with higher precision is added as the teacher model to carry out knowledge distillation processing on the first quantization model with lower precision when the first quantization model is subjected to quantization perception training, the first quantization model can be assisted to converge to the precision level of the first floating point model as much as possible, so that the precision of the model obtained by final training is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of a model training method shown in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram of a model training apparatus, shown in accordance with an exemplary embodiment;

fig. 3 is a schematic diagram of an electronic device according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, all actions of acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

Model quantization (Model Quantization) can be understood as converting a floating point model into a fixed point model by some method. For example, the weight (weight) or activation value (activation) in the original model is float32, and the model is quantized into a fixed-point model with the weight or activation value being int8, uint8, or the like. Since the expression ability of int8, uint8, etc. is limited as compared to float32, a loss of a part of values is caused, resulting in a reduction in the accuracy of the quantized quantization model.

In various fields, particularly in the field of vehicles, extremely high requirements are placed on the accuracy of a model, and the accuracy of the model influences the accuracy of a decision strategy of the vehicle, so that the accuracy of the model applied to the vehicle is improved, and the method has important significance. The vehicle field may include a vehicle automatic driving scene, an auxiliary driving scene, and the like.

To solve the above problems, embodiments of the present disclosure provide a model training method. FIG. 1 is a flow chart of a model training method, as shown in FIG. 1, according to an exemplary embodiment, which may include the steps of:

step S110, a first floating point model is obtained, wherein the first floating point model is obtained by training a neural network model to be trained to be converged by using a first sample data set.

Wherein the first sample data set comprises data of any one of the following data types: image data, audio data, point cloud data, and text data.

That is, the model training method of the embodiment of the present disclosure may be applied to training of a model that processes image data in the vehicle field, for example, training of a lane line detection model, a traffic sign recognition model, or the like, training of a model that processes audio data in the vehicle field, training of a model that processes point cloud data in the vehicle field, training of a model that processes text data in the vehicle field, or the like, and may be applied to training of a model that takes various data types in the vehicle field as inputs.

In some embodiments, to further improve the accuracy of the model applied to the vehicle, the data in the first sample data set may be obtained by labeling the data collected by the vehicle. For example, lane marking may be performed on a road image acquired by a camera of the vehicle, so as to obtain a road image including lane marking, and the plurality of road images including lane marking form a first sample data set.

In the embodiment of the disclosure, the neural network model to be trained can be trained through the first sample data set, and after training to convergence, the first floating point model can be obtained.

In some embodiments, the process of training the neural network model to be trained using the first sample data set may use conventional supervised training.

The model in the embodiments of the present disclosure may be a single-task model or a multi-task model.

Step S120, post-quantization processing is performed on the first floating point model, so as to obtain a first quantization model, wherein the first quantization model comprises first quantization parameters.

The Post quantization may also be referred to as Post-Training Quantization (PTQ), which refers to a quantization method for obtaining quantization parameters of a network without retraining the network, i.e., without updating weights.

In the embodiment of the disclosure, after post-quantization processing is performed on the first floating point model, a first quantization model may be obtained, where a preset layer of the first quantization model may include a corresponding first quantization parameter. The preset layers may be selected according to actual quantization needs, for example, each layer including weights and activation values is selected as a preset layer.

And step S130, performing quantized perception training based on the first floating point model and the first quantization model to obtain a model after quantized perception training, wherein in the quantized perception training process, the first floating point model is used as a teacher model for knowledge distillation, the first quantization model is used as a student model for knowledge distillation, and the first quantization parameter is kept unchanged.

The model after the quantized perception training is used for processing any one of the following data acquired by the vehicle: image data, audio data, point cloud data, and text data.

The model corresponding to the vehicle is used to process the data of the type of the sample data.

For example, when the data in the sample data set is point cloud data collected by the lidar of the vehicle, the subsequently obtained quantized perceptually trained model may predict the point cloud data collected by the lidar of the vehicle as input data of the model.

In the embodiment of the disclosure, after the first quantization model is obtained, a quantization perception training (Quantization Aware Training, QAT) may be further performed on the first quantization model, where the quantization perception training refers to inserting a pseudo quantization module (false_quantization module) into the model to simulate rounding (rounding) and clamping (clamping) operations performed by the quantization model in the reasoning process, so that the adaptability of the model to quantization effects is improved in the training process, and higher precision of the quantization model is obtained.

In the embodiment of the disclosure, in the process of performing the quantized perceptual training on the first quantization model, a first floating point model may be added as a teacher model, the first quantization model may be used as a student model, the quantized perceptual training on the first quantization model is performed in combination with the knowledge distillation mode, and after the quantized perceptual training combined with the knowledge distillation, a model after the quantized perceptual training may be obtained.

In some embodiments, the learning rate of the quantized perceptual training in combination with knowledge distillation is less than the learning rate when training the neural network model to be trained using the first sample data set.

By adopting the model training method provided by the embodiment of the disclosure, a first floating point model is obtained, post-quantization processing is carried out on the first floating point model to obtain a first quantization model, the first floating point model is used as a teacher model for knowledge distillation, the first quantization model is used as a student model for knowledge distillation, and quantization perception training is carried out based on the first floating point model and the first quantization model to obtain a model after the quantization perception training. As the floating point model with higher precision is added as the teacher model to carry out knowledge distillation processing on the first quantization model with lower precision when the first quantization model is subjected to quantization perception training, the first quantization model can be assisted to converge to the precision level of the first floating point model as much as possible, so that the precision of the model obtained by final training is improved.

In some embodiments, in step S130, performing quantized perceptual training based on the first floating point model and the first quantized model to obtain a quantized perceptually trained model may include the following steps:

constructing a knowledge distillation loss function, wherein the distillation loss function represents the distribution difference of intermediate result data of the first floating point model and intermediate result data corresponding to the first quantization model;

determining a preset loss function based on the knowledge distillation loss function and a loss function corresponding to the first quantization model, wherein the loss function corresponding to the first quantization model represents the difference between the output result of the first quantization model and the sample real label;

and replacing the loss function of the first quantized model by using a preset loss function, and performing quantized perception training on the replaced quantized model to obtain a quantized perception trained model.

In the embodiment of the disclosure, in order to implement a quantized perceptual training process combined with knowledge distillation, a knowledge distillation loss function may be constructed according to intermediate result data of a first floating point model and a distribution difference of intermediate result data corresponding to the first quantized model, and besides knowledge distillation loss, since quantized perceptual training is required for the first quantized model, the knowledge distillation loss function may be superimposed on the basis of the loss function corresponding to the first quantized model to obtain a preset loss function, then the loss function of the first quantized model may be replaced by the preset loss function to obtain a replaced quantized model, and quantized perceptual training is performed on the replaced quantized model by using a sample data set to obtain a model after quantized perceptual training.

In some embodiments, the distribution difference of the intermediate result data of the first floating point model and the intermediate result data corresponding to the first quantization model includes, but is not limited to, calculation by KL (Kullback-le) divergence.

In some implementations, the intermediate result data includes a calculation of at least one layer preceding the output layer of the corresponding model. That is, the intermediate result data corresponding to the first quantization model may be a calculation result of at least one layer before the output layer of the first quantization model. The intermediate result data corresponding to the first floating point model may be a calculation result of at least one layer before the output layer of the first floating point model.

For example, the calculation result of the penultimate layer of the detection head of the first quantization model may be used as intermediate result data corresponding to the first quantization model, and the calculation result of the penultimate layer of the detection head of the first floating point model may be used as intermediate result data corresponding to the first floating point model. Thus, by calculating the KL divergence between the calculation result of the penultimate layer of the detection head of the first quantization model and the calculation result of the penultimate layer of the detection head of the first floating point model, the value of the distillation loss function can be determined.

In addition, the value of the loss function corresponding to the first quantization model can be calculated according to the difference between the output result corresponding to the first quantization model and the sample real label.

Accordingly, the replaced quantization model is counter-propagated based on the value of the distillation loss function and the value of the loss function corresponding to the first quantization model, and one-time updating of the model parameters of the replaced quantization model is completed.

In combination with the foregoing, the model may be a multi-task model, where the total loss of the quantized model of the replaced multi-task includes a loss corresponding to each layer of multi-task training and a knowledge distillation loss corresponding to each layer of multi-task respectively. That is, the total loss of the quantized model of the replaced multitasking includes a loss value calculated by a difference between the output result corresponding to each layer of the multitasking detection head and the sample real label, and a loss value of the corresponding multitasking layer calculated by a difference between the calculation result of at least one layer before the output layer of each layer of the multitasking and the calculation result of at least one layer before the output layer of the corresponding multitasking layer in the first floating point model.

In some embodiments, at least one layer preceding the output layer of the model may be any one layer, any two layers, or each layer preceding the output layer. When the multi-layer structure is included, the difference is calculated by calculating the difference from the output results of the corresponding layers in the first floating point model and the first quantization model, and then taking the sum of the calculated differences for each layer as the value of the distillation loss function.

Furthermore, in some implementations, the method of the disclosed embodiments may further include the steps of:

splicing the second quantization model with the first floating point model to obtain a spliced model;

training the spliced model by using the second sample data set to obtain a trained spliced model;

separating a second floating point model from the trained spliced model;

and taking the separated second floating point model as a first floating point model, and returning to execute the step of performing post-quantization processing on the first floating point model to obtain a first quantization model until a preset training stop condition is met.

In the embodiments of the present disclosure, the model including the first quantization parameter may be referred to as a second quantization model. Optionally, the first quantization model may be selected as the second quantization model, optionally, a model after quantization perception training may be selected as the second quantization model, optionally, a model initialized by parameters such as weight and bias, which have the same structure as the first quantization model may be selected, and the first quantization parameter may be added to the model, so as to obtain the second quantization model.

In this embodiment of the present disclosure, the second quantization model may be spliced with the first floating point model to obtain a spliced splicing model, then the splicing model is trained using the second sample data set to obtain a trained splicing model, after the trained splicing model is obtained, the second floating point model may be separated from the trained splicing model, then the second floating point model may be further used as the first floating point model, and step S120 and step S130 may be executed again until a preset training stop condition is satisfied, so as to obtain a model after quantization perception training again.

In some embodiments, the preset training stop condition may be that the trained model converges or that a preset number of iterative training is reached.

In the embodiment of the disclosure, a quantization model is added to train together in the training process of the first floating point model, so that the first floating point model can be controlled to converge in a quantization friendly direction in the training process to a certain extent, the subsequent quantization perception training process can converge more easily, and the precision of the model after the quantization perception training can be improved to a certain extent.

It will be appreciated that a model may generally include a feature extraction network, which may include a backbone network backbone and a network layer neg for collecting feature maps in different phases, and a detection head.

In the embodiment of the disclosure, the second quantization model and the first floating point model may be spliced in a plurality of splicing manners.

In some embodiments, the feature extraction networks of the second quantization model and the first floating point model may be shared, and the differential detection heads may be spliced, in which case the spliced model may include a shared feature extraction network, which may be derived from the feature extraction network of the second quantization model or the first floating point model, and the spliced model may further include detection heads respectively intercepted from the second quantization model and detection heads intercepted from the first floating point model connected to the output of the shared feature extraction network.

In this case, after the trained stitching model is obtained, the shared feature extraction network and the detection heads from the first floating point model may be separated from the trained stitching model to form the second floating point model.

In other embodiments, the second quantization model and the detection head of the first floating point model may be shared, and the differential feature extraction network may be spliced, in which case the spliced model may include a shared detection head, the shared detection may be from the second quantization model or the detection head of the first floating point model, the spliced model may further include a feature extraction network cut from the second quantization model and a feature extraction network cut from the first floating point model, and considering that the input of the shared detection head is two, if the result of the two feature extraction networks is directly input to the shared detection head, the feature may be inaccurate, so that in order to ensure feature accuracy, the output of the feature extraction network cut from the second quantization model and the output of the feature extraction network cut from the first floating point model may be connected to a mean value calculation node, and then output to the shared detection head after the mean value calculation.

That is, in some embodiments, the spliced model includes a feature extraction network and a detection head, the detection head of the spliced model has the same network structure as the detection head of the second quantization model, the feature extraction network of the spliced model includes a first feature extraction network of the first floating point model, a second feature extraction network of the second quantization model, and a mean value calculation node, and the features extracted from the input data by the second feature extraction network and the features extracted from the input data by the first feature extraction network are processed by the mean value calculation node and then serve as the input of the detection head of the spliced model.

In this case, after the trained stitching model is obtained, the feature extraction network from the first floating point model and the shared detection head may be separated from the trained stitching model to form the second floating point model.

In some embodiments, in step S120, post-quantization processing is performed on the first floating-point model to obtain a first quantization model, which may include the following steps:

reasoning the data to be predicted in the third sample data set by using the first floating point model, and acquiring parameters to be quantized of a preset layer in the first floating point model when reasoning is performed on each data to be predicted respectively, wherein the parameters to be quantized comprise an activation value and a weight value;

Determining initial quantization parameters based on parameters to be quantized of a preset layer in a first floating point model;

constructing a search space based on the initial quantization parameter;

determining a first quantization parameter from the search space based on model performance evaluation indexes corresponding to candidate quantization parameters sampled from the search space, wherein the model performance evaluation indexes corresponding to the candidate quantization parameters are model performance evaluation indexes corresponding to candidate quantization models obtained by assigning the candidate quantization parameters to a first floating point model;

and assigning the first quantization parameter to the first floating point model to obtain a first quantization model.

In the embodiment of the disclosure, the quantization node may be inserted in the preset layer, so as to obtain the parameter to be quantized of the preset layer inserted in the quantization node.

Alternatively, the initial quantization parameter may be determined according to the parameter to be quantized of the preset layer in the first floating point model by a minimum maximum value (minmax) method, a cross entropy method, a mean square error minimization (mse) method, or a percentile method (percentile).

Alternatively, the initial quantization parameter may be a combination of scale (scale) and zeropoint (fixed point value corresponding to floating point 0 after floating point number is mapped to fixed point) corresponding to each preset layer. Alternatively, the initial quantization parameter may be a combination of amax corresponding to each preset layer, where amax represents an intermediate parameter for calculating scale and zero obtained by statistics according to parameters to be quantized of the preset layers, for example, a minimum maximum value counted by a minmax method.

In some embodiments, the initial quantization parameter includes an initial quantization parameter corresponding to the activation value and an initial quantization parameter corresponding to the weight value.

In some embodiments, the initial quantization parameter of the first preset multiple may be selected as a left boundary and the initial quantization parameter of the second preset multiple may be selected as a right boundary, thereby constructing a search space bounded by the left boundary and the right boundary.

In some embodiments, the first preset factor may be selected to be 0.5 and the second preset factor may be selected to be 1.5.

In the embodiment of the disclosure, after the search space is constructed, candidate quantization parameters may be sampled from the search space, that is, a combination of scale and zero, or amax.

Then, the candidate quantization parameter may be assigned to the first floating point model to obtain a candidate quantization model including the candidate quantization parameter, then, a model performance evaluation index of the candidate quantization model may be obtained by testing on a test set, and then, the first quantization parameter may be determined from the search space based on the model performance evaluation index corresponding to the sampled candidate quantization parameter.

In the embodiment of the disclosure, based on the initial quantization parameter, a search space is constructed, and the first quantization parameter is determined from the constructed search space, so that the selectable range of the first quantization parameter can be expanded, and the precision of the model after the quantization perception training obtained later is further improved.

In some embodiments, the search space may be sequentially searched according to the step value step of the search space, so as to obtain each candidate quantization parameter.

In other embodiments, candidate quantization parameters may also be sampled from the search space in a probability distribution manner. In this case, determining the first quantization parameter from the search space based on the model performance evaluation index corresponding to the candidate quantization parameter sampled from the search space may include the steps of:

predicting probability distribution of model performance evaluation indexes based on model performance evaluation indexes corresponding to each candidate quantization parameter sampled from a search space;

and returning to execute the step of predicting the probability distribution of the model performance evaluation index based on the model performance evaluation index corresponding to each candidate quantization parameter sampled from the search space until the preset search stop condition is met, and determining the most recently determined candidate quantization parameter as the first quantization parameter.

Alternatively, the candidate quantization parameter of the first one of the sampled candidate quantization parameters may be obtained in a random sampling manner, or the initial quantization parameter may be used as the first candidate quantization parameter.

In the embodiment of the disclosure, the probability distribution of the performance evaluation index may be predicted according to the model performance evaluation index corresponding to each candidate quantization parameter that is currently sampled, for example, a gaussian distribution curve may be fitted according to the model performance evaluation index corresponding to each candidate quantization parameter that is currently sampled, the abscissa of the fitted gaussian distribution curve is the quantization parameter in the search space, and the ordinate is the model performance evaluation index, so that the abscissa of the point with the largest probability in the fitted gaussian distribution curve may be used as the next candidate quantization parameter.

By repeating the above process, the next candidate quantization parameter can be continuously found until the preset search stop condition is satisfied. The preset search stopping condition may be that the model performance evaluation index converges or reaches the maximum iterative search step number.

By adopting the mode, the time for sampling in the search space can be reduced, the time for obtaining the first quantization parameter is reduced, and the model training speed is improved.

For example, the sample data included in the first sample data set, the second sample data set, the third sample data set, etc. may be completely different, partially identical, or completely identical.

Next, a model training method of the embodiment of the present disclosure will be described with a specific example of a lane line detection model applied to a vehicle for detecting a lane line position from a road image.

Training an initial neural network model by using a first sample data set formed by a plurality of road images marked with lane line positions, and obtaining a first floating point model after the initial neural network model converges. The neural network model is initialized to a multitasking model.

The first floating point model can also be used for detecting lane lines in the road image, but the detection speed is relatively low, and the detection time delay is high, so that the first floating point model is not suitable for being directly deployed in a vehicle.

And respectively inserting quantization nodes into a preset layer of the first floating point model, and reasoning the data to be predicted in a third sample data set consisting of a plurality of road images marked with the positions of the lane lines by using the first floating point model.

And when the first floating point model is obtained to respectively infer each piece of data to be predicted, the parameters to be quantized of a preset layer in the first floating point model comprise an activation value and a weight value.

And respectively counting amax for each layer of the collected activation value and weight of the preset layer, and further calculating scale and zeropoint of each preset layer. In this embodiment, a minmax method is used to calculate scale and zero point of each preset layer, where amax represents the minimum and maximum values of the weights and activation values of each preset layer.

A search space is constructed based on amax. The search space of the i-th preset layer is shown as [0.5×amaxi,1.5×amaxi ], step is 0.1×amaxi, and the probability is assumed to satisfy the mixed gaussian distribution.

Based on the model performance evaluation index corresponding to each of the candidate quantization parameters sampled from the search space, the probability distribution of the model performance evaluation index is predicted. The model performance evaluation index corresponding to the candidate quantization parameter is a model performance evaluation index corresponding to a candidate quantization model obtained by assigning the candidate quantization parameter to the first floating point model.

Based on the probability distribution of the model performance evaluation index, determining the next candidate quantization parameter which corresponds to the model performance evaluation index and is optimal from the search space.

And returning to execute the step of predicting the probability distribution of the model performance evaluation index based on the model performance evaluation index corresponding to each candidate quantization parameter sampled from the search space until the preset search stopping condition is met, determining the most recently determined candidate quantization parameter as a first quantization parameter, and assigning the first quantization parameter to the first floating point model to obtain a first quantization model.

Fixing first quantization parameters of a first quantization model, taking the first floating point model as a teacher model, taking the first quantization model as a student model, adding KL divergence to monitor output of the penultimate layer of each layer of multitask to obtain a knowledge distillation loss function value corresponding to each layer of multitask, obtaining first loss function values corresponding to each layer of multitask respectively based on differences between output results of each layer of multitask and real labels of samples, obtaining loss values of a quantization perception training process based on the knowledge distillation loss function values corresponding to each layer of multitask and the first loss function values corresponding to each layer of multitask respectively, carrying out back propagation on the first floating point model based on the loss values of the quantization perception training process to obtain an iterative model, and obtaining the model after the quantization perception training after a plurality of iterations.

In some cases, after the above process, a part of head in the obtained model after the quantized sensing training is still different from the floating point model, at this time, a second quantized model including a first quantized parameter may be obtained, the second quantized model is spliced with the first floating point model to obtain a spliced model, the spliced model is trained by using a second sample data set composed of a plurality of road images marked with lane line positions, the trained spliced model is obtained, the second floating point model is separated from the trained spliced model, the separated second floating point model is used as the first floating point model, and the first floating point model is obtained by repeatedly executing post quantization processing on the first floating point model, quantized sensing training is performed based on the first floating point model and the first quantized model, and the steps of the model after the quantized sensing training are obtained until a preset training stop condition is met. At this time, a final lane line detection model can be obtained, and it can be understood that the detection accuracy of the lane line detection model on the lane line is close to that of the first floating point model, the calculation speed is faster than that of the first floating point model, and the required memory is smaller than that of the first floating point model, so that the lane line detection model is suitable for being deployed on a vehicle.

Fig. 2 is a block diagram of a model training apparatus 200 according to an exemplary embodiment, referring to fig. 2, the model training apparatus 200 is applied to an electronic device, and the model training apparatus 200 includes:

a first obtaining module 210 configured to obtain a first floating point model, where the first floating point model is obtained by training a neural network model to be trained to converge by using a first sample data set;

a post-quantization module 220 configured to perform post-quantization processing on the first floating-point model to obtain a first quantization model, where the first quantization model includes a first quantization parameter;

the quantized perception training module 230 is configured to perform quantized perception training based on the first floating point model and the first quantized model, so as to obtain a model after quantized perception training, wherein in the quantized perception training process, the first floating point model is used as a teacher model for knowledge distillation, the first quantized model is used as a student model for knowledge distillation, the first quantized parameters are kept unchanged, and the model after quantized perception training is used for processing any one of the following data collected by a vehicle: image data, audio data, point cloud data, and text data.

Optionally, the quantized perceptual training module 230 includes:

a construction sub-module configured to construct a knowledge distillation loss function, the distillation loss function characterizing a distribution difference of intermediate result data of the first floating point model and intermediate result data corresponding to the first quantization model;

a determining submodule configured to determine a preset loss function based on the knowledge distillation loss function and a loss function corresponding to the first quantization model, wherein the loss function corresponding to the first quantization model characterizes a difference between an output result of the first quantization model and a sample real label;

and the replacing sub-module is configured to replace the loss function of the first quantized model by using the preset loss function, and perform quantized perception training on the replaced quantized model to obtain a quantized perception trained model.

Optionally, the intermediate result data includes a calculation result of at least one layer preceding the output layer of the corresponding model.

Optionally, the model training apparatus 200 further includes:

a second acquisition module configured to acquire a second quantization model including the first quantization parameter;

the splicing module is configured to splice the second quantization model and the first floating point model to obtain a spliced splicing model;

The splicing model training module is configured to train the splicing model by using a second sample data set to obtain a trained splicing model;

a separation module configured to separate a second floating point model from the trained stitching model;

and the loop execution module is configured to take the separated second floating point model as the first floating point model, and return to execute the step of performing post-quantization processing on the first floating point model to obtain a first quantization model until a preset training stop condition is met.

Optionally, the spliced model includes a feature extraction network and a detection head, the network structure of the detection head of the spliced model is the same as that of the detection head of the second quantization model, the feature extraction network of the spliced model includes a first feature extraction network of the first floating point model, a second feature extraction network of the second quantization model and a mean value calculation node, and the features extracted by the second feature extraction network on input data and the features extracted by the first feature extraction network on the input data are processed by the mean value calculation node and then used as the input of the detection head of the spliced model.

Optionally, the post-quantization module 220 includes:

the reasoning sub-module is configured to use the first floating point model to infer data to be predicted in a third sample data set, and acquire parameters to be quantized of a preset layer in the first floating point model when each data to be predicted is respectively inferred, wherein the parameters to be quantized comprise an activation value and a weight value;

an initial quantization parameter determination submodule configured to determine an initial quantization parameter based on a parameter to be quantized of a preset layer in the first floating point model;

a search space construction sub-module configured to construct a search space based on the initial quantization parameter;

a first quantization parameter determining submodule, configured to determine, from the search space, a first quantization parameter based on a model performance evaluation index corresponding to a candidate quantization parameter sampled from the search space, the model performance evaluation index corresponding to the candidate quantization parameter being a model performance evaluation index corresponding to a candidate quantization model obtained by assigning the candidate quantization parameter to the first floating point model;

and the assignment submodule is configured to assign the first quantization parameter to the first floating point model to obtain the first quantization model.

Optionally, the first quantization parameter determination submodule includes:

a probability distribution predicting unit configured to predict a probability distribution of model performance evaluation indexes based on model performance evaluation indexes respectively corresponding to the candidate quantization parameters sampled from the search space;

a candidate quantization parameter determining unit configured to determine a next candidate quantization parameter optimal for the model performance evaluation index from the search space based on a probability distribution of the model performance evaluation index;

and a loop execution unit configured to return to executing a step of predicting a probability distribution of model performance evaluation indexes based on model performance evaluation indexes corresponding to the respective candidate quantization parameters sampled from the search space until a preset search stop condition is satisfied, and to determine a most recently determined candidate quantization parameter as the first quantization parameter.

With respect to the model training apparatus 200 in the above embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail herein.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the model training method provided by the present disclosure.

FIG. 3 is a block diagram of an electronic device 300 for a model training method, according to an example embodiment. For example, the electronic device 300 may be an on-board computer or an on-board controller.

Referring to fig. 3, an electronic device 300 may include one or more of the following components: a processing component 302, a memory 304, a power supply component 306, a multimedia component 308, an audio component 310, an input/output interface 312, a sensor component 314, and a communication component 316.

The processing component 302 generally controls overall operation of the electronic device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 302 may include one or more processors 320 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interactions between the processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.

The memory 304 is configured to store various types of data to support operations at the electronic device 300. Examples of such data include instructions for any application or method operating on the electronic device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 306 provides power to the various components of the electronic device 300. The power supply components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 300.

The multimedia component 308 includes a screen between the electronic device 300 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front-facing camera and/or a rear-facing camera. When the electronic device 300 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 310 is configured to output and/or input audio signals. For example, the audio component 310 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 300 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 further comprises a speaker for outputting audio signals.

Input/output interface 312 provides an interface between processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 314 includes one or more sensors for providing status assessment of various aspects of the electronic device 300. For example, the sensor assembly 314 may detect an on/off state of the electronic device 300, a relative positioning of components, such as a display and keypad of the electronic device 300, a change in position of the electronic device 300 or a component of the electronic device 300, the presence or absence of a user's contact with the electronic device 300, an orientation or acceleration/deceleration of the electronic device 300, and a change in temperature of the electronic device 300. The sensor assembly 314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 316 is configured to facilitate communication between the electronic device 300 and other devices, either wired or wireless. The electronic device 300 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 316 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the model training methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 304, including instructions executable by processor 320 of electronic device 300 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described model training method when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of model training, the method comprising:

constructing a knowledge distillation loss function, wherein the knowledge distillation loss function characterizes the distribution difference of intermediate result data of the first floating point model and intermediate result data corresponding to the first quantization model;

replacing the loss function of the first quantization model by using the preset loss function, and performing quantization perception training on the replaced quantization model to obtain a model after quantization perception training, wherein in the quantization perception training process, the first floating point model is used as a teacher model for knowledge distillation, the first quantization model is used as a student model for knowledge distillation, the first quantization parameter is kept unchanged, and the model after quantization perception training is used for processing any one of the following data collected by a vehicle: image data, audio data, point cloud data, and text data.

2. The method of claim 1, wherein the intermediate result data comprises a calculation of at least one layer preceding an output layer of the corresponding model.

3. The method according to claim 1, wherein the method further comprises:

separating a second floating point model from the trained spliced model;

4. A method according to claim 3, wherein the spliced mosaic model comprises a feature extraction network and a detection head, the detection head of the mosaic model has the same network structure as the detection head of the second quantization model, the feature extraction network of the mosaic model comprises a first feature extraction network of the first floating point model, a second feature extraction network of the second quantization model and a mean value calculation node, and the features extracted by the second feature extraction network on input data and the features extracted by the first feature extraction network on the input data are processed by the mean value calculation node and then used as the input of the detection head of the mosaic model.

5. The method according to any one of claims 1-4, wherein performing post-quantization on the first floating point model to obtain a first quantization model includes:

constructing a search space based on the initial quantization parameter;

6. The method of claim 5, wherein determining the first quantization parameter from the search space based on a model performance evaluation index corresponding to candidate quantization parameters sampled from the search space comprises:

and returning to execute the step of predicting the probability distribution of the model performance evaluation index based on the model performance evaluation index corresponding to each candidate quantization parameter sampled from the search space until a preset search stop condition is met, and determining the most recently determined candidate quantization parameter as the first quantization parameter.

7. A model training apparatus, the apparatus comprising:

the quantized perception training module is configured to perform quantized perception training based on the first floating point model and the first quantized model to obtain a model after quantized perception training, wherein in the quantized perception training process, the first floating point model is used as a teacher model for knowledge distillation, the first quantized model is used as a student model for knowledge distillation, the first quantized parameters are kept unchanged, and the model after quantized perception training is used for processing any one of the following data collected by a vehicle: image data, audio data, point cloud data, and text data;

The quantized perceptual training module comprises:

a construction sub-module configured to construct a knowledge distillation loss function, the knowledge distillation loss function characterizing a distribution difference of intermediate result data of the first floating point model and intermediate result data corresponding to the first quantization model;

8. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-6.