WO2022111617A1 - Model training method and apparatus - Google Patents

Model training method and apparatus Download PDF

Info

Publication number
WO2022111617A1
WO2022111617A1 PCT/CN2021/133383 CN2021133383W WO2022111617A1 WO 2022111617 A1 WO2022111617 A1 WO 2022111617A1 CN 2021133383 W CN2021133383 W CN 2021133383W WO 2022111617 A1 WO2022111617 A1 WO 2022111617A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
weight
network model
quantization
Prior art date
Application number
PCT/CN2021/133383
Other languages
French (fr)
Chinese (zh)
Inventor
金晶
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022111617A1 publication Critical patent/WO2022111617A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the present application relates to the field of computers, and in particular, to a model training method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Neural network quantization is a model compression technique that converts floating point storage (operation) into integer storage (operation).
  • the model parameters of a model used float32 (32-bit floating point) Indicates that the model parameters of the model after quantization are represented by int8 (8-bit fixed-point), and through the quantization operation of the model, the operation speed of the model is improved at the expense of a small loss of precision.
  • QAT Quantization awareness training
  • the main process is: 1. Insert quantization operators before model training, 2. Statistical model layers (weights) during the training process and activation) values of min and max are used to calculate the quantization factor.
  • a pseudo-quantization node SimQuant (also referred to as a quantization operator in this embodiment) needs to be inserted into the weight input and activation output of the original model.
  • SimQuant for the convolutional neural network (CNN) and batch normalization (BN) structures, another CNN is needed to realize BN folding to realize the fusion of BN coefficients and CNN weights.
  • SimQuant will count the min and max values in the corresponding data stream (Tensor) for subsequent calculation of the scale quantization factor.
  • QAT folds CNN and BN it needs to construct another CNN to perform convolution operation on the data of the current batch.
  • BN uses the result of the convolution operation to update the BN coefficient, and then uses the updated BN coefficient.
  • the quantization operator can quantify and inverse quantize the constructed weights, while CNN can perform convolution operations on the current batch of data based on the weights obtained after inverse quantization.
  • two CNNs will perform convolution operations on the same batch of data during the training process, which increases the amount of CNN operations during the training process, thereby reducing the training speed.
  • the present application provides a model training method, the method comprising:
  • the first neural network model includes a convolutional BN layer and a first quantization operator
  • the convolutional BN layer is used to perform the input Nth batch of batch data according to the first weight.
  • Convolution processing normalizing the convolution processing result according to the BN coefficient, updating the BN coefficient based on the normalization processing result, and updating the first weight on the updated BN coefficient
  • the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight
  • the convolutional BN layer is also used to quantify the input Nth weight according to the second weight.
  • Perform convolution processing on the data of +1 batch perform model training on the first neural network model to obtain the trained first neural network model.
  • the convolution processing result refers to the result obtained after performing convolution processing on the Nth batch of batch data.
  • the convolutional BN layer can be used as an independent layer.
  • the parts corresponding to the first convolutional layer and the first batch normalized BN layer can still be distinguished in the convolutional BN layer.
  • the part of the convolutional BN layer corresponding to the first convolutional layer is still called the first convolutional layer
  • the part of the convolutional BN layer corresponding to the first batch normalized BN layer is called the first convolutional layer.
  • Batch normalized BN layer the first convolution layer is used to perform convolution processing on the input Nth batch of batch data according to the first weight, so as to obtain the first output (that is, the convolution processing result mentioned above).
  • the first BN layer is used for normalizing the first output according to the BN coefficient, and based on the normalization result, the BN coefficient is updated, and the convolutional BN layer is used for the updated
  • the BN coefficient is used to update the first weight
  • the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain the second weight
  • the first convolution layer also uses performing convolution processing on the input data of the N+1th batch according to the second weight; performing model training on the first neural network model to obtain the trained first neural network model.
  • the first neural network model can be obtained by performing BN folding processing on a pre-trained pre-trained model and adding a quantization operator (also called a pseudo-quantization node SimQuant), and the first output can be used as the first BN layer.
  • the first BN layer can normalize the first output, and update the BN coefficients based on the results of the normalization processing, wherein, in the training process, the BN layer is based on the convolutional layer in the feedforward process
  • the mean and standard deviation of the output features are used to perform BN operation.
  • the first BN layer is connected to the first convolutional layer, and the first BN layer is used to perform the BN operation according to the first convolutional layer.
  • BN operation is performed on the first output with the mean and standard deviation of the first output, and then the training device can update the BN coefficient based on the operation result, where the BN coefficient can include but is not limited to mean ⁇ , variance ⁇ , scale parameter ⁇ and at least one of the offset parameters ⁇ , or an operation result between any of them.
  • the first convolutional layer determines the weight used by the current batch of batches through the BN coefficients updated after the BN layer processes the data of the current batch. Therefore, in addition to the first convolutional layer, another A separate convolutional layer is set for data processing so that the BN layer can update the BN coefficient based on the data of the current batch.
  • the BN coefficients updated after processing are used to determine the weights used in the current batch, so there is no need to set up a separate convolutional layer. amount of data operations. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
  • the method further includes:
  • the second neural network model can be a pre-trained pre-trained model, the second neural network model includes a first convolutional layer and a first BN layer; the first convolutional layer can be It is used to perform convolution processing on the input data according to the target weight to obtain the first output, and the target weight is the weight included in the convolution kernel in the first convolution layer; the first BN layer is used to obtain the first output according to the BN coefficient.
  • the volume that needs to be BN-folded in the model can be identified according to the operator type in the computational flow graph of the second neural network model.
  • the structure of the stacked layer and the BN layer (this embodiment can also be described as a CNN+BN structure), and the identified CNN+BN structure is combined into a block (that is, the convolutional BN layer in the above embodiment); Replace the original CNN+BN structure with the combined convolutional BN layer.
  • the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the update The latter first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
  • the second neural network model may include the convolution layer and the BN layer above (the convolution layer may also be referred to as the first convolution layer in the second neural network model described above, and the BN layer may also be referred to as the first convolution layer in the second neural network model described above. referred to as the first BN layer in the second neural network model described above), the first convolutional layer in the second neural network model is used to perform convolution processing on the input data according to the target weight.
  • the first convolution layer in the neural network model is used to perform convolution processing on the input data according to the target weight
  • the input data is the data input to the first convolution layer, that is, the input of the middle layer in the neural network, and is not the input of the neural network
  • the first weight is obtained according to the product of the BN coefficient and the target weight
  • the updated first weight is a combination of the updated BN coefficient and the target weight Obtained by product
  • the target weight is the weight included in the convolution kernel in the convolution layer.
  • the target weight may be the weight included in the first convolution layer in the second neural network model.
  • the first convolution layer may perform a convolution operation on the input data based on a convolution kernel including target weights, and the convolution kernel may include target weights and biases.
  • the method further includes: performing a product operation on the BN coefficient and the target weight to obtain a first target tensor, where the first target tensor includes M elements; The N target elements with the largest absolute value among the M elements included in the first target tensor are replaced with the largest element among the M-N elements other than the N target elements among the M elements, to The first weight is obtained.
  • the weights in the second neural network model (pre-trained model) and the coefficients of BN may be used to initialize the first weights in the first neural network model.
  • the BN coefficient and the target weight can be multiplied according to the pre-trained model to obtain the first target tensor, for example, the first target tensor can be ⁇ / ⁇ *W, and then the first target tensor
  • Each element in the tensor is sorted according to size, and in a symmetrical manner, intercepts the value of the main part (eg, intercepts 95% to 99.5%), and replaces the remaining elements with the largest value in the value of the main part, so as to realize the first weight. initialization.
  • the first weight is in the form of a tensor.
  • the number of elements with large absolute values in the first target tensor is small. In the process of subsequent quantization and inverse quantization, due to the large absolute value, the accuracy of the operation will be affected. For example, other elements of the quantization factor will be unnecessary. smoothing, the embodiment of the present application improves the processing accuracy of the neural network model by performing element interception on the first target tensor.
  • the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain a second output.
  • a second quantization operator may also be added to the output position of the activation layer.
  • the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process input data, wherein the input data is input to The data of the target activation layer, that is, the input of the intermediate layer in the neural network, rather than the input of the neural network, to obtain the third output, the first neural network model also includes the target activation layer and the second quantization operator , the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, and the second quantization operator is used to perform the fourth output according to the second quantization factor. Quantization and inverse quantization.
  • the third output is a second target tensor
  • the second target tensor includes X elements
  • the method further includes: obtaining the Y with the largest absolute value among the X elements target elements; replace the Y target elements in the second target tensor with the largest element among the X-Y elements other than the Y target elements among the X elements, so as to obtain the first Two quantization factors.
  • the percentage may be, but not limited to, 95% to 99.5%, and 95% to 99.5% of the elements may be elements in the main part of the element distribution, that is, elements with an absolute value close to 0.
  • the trained first neural network model includes the trained first quantization factor and the trained BN coefficient, and the method further includes:
  • the first neural network model is quantized to obtain a third neural network model, where the third neural network model includes all the quantized
  • the first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is based on the first quantization factor and the trained BN coefficient obtained.
  • the present application provides a model training device, the device comprising:
  • the obtaining module is used to obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the input Nth input according to the first weight.
  • the data of the batch is subjected to convolution processing, the result of the convolution processing is normalized according to the BN coefficient, and the BN coefficient is updated based on the normalized processing result, and the first weight is applied to the updated BN coefficient.
  • a model training module configured to perform model training on the first neural network model to obtain a trained first neural network model.
  • the obtaining module is configured to obtain a second neural network model, where the second neural network model includes a first convolution layer and a first BN layer;
  • the first BN layer is subjected to BN folding processing to obtain the first neural network model, and the first neural network model includes the result obtained by folding the first convolutional layer and the first BN layer.
  • the convolutional BN layer is configured to obtain a second neural network model, where the second neural network model includes a first convolution layer and a first BN layer;
  • the first BN layer is subjected to BN folding processing to obtain the first neural network model, and the first neural network model includes the result obtained by folding the first convolutional layer and the first BN layer.
  • the convolutional BN layer is configured to obtain a second neural network model, where the second neural network model includes a first convolution layer and a first BN layer;
  • the first BN layer is subjected to BN folding processing to obtain the first neural network model, and the first neural network model includes the result obtained by folding the first
  • the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the update The latter first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
  • the apparatus further includes:
  • a product operation module configured to perform a product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements;
  • An element replacement module configured to replace N target elements with the largest absolute value among the M elements included in the first target tensor with M-N elements other than the N target elements among the M elements The largest of the elements to get the first weight.
  • the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain a second output.
  • the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process the input data to obtain a third output, so
  • the first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, the The second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
  • the third output is a second target tensor, and the second target tensor includes X elements, and the obtaining module is configured to obtain the largest absolute value among the X elements.
  • Y target elements
  • the element replacement module is configured to replace the Y target elements in the second target tensor with the largest element in the X-Y elements other than the Y target elements among the X elements, to The second quantization factor is obtained.
  • the first quantization operator is configured to perform quantization processing and inverse quantization processing on the updated first weight according to a first quantization factor
  • the trained first neural network model includes The first quantization factor after training and the BN coefficient after training
  • the device further includes:
  • an embodiment of the present application provides a model training apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, so as to execute the above-mentioned first aspect and any of its optional methods.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the first aspect and any one of the above-mentioned first aspects. optional method.
  • an embodiment of the present application provides a computer program, including code, for implementing the first aspect and any optional method thereof when the code is executed.
  • the present application provides a system-on-a-chip
  • the system-on-a-chip includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods; or, information.
  • the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • An embodiment of the present application provides a model training method, the method includes: acquiring a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN The layer is used to perform convolution processing on the input data of the Nth batch of batches according to the first weight, normalize the convolution processing results according to the BN coefficients, and update the BN coefficients based on the normalization processing results.
  • the updated BN coefficient is used to update the first weight, and the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the volume
  • the product BN layer is also used to perform convolution processing on the input N+1 batch of data according to the second weight, and perform model training on the first neural network model to obtain the trained first neural network model.
  • the updated BN coefficient determines the weight of the current batch, so there is no need to set up a separate convolutional layer.
  • the size of the model can be reduced, and on the other hand, the amount of data operations in the convolutional layer in the neural network is also reduced.
  • the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited.
  • the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
  • Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame
  • Figure 2 is a schematic diagram of the folding of CNN and BN by QAT;
  • FIG. 3 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an embodiment of a model training method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a BN folding provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a convolutional BN layer provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an element interception provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a model training apparatus 1000 provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an execution device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.
  • the embodiments of the present application may be applied to scenarios such as image classification, object detection, semantic segmentation, room layout, image completion, or automatic coding.
  • Application Scenario 1 ADAS/ADS Visual Perception System
  • multi-type 2D object detection needs to be performed in real time, including: dynamic obstacles (Pedestrian), Cyclist (Cyclist), Tricycle (Tricycle), Car (Car), Truck (Truck), Bus (Bus)), static obstacles (TrafficCone, TrafficStick, FireHydrant, Motorcycle, Bicycle), Traffic Signs (TrafficSign), Guidance Signs (GuideSign), Billboard (Billboard), Red Traffic Light (TrafficLight_Red)/Yellow Traffic Light (TrafficLight_Yellow)/Green Traffic Light (TrafficLight_Green)/Black Traffic Light (TrafficLight_Black), Road Sign (RoadSign)).
  • the neural network model trained by using the technical solutions provided in the embodiments of the present application can complete all the above-mentioned functions or realize a part of the functions of the ADAS/ADS visual perception system.
  • the neural network model (for example, the trained first neural network model, the second neural network model, and the third neural network model) obtained through the training of the technical solutions provided in the embodiments of the present application can detect the Mask and the key of the human body. Click, you can zoom in and out of the corresponding parts of the human body, such as waist and buttock operations, so as to output beautifying images.
  • the category of the object in the image to be classified can be acquired based on the neural network, and then the image to be classified can be classified according to the category of the object in the image to be classified.
  • photos can be quickly classified according to the content in the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.
  • the neural network models (for example, the trained first neural network model, the second neural network model, and the third neural network model) trained by the provided technical solution can quickly classify images.
  • the embodiments of the present application can perform neural network training, and the obtained trained neural network can perform task processing in the above several scenarios.
  • Neural network quantization is a model compression technique that converts floating point storage (operation) into integer storage (operation).
  • the model parameters of a model used float32 (32-bit floating point) Indicates that the model parameters of the model after quantization are represented by int8 (8-bit fixed-point), and through the quantization operation of the model, the operation speed of the model is improved at the expense of a small loss of precision.
  • R is the input floating-point data
  • Q is the fixed-point data after quantization of the floating-point data
  • Z represents the zero point value (Zero Point)
  • S represents the scale. It can be seen that after determining S and Z, both conversion between data. There are many ways to determine S and Z, such as:
  • Rmax represents the maximum value of the input floating-point data
  • Rmin represents the minimum value of the input floating-point data
  • Qmax represents the maximum value of the fixed-point data
  • Rmin represents the minimum value of the fixed-point data
  • the conversion between fixed-point data with different numbers of bits may refer to the above-mentioned conversion method between floating-point data and fixed-point data, or may be other conversion methods in the prior art, here No longer.
  • 4-bit and 8-bit can be performed with reference to the above-mentioned conversion method, and an implementation of floating-point data and 2-bit (1-bit) conversion can be performed by the following formula:
  • 2 bits can be represented as three numbers - 1, 0, 1.
  • T is the threshold.
  • the converted 2-bit fixed-point data is 1.
  • floating-point data is less than -T, its value is converted to -1.
  • the floating-point data is other values, its value is converted to 0.
  • the conversion method of 1-bit is similar to that of 2-bit, but its fixed-point values are only -1 and 1, and the T value is 0.
  • QAT Quantization awareness training
  • the main process is: 1. Insert quantization operators before model training, 2. Statistical model layers (weights) during the training process and activation) values of min and max are used to calculate the quantization factor.
  • a pseudo-quantization node SimQuant (also referred to as a quantization operator in this embodiment) needs to be inserted into the weight input and activation output of the original model.
  • SimQuant for the convolutional neural network (CNN) and batch normalization (BN) structures, another CNN is needed to realize BN folding, so as to realize the fusion of BN coefficients and CNN weights.
  • SimQuant will count the min and max values in the corresponding data stream (Tensor) for subsequent calculation of the scale quantization factor.
  • Tensor data stream
  • BN uses the result of the convolution operation to update the BN coefficient, and then uses the updated BN coefficient.
  • the quantization operator can quantify and inverse quantize the constructed weights, while CNN can perform convolution operations on the current batch of data based on the weights obtained after inverse quantization.
  • two CNNs will perform convolution operations on the same batch of data during the training process, which increases the amount of CNN operations during the training process, thereby reducing the training speed.
  • the computation amount of the CNN can be reduced.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs (ie input data) and an intercept 1 as input, and the output of the operation unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional and subsampling layers.
  • the feature extractor can be viewed as a filter, and the convolution process can be viewed as convolution with an input image or a convolutional feature map using a trainable filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle.
  • Neural units in the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as the way to extract image information is independent of location. The underlying principle is that the statistics of one part of the image are the same as the other parts. This means that image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information.
  • multiple convolution kernels can be used to extract different image information. Generally, the more convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the convolutional neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal until the output will generate an error loss, and updating the parameters in the initial super-resolution model by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation motion dominated by the error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
  • Convolutional neural network (CNN, Convolutional neuron nrtwork) is a deep neural network with a convolutional structure and a deep learning architecture. Learning at multiple levels.
  • a CNN is a feed-forward artificial neural network in which each neuron responds to overlapping regions in images fed into it.
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
  • the structures composed of the convolutional layer/pooling layer 120 and the neural network layer 130 may be the first convolutional layer and the second convolutional layer described in this application, the input layer 110 and the convolutional layer/pooling layer 120
  • the convolutional layer/pooling layer 120 is connected to the neural network layer 130, the output of the neural network layer 130 can be input to the activation layer, and the activation layer can perform nonlinear processing on the output of the neural network layer 130.
  • the convolutional/pooling layer 120 may include layers 121-126 as examples.
  • layer 121 is a convolutional layer
  • layer 122 is a pooling layer
  • layer 123 is a convolutional layer
  • layer 124 is a convolutional layer.
  • Layers are pooling layers
  • 125 are convolutional layers
  • 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, 124 and 125 are convolutional layers, and 126 are pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification...
  • the dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation .
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
  • the initial convolutional layer for example, 121
  • the features extracted by the later convolutional layers become more and more complex, such as features such as high-level semantics.
  • each layer 121-126 exemplified by 120 in Figure 3 can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to utilize the neural network layer 130 to generate one or a set of outputs of the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 3) and the output layer 140, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc...
  • the output layer 140 After the multi-layer hidden layers in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 100 (as shown in Fig. 3 from 110 to 140 is forward propagation) is completed, the back propagation (as shown in Fig. 3 from 140 to 110 as back propagation) will start to update The weight values and biases of the aforementioned layers are used to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.
  • the convolutional neural network 100 shown in FIG. 3 is only used as an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models, for example, such as The multiple convolutional layers/pooling layers shown in FIG. 4 are in parallel, and the extracted features are input to the full neural network layer 130 for processing.
  • BN Through the normalization of small batches, the differences in parameter optimization of different levels of input are eliminated, the possibility of overfitting of a certain layer of the model is reduced, and the training can be carried out more smoothly.
  • BN coefficients can have: mean ⁇ , variance ⁇ , scale parameter ⁇ and offset parameter ⁇ .
  • the main purpose is to fuse the computation of BN and CNN to reduce the amount of computation.
  • This method is mainly used in QAT, so that the training quantization can simulate the inference BN fusion process, so that BN and CNN can be fused in the model transformation (the related coefficient roots are combined into one coefficient according to the calculation rules), and the model inference efficiency is accelerated.
  • Convolution BN represents the fusion operator of convolution and BN. This operator realizes the function of CNN and the function of BN. Since the coefficient of BN is visible to CNN, it is easy to realize BN folding. The CNN convolution weights and the relevant BN coefficients are fused.
  • FIG. 5 is a schematic diagram of a system architecture 100 provided by an embodiment of the present application.
  • the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices.
  • I/O input/output
  • a user may enter data into I/O interface 112 through client device 140 .
  • the execution device 110 may call the data storage system 150
  • the data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing results to the client device 140 for provision to the user.
  • the client device 140 can be, for example, a control unit in an automatic driving system or a functional algorithm module in a mobile phone terminal, for example, the functional algorithm module can be used to implement related tasks.
  • the training device 120 can generate corresponding target models/rules based on different training data for different goals or tasks, and the corresponding target models/rules can be used to achieve the above-mentioned goals or complete the above-mentioned tasks. , which provides the user with the desired result.
  • the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific present form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • model training method provided by the embodiment of the present application is described by taking the model training stage as an example.
  • FIG. 6 is a schematic diagram of an embodiment of a model training method provided by an embodiment of the present application.
  • a model training method provided by an embodiment of the present application includes:
  • the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the input of the Nth batch of batches according to the first weight.
  • the data is subjected to convolution processing, the result of the convolution processing is normalized according to the BN coefficient, and the BN coefficient is updated based on the result of the normalization processing, and the updated first weight is updated on the BN coefficient,
  • the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight
  • the convolutional BN layer is also used to quantify the input value according to the second weight.
  • the data of the N+1 batch is processed by convolution.
  • the first neural network model may be obtained by performing BN folding processing on a pre-trained pre-trained model and adding a quantization operator (also referred to as a pseudo-quantization node SimQuant).
  • a quantization operator also referred to as a pseudo-quantization node SimQuant.
  • the training device may obtain a second neural network model, where the second neural network model is a pre-trained pre-trained model, and the second neural network model includes the first convolutional layer and the second neural network model.
  • a BN layer the first convolutional layer and the first BN layer in the second neural network model are folded to obtain the first neural network model, and the first neural network model includes A convolutional BN layer obtained after folding the first convolutional layer and the first BN layer, the convolutional BN layer includes the first convolutional layer and the first BN layer.
  • the second neural network model is a pre-trained pre-trained model, and the second neural network model is trained so that it has high data processing accuracy for specific tasks.
  • a quantization operator can be inserted into the second neural network model, and the convolutional layer and the BN layer can be quantized. Fold processing.
  • FIG. 7 is a schematic diagram of a BN folding provided by an embodiment of the present application.
  • the second neural network model includes the first convolution layer and the first BN layer
  • the training device may perform folding processing on the first convolutional layer and the first BN layer in the second neural network model to obtain the first neural network model, where the first neural network model includes A convolutional BN layer obtained by folding the first convolutional layer and the first BN layer.
  • the first convolution layer in the second neural network model is configured to perform convolution processing on the input data according to target weights.
  • the target weight needs to be multiplied by the BN coefficient, and the multiplication result is quantized and inverse quantized by the quantization operator, and then the inverse quantization result is used as the weight of the first convolution layer. .
  • the first neural network model includes a convolutional BN layer and a first quantization operator
  • the convolutional BN layer may include a first convolutional layer and a first batch normalized BN layer
  • the first convolutional layer It is used to perform convolution processing on the input data according to the first weight to obtain the first output
  • the first BN layer is used for normalizing the first output according to the BN coefficient, and based on the normalization processing
  • the BN coefficient is updated
  • the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight according to the first quantization factor, so as to obtain a second weight
  • the updated first weight is A weight is obtained according to the updated BN coefficient
  • the first convolution layer is configured to perform convolution processing on the input data according to the second weight.
  • the BN coefficient can be updated through the data of the previous batch, and the weight of this convolutional layer can be updated based on the updated BN coefficient of the data of the previous batch.
  • the first neural network model includes a first convolution layer, a first batch normalized BN layer and a first quantization operator, and the first convolution layer is used for inputting the first
  • the data of N batches are subjected to convolution processing to obtain the first output.
  • the first BN layer is used to normalize the first output according to the BN coefficient, and update the BN based on the normalization processing result.
  • the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight according to the first quantization factor to obtain a second weight, and the updated first weight is based on the updated first weight.
  • the first convolution layer is configured to perform convolution processing on the input data of the N+1th batch of batches according to the second weight, so as to obtain the second output.
  • the first weight is obtained according to the product of the BN coefficient and the target weight, and the updated first weight is obtained by multiplying the updated BN coefficient and the target weight. .
  • FIG. 8 is a schematic structural diagram of a convolutional BN layer provided by an embodiment of the present application.
  • conv represents the convolution layer
  • bn represents the BN layer
  • div represents division
  • mul represents Multiplication
  • the first convolution layer conv can perform convolution processing on the data of the previous batch (that is, the data of the Nth batch) to obtain the first output.
  • the first output is the result of dividing the convolution processing result obtained by the first convolution layer conv performing convolution processing on the data of the previous batch by the BN coefficient.
  • the first output can be used as the input of the first BN layer, and the first BN layer can normalize the first output, and update the BN coefficients based on the results of the normalization processing.
  • layer is based on the mean value and standard deviation of the output features of the convolutional layer in the feedforward process to perform BN operation.
  • the first BN layer is connected to the first convolutional layer, and the first BN layer is For performing BN operation on the first output according to the mean value and standard deviation of the first output of the first convolution layer, after that, the training device can update the BN coefficient based on the operation result, wherein the BN coefficient can include but not It is limited to at least one of the mean ⁇ , the variance ⁇ , the scale parameter ⁇ , and the offset parameter ⁇ , or an operation result between any of them.
  • the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the updated The first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
  • the updated BN coefficient (the scale parameter ⁇ new and the variance ⁇ new as shown in Figure 8) can be obtained, and then the scale parameter ⁇ new and the variance ⁇ new can be divided to obtain ⁇ / ⁇ .
  • the updated BN coefficient (eg ⁇ / ⁇ ) can be multiplied by the target weight W, and the product result ( ⁇ / ⁇ *W) can be input to the first quantization operator, which is used to
  • the first quantization factor performs quantization processing and inverse quantization processing on the updated first weight to obtain the second weight, and the first convolution layer can quantify the next batch of data (that is, the N+th 1 batch of data) for convolution processing to get the second output. It should be understood that the number of bits to be quantized in the quantization operator also needs to be set.
  • the volume that needs to be BN-folded in the model can be identified according to the operator type in the computational flow graph of the second neural network model.
  • the structure of the stacked layer and the BN layer (this embodiment can also be described as a CNN+BN structure), and the identified CNN+BN structure is combined into a block (that is, the convolutional BN layer in the above embodiment); Replace the original CNN+BN structure with the combined convolutional BN layer.
  • the first convolutional layer determines the weight used by the current batch of batches through the BN coefficients updated after the BN layer processes the data of the current batch. Therefore, in addition to the first convolutional layer, another A separate convolutional layer is set for data processing so that the BN layer can update the BN coefficient based on the data of the current batch.
  • the BN coefficients updated after processing are used to determine the weights used in the current batch, so there is no need to set up a separate convolutional layer. amount of data operations. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
  • the trained forward graph and inference graph are the same, which reduces the complexity of model saving and translation.
  • the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain a second output.
  • the first weight in the first neural network model may be initialized.
  • the BN coefficient may be multiplied by the target weight to obtain the first target tensor, and the first target tensor may be obtained.
  • a target tensor includes M elements; the N target elements with the largest absolute value among the M elements included in the first target tensor are replaced with the M elements other than the N target elements The largest element of the M-N elements to get the first weight.
  • the weights in the second neural network model (pre-trained model) and the coefficients of BN may be used to initialize the first weights in the first neural network model.
  • the BN coefficient and the target weight can be multiplied according to the pre-trained model to obtain the first target tensor, for example, the first target tensor can be ⁇ / ⁇ *W, and then the first target tensor
  • Each element in the tensor is sorted according to size, and in a symmetrical manner, intercepts the value of the main part (eg, intercepts 95% to 99.5%), and replaces the remaining elements with the largest value in the value of the main part, so as to realize the first weight. initialization.
  • FIG. 9 is a schematic diagram of an element interception provided by an embodiment of the present application
  • the result shown in FIG. 9 can be obtained.
  • distribution, in which a certain percentage of elements can be intercepted the percentage can be but not limited to 95% to 99.5%, and 95% to 99.5% of the elements can be the elements in the main part of the element distribution, that is, the elements whose absolute value is close to 0 .
  • the number of elements with larger absolute values in the first target tensor is small.
  • the accuracy of the operation will be affected, for example, the quantization factor will be affected.
  • unnecessary smoothing is performed on other elements of the first target tensor, and the embodiment of the present application improves the processing accuracy of the neural network model by performing element interception on the first target tensor.
  • the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process the input data to obtain a third output, so
  • the first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, the The second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
  • a second quantization operator may also be added to the output position of the activation layer.
  • the network model further includes a target activation layer, the target activation layer in the second neural network model is used to process the input data to obtain a third output
  • the first neural network model further includes the target activation layer and a second quantization operator, the target activation layer in the first neural network model is used for processing the input data to obtain a fourth output
  • the second quantization operator is used for pairing according to the second quantization factor
  • the fourth output is subjected to quantization processing and inverse quantization processing.
  • the third output is a second target tensor
  • the second target tensor includes X elements
  • the method further includes: obtaining a Y with the largest absolute value among the X elements target elements, replace the Y target elements in the second target tensor with the largest element among the X-Y elements other than the Y target elements among the X elements, so as to obtain the first Two quantization factors.
  • the percentage can be, but not limited to, 95% to 99.5%, and 95% to 99.5% of the elements can be elements in the backbone part of the element distribution, that is, elements with an absolute value close to 0.
  • model training may be performed on the first neural network model to obtain the trained first neural network model.
  • the model can be quantized and trained according to the set epoch.
  • the training process if the current epoch performs the freeze-bn operation; If the freeze-bn operation is not performed, the quantized model is obtained by training at the current epoch, and the current quantized model is verified by reasoning at the current epoch.
  • the trained first neural network model includes a trained first quantization factor and a trained BN coefficient
  • the training device can also use the trained first quantization factor and the After training the BN coefficients
  • the first neural network model is quantized to obtain a third neural network model, where the third neural network model includes the quantized first convolutional layer, the first convolutional The layer is used to perform convolution processing on the input data according to the quantized weight obtained according to the first quantization factor and the trained BN coefficient.
  • An embodiment of the present application provides a model training method, the method includes: acquiring a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN The layer is used to perform convolution processing on the input data of the Nth batch of batches according to the first weight, normalize the convolution processing results according to the BN coefficients, and update the BN coefficients based on the normalization processing results.
  • the updated BN coefficient is used to update the first weight, and the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the volume
  • the product BN layer is also used to perform convolution processing on the input N+1 batch of data according to the second weight, and perform model training on the first neural network model to obtain the trained first neural network model.
  • the updated BN coefficient determines the weight of the current batch, so there is no need to set up a separate convolutional layer.
  • the size of the model can be reduced, and on the other hand, the amount of data operations in the convolutional layer in the neural network is also reduced.
  • the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited.
  • the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
  • model training method in the embodiment of the present application will be described with reference to a specific example.
  • ConvBnV1 only inserts weight quantization nodes
  • ConvBnV2 inserts both weight quantization nodes and activation quantization nodes.
  • ConvBnV1 For the structure of [CNN+BN+activation operator], it is directly replaced by ConvBnV1, but for the structure of [CNN+BN] (that is, directly output after BN, without activation operator), it is directly replaced by ConvBnV2.
  • the number of activation quantization bits is set to 8, the first layer weight quantization bits are set to 8, and the rest weight quantization bits are set to 4 as an example.
  • the activation quantization can be inserted after the activation operator ReLU6.
  • the model weight quantization range is -127 ⁇ 127
  • the ReLU6 quantization range is 0 ⁇ 255
  • the quantization range without activation after BN is -127 ⁇ 127
  • the residual structure The quantization range after add is -127 ⁇ 127.
  • the quantized model can be loaded into the converter first, and each layer of the model can be The weight will be quantized and saved as a UINT type, where bits is the number of quantized bits. For example, after 8 bits, the scale quantization factor value of each layer and the quantization of the weight are saved to the inference model.
  • FIG. 10 is a schematic diagram of a model training apparatus 1000 provided by an embodiment of the present application.
  • the model training apparatus 1000 provided by the present application includes:
  • the obtaining module 1001 is used to obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the inputted first weight according to the first weight.
  • Perform convolution processing on N batches of data normalize the convolution processing results according to the BN coefficients, update the BN coefficients based on the normalization processing results, and perform the first step on the updated BN coefficients.
  • Weight update, the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the convolutional BN layer is also used to perform quantization and inverse quantization processing on the updated first weight.
  • the weight performs convolution processing on the input N+1 batch data;
  • the model training module 1002 is configured to perform model training on the first neural network model to obtain the trained first neural network model.
  • the first convolutional layer determines the weight used by the current batch of batches through the BN coefficients updated after the BN layer processes the data of the current batch. Therefore, in addition to the first convolutional layer, another A separate convolutional layer is set for data processing so that the BN layer can update the BN coefficient based on the data of the current batch.
  • the BN coefficients updated after processing are used to determine the weights used in the current batch, so there is no need to set up a separate convolutional layer. amount of data operations. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
  • the obtaining module 1001 is configured to obtain a second neural network model, where the second neural network model is a pre-trained pre-trained model, and the second neural network model includes a first convolutional model. layer and the first BN layer; perform BN folding processing on the first convolution layer and the first BN layer to obtain the first neural network model, and the first neural network model includes The convolutional layer and the convolutional BN layer obtained after the first BN layer is folded.
  • the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the update The latter first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
  • the apparatus further includes:
  • a product operation module configured to perform a product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements;
  • An element replacement module configured to replace N target elements with the largest absolute value among the M elements included in the first target tensor with M-N elements other than the N target elements among the M elements The largest of the elements to get the first weight.
  • the weights in the second neural network model (pre-trained model) and the coefficients of BN may be used to initialize the first weights in the first neural network model.
  • the BN coefficient and the target weight can be multiplied according to the pre-trained model to obtain the first target tensor, for example, the first target tensor can be ⁇ / ⁇ *W, and then the first target tensor
  • Each element in the tensor is sorted according to size, and in a symmetrical manner, intercepts the value of the main part (eg, intercepts 95% to 99.5%), and replaces the remaining elements with the largest value in the value of the main part, so as to realize the first weight. initialization.
  • the first weight is in the form of a tensor.
  • the number of elements with large absolute values in the first target tensor is small. In the process of subsequent quantization and inverse quantization, due to the large absolute value, the accuracy of the operation will be affected. For example, other elements of the quantization factor will be unnecessary. smoothing, the embodiment of the present application improves the processing accuracy of the neural network model by performing element interception on the first target tensor.
  • the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain the second output.
  • the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process the input data to obtain a third output, so
  • the first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, the The second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
  • the third output is a second target tensor, and the second target tensor includes X elements, and the obtaining module is configured to obtain the largest absolute value among the X elements.
  • Y target elements
  • the element replacement module is configured to replace the Y target elements in the second target tensor with the largest element in the X-Y elements other than the Y target elements among the X elements, to The second quantization factor is obtained.
  • the percentage may be, but not limited to, 95% to 99.5%, and 95% to 99.5% of the elements may be elements in the main part of the element distribution, that is, elements with an absolute value close to 0.
  • the trained first neural network model includes the trained first quantization factor and the trained BN coefficient
  • the device further includes:
  • a quantization module configured to quantify the first neural network model according to the trained first quantization factor and the trained BN coefficient to obtain a third neural network model, the third neural network model Including the quantized first convolution layer, the quantized first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is based on the first quantized weight. factor and the trained BN coefficients.
  • the product operation module For the relevant description of the product operation module, refer to the description of how to perform the product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements in the above-mentioned embodiment. , which will not be repeated here.
  • the element replacement module For the relevant description of the element replacement module, refer to the above-mentioned embodiment for how to replace the N target elements with the largest absolute value among the M elements included in the first target tensor with the M elements divided by the The largest element among the M-N elements other than the N target elements is used to obtain the description of the first weight, which will not be repeated here.
  • the quantization module can refer to the above-mentioned embodiment for how to quantize the first neural network model according to the trained first quantization factor and the trained BN coefficient to obtain a third neural network model, so
  • the third neural network model includes the quantized first convolution layer, and the quantized first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is It is obtained according to the first quantization factor and the trained BN coefficient, which will not be repeated here.
  • FIG. 11 is a schematic structural diagram of the execution device provided by the embodiment of the present application. Smart wearable devices, servers, etc., are not limited here.
  • the data processing apparatus described in the embodiment corresponding to FIG. 10 may be deployed on the execution device 1200 to implement the data processing function in the embodiment corresponding to FIG. 10 .
  • the execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203, and a memory 1204 (wherein the number of processors 1203 in the execution device 1200 may be one or more, and one processor is taken as an example in FIG. 11 ) , wherein the processor 1203 may include an application processor 12031 and a communication processor 12032.
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or otherwise.
  • Memory 1204 may include read-only memory and random access memory, and provides instructions and data to processor 1203 .
  • a portion of memory 1204 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1203 controls the operation of the execution device.
  • various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1203 or implemented by the processor 1203 .
  • the processor 1203 may be an integrated circuit chip, which has signal processing capability.
  • each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1203 or an instruction in the form of software.
  • the above-mentioned processor 1203 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, a vision processor (vision processing unit, VPU), a tensor processor (tensor processing) unit, TPU) and other processors suitable for AI operations, and may further include application specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components.
  • the processor 1203 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the above method in combination with its hardware.
  • the receiver 1201 can be used to receive input numerical or character information, and to generate signal input related to performing device related settings and function control.
  • the transmitter 1202 can be used to output digital or character information through the first interface; the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1202 can also include a display device such as a display screen .
  • the execution device may acquire the model trained by the model training method in the embodiment corresponding to FIG. 6 , and perform model inference.
  • FIG. 12 is a schematic structural diagram of the training device provided by the embodiment of the present application.
  • the training device 1300 is implemented by one or more servers.
  • the training device 1300 can vary greatly depending on configuration or performance, and can include one or more central processing units (CPU) 1313 (eg, one or more processors) and memory 1332, one or more storage applications
  • a storage medium 1330 eg, one or more mass storage devices for programs 1342 or data 1344.
  • the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Furthermore, the central processing unit 1313 may be configured to communicate with the storage medium 1330 to execute a series of instruction operations in the storage medium 1330 on the training device 1300 .
  • the training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358; or, one or more operating systems 1341, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1341 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the training device may execute the model training method in the embodiment corresponding to FIG. 6 .
  • the model training apparatus 1000 described in FIG. 10 may be a module in the training apparatus, and the processor in the training apparatus may execute the model training method executed by the model training apparatus 1000 .
  • Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.
  • Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps as performed by the aforementioned training device.
  • the execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc.
  • the processing unit can execute the computer executable instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 13 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the chip may be represented as a neural network processor NPU 1400, and the NPU 1400 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1403, which is controlled by the controller 1404 to extract the matrix data in the memory and perform multiplication operations.
  • the NPU 1400 can implement the model training method provided in the embodiment described in FIG. 6 through the cooperation between various internal devices, or perform reasoning on the model obtained by training.
  • the operation circuit 1403 in the NPU 1400 can perform the steps of acquiring the first neural network model and performing model training on the first neural network model.
  • the arithmetic circuit 1403 in the NPU 1400 includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1403 is a two-dimensional systolic array.
  • the arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1403 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1402 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1401 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1408 .
  • Unified memory 1406 is used to store input data and output data.
  • the weight data is directly passed through the storage unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402.
  • Input data is also moved to unified memory 1406 via the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1409.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1410 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1409 to obtain instructions from the external memory, and also for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or the weight data to the weight memory 1402 or the input data to the input memory 1401.
  • the vector calculation unit 1407 includes a plurality of operation processing units, and further processes the output of the operation circuit 1403 if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 1407 can store the processed output vectors to the unified memory 1406 .
  • the vector calculation unit 1407 may apply a linear function; or a nonlinear function to the output of the operation circuit 1403, such as performing linear interpolation on the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 1407 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as an activation input to the arithmetic circuit 1403, eg, for use in subsequent layers in a neural network.
  • the instruction fetch buffer (instruction fetch buffer) 1409 connected to the controller 1404 is used to store the instructions used by the controller 1404;
  • the unified memory 1406, the input memory 1401, the weight memory 1402 and the instruction fetch memory 1409 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means.
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application is a model training method, which is applicable to the field of artificial intelligence. The method comprises: obtaining a first neural network model, the first neural network model comprising a convolutional BN layer and a first quantization operator, the convolutional BN layer being configured to perform convolution processing on the inputted N-th batch of data according to a first weight, perform normalization processing on the convolution processing result according to a BN coefficient, and update the BN coefficient on the basis of the normalization processing result, the updated BN coefficient being configured to update the first weight, the first quantization operator being configured to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the convolutional BN layer being further configured to perform convolution processing on the inputted (N+1)-th batch of data according to the second weight. The present application can reduce the amount of data operation of a convolutional layer in a neural network.

Description

一种模型训练方法及装置A model training method and device
本申请要求于2020年11月30日提交中国专利局、申请号为202011377406.3、发明名称为“一种模型训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 30, 2020 with the application number 202011377406.3 and the invention titled "A Model Training Method and Device", the entire contents of which are incorporated into this application by reference .
技术领域technical field
本申请涉及计算机领域,尤其涉及一种模型训练方法及装置。The present application relates to the field of computers, and in particular, to a model training method and device.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
神经网络的量化(neural network quantization),是将浮点存储(运算)转换为整型存储(运算)的一种模型压缩技术,例如,原来一个模型的模型参数使用float32(32位的浮点)表示,量化后该模型的模型参数使用int8(8位的定点)表示,通过模型的量化操作,以较小的精度损失为代价,提高模型的运算速度。Neural network quantization is a model compression technique that converts floating point storage (operation) into integer storage (operation). For example, the model parameters of a model used float32 (32-bit floating point) Indicates that the model parameters of the model after quantization are represented by int8 (8-bit fixed-point), and through the quantization operation of the model, the operation speed of the model is improved at the expense of a small loss of precision.
训练感知量化(quantization aware training,QAT)是利用训练数据训练补偿量化的精度损失,其主要流程是:1、对模型训练前***量化算子,2.、在训练过程中统计模型各层(权重和激活)数值的min和max用于计算量化因子。Quantization awareness training (QAT) is to use training data to train to compensate for the loss of quantization accuracy. The main process is: 1. Insert quantization operators before model training, 2. Statistical model layers (weights) during the training process and activation) values of min and max are used to calculate the quantization factor.
QAT在模型训练阶段,需要原模型的权重输入和激活输出上***伪量化节点SimQuant(本实施例中也可称之为量化算子)。此外,对于卷积神经网络(convolution neural network,CNN)以及批量归一化(batch normalization,BN)结构需借助另外一个CNN实现BN折叠,实现BN系数与CNN权重的融合。在训练过程中,SimQuant会统计对应数据流(Tensor)中min和max值,用于后续scale量化因子的计算。如图2所示,QAT在进行CNN和BN的折叠时,需要构建另一个CNN对当前批batch的数据进行卷积运算,BN利用卷积运算的结果更新BN系数,进而使用更新后的BN系数构建权重,量化算子可以对构建好的权重进行量化和反量化处理,而CNN可以基于反量化后得到的权重,对当前批batch的数据进行卷积运算。然而,由于需借助另外一个CNN实现BN折叠,使得在训练过程中会有两个CNN对同一批batch的数据进行卷积运算,增加了训练过程中CNN的运算量,进而降低了训练速度。During the model training phase of the QAT, a pseudo-quantization node SimQuant (also referred to as a quantization operator in this embodiment) needs to be inserted into the weight input and activation output of the original model. In addition, for the convolutional neural network (CNN) and batch normalization (BN) structures, another CNN is needed to realize BN folding to realize the fusion of BN coefficients and CNN weights. During the training process, SimQuant will count the min and max values in the corresponding data stream (Tensor) for subsequent calculation of the scale quantization factor. As shown in Figure 2, when QAT folds CNN and BN, it needs to construct another CNN to perform convolution operation on the data of the current batch. BN uses the result of the convolution operation to update the BN coefficient, and then uses the updated BN coefficient. To construct weights, the quantization operator can quantify and inverse quantize the constructed weights, while CNN can perform convolution operations on the current batch of data based on the weights obtained after inverse quantization. However, due to the need to use another CNN to achieve BN folding, two CNNs will perform convolution operations on the same batch of data during the training process, which increases the amount of CNN operations during the training process, thereby reducing the training speed.
发明内容SUMMARY OF THE INVENTION
第一方面,本申请提供了一种模型训练方法,所述方法包括:In a first aspect, the present application provides a model training method, the method comprising:
获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述 BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理;对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。Obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used to perform the input Nth batch of batch data according to the first weight. Convolution processing, normalizing the convolution processing result according to the BN coefficient, updating the BN coefficient based on the normalization processing result, and updating the first weight on the updated BN coefficient, the The first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the convolutional BN layer is also used to quantify the input Nth weight according to the second weight. Perform convolution processing on the data of +1 batch; perform model training on the first neural network model to obtain the trained first neural network model.
卷积处理结果,是指对第N批batch的数据进行卷积处理后得到的结果。The convolution processing result refers to the result obtained after performing convolution processing on the Nth batch of batch data.
其中,卷积BN层可以作为一个独立的层,一种实现方式下,卷积BN层中仍然能区分出分别对应于第一卷积层和第一批量归一化BN层的部分。本段为方便描述,仍然将卷积BN层中对应第一卷积层的部分称为第一卷积层,将卷积BN层中对应第一批量归一化BN层的部分称为第一批量归一化BN层,所述第一卷积层用于根据第一权重对输入的第N批batch的数据进行卷积处理,以得到第一输出(也就是前文所说的卷积处理结果),所述第一BN层用于根据BN系数对所述第一输出进行归一化处理,并基于归一化处理结果更新所述BN系数,卷积BN层用于对更新后的所述BN系数进行第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述第一卷积层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理;对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。Among them, the convolutional BN layer can be used as an independent layer. In one implementation, the parts corresponding to the first convolutional layer and the first batch normalized BN layer can still be distinguished in the convolutional BN layer. For the convenience of description in this paragraph, the part of the convolutional BN layer corresponding to the first convolutional layer is still called the first convolutional layer, and the part of the convolutional BN layer corresponding to the first batch normalized BN layer is called the first convolutional layer. Batch normalized BN layer, the first convolution layer is used to perform convolution processing on the input Nth batch of batch data according to the first weight, so as to obtain the first output (that is, the convolution processing result mentioned above). ), the first BN layer is used for normalizing the first output according to the BN coefficient, and based on the normalization result, the BN coefficient is updated, and the convolutional BN layer is used for the updated The BN coefficient is used to update the first weight, and the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain the second weight, and the first convolution layer also uses performing convolution processing on the input data of the N+1th batch according to the second weight; performing model training on the first neural network model to obtain the trained first neural network model.
其中,第一神经网络模型可以为对一个预训练pre-trained模型进行BN折叠处理以及添加量化算子(也可以称之为伪量化节点SimQuant)得到的,第一输出可以作为第一BN层的输入,第一BN层可以对所述第一输出进行归一化处理,并基于归一化处理结果更新所述BN系数,其中,在训练过程中,BN层是基于前馈过程中卷积层的输出特征的均值和标准差来进行BN运算的,示例性的,所述第一BN层与所述第一卷积层连接,所述第一BN层用于根据所述第一卷积层的第一输出的均值和标准差对所述第一输出进行BN运算,之后,训练设备可以基于运算结果来更新BN系数,其中,BN系数可以包括但不限于均值μ、方差σ、尺度参数γ和偏移参数β中的至少一种,或者是其中任意多种之间的运算结果。Among them, the first neural network model can be obtained by performing BN folding processing on a pre-trained pre-trained model and adding a quantization operator (also called a pseudo-quantization node SimQuant), and the first output can be used as the first BN layer. Input, the first BN layer can normalize the first output, and update the BN coefficients based on the results of the normalization processing, wherein, in the training process, the BN layer is based on the convolutional layer in the feedforward process The mean and standard deviation of the output features are used to perform BN operation. Exemplarily, the first BN layer is connected to the first convolutional layer, and the first BN layer is used to perform the BN operation according to the first convolutional layer. BN operation is performed on the first output with the mean and standard deviation of the first output, and then the training device can update the BN coefficient based on the operation result, where the BN coefficient can include but is not limited to mean μ, variance σ, scale parameter γ and at least one of the offset parameters β, or an operation result between any of them.
现有技术中,第一卷积层是通过BN层对当前批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此需要除了第一卷积层之外,再单独设置一个卷积层进行数据处理来使得BN层可以基于当前批batch的数据来更新BN系数,而本申请实施例中,由于第一卷积层是通过BN层对上一批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此无需再单独多设置一个卷积层,一方面,可减小模型大小,另一方面,也减少了神经网络中卷积层的数据运算量。由于训练过程是一个需要迭代大量次数的过程,训练设备的可使用计算资源是有限的,本实施例中在训练过程中将神经网络中的卷积层都相应减少了一次卷积运算,在大量次数的训练过程中,可以大量减少训练设备的运算资源消耗,进而提升了训练的速度。In the prior art, the first convolutional layer determines the weight used by the current batch of batches through the BN coefficients updated after the BN layer processes the data of the current batch. Therefore, in addition to the first convolutional layer, another A separate convolutional layer is set for data processing so that the BN layer can update the BN coefficient based on the data of the current batch. The BN coefficients updated after processing are used to determine the weights used in the current batch, so there is no need to set up a separate convolutional layer. amount of data operations. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
在一种可能的实现中,所述方法还包括:In a possible implementation, the method further includes:
获取第二神经网络模型,所述第二神经网络模型可以为预训练pre-trained模型,所述第二神经网络模型包括第一卷积层以及第一BN层;所述第一卷积层可以用于根据目标权重对输入数据进行卷积处理,以得到第一输出,所述目标权重为所述第一卷积层中卷积核包括 的权重;所述第一BN层用于根据BN系数对所述第一输出进行归一化处理,并基于归一化处理结果更新所述BN系数;对所述第一卷积层以及所述第一BN层进行BN折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的所述卷积BN层。Obtain a second neural network model, the second neural network model can be a pre-trained pre-trained model, the second neural network model includes a first convolutional layer and a first BN layer; the first convolutional layer can be It is used to perform convolution processing on the input data according to the target weight to obtain the first output, and the target weight is the weight included in the convolution kernel in the first convolution layer; the first BN layer is used to obtain the first output according to the BN coefficient. Perform normalization processing on the first output, and update the BN coefficient based on the normalization processing result; perform BN folding processing on the first convolutional layer and the first BN layer to obtain the first A neural network model, where the first neural network model includes the convolutional BN layer obtained by folding the first convolutional layer and the first BN layer.
具体的,为了识别出第二神经网络模型中需要进行BN折叠的卷积层和BN层,可以根据第二神经网络模型的计算流图中的算子类型来判别模型中需要进行BN折叠的卷积层和BN层的结构(本实施例也可以描述为CNN+BN结构),并将识别出的CNN+BN结构组合成一个block(也就是上述实施例中的卷积BN层);之后可以将组合成的卷积BN层替换掉原本的CNN+BN结构。Specifically, in order to identify the convolutional layer and BN layer that need to be BN-folded in the second neural network model, the volume that needs to be BN-folded in the model can be identified according to the operator type in the computational flow graph of the second neural network model. The structure of the stacked layer and the BN layer (this embodiment can also be described as a CNN+BN structure), and the identified CNN+BN structure is combined into a block (that is, the convolutional BN layer in the above embodiment); Replace the original CNN+BN structure with the combined convolutional BN layer.
在一种可能的实现中,所述卷积BN层为对卷积层和BN层进行折叠得到的,所述第一权重为根据所述BN系数与目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重为所述卷积层中包括的权重。In a possible implementation, the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the update The latter first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
其中,所述第二神经网络模型可以包括上文中的卷积层和BN层(卷积层也可以称之为上文中描述的第二神经网络模型中的第一卷积层,BN层也可以称之为上文中描述的第二神经网络模型中的第一BN层),所述第二神经网络模型中的所述第一卷积层用于根据目标权重对输入数据进行卷积处理第二神经网络模型中的所述第一卷积层用于根据目标权重对输入数据进行卷积处理,输入数据为输入至第一卷积层的数据,也就是神经网络中的中间层的输入,而不是神经网络的输入;所述第一权重为根据所述BN系数与所述目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重是卷积层中卷积核所包括的权重,具体的,目标权重可以为第二神经网络模型中第一卷积层中包括的权重。第一卷积层可以基于包括目标权重的卷积核对输入数据进行卷积操作,卷积核可以包括目标权重以及偏置。Wherein, the second neural network model may include the convolution layer and the BN layer above (the convolution layer may also be referred to as the first convolution layer in the second neural network model described above, and the BN layer may also be referred to as the first convolution layer in the second neural network model described above. referred to as the first BN layer in the second neural network model described above), the first convolutional layer in the second neural network model is used to perform convolution processing on the input data according to the target weight. The first convolution layer in the neural network model is used to perform convolution processing on the input data according to the target weight, and the input data is the data input to the first convolution layer, that is, the input of the middle layer in the neural network, and is not the input of the neural network; the first weight is obtained according to the product of the BN coefficient and the target weight, and the updated first weight is a combination of the updated BN coefficient and the target weight Obtained by product, the target weight is the weight included in the convolution kernel in the convolution layer. Specifically, the target weight may be the weight included in the first convolution layer in the second neural network model. The first convolution layer may perform a convolution operation on the input data based on a convolution kernel including target weights, and the convolution kernel may include target weights and biases.
在一种可能的实现中,所述方法还包括:将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素;将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重。In a possible implementation, the method further includes: performing a product operation on the BN coefficient and the target weight to obtain a first target tensor, where the first target tensor includes M elements; The N target elements with the largest absolute value among the M elements included in the first target tensor are replaced with the largest element among the M-N elements other than the N target elements among the M elements, to The first weight is obtained.
本申请实施例中,可以利用第二神经网络模型(pre-trained模型)中权重和BN的系数对第一神经网络模型中的第一权重进行初始化。具体的,可以根据pre-trained模型将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,例如第一目标张量可以为γ/σ*W,之后对第一目标张量中各个元素按照大小进行排序,并按照对称方式,截取主干部分数值(如:截取95%~99.5%),并将剩余的元素替换为主干部分数值中最大的值,以此实现第一权重的初始化。其中,第一权重为张量的形式。In this embodiment of the present application, the weights in the second neural network model (pre-trained model) and the coefficients of BN may be used to initialize the first weights in the first neural network model. Specifically, the BN coefficient and the target weight can be multiplied according to the pre-trained model to obtain the first target tensor, for example, the first target tensor can be γ/σ*W, and then the first target tensor Each element in the tensor is sorted according to size, and in a symmetrical manner, intercepts the value of the main part (eg, intercepts 95% to 99.5%), and replaces the remaining elements with the largest value in the value of the main part, so as to realize the first weight. initialization. Among them, the first weight is in the form of a tensor.
第一目标张量中绝对值较大的元素数量较少,在进行后续量化和反量化的过程中,由于绝对值很大,会对运算的精度造成影响,例如会对量化因子的其他元素进行不必要的平滑,本申请实施例通过对第一目标张量进行元素的截取,提高了神经网络模型处理的精度。The number of elements with large absolute values in the first target tensor is small. In the process of subsequent quantization and inverse quantization, due to the large absolute value, the accuracy of the operation will be affected. For example, other elements of the quantization factor will be unnecessary. smoothing, the embodiment of the present application improves the processing accuracy of the neural network model by performing element interception on the first target tensor.
在一种可能的实现中,所述第一神经网络模型中的所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到卷积处理结果,并将所述卷积处理结果与所述更新后的BN系数进行相除,以得到第二输出。In a possible implementation, the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain a second output.
和上述实施例类似,本申请实施例中,为了对第二神经网络中各个激活层的输出进行量化,还可以在激活层的输出位置增加第二量化算子。Similar to the above-mentioned embodiment, in the embodiment of the present application, in order to quantize the output of each activation layer in the second neural network, a second quantization operator may also be added to the output position of the activation layer.
在一种可能的实现中,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,其中,输入数据为输入至目标激活层的数据,也就是神经网络中的中间层的输入,而不是神经网络的输入,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据第二量化因子对所述第四输出进行量化处理以及反量化处理。In a possible implementation, the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process input data, wherein the input data is input to The data of the target activation layer, that is, the input of the intermediate layer in the neural network, rather than the input of the neural network, to obtain the third output, the first neural network model also includes the target activation layer and the second quantization operator , the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, and the second quantization operator is used to perform the fourth output according to the second quantization factor. Quantization and inverse quantization.
在一种可能的实现中,所述第三输出为第二目标张量,所述第二目标张量包括X个元素,所述方法还包括:获取所述X个元素中绝对值最大的Y个目标元素;将所述第二目标张量中的所述Y个目标元素替换为所述X个元素中除所述Y个目标元素之外的X-Y个元素中的最大的元素,以得到所述第二量化因子。In a possible implementation, the third output is a second target tensor, and the second target tensor includes X elements, and the method further includes: obtaining the Y with the largest absolute value among the X elements target elements; replace the Y target elements in the second target tensor with the largest element among the X-Y elements other than the Y target elements among the X elements, so as to obtain the first Two quantization factors.
和上述实施例类似,本申请实施例中,在对位于激活层输出位置的量化因子进行初始化的过程中,将第二目标张量中的元素进行由大到小的排列后,可以截取其中一定百分比的元素,百分比可以但不限于是95%~99.5%,且95%~99.5%的元素可以是元素分布中主干部分的元素,也就是绝对值靠近0的元素。Similar to the above embodiment, in the embodiment of the present application, in the process of initializing the quantization factor located at the output position of the activation layer, after the elements in the second target tensor are arranged from large to small, a certain percentage of them can be intercepted. For elements, the percentage may be, but not limited to, 95% to 99.5%, and 95% to 99.5% of the elements may be elements in the main part of the element distribution, that is, elements with an absolute value close to 0.
在一种可能的实现中,所述训练后的第一神经网络模型包括训练后的第一量化因子以及训练后的BN系数,所述方法还包括:In a possible implementation, the trained first neural network model includes the trained first quantization factor and the trained BN coefficient, and the method further includes:
根据所述训练后的第一量化因子以及所述训练后的BN系数,对所述第一神经网络模型进行量化,以得到第三神经网络模型,所述第三神经网络模型包括量化后的所述第一卷积层,所述第一卷积层用于根据量化后的权重对输入数据进行卷积处理,所述量化后的权重为根据所述第一量化因子以及所述训练后的BN系数得到的。According to the trained first quantization factor and the trained BN coefficient, the first neural network model is quantized to obtain a third neural network model, where the third neural network model includes all the quantized The first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is based on the first quantization factor and the trained BN coefficient obtained.
第二方面,本申请提供了一种模型训练装置,所述装置包括:In a second aspect, the present application provides a model training device, the device comprising:
获取模块,用于获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理;The obtaining module is used to obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the input Nth input according to the first weight. The data of the batch is subjected to convolution processing, the result of the convolution processing is normalized according to the BN coefficient, and the BN coefficient is updated based on the normalized processing result, and the first weight is applied to the updated BN coefficient. the update of Perform convolution processing on the input N+1 batch data;
模型训练模块,用于对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。A model training module, configured to perform model training on the first neural network model to obtain a trained first neural network model.
在一种可能的实现中,所述获取模块,用于获取第二神经网络模型,所述第二神经网络模型包括第一卷积层以及第一BN层;对所述第一卷积层以及所述第一BN层进行BN折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的所述卷积BN层。In a possible implementation, the obtaining module is configured to obtain a second neural network model, where the second neural network model includes a first convolution layer and a first BN layer; The first BN layer is subjected to BN folding processing to obtain the first neural network model, and the first neural network model includes the result obtained by folding the first convolutional layer and the first BN layer. The convolutional BN layer.
在一种可能的实现中,所述卷积BN层为对卷积层和BN层进行折叠得到的,所述第一权重为根据所述BN系数与目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重为所述卷积层中包括的权重。In a possible implementation, the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the update The latter first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
在一种可能的实现中,所述装置还包括:In a possible implementation, the apparatus further includes:
乘积运算模块,用于将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素;a product operation module, configured to perform a product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements;
元素替换模块,用于将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重。An element replacement module, configured to replace N target elements with the largest absolute value among the M elements included in the first target tensor with M-N elements other than the N target elements among the M elements The largest of the elements to get the first weight.
在一种可能的实现中,所述第一神经网络模型中的所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到卷积处理结果,并将所述卷积处理结果与所述更新后的BN系数进行相除,以得到第二输出。In a possible implementation, the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain a second output.
在一种可能的实现中,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据第二量化因子对所述第四输出进行量化处理以及反量化处理。In a possible implementation, the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process the input data to obtain a third output, so The first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, the The second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
在一种可能的实现中,所述第三输出为第二目标张量,所述第二目标张量包括X个元素,所述获取模块,用于获取所述X个元素中绝对值最大的Y个目标元素;In a possible implementation, the third output is a second target tensor, and the second target tensor includes X elements, and the obtaining module is configured to obtain the largest absolute value among the X elements. Y target elements;
所述元素替换模块,用于将所述第二目标张量中的所述Y个目标元素替换为所述X个元素中除所述Y个目标元素之外的X-Y个元素中的最大的元素,以得到所述第二量化因子。The element replacement module is configured to replace the Y target elements in the second target tensor with the largest element in the X-Y elements other than the Y target elements among the X elements, to The second quantization factor is obtained.
在一种可能的实现中,所述第一量化算子用于根据第一量化因子对更新后的所述第一权重进行量化处理以及反量化处理,所述训练后的第一神经网络模型包括训练后的第一量化因子以及训练后的BN系数,所述装置还包括:In a possible implementation, the first quantization operator is configured to perform quantization processing and inverse quantization processing on the updated first weight according to a first quantization factor, and the trained first neural network model includes The first quantization factor after training and the BN coefficient after training, the device further includes:
第三方面,本申请实施例提供了一种模型训练装置,可以包括存储器、处理器以及总线***,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第一方面及其任一可选的方法。In a third aspect, an embodiment of the present application provides a model training apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, so as to execute the above-mentioned first aspect and any of its optional methods.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the first aspect and any one of the above-mentioned first aspects. optional method.
第五方面,本申请实施例提供了一种计算机程序,包括代码,当代码被执行时,用于实现上述第一方面及其任一可选的方法。In a fifth aspect, an embodiment of the present application provides a computer program, including code, for implementing the first aspect and any optional method thereof when the code is executed.
第六方面,本申请提供了一种芯片***,该芯片***包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,所述芯片***还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片***,可以由芯片构成,也可以包括芯片和其他分立器件。In a sixth aspect, the present application provides a system-on-a-chip, the system-on-a-chip includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods; or, information. In a possible design, the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device. The chip system may be composed of chips, or may include chips and other discrete devices.
本申请实施例提供了一种模型训练方法,所述方法包括:获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。通过上述方式,由于第一卷积层是通过BN层对上一批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此无需再单独多设置一个卷积层,一方面,可减小模型大小,另一方面,也减少了神经网络中卷积层的数据运算量。由于训练过程是一个需要迭代大量次数的过程,训练设备的可使用计算资源是有限的,本实施例中在训练过程中将神经网络中的卷积层都相应减少了一次卷积运算,在大量次数的训练过程中,可以大量减少训练设备的运算资源消耗,进而提升了训练的速度。An embodiment of the present application provides a model training method, the method includes: acquiring a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN The layer is used to perform convolution processing on the input data of the Nth batch of batches according to the first weight, normalize the convolution processing results according to the BN coefficients, and update the BN coefficients based on the normalization processing results. The updated BN coefficient is used to update the first weight, and the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the volume The product BN layer is also used to perform convolution processing on the input N+1 batch of data according to the second weight, and perform model training on the first neural network model to obtain the trained first neural network model. . In the above method, since the first convolutional layer uses the BN layer to process the data of the previous batch, the updated BN coefficient determines the weight of the current batch, so there is no need to set up a separate convolutional layer. On the one hand, the size of the model can be reduced, and on the other hand, the amount of data operations in the convolutional layer in the neural network is also reduced. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
附图说明Description of drawings
图1为人工智能主体框架的一种结构示意图;Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame;
图2为QAT进行CNN和BN的折叠的示意图;Figure 2 is a schematic diagram of the folding of CNN and BN by QAT;
图3为本申请实施例提供的卷积神经网络的示意图;3 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application;
图4为本申请实施例提供的卷积神经网络的示意图;4 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application;
图5为本申请实施例提供的一种***架构的示意图;FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图6为本申请实施例提供的一种模型训练方法的实施例示意;FIG. 6 is a schematic diagram of an embodiment of a model training method provided by an embodiment of the present application;
图7为本申请实施例提供的一种BN折叠的示意;7 is a schematic diagram of a BN folding provided by an embodiment of the present application;
图8为本申请实施例提供的一种卷积BN层的结构示意;FIG. 8 is a schematic structural diagram of a convolutional BN layer provided by an embodiment of the present application;
图9为本申请实施例提供的一种元素截取的示意;FIG. 9 is a schematic diagram of an element interception provided by an embodiment of the present application;
图10为本申请实施例提供的一种模型训练装置1000的示意;FIG. 10 is a schematic diagram of a model training apparatus 1000 provided by an embodiment of the present application;
图11为本申请实施例提供的执行设备的一种结构示意图;FIG. 11 is a schematic structural diagram of an execution device provided by an embodiment of the application;
图12是本申请实施例提供的训练设备一种结构示意图;12 is a schematic structural diagram of a training device provided by an embodiment of the present application;
图13为本申请实施例提供的芯片的一种结构示意图。FIG. 13 is a schematic structural diagram of a chip provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below with reference to the accompanying drawings. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、***、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.
首先对人工智能***总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到***的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
(1)基础设施(1) Infrastructure
基础设施为人工智能***提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算***中的智能芯片进行计算。The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有***的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能***中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用***,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能***在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.
本申请实施例可以应用在图片分类、物体检测、语义分割、室内布局(room layout)、图片补全或自动编码等等场景中。The embodiments of the present application may be applied to scenarios such as image classification, object detection, semantic segmentation, room layout, image completion, or automatic coding.
下面以ADAS/ADS视觉感知***和手机美颜两种应用场景为例对本申请的应用场景做简单的介绍。The following briefly introduces the application scenarios of the present application by taking two application scenarios of ADAS/ADS visual perception system and mobile phone beauty as examples.
应用场景1:ADAS/ADS视觉感知***Application Scenario 1: ADAS/ADS Visual Perception System
在ADAS和ADS中,需要实时进行多类型的2D目标检测,包括:动态障碍物(行人(Pedestrian)、骑行者(Cyclist)、三轮车(Tricycle)、轿车(Car)、卡车(Truck)、公交车(Bus)),静态障碍物(交通锥标(TrafficCone)、交通棍标(TrafficStick)、消防栓(FireHydrant)、摩托车(Motocycle)、自行车(Bicycle)),交通标志((TrafficSign)、导向标志(GuideSign)、广告牌(Billboard)、红色交通灯(TrafficLight_Red)/黄色交通灯(TrafficLight_Yellow)/绿色交通灯(TrafficLight_Green)/黑色交通灯(TrafficLight_Black)、路标(RoadSign))。另外,为了准确获取动态障碍物的在3维空间所占的区域,还需要对动态障碍物进行3D估计,输出3D框。为了与激光雷达的数据进行融合,需要获取动态障碍物的Mask,从而把打到动态障碍物上的激光点云筛选出来;为了进行精确的泊车位,需要同时检测出泊车位的4个关键点;为了进行构图定位,需要检测出静态目标的关键点。使用本申请实施例提供的技术方案训练得到的神经网络模型可以完成上述的全部或ADAS/ADS视觉感知***实现一部分功能。In ADAS and ADS, multi-type 2D object detection needs to be performed in real time, including: dynamic obstacles (Pedestrian), Cyclist (Cyclist), Tricycle (Tricycle), Car (Car), Truck (Truck), Bus (Bus)), static obstacles (TrafficCone, TrafficStick, FireHydrant, Motorcycle, Bicycle), Traffic Signs (TrafficSign), Guidance Signs (GuideSign), Billboard (Billboard), Red Traffic Light (TrafficLight_Red)/Yellow Traffic Light (TrafficLight_Yellow)/Green Traffic Light (TrafficLight_Green)/Black Traffic Light (TrafficLight_Black), Road Sign (RoadSign)). In addition, in order to accurately obtain the area occupied by the dynamic obstacle in the 3-dimensional space, it is also necessary to perform a 3D estimation on the dynamic obstacle and output a 3D frame. In order to integrate with the lidar data, it is necessary to obtain the mask of the dynamic obstacle, so as to screen out the laser point cloud hitting the dynamic obstacle; in order to perform accurate parking space, it is necessary to detect the four key points of the parking space at the same time ; In order to locate the composition, it is necessary to detect the key points of the static target. The neural network model trained by using the technical solutions provided in the embodiments of the present application can complete all the above-mentioned functions or realize a part of the functions of the ADAS/ADS visual perception system.
应用场景2:手机美颜功能Application Scenario 2: Mobile Phone Beauty Function
在手机中,通过本申请实施例提供的技术方案训练得到的神经网络模型(例如训练后的第一神经网络模型、第二神经网络模型以及第三神经网络模型)可以检测出人体的Mask和关键点,可以对人体相应的部位进行放大缩小,比如进行收腰和美臀操作,从而输出美颜的图像。In the mobile phone, the neural network model (for example, the trained first neural network model, the second neural network model, and the third neural network model) obtained through the training of the technical solutions provided in the embodiments of the present application can detect the Mask and the key of the human body. Click, you can zoom in and out of the corresponding parts of the human body, such as waist and buttock operations, so as to output beautifying images.
应用场景3:图像分类场景:Application Scenario 3: Image Classification Scenario:
在获取待分类图像后,可以基于神经网络获取待分类图像中的物体的类别,然后可根据待分类图像中物体的类别对待分类图像进行分类。对于摄影师来说,每天会拍很多照片,有动物的,有人物,有植物的。采用本申请的方法可以快速地将照片按照照片中的内容进行分类,可分成包含动物的照片、包含人物的照片和包含植物的照片。After acquiring the image to be classified, the category of the object in the image to be classified can be acquired based on the neural network, and then the image to be classified can be classified according to the category of the object in the image to be classified. For photographers, many photos are taken every day, including animals, people, and plants. Using the method of the present application, photos can be quickly classified according to the content in the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.
对于图像数量比较庞大的情况,人工分类的方式效率比较低下,并且人在长时间处理同一件事情时很容易产生疲劳感,此时分类的结果会有很大的误差;而通过本申请实施例提供的技术方案训练得到的神经网络模型(例如训练后的第一神经网络模型、第二神经网络模型以及第三神经网络模型)可以可以快速地将图像进行分类。In the case of a large number of images, the efficiency of manual classification is relatively low, and people are prone to fatigue when dealing with the same thing for a long time. At this time, the classification results will have large errors; The neural network models (for example, the trained first neural network model, the second neural network model, and the third neural network model) trained by the provided technical solution can quickly classify images.
本申请实施例可以进行神经网络的训练,得到的训练后的神经网络可以进行如上几种场景的任务处理。The embodiments of the present application can perform neural network training, and the obtained trained neural network can perform task processing in the above several scenarios.
神经网络的量化(neural network quantization),是将浮点存储(运算)转换为整型存储(运算)的一种模型压缩技术,例如,原来一个模型的模型参数使用float32(32位的浮点)表示,量化后该模型的模型参数使用int8(8位的定点)表示,通过模型的量化操作,以较小的精度损失为代价,提高模型的运算速度。Neural network quantization is a model compression technique that converts floating point storage (operation) into integer storage (operation). For example, the model parameters of a model used float32 (32-bit floating point) Indicates that the model parameters of the model after quantization are represented by int8 (8-bit fixed-point), and through the quantization operation of the model, the operation speed of the model is improved at the expense of a small loss of precision.
模型的量化的本质是两种数据类型的数据之间的转换/映射,其中,在将浮点数据(数据类型为浮点的数据)转换为定点数据(数据类型为定点的数据)的一种实现方式中,可以通过如下公式:The essence of the quantization of the model is the conversion/mapping between data of two data types. In the implementation mode, the following formula can be used:
Figure PCTCN2021133383-appb-000001
Figure PCTCN2021133383-appb-000001
其中,R为输入的浮点数据,Q为浮点数据R量化之后的定点数据,Z表示0点值(Ze ro Point),S表示比例,可见,确定S和Z后,既可进行这两个数据之间的转换。S和Z的确定方式很多,例如:Among them, R is the input floating-point data, Q is the fixed-point data after quantization of the floating-point data R, Z represents the zero point value (Zero Point), and S represents the scale. It can be seen that after determining S and Z, both conversion between data. There are many ways to determine S and Z, such as:
Figure PCTCN2021133383-appb-000002
Figure PCTCN2021133383-appb-000002
Z=Q max-R max/S; Z= Qmax-Rmax / S;
其中,Rmax表示输入浮点数据的最大值,Rmin表示输入浮点数据的最小值,Qmax表示定点数据的最大的值,Rmin表示定点数据的最小值。Among them, Rmax represents the maximum value of the input floating-point data, Rmin represents the minimum value of the input floating-point data, Qmax represents the maximum value of the fixed-point data, and Rmin represents the minimum value of the fixed-point data.
其中,不同比特数(位数,1比特=1位)的定点数据之间的转换可以参照上述浮点数据和定点数据之间的转换方式,也可以是现有技术中其他的转换方式,这里不再赘述。The conversion between fixed-point data with different numbers of bits (number of bits, 1 bit=1 bit) may refer to the above-mentioned conversion method between floating-point data and fixed-point data, or may be other conversion methods in the prior art, here No longer.
在一种实现中,4比特和8比特可以参照上述的转换方式进行,而浮点数据和2比特(1比特)转换的一种实现方式可通过如下公式进行:In one implementation, 4-bit and 8-bit can be performed with reference to the above-mentioned conversion method, and an implementation of floating-point data and 2-bit (1-bit) conversion can be performed by the following formula:
Figure PCTCN2021133383-appb-000003
Figure PCTCN2021133383-appb-000003
其中2比特可表示为三个数-1,0,1。T为阈值,浮点数据大于等于T时,转换得到的2比特的定点数据为1。浮点数据小于-T时,其值转换为-1。浮点数据为其他值时,其值转换为0。1比特的转换方式和2比特类似,但其定点值只有-1和1,其中T值为0。2 bits can be represented as three numbers - 1, 0, 1. T is the threshold. When the floating-point data is greater than or equal to T, the converted 2-bit fixed-point data is 1. When floating-point data is less than -T, its value is converted to -1. When the floating-point data is other values, its value is converted to 0. The conversion method of 1-bit is similar to that of 2-bit, but its fixed-point values are only -1 and 1, and the T value is 0.
训练感知量化(quantization aware training,QAT)是利用训练数据训练补偿量化的精度损失,其主要流程是:1、对模型训练前***量化算子,2.、在训练过程中统计模型各层(权重和激活)数值的min和max用于计算量化因子。Quantization awareness training (QAT) is to use training data to train to compensate for the loss of quantization accuracy. The main process is: 1. Insert quantization operators before model training, 2. Statistical model layers (weights) during the training process and activation) values of min and max are used to calculate the quantization factor.
QAT在模型训练阶段,需要原模型的权重输入和激活输出上***伪量化节点SimQuant(本实施例中也可称之为量化算子)。此外,对于卷积神经网络(convolution neural network,CNN)以及批量归一化(batch normalization,BN)结构需借助另外一个CNN实现BN折 叠,实现BN系数与CNN权重的融合。在训练过程中,SimQuant会统计对应数据流(Tensor)中min和max值,用于后续scale量化因子的计算。如图2所示,QAT在进行CNN和BN的折叠时,需要构建另一个CNN对当前批batch的数据进行卷积运算,BN利用卷积运算的结果更新BN系数,进而使用更新后的BN系数构建权重,量化算子可以对构建好的权重进行量化和反量化处理,而CNN可以基于反量化后得到的权重,对当前批batch的数据进行卷积运算。然而,由于需借助另外一个CNN实现BN折叠,使得在训练过程中会有两个CNN对同一批batch的数据进行卷积运算,增加了训练过程中CNN的运算量,进而降低了训练速度。During the model training phase of the QAT, a pseudo-quantization node SimQuant (also referred to as a quantization operator in this embodiment) needs to be inserted into the weight input and activation output of the original model. In addition, for the convolutional neural network (CNN) and batch normalization (BN) structures, another CNN is needed to realize BN folding, so as to realize the fusion of BN coefficients and CNN weights. During the training process, SimQuant will count the min and max values in the corresponding data stream (Tensor) for subsequent calculation of the scale quantization factor. As shown in Figure 2, when QAT folds CNN and BN, it needs to construct another CNN to perform convolution operation on the data of the current batch. BN uses the result of the convolution operation to update the BN coefficient, and then uses the updated BN coefficient. To construct weights, the quantization operator can quantify and inverse quantize the constructed weights, while CNN can perform convolution operations on the current batch of data based on the weights obtained after inverse quantization. However, due to the need to use another CNN to achieve BN folding, two CNNs will perform convolution operations on the same batch of data during the training process, which increases the amount of CNN operations during the training process, thereby reducing the training speed.
本申请实施例提供的上神经网络中,在进行CNN和BN的折叠时,可以减少CNN的运算量。In the upper neural network provided by the embodiment of the present application, when the CNN and the BN are folded, the computation amount of the CNN can be reduced.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and neural networks and other related concepts involved in the embodiments of the present application are first introduced below.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs(即输入数据)和截距1为输入的运算单元,该运算单元的输出可以为:A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs (ie input data) and an intercept 1 as input, and the output of the operation unit can be:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
(2)卷积神经网络(Convosutionas Neuras Network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。(2) Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional and subsampling layers. The feature extractor can be viewed as a filter, and the convolution process can be viewed as convolution with an input image or a convolutional feature map using a trainable filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The underlying principle is that the statistics of one part of the image are the same as the other parts. This means that image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more convolution kernels, the richer the image information reflected by the convolution operation.
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(5)反向传播算法(5) Back propagation algorithm
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。The convolutional neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal until the output will generate an error loss, and updating the parameters in the initial super-resolution model by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation motion dominated by the error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
卷积神经网络(CNN,Convolutional neuron nrtwork)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元对输入其中的图像中的重叠区域作出响应。Convolutional neural network (CNN, Convolutional neuron nrtwork) is a deep neural network with a convolutional structure and a deep learning architecture. Learning at multiple levels. As a deep learning architecture, a CNN is a feed-forward artificial neural network in which each neuron responds to overlapping regions in images fed into it.
如图3所示,卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120,其中池化层为可选的,以及神经网络层130。As shown in FIG. 3 , a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
其中,卷积层/池化层120以及神经网络层130组成的结构可以为本申请中所描述的第一卷积层以及第二卷积层,输入层110和卷积层/池化层120连接,卷积层/池化层120连接与神经网络层130连接,神经网络层130的输出可以输入至激活层,激活层可以对神经网络层130的输出进行非线性化处理。The structures composed of the convolutional layer/pooling layer 120 and the neural network layer 130 may be the first convolutional layer and the second convolutional layer described in this application, the input layer 110 and the convolutional layer/pooling layer 120 The convolutional layer/pooling layer 120 is connected to the neural network layer 130, the output of the neural network layer 130 can be input to the activation layer, and the activation layer can perform nonlinear processing on the output of the neural network layer 130.
卷积层/池化层120:Convolutional layer/pooling layer 120:
卷积层:Convolutional layer:
如图3所示卷积层/池化层120可以包括如示例121-126层,在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG. 3, the convolutional/pooling layer 120 may include layers 121-126 as examples. In one implementation, layer 121 is a convolutional layer, layer 122 is a pooling layer, layer 123 is a convolutional layer, and layer 124 is a convolutional layer. Layers are pooling layers, 125 are convolutional layers, and 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, 124 and 125 are convolutional layers, and 126 are pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
以卷积层121为例,卷积层121可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用维度相同的多个权重矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化……该多个权重矩阵维度相同,经过该多个维度相同的权重矩阵提取后的特征图维度也相同,再将提 取到的多个维度相同的特征图合并形成卷积运算的输出。Taking the convolution layer 121 as an example, the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied. The output of each weight matrix is stacked to form the depth dimension of the convolutional image. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification... The dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation .
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入图像中提取信息,从而帮助卷积神经网络100进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如126)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。When the convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (for example, 121) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 100 deepens, the features extracted by the later convolutional layers (eg 126) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,即如图3中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer, that is, each layer 121-126 exemplified by 120 in Figure 3, which can be a convolutional layer followed by a layer The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
神经网络层130:Neural network layer 130:
在经过卷积层/池化层120的处理后,卷积神经网络100还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层120只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络100需要利用神经网络层130来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层130中可以包括多层隐含层(如图3所示的131、132至13n)以及输出层140,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等……After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to utilize the neural network layer 130 to generate one or a set of outputs of the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 3) and the output layer 140, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc...
在神经网络层130中的多层隐含层之后,也就是整个卷积神经网络100的最后层为输出层140,该输出层140具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络100的前向传播(如图3由110至140的传播为前向传播)完成,反向传播(如图3由140至110的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络100的损失及卷积神经网络100通过输出层输出的结果和理想结果之间的误差。After the multi-layer hidden layers in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 100 (as shown in Fig. 3 from 110 to 140 is forward propagation) is completed, the back propagation (as shown in Fig. 3 from 140 to 110 as back propagation) will start to update The weight values and biases of the aforementioned layers are used to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.
需要说明的是,如图3所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,如图4所示的多个卷积层/池化层并行,将分别提取的特征均输入给全神经网络层130进行处理。It should be noted that the convolutional neural network 100 shown in FIG. 3 is only used as an example of a convolutional neural network. In a specific application, the convolutional neural network can also exist in the form of other network models, for example, such as The multiple convolutional layers/pooling layers shown in FIG. 4 are in parallel, and the extracted features are input to the full neural network layer 130 for processing.
(3)BN:通过小批量的归一化,消除了不同层级输入对参数优化的差异性,减少了模型某一层过拟合的可能性,使得训练更能平稳的进行。BN系数可以有:均值μ、方差σ、尺度参数γ和偏移参数β。(3) BN: Through the normalization of small batches, the differences in parameter optimization of different levels of input are eliminated, the possibility of overfitting of a certain layer of the model is reduced, and the training can be carried out more smoothly. BN coefficients can have: mean μ, variance σ, scale parameter γ and offset parameter β.
(4)BN折叠(BN-folding):主要目的是将BN与CNN的计算融合以减少计算量。该方法主要用于QAT中,使得训练量化可以模拟推理BN融合过程,以便在模型转化中将BN和CNN进行融合(据计算规则将相关的系数根合并成一个系数),加速模型推理效率。(4) BN-folding: The main purpose is to fuse the computation of BN and CNN to reduce the amount of computation. This method is mainly used in QAT, so that the training quantization can simulate the inference BN fusion process, so that BN and CNN can be fused in the model transformation (the related coefficient roots are combined into one coefficient according to the calculation rules), and the model inference efficiency is accelerated.
(5)卷积BN(ConvBn)表示卷积和BN的融合算子,该算子即实现CNN的功能,也实现BN的功能,由于BN的系数对CNN是可见的,因此易于实现BN折叠,使得CNN 卷积权重和相关BN系数进行融合。(5) Convolution BN (ConvBn) represents the fusion operator of convolution and BN. This operator realizes the function of CNN and the function of BN. Since the coefficient of BN is visible to CNN, it is easy to realize BN folding. The CNN convolution weights and the relevant BN coefficients are fused.
图5是本申请实施例提供的一种***架构100的示意图,在图5中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据。FIG. 5 is a schematic diagram of a system architecture 100 provided by an embodiment of the present application. In FIG. 5, the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices. A user may enter data into I/O interface 112 through client device 140 .
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中,执行设备110可以调用数据存储***150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储***150中。When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs computation and other related processing (for example, performing the function realization of the neural network in this application), the execution device 110 may call the data storage system 150 The data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .
最后,I/O接口112将处理结果返回给客户设备140,从而提供给用户。Finally, the I/O interface 112 returns the processing results to the client device 140 for provision to the user.
可选地,客户设备140,例如可以是自动驾驶***中的控制单元、手机终端中的功能算法模块,例如该功能算法模块可以用于实现相关的任务。Optionally, the client device 140 can be, for example, a control unit in an automatic driving system or a functional algorithm module in a mobile phone terminal, for example, the functional algorithm module can be used to implement related tasks.
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则,该相应的目标模型/规则即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth noting that the training device 120 can generate corresponding target models/rules based on different training data for different goals or tasks, and the corresponding target models/rules can be used to achieve the above-mentioned goals or complete the above-mentioned tasks. , which provides the user with the desired result.
在图5中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。In the case shown in FIG. 5 , the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific present form can be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
值得注意的是,图5仅是本申请实施例提供的一种***架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图5中,数据存储***150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储***150置于执行设备110中。It is worth noting that FIG. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 5 , the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
首先以模型训练阶段为例对本申请实施例提供的模型训练方法进行说明。First, the model training method provided by the embodiment of the present application is described by taking the model training stage as an example.
参照图6,图6为本申请实施例提供的一种模型训练方法的实施例示意,如图6示出的那样,本申请实施例提供的一种模型训练方法包括:Referring to FIG. 6 , FIG. 6 is a schematic diagram of an embodiment of a model training method provided by an embodiment of the present application. As shown in FIG. 6 , a model training method provided by an embodiment of the present application includes:
601、获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理。601. Obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the input of the Nth batch of batches according to the first weight. The data is subjected to convolution processing, the result of the convolution processing is normalized according to the BN coefficient, and the BN coefficient is updated based on the result of the normalization processing, and the updated first weight is updated on the BN coefficient, The first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the convolutional BN layer is also used to quantify the input value according to the second weight. The data of the N+1 batch is processed by convolution.
其中,第一神经网络模型可以为对一个预训练pre-trained模型进行BN折叠处理以及添 加量化算子(也可以称之为伪量化节点SimQuant)得到的。The first neural network model may be obtained by performing BN folding processing on a pre-trained pre-trained model and adding a quantization operator (also referred to as a pseudo-quantization node SimQuant).
本申请实施例中,训练设备可以获取第二神经网络模型,所述第二神经网络模型为预训练pre-trained模型,所述第二神经网络模型包括所述第一卷积层以及所述第一BN层;对所述第二神经网络模型中的所述第一卷积层以及所述第一BN层进行折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的卷积BN层,所述卷积BN层包括所述第一卷积层以及所述第一BN层。In this embodiment of the present application, the training device may obtain a second neural network model, where the second neural network model is a pre-trained pre-trained model, and the second neural network model includes the first convolutional layer and the second neural network model. a BN layer; the first convolutional layer and the first BN layer in the second neural network model are folded to obtain the first neural network model, and the first neural network model includes A convolutional BN layer obtained after folding the first convolutional layer and the first BN layer, the convolutional BN layer includes the first convolutional layer and the first BN layer.
本申请实施例中,第二神经网络模型为预训练pre-trained模型,第二神经网络模型被训练使得其针对于特定的任务具备较高的数据处理精度。为了对第二神经网络模型进行权重量化(更具体的,可以是对卷积层中的权重进行量化),可以在第二神经网络模型中***量化算子,并将卷积层与BN层进行折叠处理。In the embodiment of the present application, the second neural network model is a pre-trained pre-trained model, and the second neural network model is trained so that it has high data processing accuracy for specific tasks. In order to quantize the weights of the second neural network model (more specifically, the weights in the convolutional layer can be quantized), a quantization operator can be inserted into the second neural network model, and the convolutional layer and the BN layer can be quantized. Fold processing.
具体的,可以参照图7,图7为本申请实施例提供的一种BN折叠的示意,如图7所示,第二神经网络模型包括所述第一卷积层以及所述第一BN层,训练设备可以对所述第二神经网络模型中的所述第一卷积层以及所述第一BN层进行折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的卷积BN层。Specifically, please refer to FIG. 7 , which is a schematic diagram of a BN folding provided by an embodiment of the present application. As shown in FIG. 7 , the second neural network model includes the first convolution layer and the first BN layer , the training device may perform folding processing on the first convolutional layer and the first BN layer in the second neural network model to obtain the first neural network model, where the first neural network model includes A convolutional BN layer obtained by folding the first convolutional layer and the first BN layer.
在一种可能的实现中,所述第二神经网络模型中的所述第一卷积层用于根据目标权重对输入数据进行卷积处理。为了对目标权重进行量化以及实现BN折叠,需要将目标权重与BN系数进行相乘,并通过量化算子对乘积结果进行量化和反量化处理,之后将反量化结果作为第一卷积层的权重。In a possible implementation, the first convolution layer in the second neural network model is configured to perform convolution processing on the input data according to target weights. In order to quantify the target weight and realize BN folding, the target weight needs to be multiplied by the BN coefficient, and the multiplication result is quantized and inverse quantized by the quantization operator, and then the inverse quantization result is used as the weight of the first convolution layer. .
具体的,所述第一神经网络模型包括卷积BN层和第一量化算子,卷积BN层可以包括第一卷积层和第一批量归一化BN层,所述第一卷积层用于根据第一权重对输入的数据进行卷积处理,以得到第一输出,所述第一BN层用于根据BN系数对所述第一输出进行归一化处理,并基于归一化处理结果更新所述BN系数,所述第一量化算子用于根据第一量化因子对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述更新后的第一权重为根据更新后的BN系数得到的,所述第一卷积层用于根据所述第二权重对输入的数据进行卷积处理。Specifically, the first neural network model includes a convolutional BN layer and a first quantization operator, the convolutional BN layer may include a first convolutional layer and a first batch normalized BN layer, and the first convolutional layer It is used to perform convolution processing on the input data according to the first weight to obtain the first output, and the first BN layer is used for normalizing the first output according to the BN coefficient, and based on the normalization processing As a result, the BN coefficient is updated, and the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight according to the first quantization factor, so as to obtain a second weight, and the updated first weight is A weight is obtained according to the updated BN coefficient, and the first convolution layer is configured to perform convolution processing on the input data according to the second weight.
本申请实施例中,针对于多个批batch的数据,可以通过上一批batch的数据来更新BN系数,并基于上一批batch的数据更新的BN系数,来更新本次卷积层的权重,具体的,所述第一神经网络模型包括第一卷积层、第一批量归一化BN层以及第一量化算子,所述第一卷积层用于根据第一权重对输入的第N批batch的数据进行卷积处理,以得到第一输出,所述第一BN层用于根据BN系数对所述第一输出进行归一化处理,并基于归一化处理结果更新所述BN系数,所述第一量化算子用于根据第一量化因子对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述更新后的第一权重为根据更新后的BN系数得到的,所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到第二输出。其中,所述第一权重为根据所述BN系数与所述目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的。In the embodiment of the present application, for multiple batches of data, the BN coefficient can be updated through the data of the previous batch, and the weight of this convolutional layer can be updated based on the updated BN coefficient of the data of the previous batch. , specifically, the first neural network model includes a first convolution layer, a first batch normalized BN layer and a first quantization operator, and the first convolution layer is used for inputting the first The data of N batches are subjected to convolution processing to obtain the first output. The first BN layer is used to normalize the first output according to the BN coefficient, and update the BN based on the normalization processing result. coefficient, the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight according to the first quantization factor to obtain a second weight, and the updated first weight is based on the updated first weight. The first convolution layer is configured to perform convolution processing on the input data of the N+1th batch of batches according to the second weight, so as to obtain the second output. The first weight is obtained according to the product of the BN coefficient and the target weight, and the updated first weight is obtained by multiplying the updated BN coefficient and the target weight. .
具体的,可以参照图8,图8为本申请实施例提供的一种卷积BN层的结构示意,如图8 所示,conv表示卷积层,bn表示BN层,div表示除法,mul表示乘法,第一卷积层conv可以对上一批batch的数据(也就是第N批batch的数据)进行卷积处理,以得到第一输出。应理解,第一输出是第一卷积层conv对上一批batch的数据进行卷积处理得到的卷积处理结果与BN系数相除的结果。Specifically, reference may be made to FIG. 8, which is a schematic structural diagram of a convolutional BN layer provided by an embodiment of the present application. As shown in FIG. 8, conv represents the convolution layer, bn represents the BN layer, div represents division, and mul represents Multiplication, the first convolution layer conv can perform convolution processing on the data of the previous batch (that is, the data of the Nth batch) to obtain the first output. It should be understood that the first output is the result of dividing the convolution processing result obtained by the first convolution layer conv performing convolution processing on the data of the previous batch by the BN coefficient.
第一输出可以作为第一BN层的输入,第一BN层可以对所述第一输出进行归一化处理,并基于归一化处理结果更新所述BN系数,其中,在训练过程中,BN层是基于前馈过程中卷积层的输出特征的均值和标准差来进行BN运算的,示例性的,所述第一BN层与所述第一卷积层连接,所述第一BN层用于根据所述第一卷积层的第一输出的均值和标准差对所述第一输出进行BN运算,之后,训练设备可以基于运算结果来更新BN系数,其中,BN系数可以包括但不限于均值μ、方差σ、尺度参数γ和偏移参数β中的至少一种,或者是其中任意多种之间的运算结果。The first output can be used as the input of the first BN layer, and the first BN layer can normalize the first output, and update the BN coefficients based on the results of the normalization processing. layer is based on the mean value and standard deviation of the output features of the convolutional layer in the feedforward process to perform BN operation. Exemplarily, the first BN layer is connected to the first convolutional layer, and the first BN layer is For performing BN operation on the first output according to the mean value and standard deviation of the first output of the first convolution layer, after that, the training device can update the BN coefficient based on the operation result, wherein the BN coefficient can include but not It is limited to at least one of the mean μ, the variance σ, the scale parameter γ, and the offset parameter β, or an operation result between any of them.
本申请实施例中,所述卷积BN层为对卷积层和BN层进行折叠得到的,所述第一权重为根据所述BN系数与目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重为所述卷积层中包括的权重。更新BN系数之后可以得到的更新后的BN系数(如图8所示的尺度参数γ new与方差σ new),之后可以对尺度参数γ new与方差σ new相除,得到γ/σ,为了进行BN折叠,可以将更新后的BN系数(例如γ/σ)与目标权重W相乘,并将乘积结果(γ/σ*W)输入至第一量化算子,第一量化算子用于根据第一量化因子对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,第一卷积层可以根据第二权重对下一批batch的数据(也就是第N+1批batch的数据)进行卷积处理,以得到第二输出。应理解,还需要设置量化算子中需要量化的bit数。 In the embodiment of this application, the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the updated The first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer. After updating the BN coefficient, the updated BN coefficient (the scale parameter γ new and the variance σ new as shown in Figure 8) can be obtained, and then the scale parameter γ new and the variance σ new can be divided to obtain γ/σ. BN folding, the updated BN coefficient (eg γ/σ) can be multiplied by the target weight W, and the product result (γ/σ*W) can be input to the first quantization operator, which is used to The first quantization factor performs quantization processing and inverse quantization processing on the updated first weight to obtain the second weight, and the first convolution layer can quantify the next batch of data (that is, the N+th 1 batch of data) for convolution processing to get the second output. It should be understood that the number of bits to be quantized in the quantization operator also needs to be set.
具体的,为了识别出第二神经网络模型中需要进行BN折叠的卷积层和BN层,可以根据第二神经网络模型的计算流图中的算子类型来判别模型中需要进行BN折叠的卷积层和BN层的结构(本实施例也可以描述为CNN+BN结构),并将识别出的CNN+BN结构组合成一个block(也就是上述实施例中的卷积BN层);之后可以将组合成的卷积BN层替换掉原本的CNN+BN结构。Specifically, in order to identify the convolutional layer and BN layer that need to be BN-folded in the second neural network model, the volume that needs to be BN-folded in the model can be identified according to the operator type in the computational flow graph of the second neural network model. The structure of the stacked layer and the BN layer (this embodiment can also be described as a CNN+BN structure), and the identified CNN+BN structure is combined into a block (that is, the convolutional BN layer in the above embodiment); Replace the original CNN+BN structure with the combined convolutional BN layer.
现有技术中,第一卷积层是通过BN层对当前批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此需要除了第一卷积层之外,再单独设置一个卷积层进行数据处理来使得BN层可以基于当前批batch的数据来更新BN系数,而本申请实施例中,由于第一卷积层是通过BN层对上一批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此无需再单独多设置一个卷积层,一方面,可减小模型大小,另一方面,也减少了神经网络中卷积层的数据运算量。由于训练过程是一个需要迭代大量次数的过程,训练设备的可使用计算资源是有限的,本实施例中在训练过程中将神经网络中的卷积层都相应减少了一次卷积运算,在大量次数的训练过程中,可以大量减少训练设备的运算资源消耗,进而提升了训练的速度。In the prior art, the first convolutional layer determines the weight used by the current batch of batches through the BN coefficients updated after the BN layer processes the data of the current batch. Therefore, in addition to the first convolutional layer, another A separate convolutional layer is set for data processing so that the BN layer can update the BN coefficient based on the data of the current batch. The BN coefficients updated after processing are used to determine the weights used in the current batch, so there is no need to set up a separate convolutional layer. amount of data operations. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
此外,训练的前向图和推理图相同,可降低模型保存和转化的复杂度。In addition, the trained forward graph and inference graph are the same, which reduces the complexity of model saving and translation.
在一种可能的实现中,所述第一神经网络模型中的所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到卷积处理结果,并将所述卷积处 理结果与所述更新后的BN系数进行相除,以得到第二输出。In a possible implementation, the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain a second output.
本申请实施例中,可以对第一神经网络模型中的第一权重进行初始化,具体的,可以将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素;将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重。In this embodiment of the present application, the first weight in the first neural network model may be initialized. Specifically, the BN coefficient may be multiplied by the target weight to obtain the first target tensor, and the first target tensor may be obtained. A target tensor includes M elements; the N target elements with the largest absolute value among the M elements included in the first target tensor are replaced with the M elements other than the N target elements The largest element of the M-N elements to get the first weight.
本申请实施例中,可以利用第二神经网络模型(pre-trained模型)中权重和BN的系数对第一神经网络模型中的第一权重进行初始化。具体的,可以根据pre-trained模型将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,例如第一目标张量可以为γ/σ*W,之后对第一目标张量中各个元素按照大小进行排序,并按照对称方式,截取主干部分数值(如:截取95%~99.5%),并将剩余的元素替换为主干部分数值中最大的值,以此实现第一权重的初始化。In this embodiment of the present application, the weights in the second neural network model (pre-trained model) and the coefficients of BN may be used to initialize the first weights in the first neural network model. Specifically, the BN coefficient and the target weight can be multiplied according to the pre-trained model to obtain the first target tensor, for example, the first target tensor can be γ/σ*W, and then the first target tensor Each element in the tensor is sorted according to size, and in a symmetrical manner, intercepts the value of the main part (eg, intercepts 95% to 99.5%), and replaces the remaining elements with the largest value in the value of the main part, so as to realize the first weight. initialization.
例如,可以如图9所示,图9为本申请实施例提供的一种元素截取的示意,在将第一目标张量中的元素进行由大到小的排列后,可以得到图9中所示的分布,其中,可以截取其中一定百分比的元素,百分比可以但不限于是95%~99.5%,且95%~99.5%的元素可以是元素分布中主干部分的元素,也就是绝对值靠近0的元素。For example, as shown in FIG. 9 , which is a schematic diagram of an element interception provided by an embodiment of the present application, after the elements in the first target tensor are arranged from large to small, the result shown in FIG. 9 can be obtained. distribution, in which a certain percentage of elements can be intercepted, the percentage can be but not limited to 95% to 99.5%, and 95% to 99.5% of the elements can be the elements in the main part of the element distribution, that is, the elements whose absolute value is close to 0 .
本申请实施例中,第一目标张量中绝对值较大的元素数量较少,在进行后续量化和反量化的过程中,由于绝对值很大,会对运算的精度造成影响,例如会对量化因子的其他元素进行不必要的平滑,本申请实施例通过对第一目标张量进行元素的截取,提高了神经网络模型处理的精度。In the embodiment of the present application, the number of elements with larger absolute values in the first target tensor is small. In the process of subsequent quantization and inverse quantization, due to the large absolute values, the accuracy of the operation will be affected, for example, the quantization factor will be affected. unnecessary smoothing is performed on other elements of the first target tensor, and the embodiment of the present application improves the processing accuracy of the neural network model by performing element interception on the first target tensor.
在一种可能的实现中,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据第二量化因子对所述第四输出进行量化处理以及反量化处理。In a possible implementation, the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process the input data to obtain a third output, so The first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, the The second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
和上述实施例类似,本申请实施例中,为了对第二神经网络中各个激活层的输出进行量化,还可以在激活层的输出位置增加第二量化算子,具体的,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据第二量化因子对所述第四输出进行量化处理以及反量化处理。Similar to the above-mentioned embodiment, in this embodiment of the present application, in order to quantify the output of each activation layer in the second neural network, a second quantization operator may also be added to the output position of the activation layer. The network model further includes a target activation layer, the target activation layer in the second neural network model is used to process the input data to obtain a third output, and the first neural network model further includes the target activation layer and a second quantization operator, the target activation layer in the first neural network model is used for processing the input data to obtain a fourth output, and the second quantization operator is used for pairing according to the second quantization factor The fourth output is subjected to quantization processing and inverse quantization processing.
在一种可能的实现中,所述第三输出为第二目标张量,所述第二目标张量包括X个元素,所述方法还包括,获取所述X个元素中绝对值最大的Y个目标元素,将所述第二目标张量中的所述Y个目标元素替换为所述X个元素中除所述Y个目标元素之外的X-Y个元素中的最大的元素,以得到所述第二量化因子。In a possible implementation, the third output is a second target tensor, and the second target tensor includes X elements, and the method further includes: obtaining a Y with the largest absolute value among the X elements target elements, replace the Y target elements in the second target tensor with the largest element among the X-Y elements other than the Y target elements among the X elements, so as to obtain the first Two quantization factors.
和上述实施例类似,本申请实施例中,在对位于激活层输出位置的量化因子进行初始化的过程中,将第一二目标张量中的元素进行由大到小的排列后,可以截取其中一定百分 比的元素,百分比可以但不限于是95%~99.5%,且95%~99.5%的元素可以是元素分布中主干部分的元素,也就是绝对值靠近0的元素。Similar to the above embodiment, in the embodiment of this application, in the process of initializing the quantization factor located at the output position of the activation layer, after arranging the elements in the first and second target tensors from large to small, a certain percentage of them can be intercepted. , the percentage can be, but not limited to, 95% to 99.5%, and 95% to 99.5% of the elements can be elements in the backbone part of the element distribution, that is, elements with an absolute value close to 0.
602、对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。602. Perform model training on the first neural network model to obtain a trained first neural network model.
本申请实施例中,在获取第一神经网络模型之后,可以对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。In this embodiment of the present application, after the first neural network model is obtained, model training may be performed on the first neural network model to obtain the trained first neural network model.
具体的,可以根据设置的epoch对模型进行量化训练,在训练过程中,若当前epoch进行freeze-bn操作;则在当前epoch训练得到量化模型,并在当前epoch推理验证当前量化模型;若当前epoch不进行freeze-bn操作,则在当前epoch训练得到量化模型,并在当前epoch推理验证当前量化模型。Specifically, the model can be quantized and trained according to the set epoch. During the training process, if the current epoch performs the freeze-bn operation; If the freeze-bn operation is not performed, the quantized model is obtained by training at the current epoch, and the current quantized model is verified by reasoning at the current epoch.
在一种可能的实现中,所述训练后的第一神经网络模型包括训练后的第一量化因子以及训练后的BN系数,训练设备还可以根据所述训练后的第一量化因子以及所述训练后的BN系数,对所述第一神经网络模型进行量化,以得到第三神经网络模型,所述第三神经网络模型包括量化后的所述第一卷积层,所述第一卷积层用于根据量化后的权重对输入数据进行卷积处理,所述量化后的权重为根据所述第一量化因子以及所述训练后的BN系数得到的。In a possible implementation, the trained first neural network model includes a trained first quantization factor and a trained BN coefficient, and the training device can also use the trained first quantization factor and the After training the BN coefficients, the first neural network model is quantized to obtain a third neural network model, where the third neural network model includes the quantized first convolutional layer, the first convolutional The layer is used to perform convolution processing on the input data according to the quantized weight obtained according to the first quantization factor and the trained BN coefficient.
示例性的,以第一卷积层的输入X为UINT型,第一卷积层的权重W为INT型,要转成UINT型推理为例,可以载入第三神经网络模型到converter,并对模型每层根据
Figure PCTCN2021133383-appb-000004
将进行权重量化,并保存为UINT型,其中bits为量化的bit数,如8bit之后将各层的scale量化因子值和权重的量化等保存至推理模型。
Exemplarily, taking the input X of the first convolutional layer as a UINT type, and the weight W of the first convolutional layer as an INT type, to convert it into a UINT-type inference as an example, you can load the third neural network model into the converter, and For each layer of the model according to
Figure PCTCN2021133383-appb-000004
The weight will be quantized and saved as a UINT type, where bits is the number of quantized bits. For example, after 8 bits, the scale quantization factor value of each layer and the quantization of the weight are saved to the inference model.
本申请实施例提供了一种模型训练方法,所述方法包括:获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。通过上述方式,由于第一卷积层是通过BN层对上一批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此无需再单独多设置一个卷积层,一方面,可减小模型大小,另一方面,也减少了神经网络中卷积层的数据运算量。由于训练过程是一个需要迭代大量次数的过程,训练设备的可使用计算资源是有限的,本实施例中在训练过程中将神经网络中的卷积层都相应减少了一次卷积运算,在大量次数的训练过程中,可以大量减少训练设备的运算资源消耗,进而提升了训练的速度。An embodiment of the present application provides a model training method, the method includes: acquiring a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN The layer is used to perform convolution processing on the input data of the Nth batch of batches according to the first weight, normalize the convolution processing results according to the BN coefficients, and update the BN coefficients based on the normalization processing results. The updated BN coefficient is used to update the first weight, and the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the volume The product BN layer is also used to perform convolution processing on the input N+1 batch of data according to the second weight, and perform model training on the first neural network model to obtain the trained first neural network model. . In the above method, since the first convolutional layer uses the BN layer to process the data of the previous batch, the updated BN coefficient determines the weight of the current batch, so there is no need to set up a separate convolutional layer. On the one hand, the size of the model can be reduced, and on the other hand, the amount of data operations in the convolutional layer in the neural network is also reduced. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
接下来结合一个具体的实例对本申请实施例中的模型训练方法进行描述。Next, the model training method in the embodiment of the present application will be described with reference to a specific example.
本实施例中可以定义两种卷积BN层ConvBn:1.ConvBnV1仅***权重量化节点;2.ConvBnV2即***权重量化节点,也***了激活量化节点。对于【CNN+BN+激活算子】的结构直接替换为ConvBnV1,但对于【CNN+BN】的结构(即BN后直接输出,不接激活算子)直接替换为ConvBnV2。In this embodiment, two convolutional BN layers ConvBn can be defined: 1. ConvBnV1 only inserts weight quantization nodes; 2. ConvBnV2 inserts both weight quantization nodes and activation quantization nodes. For the structure of [CNN+BN+activation operator], it is directly replaced by ConvBnV1, but for the structure of [CNN+BN] (that is, directly output after BN, without activation operator), it is directly replaced by ConvBnV2.
在训练阶段,以激活量化bit数设为8,首层权重量化bit数设为8,其余权重量化bit数设为4为例进行说明,首先可以根据MobileNetV2结构在激活算子ReLU6后***激活量化节点(量化范围:0~255,量化bit数设为8),在残差结构的Add算子后***量化节点(量化范围:-127~127,量化bit数设为8),在全连接(FC)算子后******量化节点(量化范围:-7~7,量化bit数设为4),然年后扫描MobileNetV2中的【CNN+BN+ReLU6】和【CNN+BN】结构,并分别替换为ConvBnV1和ConvBnV2,实现BN折叠;对ConvBnV2算子中激活的量化bit数设为8,ConvBnV1和ConvBnV2中的权重量化bit数根据情况设置(例如将模型首层的权重量化bit数设为8,其余层权重的量化bit数设为4);之后载入pre-trained模型,将模型结构与对应的权重逐层一一对应,并利用pre-trained模型中权重和BN的系数对相应的scale量化因子值进行初始化,主干截断的比例设置为95%;利用pre-trained模型和任意筛选256张训练集数据进行推理得到各层的激活X,利用各层的X初始对应的scale量化因子,主干截断的比例为99.5%;根据epoch=20对模型进行量化训练,并在当前epoch推理验证当前量化模型。In the training phase, the number of activation quantization bits is set to 8, the first layer weight quantization bits are set to 8, and the rest weight quantization bits are set to 4 as an example. First, according to the MobileNetV2 structure, the activation quantization can be inserted after the activation operator ReLU6. node (quantization range: 0 to 255, quantization bit number is set to 8), insert a quantization node (quantization range: -127 to 127, quantization bit number is set to 8) after the Add operator of the residual structure, and the full connection ( After the FC) operator, insert the quantization node (quantization range: -7 to 7, the number of quantization bits is set to 4), and then scan the [CNN+BN+ReLU6] and [CNN+BN] structures in MobileNetV2 after the year, and respectively Replace with ConvBnV1 and ConvBnV2 to realize BN folding; set the number of quantization bits activated in the ConvBnV2 operator to 8, and set the number of weight quantization bits in ConvBnV1 and ConvBnV2 according to the situation (for example, set the number of weight quantization bits in the first layer of the model to 8 , the number of quantized bits of the weights of the remaining layers is set to 4); then load the pre-trained model, map the model structure to the corresponding weights layer by layer, and use the weights in the pre-trained model and the BN coefficients to the corresponding scale The quantization factor value is initialized, and the proportion of trunk truncation is set to 95%; the pre-trained model and the arbitrary selection of 256 training set data are used for inference to obtain the activation X of each layer, and the scale quantization factor corresponding to the initial X of each layer is used. The truncation ratio is 99.5%; the model is quantized and trained according to epoch=20, and the current quantized model is verified by reasoning at the current epoch.
在模型转化阶段,以全8bit的量化为例,模型权重量化范围为-127~127,ReLU6的量化范围为0~255,BN后不接激活的量化范围为-127~127,残差结构的add后的量化范围为-127~127。对于这个模型要转化为UINT的推理,首先可以载入量化后的模型到converter,并对模型每层根据
Figure PCTCN2021133383-appb-000005
将进行权重量化,并保存为UINT型,其中bits为量化的bit数,如8bit之后将各层的scale量化因子值和权重的量化等保存至推理模型。
In the model transformation stage, taking full 8-bit quantization as an example, the model weight quantization range is -127~127, the ReLU6 quantization range is 0~255, the quantization range without activation after BN is -127~127, the residual structure The quantization range after add is -127~127. For the reasoning that this model is to be converted into UINT, the quantized model can be loaded into the converter first, and each layer of the model can be
Figure PCTCN2021133383-appb-000005
The weight will be quantized and saved as a UINT type, where bits is the number of quantized bits. For example, after 8 bits, the scale quantization factor value of each layer and the quantization of the weight are saved to the inference model.
参照图10,图10为本申请实施例提供的一种模型训练装置1000的示意,如图10中示出的那样,本申请提供的模型训练装置1000包括:Referring to FIG. 10 , FIG. 10 is a schematic diagram of a model training apparatus 1000 provided by an embodiment of the present application. As shown in FIG. 10 , the model training apparatus 1000 provided by the present application includes:
获取模块1001,用于获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理;The obtaining module 1001 is used to obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the inputted first weight according to the first weight. Perform convolution processing on N batches of data, normalize the convolution processing results according to the BN coefficients, update the BN coefficients based on the normalization processing results, and perform the first step on the updated BN coefficients. Weight update, the first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the convolutional BN layer is also used to perform quantization and inverse quantization processing on the updated first weight. The weight performs convolution processing on the input N+1 batch data;
模型训练模块1002,用于对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。The model training module 1002 is configured to perform model training on the first neural network model to obtain the trained first neural network model.
现有技术中,第一卷积层是通过BN层对当前批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此需要除了第一卷积层之外,再单独设置一个卷积层进行数据处理来使得BN层可以基于当前批batch的数据来更新BN系数,而本申请实施例中,由于第一卷积层是通过BN层对上一批batch的数据进行处理后更新的BN系数来确定当前批batch所采用的权重的,因此无需再单独多设置一个卷积层,一方面,可减小模型大小,另一方面,也减少了神经网络中卷积层的数据运算量。由于训练过程是一个需要迭代大量次数的过程,训练设备的可使用计算资源是有限的,本实施例中在训练过程中将神经网络 中的卷积层都相应减少了一次卷积运算,在大量次数的训练过程中,可以大量减少训练设备的运算资源消耗,进而提升了训练的速度。In the prior art, the first convolutional layer determines the weight used by the current batch of batches through the BN coefficients updated after the BN layer processes the data of the current batch. Therefore, in addition to the first convolutional layer, another A separate convolutional layer is set for data processing so that the BN layer can update the BN coefficient based on the data of the current batch. The BN coefficients updated after processing are used to determine the weights used in the current batch, so there is no need to set up a separate convolutional layer. amount of data operations. Since the training process is a process that requires a large number of iterations, the available computing resources of the training equipment are limited. In this embodiment, the convolution layers in the neural network are reduced by one convolution operation during the training process. During the training process of the number of times, the computing resource consumption of the training equipment can be greatly reduced, thereby improving the training speed.
在一种可能的实现中,所述获取模块1001,用于获取第二神经网络模型,所述第二神经网络模型为预训练pre-trained模型,所述第二神经网络模型包括第一卷积层以及第一BN层;对所述第一卷积层以及所述第一BN层进行BN折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的所述卷积BN层。In a possible implementation, the obtaining module 1001 is configured to obtain a second neural network model, where the second neural network model is a pre-trained pre-trained model, and the second neural network model includes a first convolutional model. layer and the first BN layer; perform BN folding processing on the first convolution layer and the first BN layer to obtain the first neural network model, and the first neural network model includes The convolutional layer and the convolutional BN layer obtained after the first BN layer is folded.
在一种可能的实现中,所述卷积BN层为对卷积层和BN层进行折叠得到的,所述第一权重为根据所述BN系数与目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重为所述卷积层中包括的权重。In a possible implementation, the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, the first weight is obtained according to the product of the BN coefficient and the target weight, and the update The latter first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolutional layer.
在一种可能的实现中,所述装置还包括:In a possible implementation, the apparatus further includes:
乘积运算模块,用于将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素;a product operation module, configured to perform a product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements;
元素替换模块,用于将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重。An element replacement module, configured to replace N target elements with the largest absolute value among the M elements included in the first target tensor with M-N elements other than the N target elements among the M elements The largest of the elements to get the first weight.
本申请实施例中,可以利用第二神经网络模型(pre-trained模型)中权重和BN的系数对第一神经网络模型中的第一权重进行初始化。具体的,可以根据pre-trained模型将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,例如第一目标张量可以为γ/σ*W,之后对第一目标张量中各个元素按照大小进行排序,并按照对称方式,截取主干部分数值(如:截取95%~99.5%),并将剩余的元素替换为主干部分数值中最大的值,以此实现第一权重的初始化。其中,第一权重为张量的形式。In this embodiment of the present application, the weights in the second neural network model (pre-trained model) and the coefficients of BN may be used to initialize the first weights in the first neural network model. Specifically, the BN coefficient and the target weight can be multiplied according to the pre-trained model to obtain the first target tensor, for example, the first target tensor can be γ/σ*W, and then the first target tensor Each element in the tensor is sorted according to size, and in a symmetrical manner, intercepts the value of the main part (eg, intercepts 95% to 99.5%), and replaces the remaining elements with the largest value in the value of the main part, so as to realize the first weight. initialization. Among them, the first weight is in the form of a tensor.
第一目标张量中绝对值较大的元素数量较少,在进行后续量化和反量化的过程中,由于绝对值很大,会对运算的精度造成影响,例如会对量化因子的其他元素进行不必要的平滑,本申请实施例通过对第一目标张量进行元素的截取,提高了神经网络模型处理的精度。The number of elements with large absolute values in the first target tensor is small. In the process of subsequent quantization and inverse quantization, due to the large absolute value, the accuracy of the operation will be affected. For example, other elements of the quantization factor will be unnecessary. smoothing, the embodiment of the present application improves the processing accuracy of the neural network model by performing element interception on the first target tensor.
在一种可能的实现中,所述第一神经网络模型中的所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到卷积处理结果,并将所述卷积处理结果与所述更新后的BN系数进行相除,以得到所述第二输出。In a possible implementation, the first convolution layer in the first neural network model is configured to perform convolution processing on the input data of the N+1th batch according to the second weight, to obtain convolution processing result, and dividing the convolution processing result with the updated BN coefficient to obtain the second output.
在一种可能的实现中,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据第二量化因子对所述第四输出进行量化处理以及反量化处理。In a possible implementation, the second neural network model further includes a target activation layer, and the target activation layer in the second neural network model is used to process the input data to obtain a third output, so The first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data to obtain a fourth output, the The second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
在一种可能的实现中,所述第三输出为第二目标张量,所述第二目标张量包括X个元素,所述获取模块,用于获取所述X个元素中绝对值最大的Y个目标元素;In a possible implementation, the third output is a second target tensor, and the second target tensor includes X elements, and the obtaining module is configured to obtain the largest absolute value among the X elements. Y target elements;
所述元素替换模块,用于将所述第二目标张量中的所述Y个目标元素替换为所述X个元素中除所述Y个目标元素之外的X-Y个元素中的最大的元素,以得到所述第二量化因子。The element replacement module is configured to replace the Y target elements in the second target tensor with the largest element in the X-Y elements other than the Y target elements among the X elements, to The second quantization factor is obtained.
和上述实施例类似,本申请实施例中,在对位于激活层输出位置的量化因子进行初始化的过程中,将第二目标张量中的元素进行由大到小的排列后,可以截取其中一定百分比的元素,百分比可以但不限于是95%~99.5%,且95%~99.5%的元素可以是元素分布中主干部分的元素,也就是绝对值靠近0的元素。Similar to the above embodiment, in the embodiment of the present application, in the process of initializing the quantization factor located at the output position of the activation layer, after the elements in the second target tensor are arranged from large to small, a certain percentage of them can be intercepted. For elements, the percentage may be, but not limited to, 95% to 99.5%, and 95% to 99.5% of the elements may be elements in the main part of the element distribution, that is, elements with an absolute value close to 0.
在一种可能的实现中,所述训练后的第一神经网络模型包括训练后的第一量化因子以及训练后的BN系数,所述装置还包括:In a possible implementation, the trained first neural network model includes the trained first quantization factor and the trained BN coefficient, and the device further includes:
量化模块,用于根据所述训练后的第一量化因子以及所述训练后的BN系数,对所述第一神经网络模型进行量化,以得到第三神经网络模型,所述第三神经网络模型包括量化后的所述第一卷积层,所述量化后的第一卷积层用于根据量化后的权重对输入数据进行卷积处理,所述量化后的权重为根据所述第一量化因子以及所述训练后的BN系数得到的。A quantization module, configured to quantify the first neural network model according to the trained first quantization factor and the trained BN coefficient to obtain a third neural network model, the third neural network model Including the quantized first convolution layer, the quantized first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is based on the first quantized weight. factor and the trained BN coefficients.
乘积运算模块的相关描述可以参照上述实施例中,关于如何将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素的描述,这里不再赘述。For the relevant description of the product operation module, refer to the description of how to perform the product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements in the above-mentioned embodiment. , which will not be repeated here.
元素替换模块的相关描述可以参照上述实施例中,关于如何将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重的描述,这里不再赘述。For the relevant description of the element replacement module, refer to the above-mentioned embodiment for how to replace the N target elements with the largest absolute value among the M elements included in the first target tensor with the M elements divided by the The largest element among the M-N elements other than the N target elements is used to obtain the description of the first weight, which will not be repeated here.
量化模块可以参照上述实施例中,关于如何根据所述训练后的第一量化因子以及所述训练后的BN系数,对所述第一神经网络模型进行量化,以得到第三神经网络模型,所述第三神经网络模型包括量化后的所述第一卷积层,所述量化后的第一卷积层用于根据量化后的权重对输入数据进行卷积处理,所述量化后的权重为根据所述第一量化因子以及所述训练后的BN系数得到的,这里不再赘述。The quantization module can refer to the above-mentioned embodiment for how to quantize the first neural network model according to the trained first quantization factor and the trained BN coefficient to obtain a third neural network model, so The third neural network model includes the quantized first convolution layer, and the quantized first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is It is obtained according to the first quantization factor and the trained BN coefficient, which will not be repeated here.
接下来介绍本申请实施例提供的一种执行设备,请参阅图11,图11为本申请实施例提供的执行设备的一种结构示意图,执行设备1200具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、服务器等,此处不做限定。其中,执行设备1200上可以部署有图10对应实施例中所描述的数据处理装置,用于实现图10对应实施例中数据处理的功能。具体的,执行设备1200包括:接收器1201、发射器1202、处理器1203和存储器1204(其中执行设备1200中的处理器1203的数量可以一个或多个,图11中以一个处理器为例),其中,处理器1203可以包括应用处理器12031和通信处理器12032。在本申请的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接。Next, an execution device provided by an embodiment of the present application will be introduced. Please refer to FIG. 11. FIG. 11 is a schematic structural diagram of the execution device provided by the embodiment of the present application. Smart wearable devices, servers, etc., are not limited here. The data processing apparatus described in the embodiment corresponding to FIG. 10 may be deployed on the execution device 1200 to implement the data processing function in the embodiment corresponding to FIG. 10 . Specifically, the execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203, and a memory 1204 (wherein the number of processors 1203 in the execution device 1200 may be one or more, and one processor is taken as an example in FIG. 11 ) , wherein the processor 1203 may include an application processor 12031 and a communication processor 12032. In some embodiments of the present application, the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or otherwise.
存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。Memory 1204 may include read-only memory and random access memory, and provides instructions and data to processor 1203 . A portion of memory 1204 may also include non-volatile random access memory (NVRAM). The memory 1204 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
处理器1203控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线***耦合在一起,其中总线***除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线***。The processor 1203 controls the operation of the execution device. In a specific application, various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. However, for the sake of clarity, the various buses are referred to as bus systems in the figures.
上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器、以及视觉处理器(vision processing unit,VPU)、张量处理器(tensor processing unit,TPU)等适用于AI运算的处理器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1203可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application may be applied to the processor 1203 or implemented by the processor 1203 . The processor 1203 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1203 or an instruction in the form of software. The above-mentioned processor 1203 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, a vision processor (vision processing unit, VPU), a tensor processor (tensor processing) unit, TPU) and other processors suitable for AI operations, and may further include application specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components. The processor 1203 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the above method in combination with its hardware.
接收器1201可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1202可用于通过第一接口输出数字或字符信息;发射器1202还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1202还可以包括显示屏等显示设备。The receiver 1201 can be used to receive input numerical or character information, and to generate signal input related to performing device related settings and function control. The transmitter 1202 can be used to output digital or character information through the first interface; the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1202 can also include a display device such as a display screen .
执行设备可以获取到通过图6对应实施例中的模型训练方法训练得到的模型,并进行模型推理。The execution device may acquire the model trained by the model training method in the embodiment corresponding to FIG. 6 , and perform model inference.
本申请实施例还提供了一种训练设备,请参阅图12,图12是本申请实施例提供的训练设备一种结构示意图,具体的,训练设备1300由一个或多个服务器实现,训练设备1300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以***处理器(central processing units,CPU)1313(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1313可以设置为与存储介质1330通信,在训练设备1300上执行存储介质1330中的一系列指令操作。This embodiment of the present application also provides a training device. Please refer to FIG. 12. FIG. 12 is a schematic structural diagram of the training device provided by the embodiment of the present application. Specifically, the training device 1300 is implemented by one or more servers. The training device 1300 Can vary greatly depending on configuration or performance, and can include one or more central processing units (CPU) 1313 (eg, one or more processors) and memory 1332, one or more storage applications A storage medium 1330 (eg, one or more mass storage devices) for programs 1342 or data 1344. Among them, the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage. The program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Furthermore, the central processing unit 1313 may be configured to communicate with the storage medium 1330 to execute a series of instruction operations in the storage medium 1330 on the training device 1300 .
训练设备1300还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1358;或,一个或一个以上操作***1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358; or, one or more operating systems 1341, such as Windows Server™, Mac OS X™ , UnixTM, LinuxTM, FreeBSDTM and so on.
具体的,训练设备可以执行图6对应实施例中的模型训练方法。Specifically, the training device may execute the model training method in the embodiment corresponding to FIG. 6 .
图10中描述的模型训练装置1000可以为训练设备中的模块,训练设备中的处理器可以执行模型训练装置1000所执行的模型训练方法。The model training apparatus 1000 described in FIG. 10 may be a module in the training apparatus, and the processor in the training apparatus may execute the model training method executed by the model training apparatus 1000 .
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步 骤。Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps as performed by the aforementioned training device.
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的数据处理方法,或者,以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc. The processing unit can execute the computer executable instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体的,请参阅图13,图13为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 1400,NPU 1400作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1403,通过控制器1404控制运算电路1403提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 13. FIG. 13 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip may be represented as a neural network processor NPU 1400, and the NPU 1400 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1403, which is controlled by the controller 1404 to extract the matrix data in the memory and perform multiplication operations.
NPU 1400可以通过内部的各个器件之间的相互配合,来实现图6所描述的实施例中提供的模型训练方法,或者对训练得到的模型进行推理。The NPU 1400 can implement the model training method provided in the embodiment described in FIG. 6 through the cooperation between various internal devices, or perform reasoning on the model obtained by training.
其中,NPU 1400中的运算电路1403可以执行获取第一神经网络模型以及对所述第一神经网络模型进行模型训练的步骤。The operation circuit 1403 in the NPU 1400 can perform the steps of acquiring the first neural network model and performing model training on the first neural network model.
更具体的,在一些实现中,NPU 1400中的运算电路1403内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1403是二维脉动阵列。运算电路1403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1403是通用的矩阵处理器。More specifically, in some implementations, the arithmetic circuit 1403 in the NPU 1400 includes multiple processing units (Process Engine, PE). In some implementations, arithmetic circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1403 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1402中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1401中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1408中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1402 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1401 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1408 .
统一存储器1406用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1405,DMAC被搬运到权重存储器1402中。输入数据也通过DMAC被搬运到统一存储器1406中。Unified memory 1406 is used to store input data and output data. The weight data is directly passed through the storage unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402. Input data is also moved to unified memory 1406 via the DMAC.
BIU为Bus Interface Unit即,总线接口单元1410,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1409的交互。The BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1409.
总线接口单元1410(Bus Interface Unit,简称BIU),用于取指存储器1409从外部存储器获取指令,还用于存储单元访问控制器1405从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1410 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1409 to obtain instructions from the external memory, and also for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1406或将权重数 据搬运到权重存储器1402中或将输入数据数据搬运到输入存储器1401中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or the weight data to the weight memory 1402 or the input data to the input memory 1401.
向量计算单元1407包括多个运算处理单元,在需要的情况下,对运算电路1403的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1407 includes a plurality of operation processing units, and further processes the output of the operation circuit 1403 if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
在一些实现中,向量计算单元1407能将经处理的输出的向量存储到统一存储器1406。例如,向量计算单元1407可以将线性函数;或,非线性函数应用到运算电路1403的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1407生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1403的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit 1407 can store the processed output vectors to the unified memory 1406 . For example, the vector calculation unit 1407 may apply a linear function; or a nonlinear function to the output of the operation circuit 1403, such as performing linear interpolation on the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1407 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1403, eg, for use in subsequent layers in a neural network.
控制器1404连接的取指存储器(instruction fetch buffer)1409,用于存储控制器1404使用的指令;The instruction fetch buffer (instruction fetch buffer) 1409 connected to the controller 1404 is used to store the instructions used by the controller 1404;
统一存储器1406,输入存储器1401,权重存储器1402以及取指存储器1409均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1406, the input memory 1401, the weight memory 1402 and the instruction fetch memory 1409 are all On-Chip memories. External memory is private to the NPU hardware architecture.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。Wherein, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to enable a computer device (which may be a personal computer, training device, or network device, etc.) to execute the various embodiments of the application. method.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是 通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Claims (19)

  1. 一种模型训练方法,其特征在于,所述方法包括:A model training method, characterized in that the method comprises:
    获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理;Obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used to perform the input Nth batch of batch data according to the first weight. Convolution processing, normalizing the convolution processing result according to the BN coefficient, updating the BN coefficient based on the normalization processing result, and updating the first weight on the updated BN coefficient, the The first quantization operator is used to perform quantization processing and inverse quantization processing on the updated first weight to obtain a second weight, and the convolutional BN layer is also used to quantify the input Nth weight according to the second weight. +1 batch of batch data for convolution processing;
    对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。Perform model training on the first neural network model to obtain a trained first neural network model.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    获取第二神经网络模型,所述第二神经网络模型包括第一卷积层以及第一BN层;obtaining a second neural network model, the second neural network model includes a first convolution layer and a first BN layer;
    对所述第一卷积层以及所述第一BN层进行BN折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的所述卷积BN层。Perform BN folding processing on the first convolutional layer and the first BN layer to obtain the first neural network model, where the first neural network model includes the first convolutional layer and the first BN layer. The convolutional BN layer obtained after a BN layer is folded.
  3. 根据权利要求1所述的方法,其特征在于,所述卷积BN层为对卷积层和BN层进行折叠得到的,所述第一权重为根据所述BN系数与目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重为所述卷积层中包括的权重。The method according to claim 1, wherein the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, and the first weight is obtained according to the product of the BN coefficient and the target weight. The updated first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolution layer.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, wherein the method further comprises:
    将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素;将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重。Perform a product operation on the BN coefficient and the target weight to obtain a first target tensor, where the first target tensor includes M elements; the M elements included in the first target tensor are The N target elements with the largest absolute values are replaced with the largest element among the M-N elements other than the N target elements among the M elements, so as to obtain the first weight.
  5. 根据权利要求3或4所述的方法,其特征在于,所述第一神经网络模型中的所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到卷积处理结果,并将所述卷积处理结果与所述更新后的BN系数进行相除,以得到第二输出。The method according to claim 3 or 4, wherein the first convolutional layer in the first neural network model is used to pair the input data of the N+1th batch according to the second weight Convolution processing is performed to obtain a convolution processing result, and the convolution processing result is divided by the updated BN coefficient to obtain a second output.
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据所述第二量化因子对所述第四输出进行量化处理以及反量化处理。The method according to any one of claims 1 to 5, wherein the second neural network model further comprises a target activation layer, and the target activation layer in the second neural network model is used to perform a process on the input data. processing to obtain a third output, the first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data , to obtain a fourth output, and the second quantization operator is configured to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
  7. 根据权利要求6所述的方法,其特征在于,所述第三输出为第二目标张量,所述第二 目标张量包括X个元素,所述方法还包括:The method according to claim 6, wherein the third output is a second target tensor, and the second target tensor includes X elements, and the method further comprises:
    获取所述X个元素中绝对值最大的Y个目标元素;Obtain the Y target elements with the largest absolute value among the X elements;
    将所述第二目标张量中的所述Y个目标元素替换为所述X个元素中除所述Y个目标元素之外的X-Y个元素中的最大的元素,以得到所述第二量化因子。The Y target elements in the second target tensor are replaced with the largest element among the X-Y elements other than the Y target elements among the X elements, so as to obtain the second quantization factor.
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述第一量化算子用于根据第一量化因子对更新后的所述第一权重进行量化处理以及反量化处理,所述训练后的第一神经网络模型包括训练后的第一量化因子以及训练后的BN系数,所述方法还包括:The method according to any one of claims 1 to 7, wherein the first quantization operator is configured to perform quantization processing and inverse quantization processing on the updated first weight according to a first quantization factor, and the The first neural network model after training includes the first quantization factor after training and the BN coefficient after training, and the method further includes:
    根据所述训练后的第一量化因子以及所述训练后的BN系数,对所述第一神经网络模型进行量化,以得到第三神经网络模型,所述第三神经网络模型包括量化后的所述第一卷积层,所述量化后的第一卷积层用于根据量化后的权重对输入数据进行卷积处理,所述量化后的权重为根据所述第一量化因子以及所述训练后的BN系数得到的。According to the trained first quantization factor and the trained BN coefficient, the first neural network model is quantized to obtain a third neural network model, where the third neural network model includes all the quantized The first convolution layer, the first convolution layer after quantization is used to perform convolution processing on the input data according to the weight after quantization, and the weight after quantization is based on the first quantization factor and the training The latter BN coefficients are obtained.
  9. 一种模型训练装置,其特征在于,所述装置包括:A model training device, characterized in that the device comprises:
    获取模块,用于获取第一神经网络模型,其中,所述第一神经网络模型包括卷积BN层和第一量化算子,所述卷积BN层用于根据第一权重对输入的第N批batch的数据进行卷积处理,根据BN系数对卷积处理结果进行归一化处理,并基于归一化处理结果更新所述BN系数,对更新后的所述BN系数进行所述第一权重的更新,所述第一量化算子用于对更新后的所述第一权重进行量化处理以及反量化处理,以得到第二权重,所述卷积BN层还用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理;The obtaining module is used to obtain a first neural network model, wherein the first neural network model includes a convolutional BN layer and a first quantization operator, and the convolutional BN layer is used for the input Nth input according to the first weight. The data of the batch is subjected to convolution processing, the result of the convolution processing is normalized according to the BN coefficient, and the BN coefficient is updated based on the normalized processing result, and the first weight is applied to the updated BN coefficient. the update of Perform convolution processing on the input N+1 batch data;
    模型训练模块,用于对所述第一神经网络模型进行模型训练,以获取训练后的第一神经网络模型。A model training module, configured to perform model training on the first neural network model to obtain a trained first neural network model.
  10. 根据权利要求9所述的装置,其特征在于,所述获取模块,用于获取第二神经网络模型,所述第二神经网络模型包括第一卷积层以及第一BN层;对所述第一卷积层以及所述第一BN层进行BN折叠处理,以得到所述第一神经网络模型,所述第一神经网络模型包括对所述第一卷积层以及所述第一BN层进行折叠处理后得到的所述卷积BN层。The device according to claim 9, wherein the obtaining module is configured to obtain a second neural network model, wherein the second neural network model includes a first convolution layer and a first BN layer; A convolutional layer and the first BN layer are subjected to BN folding processing to obtain the first neural network model, and the first neural network model includes performing BN folding on the first convolutional layer and the first BN layer. The convolutional BN layer obtained after the folding process.
  11. 根据权利要求9所述的装置,其特征在于,所述卷积BN层为对卷积层和BN层进行折叠得到的,所述第一权重为根据所述BN系数与目标权重的乘积结果得到的,所述更新后的第一权重为将所述更新后的BN系数与所述目标权重进行乘积得到的,所述目标权重为所述卷积层中包括的权重。The device according to claim 9, wherein the convolutional BN layer is obtained by folding the convolutional layer and the BN layer, and the first weight is obtained according to the product of the BN coefficient and the target weight The updated first weight is obtained by multiplying the updated BN coefficient and the target weight, and the target weight is the weight included in the convolution layer.
  12. 根据权利要求11所述的装置,其特征在于,所述装置还包括:The apparatus of claim 11, wherein the apparatus further comprises:
    乘积运算模块,用于将所述BN系数与所述目标权重进行乘积运算,以得到第一目标张量,所述第一目标张量包括M个元素;a product operation module, configured to perform a product operation on the BN coefficient and the target weight to obtain a first target tensor, and the first target tensor includes M elements;
    元素替换模块,用于将所述第一目标张量包括的所述M个元素中绝对值最大的N个目标 元素替换为所述M个元素中除所述N个目标元素之外的M-N个元素中的最大的元素,以得到所述第一权重。An element replacement module, configured to replace N target elements with the largest absolute value among the M elements included in the first target tensor with M-N elements other than the N target elements among the M elements The largest of the elements to get the first weight.
  13. 根据权利要求11或12所述的装置,其特征在于,所述第一神经网络模型中的所述第一卷积层用于根据所述第二权重对输入的第N+1批batch的数据进行卷积处理,以得到卷积处理结果,并将所述卷积处理结果与所述更新后的BN系数进行相除,以得到第二输出。The apparatus according to claim 11 or 12, wherein the first convolutional layer in the first neural network model is used to pair the input data of the N+1th batch according to the second weight Convolution processing is performed to obtain a convolution processing result, and the convolution processing result is divided by the updated BN coefficient to obtain a second output.
  14. 根据权利要求9至13任一所述的装置,其特征在于,所述第二神经网络模型还包括目标激活层,所述第二神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第三输出,所述第一神经网络模型还包括所述目标激活层以及第二量化算子,所述第一神经网络模型中的所述目标激活层用于对输入数据进行处理,以得到第四输出,所述第二量化算子用于根据第二量化因子对所述第四输出进行量化处理以及反量化处理。The apparatus according to any one of claims 9 to 13, wherein the second neural network model further comprises a target activation layer, and the target activation layer in the second neural network model is used to perform a process on the input data. processing to obtain a third output, the first neural network model further includes the target activation layer and a second quantization operator, and the target activation layer in the first neural network model is used to process the input data , to obtain a fourth output, and the second quantization operator is used to perform quantization processing and inverse quantization processing on the fourth output according to the second quantization factor.
  15. 根据权利要求14所述的装置,其特征在于,所述第三输出为第二目标张量,所述第二目标张量包括X个元素,所述获取模块,用于获取所述X个元素中绝对值最大的Y个目标元素;The apparatus according to claim 14, wherein the third output is a second target tensor, the second target tensor includes X elements, and the obtaining module is configured to obtain the X elements The Y target elements with the largest absolute value in ;
    所述元素替换模块,用于将所述第二目标张量中的所述Y个目标元素替换为所述X个元素中除所述Y个目标元素之外的X-Y个元素中的最大的元素,以得到所述第二量化因子。The element replacement module is configured to replace the Y target elements in the second target tensor with the largest element in the X-Y elements other than the Y target elements among the X elements, to The second quantization factor is obtained.
  16. 根据权利要求9至15任一所述的装置,其特征在于,所述第一量化算子用于根据第一量化因子对更新后的所述第一权重进行量化处理以及反量化处理,所述训练后的第一神经网络模型包括训练后的第一量化因子以及训练后的BN系数,所述装置还包括:The apparatus according to any one of claims 9 to 15, wherein the first quantization operator is configured to perform quantization processing and inverse quantization processing on the updated first weight according to a first quantization factor, and the The trained first neural network model includes the trained first quantization factor and the trained BN coefficient, and the device further includes:
    量化模块,用于根据所述训练后的第一量化因子以及所述训练后的BN系数,对所述第一神经网络模型进行量化,以得到第三神经网络模型,所述第三神经网络模型包括量化后的所述第一卷积层,所述量化后的第一卷积层用于根据量化后的权重对输入数据进行卷积处理,所述量化后的权重为根据所述第一量化因子以及所述训练后的BN系数得到的。A quantization module, configured to quantify the first neural network model according to the trained first quantization factor and the trained BN coefficient to obtain a third neural network model, the third neural network model Including the quantized first convolution layer, the quantized first convolution layer is used to perform convolution processing on the input data according to the quantized weight, and the quantized weight is based on the first quantized weight. factor and the trained BN coefficients.
  17. 一种模型训练装置,其特征在于,所述装置包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为获取所述代码,并执行如权利要求1至8任一所述的方法。An apparatus for model training, characterized in that the apparatus comprises a memory and a processor; the memory stores codes, and the processor is configured to obtain the codes and execute the code according to any one of claims 1 to 8 Methods.
  18. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机实施权利要求1至8任一所述的方法。A computer storage medium, characterized in that the computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to implement any of claims 1 to 8. a described method.
  19. 一种计算机产品,包括代码,其特征在于,在所述代码被执行时用于实现如权利要求1至8任一所述的方法。A computer product comprising code, characterized in that, when the code is executed, it is used to implement the method according to any one of claims 1 to 8.
PCT/CN2021/133383 2020-11-30 2021-11-26 Model training method and apparatus WO2022111617A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011377406.3A CN114595799A (en) 2020-11-30 2020-11-30 Model training method and device
CN202011377406.3 2020-11-30

Publications (1)

Publication Number Publication Date
WO2022111617A1 true WO2022111617A1 (en) 2022-06-02

Family

ID=81753774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133383 WO2022111617A1 (en) 2020-11-30 2021-11-26 Model training method and apparatus

Country Status (2)

Country Link
CN (1) CN114595799A (en)
WO (1) WO2022111617A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409194A (en) * 2021-06-30 2021-09-17 上海汽车集团股份有限公司 Parking information acquisition method and device and parking method and device
CN115984802A (en) * 2023-03-08 2023-04-18 安徽蔚来智驾科技有限公司 Target detection method, computer-readable storage medium and driving equipment
CN116720563A (en) * 2022-09-19 2023-09-08 荣耀终端有限公司 Method and device for improving fixed-point neural network model precision and electronic equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496200B (en) * 2022-09-05 2023-09-22 中国科学院半导体研究所 Neural network quantization model training method, device and equipment
CN115879504B (en) * 2022-12-30 2023-08-29 珠海市欧冶半导体有限公司 Device and method for splitting and quantizing layerrnorm operator
CN117035123B (en) * 2023-10-09 2024-01-09 之江实验室 Node communication method, storage medium and device in parallel training

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059733A (en) * 2019-04-01 2019-07-26 苏州科达科技股份有限公司 The optimization and fast target detection method, device of convolutional neural networks
US20200134448A1 (en) * 2018-10-31 2020-04-30 Google Llc Quantizing neural networks with batch normalization
CN111144457A (en) * 2019-12-13 2020-05-12 北京达佳互联信息技术有限公司 Image processing method, device, equipment and storage medium
US20200151568A1 (en) * 2018-11-12 2020-05-14 Electronics And Telecommunications Research Institute Quantization method and device for weights of batch normalization layer
CN111753862A (en) * 2019-03-29 2020-10-09 北京地平线机器人技术研发有限公司 Method and device for training neural network model and image recognition method
CN111783974A (en) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 Model construction and image processing method and device, hardware platform and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134448A1 (en) * 2018-10-31 2020-04-30 Google Llc Quantizing neural networks with batch normalization
US20200151568A1 (en) * 2018-11-12 2020-05-14 Electronics And Telecommunications Research Institute Quantization method and device for weights of batch normalization layer
CN111753862A (en) * 2019-03-29 2020-10-09 北京地平线机器人技术研发有限公司 Method and device for training neural network model and image recognition method
CN110059733A (en) * 2019-04-01 2019-07-26 苏州科达科技股份有限公司 The optimization and fast target detection method, device of convolutional neural networks
CN111144457A (en) * 2019-12-13 2020-05-12 北京达佳互联信息技术有限公司 Image processing method, device, equipment and storage medium
CN111783974A (en) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 Model construction and image processing method and device, hardware platform and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409194A (en) * 2021-06-30 2021-09-17 上海汽车集团股份有限公司 Parking information acquisition method and device and parking method and device
CN113409194B (en) * 2021-06-30 2024-03-22 上海汽车集团股份有限公司 Parking information acquisition method and device, and parking method and device
CN116720563A (en) * 2022-09-19 2023-09-08 荣耀终端有限公司 Method and device for improving fixed-point neural network model precision and electronic equipment
CN116720563B (en) * 2022-09-19 2024-03-29 荣耀终端有限公司 Method and device for improving fixed-point neural network model precision and electronic equipment
CN115984802A (en) * 2023-03-08 2023-04-18 安徽蔚来智驾科技有限公司 Target detection method, computer-readable storage medium and driving equipment

Also Published As

Publication number Publication date
CN114595799A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
WO2022111617A1 (en) Model training method and apparatus
WO2022083536A1 (en) Neural network construction method and apparatus
WO2020221200A1 (en) Neural network construction method, image processing method and devices
WO2022042713A1 (en) Deep learning training method and apparatus for use in computing device
WO2021043112A1 (en) Image classification method and apparatus
WO2022001805A1 (en) Neural network distillation method and device
WO2022052601A1 (en) Neural network model training method, and image processing method and device
WO2021147325A1 (en) Object detection method and apparatus, and storage medium
CN111368972B (en) Convolutional layer quantization method and device
CN111401516A (en) Neural network channel parameter searching method and related equipment
WO2021244249A1 (en) Classifier training method, system and device, and data processing method, system and device
CN110222718B (en) Image processing method and device
CN113705769A (en) Neural network training method and device
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
WO2022228425A1 (en) Model training method and apparatus
WO2022012668A1 (en) Training set processing method and apparatus
CN112580720A (en) Model training method and device
WO2021129668A1 (en) Neural network training method and device
CN113592060A (en) Neural network optimization method and device
WO2022179586A1 (en) Model training method, and device associated therewith
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
WO2023274052A1 (en) Image classification method and related device thereof
CN113191241A (en) Model training method and related equipment
CN114359289A (en) Image processing method and related device
CN113011568A (en) Model training method, data processing method and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21897117

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21897117

Country of ref document: EP

Kind code of ref document: A1