CN113222097A

CN113222097A - Data processing method and related product

Info

Publication number: CN113222097A
Application number: CN202010068857.2A
Authority: CN
Inventors: 沈煜; 胡英俊; 蒋科; 其他发明人请求不公开姓名
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2021-08-06

Abstract

The embodiment of the application discloses the field of artificial intelligence, in particular to a data processing method and related products in the technical field of neural networks, wherein the method comprises the following steps: acquiring a first operation parameter required for executing first operation processing by using a neural network model; the first operation processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an L-th layer of the neural network model; performing fixed-point processing on the first operation parameter to obtain a second operation parameter; based on the second operation parameter, executing the first operation processing on the input data of the at least one network layer to obtain a processing result of the at least one network layer; the data processing device is realized by fixed points in the whole process of executing the prediction processing task by using the neural network model, and the expenditure of computing resources can be reduced.

Description

Data processing method and related product

Technical Field

The present application relates to the field of neural network technology, and in particular, to a data processing method and related products.

Background

With the development of the neural network technology, a neural network quantization technology appears, which mainly compresses the weight, the activation value (i.e., the output feature map or the input feature map of each layer) and the like in each network layer (such as a convolutional layer and a full-link layer) in the neural network, and reduces the bit width of the weight, the bit width of the activation value and the like, thereby achieving the purposes of compressing the data volume of the neural network model, reducing the calculation resource requirements of the neural network model in the prediction process and the like.

Low bit quantization is one of the popular areas of current model optimization research. However, in the process of performing the prediction processing task by using the low-bit quantized neural network model, many floating-point operations are usually required, and the overhead of computing resources and storage resources is large. Therefore, new data processing methods need to be researched to reduce resource overhead.

Disclosure of Invention

The embodiment of the application discloses a data processing method and a related product.

In a first aspect, an embodiment of the present application provides a data processing method, where the method includes: acquiring a first operation parameter required for executing first operation processing by using a neural network model; the first operation processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an Lth layer of the neural network model, the first operation parameter includes a quantization parameter of the first quantization processing, and L is an integer greater than 0; performing fixed-point processing on the first operation parameter to obtain a second operation parameter; and executing the first operation processing on the input data of the at least one network layer based on the second operation parameter to obtain a processing result of the at least one network layer. Optionally, the input data of the at least one network layer is a to-be-predicted image, or is obtained by processing the to-be-predicted image through at least one second network layer located before the at least one network layer in the neural network model.

The quantization process performed by the L-th layer may be a quantization process performed by the L-th layer on activation data output therefrom. Optionally, the first arithmetic processing is obtained by combining the arithmetic processing executed by the L-th layer and the quantization processing executed by the L-th layer. Floating point numbers are not included in the second operation parameters. The lth layer may be any network layer other than the last layer in the neural network model. It is to be understood that the data processing apparatus may perform the arithmetic processing and the quantization processing performed by each layer in the neural network model in a similar manner. In some embodiments, the data processing apparatus can perform the whole-flow fixed-point processing without performing floating-point operations in the whole flow of performing the prediction processing task by using the neural network model.

In the embodiment of the application, the data processing device can reduce floating point number operation in the process of executing the prediction processing task by using the neural network model, and the expenditure of computing resources and storage resources is low.

In an optional implementation manner, the converting the first operation parameter into a fixed-point number to obtain a second operation parameter includes: expanding each floating point number included in the first operation parameter by K times to obtain an expanded numerical value, wherein K is an integer greater than 1; and converting the expansion numerical value into a fixed point number to obtain the second operation parameter.

Optionally, the first operation processing is linear operation processing, a calculation formula corresponding to the first operation processing is a fraction, and a numerator and a denominator of the fraction both include floating point numbers. It can be understood that each floating point number in the calculation formula corresponding to the first operation processing is expanded by K times, and the numerical value calculated by the calculation formula is unchanged. Since each floating point number is expanded by K times before the floating point number in the first operation parameter is converted into the fixed point number, the precision of the first operation processing can be ensured.

In the implementation mode, the floating point number in the first operation parameter is expanded by K times, and then the floating point number expanded by K times is converted into a fixed point number, so that the precision of the first operation processing is ensured.

In an alternative implementation, the first operation process is obtained by combining a network model process and the first quantization process performed by at least two adjacent network layers including the lth layer.

In an alternative implementation, the quantization parameter is a parameter determined based on a quantization maximum and a quantization minimum; the quantization maximum value and the quantization minimum value are obtained by clustering a plurality of numerical values included in the data to be quantized of the L-th layer.

In the implementation mode, the distribution situation of the data to be quantized can be more accurately expressed by using limited quantization steps, and the quantization precision is further improved.

In an optional implementation manner, before the obtaining of the first operation parameter required for performing the first operation processing by using the neural network model, the method further includes: clustering a plurality of numerical values included in the data to be quantized of the L-th layer to obtain a clustering result; obtaining the quantization maximum value and the quantization minimum value based on the clustering result; determining the quantization parameter based on the quantized maximum value and the quantized minimum value, wherein the quantization parameter is used to quantize the values in the activation data into integers represented by M bits.

In the implementation mode, a quantization maximum value and a quantization minimum value are obtained based on a clustering result, and then quantization parameters are determined; the quantization parameter can be determined quickly and accurately.

In an optional implementation manner, the obtaining a quantized maximum value and a quantized minimum value based on the clustering result includes: taking a maximum value of at least two clustering centers included in the clustering result as the quantization maximum value, and taking a minimum value of the at least two clustering centers as the quantization minimum value.

In the implementation mode, the quantization maximum value and the quantization minimum value can be accurately and quickly obtained according to at least two clustering centers included in the clustering result.

In an optional implementation manner, the determining the quantization parameter based on the quantization maximum and the quantization minimum includes: taking the quantization minimum value as a first term and the quantization maximum value as a last term to obtain a first arithmetic progression comprising N numerical values, wherein N is an integer greater than 1; using a common factor of the N values included in the first arithmetic progression as the quantization parameter.

In this implementation, a common factor of respective numerical values included in the first arithmetic progression is used as a quantization parameter so as to quantize the activation data of the L-th layer into an integer using the quantization parameter.

In an optional implementation, the method further includes: and determining the bit number M corresponding to the first quantization processing based on the distribution dispersion of the multiple numerical values included in the data to be quantized.

In the implementation mode, the bit number M corresponding to quantization processing executed by the L-th layer is determined based on the distribution dispersion of a plurality of numerical values included in the data to be quantized; the bit number occupied by the quantized data is reduced, and further the computing resources are saved.

In an optional implementation manner, before the obtaining of the first operation parameter required for performing the first operation processing by using the neural network model, the method further includes: determining a first reference value based on weight data of at least one network layer of the neural network model; determining a plurality of value intervals based on the first reference value, wherein each value interval in the value intervals corresponds to an integer which can be represented by F bits, and F is an integer which is greater than 1 and less than 8; and performing second quantization processing on the numerical values in the weight data based on the plurality of numerical value intervals to obtain an integer represented by F bits.

In an optional implementation manner, before determining the plurality of value intervals based on the first reference value, the method further includes: determining the number of bits F of the second quantization processing based on distribution dispersion of a plurality of values included in the weight data; the determining a plurality of value intervals based on the first reference value comprises: determining the plurality of value intervals based on the first reference value and the number of bits F.

In this implementation, the number of bits F for quantization processing is determined based on the distribution dispersion of the plurality of values included in the weight data; the bit number occupied by the quantized weight data is reduced, and further the computing resources are saved.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including: an acquisition unit configured to acquire a first operation parameter required to execute a first operation process using a neural network model; the first operation processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an Lth layer of the neural network model, the first operation parameter includes a quantization parameter of the first quantization processing, and L is an integer greater than 0; the conversion unit is used for carrying out fixed-point processing on the first operation parameter to obtain a second operation parameter; and the processing unit is used for executing the first operation processing on the input data of the at least one network layer based on the second operation parameter to obtain a processing result of the at least one network layer. Optionally, the input data of the at least one network layer is a to-be-predicted image, or is obtained by processing the to-be-predicted image through at least one second network layer located before the at least one network layer in the neural network model.

In an optional implementation manner, the conversion unit is specifically configured to expand each floating point number included in the first operation parameter by K times to obtain an expanded numerical value, where K is an integer greater than 1; and converting the expansion numerical value into a fixed point number to obtain the second operation parameter.

In an alternative implementation, the quantization parameter is a parameter determined based on a quantization maximum and a quantization minimum; the quantization maximum value and the quantization minimum value are obtained by clustering a plurality of numerical values included in data to be quantized output by the L-th layer of the neural network model.

In an optional implementation, the apparatus further comprises: the clustering unit is used for clustering a plurality of numerical values included in the data to be quantized of the L-th layer to obtain a clustering result; a first determining unit, configured to obtain the maximum quantization value and the minimum quantization value based on the clustering result; the first determining unit is further configured to determine the quantization parameter based on the quantization maximum value and the quantization minimum value, where the quantization parameter is used to quantize a value in the activation data to an integer represented by M bits.

In an optional implementation manner, the first determining unit is specifically configured to use a maximum value of at least two clustering centers included in the clustering result as the quantization maximum value, and use a minimum value of the at least two clustering centers as the quantization minimum value.

In an optional implementation manner, the first determining unit is specifically configured to obtain a first arithmetic difference sequence including N numerical values by using the quantized minimum value as a first term and using the quantized maximum value as a last term, where N is an integer greater than 1; using a common factor of the N values included in the first arithmetic progression as the quantization parameter.

In an optional implementation manner, the first determining unit is further configured to determine, based on distribution dispersion of a plurality of numerical values included in the data to be quantized, a number of bits M corresponding to the first quantization processing.

In an optional implementation, the apparatus further comprises: a second determination unit, further configured to determine a first reference value based on weight data of at least one network layer of the neural network model; determining a plurality of value intervals based on the first reference value, wherein each value interval in the value intervals corresponds to an integer which can be represented by F bits, and F is an integer which is greater than 1 and less than 8; and the quantization unit is used for carrying out second quantization processing on the numerical values in the weight data based on the numerical value intervals to obtain an integer represented by F bits.

In an optional implementation manner, the second determining unit is further configured to determine the number F of bits for the second quantization processing based on distribution dispersion of multiple values included in the weight data; the second determining unit is specifically configured to determine the plurality of numerical value intervals based on the first reference value and the number of bits F.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the first aspect and any one of the alternative implementations as described above when the program is executed.

In a fourth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to perform the method according to the first aspect and any optional implementation manner.

In a fifth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program includes program instructions, and when the program instructions are executed by a processor, the processor is caused to execute the method of the first aspect and any optional implementation manner.

In a sixth aspect, the present application provides a computer program product, which includes program instructions, and when executed by a processor, causes the processor to execute the method of the first aspect and any optional implementation manner.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of another data processing method provided by an embodiment of the present application;

fig. 3 is a flowchart of another data processing method provided in the embodiment of the present application;

FIG. 4 is a flow chart of another data processing method provided in the embodiments of the present application;

FIG. 5 is a flow chart of another data processing method provided in the embodiments of the present application;

FIG. 6 is a flow chart of another data processing method provided in the embodiments of the present application;

FIG. 7A is a diagram illustrating a quantized activation data according to an embodiment of the present application;

FIG. 7B is a diagram illustrating an inverse quantization of activation data according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The terms "first," "second," and "third," etc. in the description and claims of the present application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a list of steps or elements. A method, system, article, or apparatus is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, system, article, or apparatus. "and/or" is used to indicate the selection of one or both between two objects to which it is connected. For example "A and/or B" means A, B or A + B.

In the process of executing the prediction processing task by using the neural network model after low bit quantization, more floating point number operations are generally required, and the overhead of computing resources and storage resources is large. The embodiment of the application provides a data processing method which can reduce or even eliminate floating point number operation and is high in prediction precision. The data processing method provided by the embodiment of the application can be applied to image text recognition scenes, video text recognition scenes, target detection scenes and the like. The following respectively briefly introduces the application of the data processing method provided by the embodiment of the present application in an image text recognition scene, a video text recognition scene, and a target detection scene.

Image text recognition 1: the terminal device collects an image including one or more characters, performs text recognition on the image, and displays the recognized characters. For example, a user takes an image of a sign using a mobile phone, and the mobile phone performs text recognition on the image and displays text on the sign. For another example, the user uses a mobile phone to capture an image containing a segment of english, and the mobile phone performs text recognition on the image and displays the chinese character obtained by translating the segment of english.

Image text recognition 2: the terminal equipment sends the acquired image to a server; the server performs text recognition on the image and sends a text recognition result obtained by recognition to the terminal equipment; and the terminal equipment receives and displays the text recognition result. For example, a monitoring device on a road acquires an image including a license plate number of a vehicle and sends the image to a server, and the server identifies the license plate number in the image. For another example, the user uses a mobile phone to take an image of a signboard and sends the image to the server; the server performs text recognition on the image to obtain a text recognition result, and sends the text recognition result to the terminal equipment; the terminal device displays the text recognition result.

Video text recognition 1: the terminal equipment collects a section of video and respectively performs text recognition on each frame of image in the video. For example, a user uses a mobile phone to shoot a video, wherein a plurality of frames of images in the video comprise at least one character; and the mobile phone respectively performs text recognition on each frame of image in the video to obtain and display a text recognition result.

Video text recognition 2: the method comprises the steps that a piece of video is collected by terminal equipment and sent to a server; and the server respectively performs text recognition on each frame of image in the video to obtain a text recognition result. For example, a monitoring device on a road acquires a section of video, wherein at least one frame of image in the section of video comprises a license plate number; the monitoring equipment sends the video to a server; the server performs text recognition on each frame of image in the video to obtain at least one license plate number.

A target detection scene: the data processing device extracts the region where the target object is located from the image to be processed. For example, the automatic driving device extracts the area where the pedestrian is located from the acquired image.

In the above scenario, the data processing apparatus may reduce floating point number operations during the execution of the prediction processing task, thereby saving computational resources and ensuring prediction accuracy.

Referring to fig. 1, fig. 1 is a flowchart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:

101. the data processing apparatus acquires a first operation parameter required to execute a first operation process using the neural network model.

The first arithmetic processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an lth layer of the neural network model, the first arithmetic parameter includes a quantization parameter of the first quantization processing, and L is an integer greater than 0. The network model processing may include at least one of convolution processing, pooling processing, batch normalization processing, and the like, and the present application is not limited thereto. The data processing device can be a mobile phone, a tablet personal computer, wearable equipment, a notebook computer, a desktop computer and other terminal equipment, and can also be a server. The first quantization process may be a quantization process performed by the data processing apparatus on activation data obtained by a network model process performed by the at least one network layer including the lth layer of the neural network model. The activation value (i.e., activation data) of one network layer in the neural network model may refer to the output data (i.e., output feature map) of the network layer, and the quantized output data may be used as the input data (i.e., input feature map) of the next network layer. Optionally, the first arithmetic processing is obtained by combining the network model processing executed by the L-th layer and the first quantization processing. Optionally, the first arithmetic processing is obtained by combining network model processing executed by at least two adjacent network layers including the L-th layer and the first quantization processing. Optionally, the second operation parameter does not include a floating point number. The L-th layer may be any network layer except the last layer in the neural network model. It should be understood that the data processing apparatus may perform the network model processing and the quantization processing performed by each layer in the neural network model in a similar manner. In some embodiments, the data processing apparatus can perform the whole-flow fixed-point processing without performing floating-point operations in the whole flow of performing the prediction processing task by using the neural network model.

The neural network model may be a deep neural network model, such as a convolutional neural network, a cyclic neural network, a Long Short-Term Memory (LSTM), and the like, which is not limited in this application. In some embodiments, the neural network model is derived by training. For example, the neural network model is a prediction model trained using training data. The weights in the neural network model may be unquantized weights or at least a portion of the weights in the neural network model may be quantified. For example, the weights in the neural network model are each quantized to integers represented by less than 8 bits. In some embodiments, the data processing apparatus may obtain a trained neural network model; quantifying the weight in the neural network model to obtain a quantified neural network model; the method flow in fig. 1 is performed based on the quantized neural network model.

102. And performing fixed-point processing on the first operation parameter to obtain a second operation parameter.

Optionally, an implementation manner of performing fixed-point processing on the first operation parameter to obtain the second operation parameter is as follows: expanding each floating point number included in the first operation parameter by K times to obtain an expanded numerical value, wherein K is an integer greater than 1; and converting the expansion numerical value into a fixed point number to obtain the second operation parameter. Optionally, the step of converting the expansion value into a fixed point number to obtain the second operation parameter may be: the above-mentioned expanded numerical values are converted into integers by rounding. For example, x is a floating point number in the first operation parameter, and x is multiplied by K to obtain x × K; and converting x by K into fixed points to obtain round (x by K). The round function is to round a number to a specified number of bits, and the function formula is: round (number to be rounded, number of remaining digits). For example, round (3.1415926,2) ═ 3.14 and round (3.1415926,3) ═ 3.142. In some embodiments, the round function may be rounding floating point numbers to integers. For example, round (3.1415926,0) ═ 3 and round (7.76) ═ 8. It will be appreciated that the data processing apparatus may convert each floating-point number included in the first operation parameter into a fixed-point number in a similar manner.

103. And executing the first operation processing on the input data of the at least one network layer based on the second operation parameter to obtain a processing result of the at least one network layer.

The input data of the at least one network layer is a to-be-predicted image, or is obtained by processing the to-be-predicted image through a network layer located before the at least one network layer in the neural network model. For example, the first operation process is: qout ═ ceil [ (x a + y B-Q/2)/Q ]; the first operation parameter comprises x, y, Q, A and B, wherein x, y and Q are floating point numbers; the second operation parameters comprise round (x × K), round (y × K) round (Q/2 × K), round (Q × K), a and B; based on the second operation parameter, the first operation processing is executed by: out ═ ceil [ (round (x × K) × a + round (y × K) -round (Q/2 × K))/round (Q × K) ]. ceil is the name of a function used to return the smallest integer greater than or equal to a specified expression. ceil (x) is the smallest integer greater than or equal to x.

The foregoing embodiments do not describe the implementation of step 102 and step 103 in detail, and the application of step 102 and step 103 in some embodiments is described below.

Example one

The network model processing performed by the L-th layer (Eltwise layer) in the neural network model is: out is x A + y B, wherein x and y are floating point numbers, and A and B are two input feature maps of the Eltwise layer; the quantization process performed by this L-th layer is as follows: qout ═ ceil [ (Qin-Q/2)/Q ], where Qin denotes activation data, i.e., quantized input, for the L-th layer output; qout denotes quantized activation data, i.e., quantized output, and Q denotes a quantization parameter; the first operation processing executed by the L-th layer is as follows: qout equals ceil [ (x A + y B-Q/2)/Q ]. For example, the data processing apparatus converts a first operation parameter required for the L-th layer to perform a first operation into a fixed-point number to obtain a second operation parameter, and performs the first operation based on the second operation parameter according to the following formula: out ═ ceil [ (round (x × K) × a + round (y × K) -round (Q/2 × K))/round (Q × K) ].

Example two

The network model processing (convolution operation) performed by the L-th layer (convolution layer) in the neural network model is: conv _ out is W × a, where W is a weight matrix (weight) of the L-th layer, and a is an input feature map (i.e., activation) of the L-th layer. Assume that the current quantization adopts 2W4A, that is, the bit width of each element in the weight matrix W after quantization is 2 bits, and the bit width of each element in the input feature map a after quantization is 4 bits. W { -alpha × B, a { -Qparam { -3, -1, 1, 3}, M { [ 0-shift, 1-shift, …, 15-shift }, alpha (floating point number) represents a common factor corresponding to the weight matrix, B represents a quantized weight matrix, Qparam (floating point number) represents a common factor corresponding to the input feature map, and M represents a quantized input feature map. Since Conv _ out is W × a ═ alpha × Qparam (B × M) ═ scale (B × M), the convolution operation can extract the common factor scale (i.e., alpha × Qparam), and the fixed-point convolution operation is performed by B and M during the hardware operation. The calculation formula of the data processing device for carrying out quantization processing on the activation data output by the L-th layer is as follows: qout ═ ceil [ (Qin-Q/2)/Q ], and Q denotes a quantization parameter of the L-th layer. It should be understood that the calculation formula of the first arithmetic processing performed by this L-th layer is expressed as follows: qout — ceil [ (scale M) -Q/2)/Q ]. For example, the data processing apparatus converts all floating point numbers in the first operation parameter required by the L-th layer to fixed point numbers to obtain a second operation parameter, and executes a calculation formula corresponding to the first operation processing based on the second operation parameter as follows: qout ═ ceil [ (round (scale × K) × (B × M) -round (Q/2 × K))/round (Q/2 × K) ].

EXAMPLE III

The network model processing performed by the L-th layer (convolutional layer) in the neural network model is: conv _ out is W × a, where W is a weight matrix (weight) of the L-th layer, a denotes an input feature map of the L-th layer, W is alpha B, a is Qparam M, alpha (floating point) denotes a common factor corresponding to the weight matrix, B denotes a quantized weight matrix, Qparam (floating point) denotes a common factor corresponding to the input feature map, and M denotes a quantized input feature map. Conv _ out is W × a ═ alpha × Qparam (B × M) ═ scale (B × M). Assuming that a part of the network structure in the neural network model is a convolutional layer (i.e., L-th layer, conv) - > batch normalization layer (i.e., BN layer) - > activation layer (i.e., relu layer), the network model process obtained by combining the operation process performed by the convolutional layer, the operation (linear operation) process performed by the batch normalization layer, and the operation (linear operation) process performed by the activation layer is: BNout is scale1 a + bias, scale1 is the floating point number after merging (including alpha, Qparam, etc.), and a is the convolution layer output result. Next, the BNout is quantized. The quantization function may be: qout ═ ceil [ (Qin-Q/2)/Q ], Qin is BNout, Q/2 can be merged into bias, and finally the first operation processing executed by conv- > BN- > relu can be simplified as: qout ceil [ (scale1 a + bias)/Q ], where scale1, bias and Q are floating point numbers. The floating point number elimination is performed next. For example, the data processing apparatus converts all floating point numbers in the first operation parameter required by the L-th layer to fixed point numbers to obtain a second operation parameter, and executes a calculation formula corresponding to the first operation processing based on the second operation parameter as follows: qout ═ ceil [ (round (N × scale) × a + round (N × bias))/round (N × Q) ], ceil [ (scale _ new × a + bias _ new)/Q _ N ew ], scale _ new, bias _ new, and Q _ new are fixed-point numbers.

When the data processing device executes the prediction processing task by using the neural network model, floating point operations such as batch normalization, scaling operation and the like are performed. In addition, the quantization process also introduces floating point numbers, such as quantization parameter Q. In consideration of the complexity, the area, the computing resources and other factors of the implementation of the neural network hardware accelerator, the method provided by the embodiment of the application can reduce or even eliminate all floating point parameters and floating point operations in the neural network model, and can improve the computing efficiency.

The following takes the convolutional layer in the neural network model as an example, and further describes how to implement the full-flow fixed-point processing when the neural network model is used to execute the prediction processing task.

Fig. 2 is a flowchart of another data processing method according to an embodiment of the present application. As shown in fig. 2, the method may include:

201. and the data processing device quantizes the input characteristic diagram to be processed of the L-th layer of the neural network model to obtain a parameter 2 and a matrix 2.

For example, a represents the input feature map (i.e., activation data) to be processed in the L-th layer, where a is Qparam M, Qparam (floating point number) represents the common factor (corresponding to parameter 2) corresponding to the input feature map (one matrix), and M represents the quantized input feature map, i.e., matrix 2. In practical applications, the data processing apparatus has completed the quantification of the weights of the neural network model before executing the method flow in fig. 2. In fig. 2, parameter 1 is a common factor corresponding to the weighting matrix of the L-th layer, and matrix 1 is the weighting matrix after quantization of the L-th layer. For example, W is a weight matrix before quantization in the L-th layer, W ═ alpha × B, alpha (floating point number) represents a common factor corresponding to the quantized weight matrix B, and B represents the quantized weight matrix.

202. And calculating the product of the parameter 1 and the parameter 2 to obtain the parameter 3.

203. And calculating the product of the matrix 1 and the matrix 2 to obtain a matrix 3.

Step 202 and step 203 may be understood as steps where the L-th layer (i.e., the convolutional layer) performs a convolution operation.

204. And executing target operation processing to obtain a processing result of the L-th layer based on the parameter 3 and the matrix 3.

Optionally, the calculation formula corresponding to the target operation processing is as follows: qout — ceil [ (round (scale × K) × (B × M) -round (Q/2 × K))/round (Q/2 × K) ], where scale denotes parameter 3, (B × M) denotes matrix 3, and Q denotes a quantization parameter of the L-th layer. It should be appreciated that when the data processing apparatus performs the prediction processing task using the neural network model, each layer may perform a method flow similar to that in fig. 2, which may reduce the overhead of computing resources and storage resources.

The implementation of the data processing method in the foregoing embodiment requires the use of quantization parameters. In practical applications, before implementing the method in the foregoing embodiments by using a neural network model, the data processing apparatus needs to determine quantization parameters required by each layer in the neural network model to perform quantization processing, and store the quantization parameters. The following takes the L-th layer of the neural network model as an example, and how to determine quantization parameters required by the network layer in the neural network model to perform quantization processing is described.

In some embodiments, the quantization parameter is a parameter determined based on a quantization maximum value and a quantization minimum value; the maximum quantization value and the minimum quantization value are obtained by clustering a plurality of numerical values included in data to be quantized output from the L-th layer of the neural network model.

Fig. 3 is a flowchart of another data processing method according to an embodiment of the present application. The data processing device executes the method flow in fig. 3 to obtain the quantization parameter required by the quantization processing executed at the L-th layer of the neural network model, and stores the quantization parameter.

As shown in fig. 3, the method may include:

301. the data processing device acquires data to be quantized output by the L-th layer of the neural network model.

The data to be quantized includes at least one value represented by M or more bits, L is an integer greater than 0, and M is an integer greater than 1. The above M may be 6, 8, 12, 16, etc. In practical applications, the data processing apparatus may execute the method flow of fig. 3 to determine and store quantization parameters required for each layer of the neural network model to perform quantization processing before executing the method flow of fig. 1 to implement the prediction processing task. Optionally, the data to be quantized is activation data output by an L-th layer of the neural network model when the data processing apparatus inputs a training sample to the neural network model for prediction processing.

302. And clustering a plurality of numerical values included in the data to be quantized to obtain a clustering result.

303. And taking the maximum value of at least two clustering centers included in the clustering result as a quantization maximum value, and taking the minimum value of the at least two clustering centers as a quantization minimum value.

304. And determining a first arithmetic progression comprising N values, wherein the first arithmetic progression comprises the minimum value as a first term and the maximum value as a last term.

N is the power M of 2. In some embodiments, the data processing apparatus, prior to performing step 304, may perform the following operations: and acquiring the number M of bits to be adopted for quantizing the numerical value included in the data to be quantized. The number of bits M may be preset, and for example, the data processing apparatus or other apparatus may store in advance the number of bits to be used for quantizing the value included in the data to be quantized. Accordingly, when the determination of the quantization parameter is required, the stored one is retrieved from the memory. In other embodiments, the number M of bits may also be determined based on the data to be quantized, for example, the number M of bits to be used for quantizing a value included in the data to be quantized is determined based on a distribution dispersion of a plurality of values included in the data to be quantized. Alternatively, the number of bits M may be determined based on other distribution characteristics or parameters of the data to be quantized.

305. And taking the common factor of each item in the first arithmetic progression as the quantization parameter of the L-th layer of the neural network model, and storing the quantization parameter.

For example, the quantization maximum (i.e., max) is 1.3, the quantization minimum (min) is-0.2, M is 4; determining a first arithmetic series { -0.2, -0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8.0.9,1.0,1.1,1.2,1.3} having-0.2 as the first term and 1.3 as the last term and comprising 16 values; the common factor 0.1 of each term in the first arithmetic progression is extracted as a quantization parameter.

It is understood that steps 301 to 305 are a method flow for the data processing apparatus to determine the quantization parameter of the L-th layer in the neural network model. In some embodiments, the quantization parameters of each network layer in the neural network model are different, and the data processing apparatus may use a similar process flow to determine the quantization parameters of each layer in the neural network model and store the quantization parameters of each layer, so as to quantize the feature map (i.e., activation data) output by each layer by using the quantization parameters of each layer when performing the prediction processing task by using the neural network model.

The foregoing embodiments do not describe an implementation of determining the number M of bits to be used for quantizing the value included in the above data to be quantized, and are introduced below.

Exemplarily, the data processing apparatus determines the number of bits M corresponding to the quantization process executed by the lth layer based on the variance of each value included in the data to be quantized; the number of bits M corresponding to the quantization performed by the L-th layer is positively correlated with the variance of each value included in the data to be quantized. For example, the data processing apparatus may compare the variance with one or more thresholds to determine a bit number M, and when the variance of each value included in the data to be quantized is smaller than a first threshold, determine that the bit number M corresponding to the quantization processing performed by the L-th layer is 2; and when the variance of each numerical value included in the data to be quantized is not less than the first threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 4. The first threshold may be 0.016, 0.2, 1.2, 10, 100, etc., and the present application is not limited thereto. For another example, a plurality of segment intervals may be set, and when the variance of each value included in the data to be quantized is smaller than the second threshold and larger than the third threshold, the data processing apparatus determines that the number of bits M corresponding to the quantization processing performed by the L-th layer is 4; when the variance of each numerical value included in the data to be quantized is not less than the second threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 6; and when the variance of each numerical value included in the data to be quantized is not greater than the third threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 2. The third threshold may be 0.016, 0.2, 1.2, etc., and the second threshold may be 10, 25, etc., which is not limited in the present application.

Illustratively, the data processing apparatus determines, based on the range of each value included in the data to be quantized, the number of bits M corresponding to quantization processing performed by the lth layer; the number of bits M corresponding to the quantization process performed in the L-th layer is positively correlated with the range of each value included in the data to be quantized. The range of each value included in the data to be quantized is a difference value between a maximum value and a minimum value included in the data to be quantized. For example, when the range of each value included in the data to be quantized is smaller than the fourth threshold, the data processing apparatus determines that the number of bits M corresponding to the quantization processing executed by the L-th layer is 3; and when the variance of each numerical value included in the data to be quantized is not less than the fourth threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 4. The fourth threshold may be 1.2, 10, 100, etc., and the application is not limited thereto. For another example, when the range of each value included in the data to be quantized is smaller than the fifth threshold and larger than the sixth threshold, the data processing apparatus determines that the number of bits M corresponding to the quantization processing performed by the L-th layer is 4; when the range of each numerical value included in the data to be quantized is not less than the fifth threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 6; and when the range of each numerical value included in the data to be quantized is not greater than the sixth threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 2. The sixth threshold may be 0.016, 0.2, 1.2, etc., and the fifth threshold may be 10, 25, etc., which is not limited in the present application.

Exemplarily, the data processing apparatus determines the number of bits M corresponding to the quantization process executed by the lth layer based on the standard deviation of each value included in the data to be quantized; the number of bits M corresponding to the quantization performed by the L-th layer is positively correlated with the standard deviation of each value included in the data to be quantized. For example, when the standard deviation of each numerical value included in the data to be quantized is smaller than the seventh threshold, the data processing apparatus determines that the number of bits M corresponding to the quantization processing executed by the L-th layer is 3; and when the standard deviation of each numerical value included in the data to be quantized is not less than the seventh threshold, determining that the bit number M corresponding to the quantization processing executed by the L-th layer is 4. The seventh threshold may be 0.04, 0.2, 1.2, 10, etc., and the present application is not limited thereto.

In this implementation, the data processing apparatus determines, according to a distribution of values included in the data to be quantized, a number of bits M corresponding to the quantization process executed by the L-th layer; the bits used for quantization can be reduced, thereby reducing the computing resources consumed for prediction processing by the neural network model.

In some embodiments, the weights in the neural network model are quantized weights. In practical applications, the data processing apparatus may also quantize the weights in the trained neural network model to compress the neural network model. The following describes a scheme for quantizing the weights in the trained neural network model by the data processing device.

Fig. 4 is a flowchart of another data processing method according to an embodiment of the present application. The data processing apparatus executing the method flow of fig. 4 may quantify the weights in the neural network model. As shown in fig. 4, the method may include:

401. based on the weight data of the neural network model, a first reference value is determined.

Alternatively, the determining the first reference value based on the weight data of the neural network model may be calculating an average value of values in the weight data, and using the average value as the first reference value. The weight data includes weights of at least one network layer of the neural network model. Illustratively, the weight data is a weight of a L-th layer of the neural network model. Illustratively, the weight data includes weights of at least two network layers of the neural network model. In some embodiments, the data processing apparatus may quantize the weights of one or more network layers at a time; retraining (i.e., fine-tuning) the neural network model while keeping the quantized weights unchanged; then, continuing to quantize the weights of the previously unquantized one or more network layers; and so on to quantify all weights in the neural network model. It will be appreciated that fig. 4 is an example of quantifying partial weights in a neural network model, in which the data processing apparatus may quantify network layers in a similar manner.

402. And determining the R numerical value interval based on the first reference value.

Each value interval in the R value intervals corresponds to an integer which can be represented by F bits, where F is an integer greater than 1 and less than 8, and R is an integer greater than 1. Optionally, before performing step 402, the data processing apparatus may perform the following operations: determining the bit number F of quantization processing based on the distribution dispersion of a plurality of numerical values included in the weight data; the number of bits is positively correlated with the distribution dispersion. For example, if the distribution dispersion of the plurality of values included in the weight data is not greater than the first dispersion threshold, the number F of bits for quantization processing is determined to be 2; otherwise, the bit number F of the quantization process is determined to be 4. Based on the first reference value, determining the R number interval may be: determining the R number value interval based on the first reference value and the bit number F; r is the F power of 2.

403. And quantizing the numerical values in the weight data by adopting F bits based on the R numerical value interval.

For example, alpha1 is a first reference value, W is any weight in the weight data, and based on the first reference value, the 4 value intervals are determined as follows:

when W > ═ alpha1, W ≈ alpha1 ≈ 3/2;

when 0< W < alpha1, W ≈ alpha 1/2;

when W < ═ alpha1, W is approximately equal to alpha1 (-3)/2;

when-alpha 1< W < ═ 0, W ≈ alpha1 (-1)/2.

Because alpha1 is a common factor, the quantized value after extracting alpha1/2 is 3, 1, -1, -3, so 2 bits can be used for expression, and-3, -1, 1, 3 can be mapped into 0,1, 2, 3 through a mapping relation during hardware storage. That is, weights in the weight data that are greater than alpha1 are each quantized to be the product of alpha1/2 and 3; weights greater than 0 and less than alpha1 in the weight data are each quantized as the product of alpha1/2 and 1; weights in the weight data that are less than or equal to-alpha 1 are each quantized to be the product of alpha1/2 and-3; weights greater than-alpha 1 and less than 0 in the weight data are each quantized to be the product of alpha1/2 and-1. The above example is a 2-bit quantization method, and the embodiment of the present application also derives 3-bit, 4-bit, and 5-bit equivalent quantization methods. The main idea is to divide the quantization step by using a dichotomy. For 3-bit quantization, the quantization steps are re-expanded compared to 2-bit, when the weight is greater than 0, replacing alpha1/2 and alpha1 3/2 with alpha1/4, alpha1 3/4, alpha1 5/4 and alpha1 7/4, and when the weight is less than 0, replacing-alpha 1/2 and-alpha 1 3/2 with-alpha 1/4, -alpha1 3/4, -alpha1 5/4 and-alpha 1 7/4. Similarly, 4bit and higher bit weight quantization also uses dichotomy to more finely split the quantization step. That is, the data processing apparatus may quantize the weights using different bits according to the characteristics of the neural network model. In addition, the weights of different layers in the neural network model may be quantized with different bits. For example, the data processing apparatus quantizes the weight of the first network layer in the neural network model using 2 bits, and quantizes the weight of the second network layer in the neural network model using 4 bits. The data processing method in fig. 4 may be understood as a low-ratio-privilege weighting method.

The data processing method provided by the embodiment of the application can adjust the bit width of the weight (namely the bit of the quantized weight) according to the characteristics of the neural network model, and reduces the network performance loss caused by quantization.

Fig. 1 depicts a flow of a method for a data processing apparatus to perform a prediction processing task using a neural network model, fig. 3 depicts a flow of a method for a data processing apparatus to determine quantization parameters for each network layer in the neural network model, and fig. 4 depicts a process for quantizing weights in the neural network model. In some embodiments, the data processing apparatus may first execute the method flows of fig. 3 and 4 to quantize the weights in the neural network model and obtain quantization parameters of each network layer, and then execute the method flow of fig. 4.

Fig. 5 is a flowchart of another data processing method according to an embodiment of the present application. As shown in fig. 5, the method may include:

501. the data processing apparatus quantizes a portion of the weights in the neural network model.

The weights in the neural network model may be understood as weight data in the neural network model.

502. The data processing device retrains the neural network model while keeping the quantized weight unchanged.

Experience has shown that most networks are more sensitive to accuracy as they get closer to the input, so the retraining (fine tune) process can be done in a back-to-front order. For example, the neural network model has 50 layers, and can be finely adjusted in multiple rounds according to the training convergence condition. For example, the weights of 5-10 layers are selected in each round and are quantized with low bits, and the other layers are finely adjusted by floating point expression until all layers of the neural network model are quantized with low bits (namely, the weights in the neural network model are quantized), and the retraining process is finished.

503. And judging whether the weight quantization is finished.

If yes, go to step 504; if not, go to step 501. Judging whether the weight quantization is finished or not can be judging whether the difference between the prediction precision of the neural network model and the original prediction precision is smaller than an error threshold value or not, and quantizing the weight in the neural network model; if yes, judging that the weight quantization is finished, and if not, judging that the weight quantization is not finished. The raw prediction accuracy refers to the prediction accuracy of the neural network model before the weights in the neural network model are unquantized. The error threshold may be 5%, 3%, 1%, etc., and the present application is not limited thereto. The above raw prediction accuracy may be 85%, 90%, 95%, etc.

504. The data processing apparatus determines quantization parameters for each network layer in the neural network model.

505. The data processing device judges whether the difference between the prediction precision of the neural network model and the original prediction precision is smaller than an error threshold value when the quantization parameters of each network layer are adopted to quantize the activation data output by each network layer in the process of performing prediction processing by using the neural network model.

If yes, go to step 506; if not, go to step 504.

506. And the data processing device inputs the data to be processed into the neural network model for prediction processing to obtain a prediction result.

An implementation of step 506 may be found in fig. 1. The data to be processed may be image data, voice data, etc., and the prediction result may be an image processing result, a voice processing result, etc.

In the method flow of fig. 5, the data processing apparatus first quantizes the weights in the neural network model, then determines the quantization parameters of each network layer in the neural network model, and finally executes the data processing flow shown in fig. 1 by using the obtained neural network model. In other embodiments, the data processing apparatus may determine quantization parameters for each network layer in the neural network model prior to quantizing the weights in the neural network model.

Fig. 6 is a flowchart of another data processing method according to an embodiment of the present application. The data processing apparatus executing the method flow of fig. 6 may determine quantization parameters of each network layer in the neural network model, and then quantize the weights in the neural network model.

As shown in fig. 6, the method may include:

601. the data processing apparatus determines quantization parameters of a portion of the network layers in the neural network model.

602. The data processing device retrains the neural network model.

And in the retraining process of the neural network model, the data processing device quantizes the activation data output by the partial network layer by adopting the quantization parameters of the partial network layer. For example, a quantization parameter for an L-th layer of a neural network model is determined by a data processing apparatus, and the activation data for the L-th layer is quantized by the quantization parameter during retraining of the neural network model. Fig. 7A is a schematic diagram of quantizing activation data according to an embodiment of the present application, i.e., a schematic diagram of activation data quantization forward. As shown in fig. 7A, activation data at different value intervals is quantized to a different value. For example, values greater than thr5 and not less than thr6 in the activation data are quantized as q 6. Fig. 7B is a diagram illustrating inverse quantization of activation data according to an embodiment of the present application.

603. It is judged whether the determination of the quantization parameter is completed.

If yes, go to step 604; if not, go to step 601. Determining whether the quantization parameter is determined or not may be determining whether a difference between a prediction accuracy of the neural network model and an original network accuracy is smaller than an error threshold, and determining a quantization parameter of each network layer in the neural network model; if so, determining that the quantization parameter is determined to be completed, otherwise, determining that the quantization parameter is not determined to be completed.

604. The data processing apparatus quantizes a portion of the weights in the neural network model.

605. The data processing device retrains the neural network model while keeping the quantized weight unchanged.

Optionally, the data processing apparatus quantizes the activation data output by each network layer by using the quantization parameter of each network layer in the process of retraining the neural network model while keeping the quantized weight unchanged.

606. And judging whether the weight quantization is finished.

If yes, go to step 607; if not, go to step 604. Judging whether the weight quantization is finished or not can be judging whether the difference between the prediction precision of the neural network model and the original prediction precision is smaller than an error threshold value or not and the weight in the neural network model is quantized; if yes, judging that the weight quantization is finished, and if not, judging that the weight quantization is not finished.

607. And the data processing device inputs the data to be processed into the neural network model for prediction processing to obtain a prediction result.

An implementation of step 607 can be seen in fig. 1. The data to be processed may be image data, voice data, etc., and the prediction result may be an image processing result, a voice processing result, etc.

Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 8, the data processing apparatus includes:

an acquisition unit 801 configured to acquire a first operation parameter required to perform a first operation process using a neural network model; the first arithmetic processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an lth layer of the neural network model, the first arithmetic parameter includes a quantization parameter of the first quantization processing, and L is an integer greater than 0;

a conversion unit 802, configured to perform a fixed-point processing on the first operation parameter to obtain a second operation parameter;

a processing unit 803, configured to perform the first arithmetic processing on the input data of the at least one network layer based on the second arithmetic parameter, so as to obtain a processing result of the at least one network layer; the input data of the at least one network layer is a to-be-predicted image, or is obtained by processing the to-be-predicted image through at least one second network layer located before the at least one network layer in the neural network model.

In an optional implementation manner, the conversion unit 802 is specifically configured to expand each floating point number included in the first operation parameter by K times to obtain an expanded numerical value, where K is an integer greater than 1; and converting the expansion numerical value into a fixed point number to obtain the second operation parameter.

In an alternative implementation, the first operation process is obtained by combining a network model process executed by at least two adjacent network layers including the L-th layer and the first quantization process.

In an alternative implementation, the quantization parameter is a parameter determined based on a quantization maximum value and a quantization minimum value; the maximum quantization value and the minimum quantization value are obtained by clustering a plurality of numerical values included in data to be quantized output from the L-th layer of the neural network model.

In an optional implementation manner, the apparatus further includes:

a clustering unit 804, configured to perform clustering processing on a plurality of numerical values included in the data to be quantized output by the L-th layer to obtain a clustering result;

a first determining unit 805 configured to obtain the maximum quantization value and the minimum quantization value based on the clustering result;

the first determining unit 805 is further configured to determine the quantization parameter based on the quantized maximum value and the quantized minimum value, wherein the quantization parameter is used to quantize a value in the L-th layer of the activation data to an integer represented by M bits.

In an alternative implementation manner, the first determining unit 805 is specifically configured to use a maximum value of at least two clusters included in the clustering result as the quantized maximum value, and use a minimum value of the at least two clusters as the quantized minimum value.

In an alternative implementation manner, the first determining unit 805 is specifically configured to obtain a first arithmetic progression including N numerical values by using the quantized minimum value as a first term and using the quantized maximum value as a last term, where N is an integer greater than 1; the quantization parameter is a common factor of the N numbers included in the first arithmetic progression.

In an optional implementation manner, the first determining unit 805 is further configured to determine, based on distribution dispersion of a plurality of numerical values included in the data to be quantized, a bit number M corresponding to the first quantization processing.

In an optional implementation manner, the apparatus further includes:

a second determining unit 806, configured to determine a first reference value based on the weight data of the neural network model; the weight data comprises weights of at least one network layer of the neural network model; determining a plurality of value intervals based on the first reference value; each value interval in the plurality of value intervals corresponds to an integer which can be represented by F bits, wherein F is an integer which is greater than 1 and less than 8;

a quantization unit 807 for quantizing the values in the first weight data using F bits based on the plurality of value intervals. The second determination unit 806 may be the same unit as the first determination unit 805 or may be a different unit.

In an optional implementation manner, the second determining unit 806 is further configured to determine the number of bits F of the quantization processing based on distribution dispersion of a plurality of values included in the weight data; the second determining unit is specifically configured to determine the plurality of numerical value intervals based on the first reference value and the number of bits F; the numerical intervals are F power numerical intervals of 2.

It should be understood that the above division of the units of the data processing apparatus is only a division of logical functions, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. For example, the above units may be processing elements which are set up separately, or may be implemented by integrating the same chip, or may be stored in a storage element of the controller in the form of program codes, and a certain processing element of the processor calls and executes the functions of the above units. In addition, the units can be integrated together or can be independently realized. The processing element may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method or the units above may be implemented by hardware integrated logic circuits in a processor element or instructions in software. The processing element may be a general-purpose processor, such as a Central Processing Unit (CPU), or may be one or more integrated circuits configured to implement the above method, such as: one or more application-specific integrated circuits (ASICs), one or more microprocessors (DSPs), one or more field-programmable gate arrays (FPGAs), etc.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 90 includes a processor 901, a memory 902, and a communication interface 903; the processor 901, the memory 902, and the communication interface 903 are connected to each other by a bus. The electronic device in fig. 9 may be the data processing apparatus in the foregoing embodiments.

The memory 902 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a compact read-only memory (CDROM), and the memory 902 is used for related instructions and data. The communication interface 903 is used for receiving and transmitting data.

The processor 901 may be one or more Central Processing Units (CPUs), and in the case that the processor 901 is one CPU, the CPU may be a single-core CPU or a multi-core CPU. The steps performed by the data processing apparatus in the above-described embodiment may be based on the structure of the electronic device shown in fig. 9. In particular, the processor 901 may implement the functions of the units in fig. 8.

The processor 901 of the electronic device 90 is configured to read the program codes stored in the memory 902 and execute the data processing method in the foregoing embodiment.

Fig. 10 is a schematic structural diagram of a server 1000 according to an embodiment of the present application, where the server 1000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1022 (e.g., one or more processors) and a memory 1032, and one or more storage media 1030 (e.g., one or more mass storage devices) storing an application 1042 or data 1044. Memory 1032 and storage medium 1030 may be, among other things, transient or persistent storage. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 1022 may be disposed in communication with the storage medium 1030, and configured to execute a series of instruction operations in the storage medium 1030 on the server 1000. The server 1000 may be a data processing apparatus provided in the present application.

The server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the data processing apparatus in the above-described embodiment may be based on the server configuration shown in fig. 10. Specifically, the central processing unit 1022 may implement the functions of each unit in fig. 8.

In an embodiment of the present application, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the data processing method provided by the foregoing embodiment.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

acquiring a first operation parameter required for executing first operation processing by using a neural network model; the first operation processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an Lth layer of the neural network model, the first operation parameter includes a quantization parameter of the first quantization processing, and L is an integer greater than 0;

performing fixed-point processing on the first operation parameter to obtain a second operation parameter;

and executing the first operation processing on the input data of the at least one network layer based on the second operation parameter to obtain a processing result of the at least one network layer.

2. The method of claim 1, wherein converting the first operation parameter into fixed-point numbers to obtain a second operation parameter comprises:

expanding each floating point number included in the first operation parameter by K times to obtain an expanded numerical value, wherein K is an integer greater than 1;

and converting the expansion numerical value into a fixed point number to obtain the second operation parameter.

3. The method according to claim 1 or 2, wherein the first arithmetic processing is obtained by combining network model processing performed by at least two adjacent network layers including the lth layer and the first quantization processing.

4. The method according to any one of claims 1 to 3, wherein the quantization parameter is a parameter determined based on a quantization maximum and a quantization minimum; the quantization maximum value and the quantization minimum value are obtained by clustering a plurality of numerical values included in the data to be quantized of the L-th layer.

5. The method of claim 4, wherein prior to obtaining the first operational parameter required to perform the first operational process using the neural network model, the method further comprises:

clustering a plurality of numerical values included in the data to be quantized of the L-th layer to obtain a clustering result;

obtaining the quantization maximum value and the quantization minimum value based on the clustering result;

determining the quantization parameter based on the quantized maximum value and the quantized minimum value, wherein the quantization parameter is used to quantize the values in the activation data into integers represented by M bits.

6. The method of claim 5, wherein the deriving a quantized maximum and a quantized minimum based on the clustering result comprises:

taking a maximum value of at least two clustering centers included in the clustering result as the quantization maximum value, and taking a minimum value of the at least two clustering centers as the quantization minimum value.

7. The method of claim 5 or 6, wherein determining the quantization parameter based on the quantized maximum and the quantized minimum comprises:

taking the quantization minimum value as a first term and the quantization maximum value as a last term to obtain a first arithmetic progression comprising N numerical values, wherein N is an integer greater than 1;

using a common factor of the N values included in the first arithmetic progression as the quantization parameter.

8. The method according to any one of claims 5 to 7, further comprising:

and determining the bit number M corresponding to the first quantization processing based on the distribution dispersion of the multiple numerical values included in the data to be quantized.

9. The method according to any one of claims 1 to 8, wherein before the obtaining of the first operation parameter required for performing the first operation process using the neural network model, the method further comprises:

determining a first reference value based on weight data of at least one network layer of the neural network model;

determining a plurality of value intervals based on the first reference value, wherein each value interval in the value intervals corresponds to an integer which can be represented by F bits, and F is an integer which is greater than 1 and less than 8;

and performing second quantization processing on the numerical values in the weight data based on the plurality of numerical value intervals to obtain an integer represented by F bits.

10. The method of claim 9, wherein prior to determining a plurality of value intervals based on the first reference value, the method further comprises:

determining the number of bits F of the second quantization processing based on distribution dispersion of a plurality of values included in the weight data;

the determining a plurality of value intervals based on the first reference value comprises:

determining the plurality of value intervals based on the first reference value and the number of bits F.

11. A data processing apparatus, comprising:

an acquisition unit configured to acquire a first operation parameter required to execute a first operation process using a neural network model; the first operation processing is obtained by network model processing and first quantization processing executed based on at least one network layer including an Lth layer of the neural network model, the first operation parameter includes a quantization parameter of the first quantization processing, and L is an integer greater than 0;

the conversion unit is used for carrying out fixed-point processing on the first operation parameter to obtain a second operation parameter;

and the processing unit is used for executing the first operation processing on the input data of the at least one network layer based on the second operation parameter to obtain a processing result of the at least one network layer.

12. The apparatus of claim 11,

the conversion unit is specifically configured to expand each floating point number included in the first operation parameter by K times to obtain an expanded numerical value, where K is an integer greater than 1; and converting the expansion numerical value into a fixed point number to obtain the second operation parameter.

13. The apparatus according to claim 11 or 12, wherein the first arithmetic processing is obtained by combining network model processing and the first quantization processing performed by at least two adjacent network layers including the lth layer.

14. The apparatus according to any one of claims 11 to 13, wherein the quantization parameter is a parameter determined based on a quantization maximum and a quantization minimum; the quantization maximum value and the quantization minimum value are obtained by clustering a plurality of numerical values included in data to be quantized output by the L-th layer of the neural network model.

15. The apparatus of claim 14, further comprising:

the clustering unit is used for clustering a plurality of numerical values included in the data to be quantized of the L-th layer to obtain a clustering result;

a first determining unit, configured to obtain the maximum quantization value and the minimum quantization value based on the clustering result;

the first determining unit is further configured to determine the quantization parameter based on the quantization maximum value and the quantization minimum value, where the quantization parameter is used to quantize a value in the activation data to an integer represented by M bits.

16. The apparatus of claim 15,

the first determining unit is specifically configured to use a maximum value of at least two clustering centers included in the clustering result as the quantized maximum value, and use a minimum value of the at least two clustering centers as the quantized minimum value.

17. The apparatus of claim 15 or 16,

the first determining unit is specifically configured to obtain a first arithmetic progression including N numerical values by using the quantized minimum value as a first term and the quantized maximum value as a last term, where N is an integer greater than 1; using a common factor of the N values included in the first arithmetic progression as the quantization parameter.

18. The apparatus of any one of claims 15 to 17,

the first determining unit is further configured to determine, based on distribution dispersion of a plurality of numerical values included in the data to be quantized, a number of bits M corresponding to the first quantization processing.

19. The apparatus of any one of claims 11 to 18, further comprising:

a second determination unit, further configured to determine a first reference value based on weight data of at least one network layer of the neural network model; determining a plurality of value intervals based on the first reference value, wherein each value interval in the value intervals corresponds to an integer which can be represented by F bits, and F is an integer which is greater than 1 and less than 8;

and the quantization unit is used for carrying out second quantization processing on the numerical values in the weight data based on the numerical value intervals to obtain an integer represented by F bits.

20. The apparatus of claim 19,

the second determining unit is further configured to determine the number of bits F of the second quantization processing based on distribution dispersion of a plurality of values included in the weight data;

the second determining unit is specifically configured to determine the plurality of numerical value intervals based on the first reference value and the number of bits F.

21. A computer-readable storage medium, in which a computer program is stored, which computer program comprises program instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 10.

22. An electronic device, comprising: a memory for storing a program; a processor for executing the program stored by the memory, the processor being configured to perform the method of any of claims 1 to 10 when the program is executed.

23. A chip, comprising: a processor interfacing with data, the processor reading instructions stored on a memory through the data interface, performing the method of any one of claims 1 to 10.