CN113743593B

CN113743593B - Neural network quantization method, system, storage medium and terminal

Info

Publication number: CN113743593B
Application number: CN202111136714.1A
Authority: CN
Inventors: 舒顺朋
Original assignee: Shanghai Qigan Electronic Information Technology Co ltd
Current assignee: Shanghai Qigan Electronic Information Technology Co ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2023-08-22
Anticipated expiration: 2041-09-27
Also published as: CN113743593A

Abstract

The invention provides a neural network quantization method, which comprises the steps of inputting a picture to be processed into an original network, and operating the original network to obtain original floating point data of the picture to be processed; simplifying the original floating point data and extracting first distribution characteristics of the original floating point data; at least one of the pictures to be processed is used as a quantized picture, and the quantized picture is imported into an original network for first quantization to obtain an initial locating point and second distribution characteristics of each layer; determining nodes according to the network structure of the original network or determining nodes according to the first distribution characteristics and the second distribution characteristics; performing node quantization on the network according to the nodes to obtain a node quantization result; and optimizing the original network according to the sectional node quantization result to obtain a target neural network, thereby effectively improving the accuracy of the quantization result. The invention also provides a neural network quantization system, a storage medium and a terminal.

Description

Neural network quantization method, system, storage medium and terminal

Technical Field

The invention relates to the technical field of deep learning neural networks, in particular to a deep learning neural network quantization method, a deep learning neural network quantization system, a deep learning neural network quantization storage medium and a deep learning neural network terminal.

Background

Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), which was introduced to Machine Learning to bring it closer to the original goal-artificial intelligence (Artificial Intelligence, AI). Deep learning currently achieves great success both in academic and industrial applications. Deep learning has achieved many results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization technologies, and other related fields. Especially in the field of computer vision. The great success of Alexnet in image classification in 2012 has exploded the whole artificial intelligence field. In 2015, deep learning networks represented by the residual network (Resnet) have achieved comparable human achievements in classification. However, in industrial application, the application of the deep learning network is faced with a certain difficulty due to the limitation of huge coefficient and complex operation of the related deep learning network and the calculation power and power consumption of hardware. Particularly in the field of edge computing, low computational power and low power consumption are normal requirements. In this case, various aspects need to make certain improvements in the weight reduction of the deep learning network. In specific application, the application difficulty of the network can be reduced by limiting the depth of the network, the channel number of the convolution layer coefficients, the convolution kernel size and other measures according to the task difficulty. Besides the reduction processing on the network design level, the measures of network pruning, parameter compression, data bit width reduction and the like can be adopted to well reduce the calculation force and the power consumption pressure when the method is applied.

In an attempt to perform network operations with low numbers of bits, the most extreme effort is currently to use 1bit or 2bit data to participate in the operations. There are a number of papers that have been published in succession, but from a practical feedback perspective, current attempts at this point have only been successfully applied to specific networks and specific tasks, and continue to be studied in most network and most application scenarios. The current mature low bit number operation is to replace the original 32bit floating point number with 8bit data to realize the deep learning network. A large number of deep-learned hardware chips also basically defaults to this rule. The floating point number with 8bit integer to replace 32bit can obtain 4 times of income in theory, i.e. the memory use is reduced to one fourth of the original one, and the operation speed can be greatly improved. In applications that utilize 8bit integers instead of floating point operations, a key technique is how to re-quantize the neural network.

In order to realize network quantization, the current quantization algorithm is divided into two types, namely direct quantization of a trained network to an 8bit data size, and network retraining and adjustment are not needed. The other is the combination of quantization process and retraining, and the specific embodiments are diverse. For example, quantization is performed first and then secondary training is performed in the form of floating point numbers with quantization accuracy, and typically quantization (QuantizationAware Training) is introduced in training. For another example, the quantization may be followed by a classification of the coefficients, only a portion of which is optimized, involving multiple iterations. Whichever one must go through the step of advanced quantization, the quality of the first quantization will determine the difficulty of the subsequent optimization of the neural network.

In order to achieve better quantization effect for the network, it is necessary to provide a novel neural network quantization method, system, storage medium and terminal to solve the above problems in the prior art.

Disclosure of Invention

The invention aims to provide a neural network quantization method, a system, a storage medium and a terminal, which effectively improve the operation speed of the quantized neural network and reduce the memory space.

In order to achieve the above object, a neural network quantization method of the present invention includes:

inputting a picture to be processed into an original network, and operating the original network to obtain original floating point data of each layer of the picture to be processed;

simplifying the original floating point data of each layer of the picture to be processed to obtain a plurality of original fixed points, and extracting the distribution characteristics of each layer of data of the original fixed points as first distribution characteristics;

at least one of the pictures to be processed is used as a quantized picture, and the quantized picture is imported into the original network for first quantization to obtain a plurality of initial locating points and second distribution characteristics of data of each layer of the quantized picture;

determining a node according to a network structure of the original network or determining the node according to the first distribution characteristic and the second distribution characteristic;

Performing node quantization on the original network according to the node to obtain a node quantization result;

and optimizing and adjusting the original network according to the sectional node quantification result to obtain a target neural network.

The invention has the beneficial effects that: simplifying the original floating point data and extracting first distribution characteristics of the original floating point data; at least one of the pictures to be processed is taken as a quantized picture, the quantized picture is imported into the original network to be quantized for the first time to obtain second distribution characteristics of each layer, nodes are determined according to the network structure or the first distribution characteristics and the second distribution characteristics of the original network, so that the quantized picture is subjected to node quantization according to the nodes, the original network is optimized according to the node quantization result to obtain a target neural network, accuracy of the quantization result is effectively improved, running speed of the finally obtained target neural network is effectively improved, and occupied memory space is smaller.

Further, the picture to be processed is obtained from the data set of the original network in a random selection mode.

Further, the data set includes at least one of a training set for training, a validation set for validation, and a test set for testing.

Further, the first quantization uses a linear quantization process.

Further, the determining a node according to the first distribution feature and the second distribution feature includes:

performing similarity judgment on the first distribution characteristics and the second distribution characteristics to obtain a first similarity judgment result;

and sequencing all layers in the quantized picture according to the first similarity judgment result, and selecting a layer with the first similarity judgment result smaller than a preset threshold value as a node. The beneficial effects are that: and selecting a layer with the first similarity judgment result smaller than a preset threshold value as a node, and selecting a layer with smaller similarity, namely larger change before and after quantization, as the node so as to facilitate the subsequent quantization of the nodes and improve the accuracy of the quantization result.

Further, the performing similarity determination on the first distribution feature and the second distribution feature to obtain a first similarity determination result includes calculating cosine similarity or relative entropy of the first distribution feature and the second distribution feature, and obtaining the first similarity determination result according to the cosine similarity or the relative entropy.

Further, the determining a node according to the network structure of the original network includes at least one of the following ways:

When the original network is judged to be a residual network, selecting an initial layer and a termination layer of a residual structure of the original network as the nodes;

when the original network is judged to be an acceptance series network, selecting an initial layer and a termination layer of an acceptance structure of the original network as the nodes;

when the original network is judged to contain a fusion part, selecting a fusion layer sampled in the original network as the node;

when the original network is judged to be a SE block-type network, selecting a Scale layer of the original network as the node;

and when judging that the number of channels of the feature map of the current layer of the original network is changed by more than four times compared with the previous layer, selecting the current layer of the original network as the node.

Further, the performing the node quantization on the network according to the node to obtain a node quantization result includes:

linearly calculating a judgment fixed point bit between a current node and a next node according to original floating point data between the current node and the next node, wherein the judgment fixed point bit is an integer bit;

sequentially replacing initial fixed point positions between the current node and the next node by corresponding judgment fixed point positions between the current node and the next node, and acquiring third distribution characteristics of data distribution of the next node;

Performing similarity judgment on the third distribution characteristic and the first distribution characteristic of the next node to obtain a second similarity judgment result;

and performing judgment processing according to the second similarity judgment result and the first similarity judgment result, wherein the judgment processing comprises:

when the second similarity judgment result is larger than the first similarity judgment result, correspondingly replacing the initial fixed point position with the current judgment fixed point position;

when the second similarity judgment result is smaller than or equal to the first similarity judgment result, performing tuning processing on the judgment fixed point bit between the current node and the next node to obtain a tuning bit, and replacing the tuning bit with the initial fixed point bit;

and circularly executing the process until the node quantization is completed for all the nodes of the original network. The beneficial effects are that: and calculating judgment point positions according to the selected nodes, and then carrying out similarity judgment on the judgment point positions according to the third distribution characteristics of the next node to determine whether the judgment point positions between adjacent nodes are improved, if so, correspondingly replacing the initial point positions with the judgment point positions, and if not, carrying out optimization processing on the judgment point positions to obtain more accurate point positions, thereby completing the quantization process of all nodes of the original network, improving the accuracy of quantization results, and replacing the initial point positions with the judgment point positions of the whole digits, so that the operation speed is faster and the occupied memory is smaller when the follow-up neural network carries out visual processing on images.

Further, when the second similarity determination result is smaller than or equal to the first similarity determination result, performing tuning processing on the determination fixed-point bit between the current node and the next node to obtain a tuning bit, and replacing the tuning bit with the initial fixed-point bit includes:

performing integer bit offset processing on the judging fixed point bit of the current node to obtain a new fixed point bit;

replacing the initial fixed point position corresponding to the judging fixed point position with the new fixed point position, and obtaining a fourth distribution characteristic of data distribution of the next node;

performing similarity judgment according to the fourth distribution characteristic and the first distribution characteristic of the next node to obtain a third similarity judgment result;

judging that the third similarity judgment result is larger than the first similarity judgment result, taking a new fixed point position between the current node and the next node as the tuning point position, and correspondingly replacing the initial fixed point position with the tuning point position;

and executing the process circularly until each judgment fixed point bit between the current node and the next node completes tuning processing. The beneficial effects are that: and by carrying out integer bit offset processing on the judgment fixed point bits, selecting more proper new fixed point bits as modulation fixed point bits to replace the initial fixed point bits, thereby improving the accuracy of the quantization result.

The optimizing and adjusting the feature map of the original network according to the quantization result of the segmentation node to obtain a target neural network comprises the following steps:

and according to the position of the initial locating point in the original network, the judging locating point replacing the initial locating point and the adjusting locating point are correspondingly used as new locating points of the original network, so that the original network is optimized to obtain the target neural network.

In a second aspect, the present invention provides a deep learning network quantization system, comprising:

the input acquisition module is used for inputting the picture to be processed into an original network, and operating the original network to acquire the original floating point data of the picture to be processed;

the first extraction module is used for simplifying the original floating point data of each layer to obtain a plurality of original fixed points, and extracting the distribution characteristics of each layer of data of the original fixed points as first distribution characteristics;

the second extraction module is used for taking at least one of the pictures to be processed as a quantized picture, and importing the quantized picture into the original network for first quantization to obtain second distribution characteristics of each layer;

the node determining module is used for determining nodes according to the network structure of the original network or determining nodes according to the first distribution characteristics and the second distribution characteristics;

The node quantization module is used for carrying out node quantization on the quantized picture according to the nodes so as to obtain a node quantization result;

and the optimizing module is used for optimizing and adjusting the original network according to the segmentation node quantization result to obtain a target neural network.

The deep learning network quantization system has the beneficial effects that: simplifying the original floating point data and extracting first distribution characteristics of the original floating point data; at least one of the pictures to be processed is used as a quantized picture, the quantized picture is imported into the original network to be quantized for the first time to obtain second distribution characteristics of each layer, nodes are determined according to the first distribution characteristics and the second distribution characteristics, so that the quantized picture is subjected to node quantization according to the nodes, the original network is optimized according to the node quantization result to obtain a target neural network, accuracy of the quantization result is effectively improved, running speed of the finally obtained target neural network is effectively improved, and occupied memory space is smaller.

In a third aspect, the invention also discloses a computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, performs the above-mentioned method.

In a fourth aspect, the present invention provides a terminal comprising a memory and a processor, the memory having stored thereon a computer program capable of running on the processor, the terminal executing the method as described above when the computer program is run.

Advantageous effects of the third aspect and the fourth aspect are specifically referred to the description of the advantageous effects of the first aspect and the second aspect, and are not repeated here.

Drawings

Fig. 1 is a flow chart of a deep learning network quantization method according to an embodiment of the invention;

FIG. 2 is a schematic diagram illustrating the selection of decision point locations in a Resnet network according to a deep learning network quantization method of an embodiment of the present invention;

fig. 3 is a block diagram of a deep learning network quantization system according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. Unless otherwise defined, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. As used herein, the word "comprising" and the like means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof without precluding other elements or items.

Aiming at the problems existing in the prior art, the embodiment of the invention provides a neural network quantification method, as shown in fig. 1, comprising the following steps:

s1, inputting a picture to be processed into an original network, and operating the original network to obtain original floating point data of each layer of the network under the picture to be processed.

In some embodiments, the pictures to be processed are acquired from the data set of the original network in a random selection manner, and the randomness of the selection of the pictures to be processed is ensured by selecting the pictures to be processed from the original network randomly, so that the diversity and the representativeness of the scene are ensured as randomly as possible.

In still other embodiments, the data set includes at least one of a training set for training, a validation set for validation, and a test set for testing, and by selecting pictures to be processed in the training set, the validation set, and the test set, pictures of different roles can be selected for quantization processing, respectively.

Further, the number of the pictures to be processed is n Batch size, wherein the Batch size refers to the number of data samples grabbed by one training.

Preferably, the number of the pictures to be processed is 1-2 Batch size.

S2, simplifying the original floating point data of each layer of the network of the picture to be processed to obtain a plurality of original fixed points, and extracting the distribution characteristics of each layer of data of the original fixed points as first distribution characteristics.

In some embodiments, the original floating point data of each layer of the to-be-processed picture network is sampled according to a sampling theorem to obtain a plurality of discrete original fixed points, and after the original fixed points of the to-be-processed picture are obtained, the first distribution characteristic of each original fixed point is correspondingly extracted, so that the first distribution characteristic capable of accurately reflecting the real situation of the to-be-processed picture is obtained, and the judgment standard of the quantization result is conveniently used later, so that the accuracy of the quantization result is improved.

In some possible embodiments, the network data of the picture to be processed includes a plurality of layers, each layer having corresponding floating point data and a first distribution characteristic of the floating point data, so as to obtain a first distribution characteristic of the floating point data of each layer, denoted as H _f (i) I is the sequence number of each layer, i.e. the position of the original fixed point.

S3, at least one of the pictures to be processed is used as a quantized picture, and the quantized picture is imported into the original network to be quantized for the first time so as to obtain a plurality of initial locating points and second distribution characteristics of data of each layer of the quantized picture.

At least one of the pictures to be processed is selected as a quantized picture, and the quantized picture is subjected to first quantization processing to obtain an initial locating point after the first quantization and second distribution characteristics of each layer of the network of the quantized picture.

In some possible embodiments, after the first quantization of the network of quantized pictures, a series of initial fixed-point bits are obtained, which are denoted as Q _i The distribution characteristics of the floating point data of each layer of the quantized picture corresponding to the initial locating point are marked as second distribution characteristics, and the second distribution characteristics are marked as H _q (i) Where i is the sequence number of each layer, i.e. the position of the initial locating point.

In some embodiments, the first quantization uses a linear quantization process, and since the linear quantization process is the content of the prior art, the present solution does not involve an improvement of the linear quantization itself, and will not be described herein.

And S4, determining nodes according to the network structure of the original network or determining nodes according to the first distribution characteristics and the second distribution characteristics.

In some embodiments, the determining a node from the first distribution feature and the second distribution feature comprises:

and sequencing all layers in the quantized picture according to the first similarity judgment result, and selecting a layer with the first similarity judgment result smaller than a preset threshold value as a node.

The first distribution characteristics reflect the real situation of the original floating point data of each layer of the network of the quantized pictures, the second distribution characteristics reflect the distribution situation of the floating point data of each layer of the network of the quantized pictures after the first quantization, the first distribution characteristics and the second distribution characteristics are subjected to similarity judgment to obtain a first similarity judgment result, the first similarity judgment results of each layer are ordered, the layer with the first similarity judgment smaller than a preset threshold value is selected as a node, namely, the layer with larger front-back variation is selected as a node, so that the layer with larger difference with the real situation of the quantized pictures is selected as a node in the first quantization process to carry out the subsequent segmental quantization process, and the accuracy of the subsequent quantization process is effectively improved; the preset threshold value is a threshold value indicating that the similarity indicated by the first similarity judging result is large when the threshold value is larger than the preset threshold value, and the similarity indicated by the first similarity judging result is small when the threshold value is smaller than the preset threshold value.

In some further embodiments, the sorting the layers in the quantized picture according to the size of the first similarity determination result, selecting a layer with a first similarity determination result greater than a preset threshold as a node, further includes selecting a layer corresponding to the first similarity determination result with a sorting result located after the duty ratio threshold as a node after sorting the first similarity determination result in the order from large to small.

Specifically, the first similarity judgment results of all the layers are sequenced according to the sequence from large to small, the layer corresponding to the first similarity judgment result, of which the similarity judgment is ranked at the rear 25%, is selected as a node, namely, the layer with smaller similarity between the second distribution feature and the first distribution feature and larger variation gap is found out as the node, and the node quantization is carried out in the subsequent process, wherein the layer corresponding to the first similarity judgment result has a corresponding locating point, namely, the initial locating point.

In some embodiments, the number of the original fixed points is the same as that of the original fixed points, in the process of performing similarity judgment on the original fixed points and the original fixed points, the original fixed points corresponding to the original fixed points at the same position are selected to perform similarity judgment, for example, the original fixed point is selected to be the 10 th fixed point, then the initial fixed points corresponding to perform similarity judgment are the 10 th initial fixed points in all the initial fixed points, so that the initial fixed points and the original fixed points are guaranteed to be in one-to-one correspondence, so that the difference between the initial fixed points and the original fixed points is judged, and further quantization optimization is facilitated.

In still other embodiments, the duty cycle threshold is selected according to different situations, and in some possible embodiments, the duty cycle threshold may be selected to be any one of 20%, 30%, and 50%.

In other embodiments, further, the determining a node according to the network structure of the original network includes at least one of:

when the original network is judged to be a residual network (Resnet), selecting an initial layer and a termination layer of a residual structure of the original network as nodes, and selecting an Eltwise layer as a node in general;

when the original network is judged to be an acceptance series network, selecting an initial layer and a termination layer of an acceptance structure of the original network as nodes, and selecting a concat layer as a node in general;

when the original network is judged to contain a fusion part, that is, different layers in the original network are fused or form a closed loop, for example, an FPN structure network or a dark network is specifically included, and the fusion layer after upsampling in the original network is selected as a node;

when the original network is judged to be a SE block-type network, selecting a Scale layer of the original network as a node;

And when judging that the number of channels of the feature map of the current layer of the original network is changed by more than four times compared with the previous layer, selecting the current layer of the original network as a node.

In some embodiments, for a residual network (Resnet) class of network, an Eltwise layer structure is used as a node; for the network of the indication series structure, the concat layer is used as a node; for the network containing the fusion part, including an FPN structure network or a dark net network, taking the sampled fusion layer as a node; for SE block and similar networks, a Scale layer is used as a node; for a feature map with more than four times channel number variation, the feature map is selected as a node.

Further, when the node cannot be selected according to the original network, the node is determined by calculating according to the first distribution feature and the second distribution feature, which are specifically described above, and are not described herein.

And S5, carrying out node quantization on the quantized picture according to the node to obtain a node quantization result.

In some embodiments, the performing the node quantization on the network according to the node to obtain a node quantization result includes:

Linearly calculating a judgment fixed point bit between a current node and a next node according to original floating point data between the current node and the next node;

and circularly executing the process until the node quantization is completed for all the nodes of the quantized picture.

In the above-mentioned process of node quantization, the original floating point data between the current node and the next node is linearly calculated, so as to obtain the judgment fixed point bit between the current node and the next node, the judgment fixed point bit between the current node and the next node is sequentially used for correspondingly replacing the initial fixed point bit between the current node and the next node, because the initial fixed point bit of the current node is changed, the data distribution of the next node is also changed, then the data distribution of the next node is obtained, namely, the third distribution feature, and the second similarity judgment result is obtained by carrying out similarity judgment on the third distribution feature and the first distribution feature of the next node, and according to the second similarity judgment result and the first similarity judgment result, whether the quantized result after the initial fixed point bit is changed is better than the quantized result of the first quantization is judged, when the second similarity judgment result is greater than the first similarity judgment result, the similarity represented by the second similarity judgment result is greater than the similarity represented by the first similarity judgment result, and the similarity represented by the second similarity judgment result is greater than the similarity represented by the first similarity judgment result, and the quantized result after the initial judgment result is replaced by the first similarity judgment result is more accurate than the first similarity judgment result.

And if the second similarity judgment result is smaller than or equal to the first similarity judgment result, the similarity represented by the second similarity judgment result is smaller than or equal to the similarity represented by the first similarity judgment result, the fact that the quantization result obtained after the judgment fixed point bit is replaced by the initial fixed point is not accurate in the first quantization result is represented, the judgment fixed point bit between the current node and the next node is subjected to optimization processing to obtain an adjustment fixed point bit, and then the adjustment fixed point bit is replaced by the adjustment fixed point bit so as to finish the quantization process again in the node.

Since there are generally a plurality of judgment fixed-point bits between the current node and the next node, i.e., between adjacent nodes, in the process of judging the judgment fixed-point bits between the adjacent nodes, similarity judgment is sequentially performed on each judgment fixed-point bit in order to determine whether the judgment fixed-point bit at different positions is suitable, and then the above process is repeated until quantization is completed on each node of the quantized picture, thereby completing the quantization process of the partial nodes.

In some embodiments, the process of linearly calculating the predicate setpoint bits between adjacent ones of the nodes from the data of the floating point data of the nodes includes: firstly, solving the maximum value of floating point data according to the node; the maximum value is then mapped linearly to the maximum value 128 of int 8; and (3) reversely pushing out a fixed-point bit A according to the mapped proportionality coefficient Sl, wherein the calculation formula of the proportionality coefficient is sl=max (data)/128, max (data) is the maximum value of floating point data, and A= [ log2 (Sl) ], and the symbol [ (] represents a downward integer.

In some embodiments, when the second similar determination result is greater than or equal to the first similar determination result, performing tuning processing on the determination fixed-point bit between the current node and the next node to obtain a tuning bit, and replacing the tuning bit with the initial fixed-point bit includes:

and executing the process circularly until each judgment fixed point bit between the current node and the next node completes tuning processing.

When the judging fixed point position of the current position is judged to be located at the initial fixed point position of the corresponding position and is not improved, adjusting the judging fixed point position, specifically, carrying out integer bit offset processing on the judging fixed point position of the current position to obtain a new fixed point position, sequentially replacing the initial fixed point position corresponding to the original judging fixed point position with the obtained new fixed point position, obtaining the fourth distribution characteristic of the data distribution of the next node, carrying out similarity judgment according to the fourth distribution characteristic of the next node and the first distribution characteristic corresponding to the next node, obtaining a third similarity judging result, if the third similarity judging result is judged to be larger than the first similarity judging result, namely, showing that the similarity between the new fixed point position after change and the original fixed point is larger than the similarity between the initial fixed point position and the original fixed point, the new fixed point position obtained after judgment offset is improved compared with the initial fixed point position of the corresponding position, namely, the new fixed point position is closer to the original fixed point position of the quantized picture, taking the new fixed point position as the optimal fixed point position, and carrying out similarity judgment according to the fourth distribution characteristic of the data distribution characteristic of the next node, if the third similarity judging result is larger than the first similarity judging result, namely, showing that the similarity between the new fixed point position and the original fixed point position is closer to the original fixed point position.

It should be noted that, integer bit offset is adopted in the whole bit offset processing process of the judging fixed point bit, and since the judging point bit is also integer bit, the new fixed point bit obtained after the whole bit offset processing of the judging point bit is also integer bit, so that the judging fixed point bit and the new fixed point bit obtained after the whole process is quantized are both integer bits, and after the original network is optimized and adjusted subsequently, the running speed of the network is effectively improved.

Further, if the third similarity determination result is smaller than or equal to the first similarity determination result, that is, the similarity between the new fixed point after the change and the original fixed point is smaller than or equal to the similarity between the initial fixed point and the original fixed point, the new fixed point after the offset is determined to be compared with the initial fixed point at the corresponding position, the quantization condition is not improved, the initial fixed point is replaced again by the determination point at the corresponding position, the initial fixed point is selected as the final fixed point, and therefore the quantization process is completed, and an accurate quantization result is obtained.

Because a plurality of fixed point positions exist between the current node and the next node, after the tuning processing is performed on the first fixed point position, the tuned fixed point position is used as a final fixed point position, and the next fixed point position is continuously optimized until the processing process of all fixed point positions needing the tuning processing between the current node and the next node is completed.

In some embodiments, after obtaining the fixed point judgment bit, in order to obtain a better tuning effect, tuning the fixed point judgment bit, for example, when the value of the currently selected fixed point judgment bit is 4, and when the new fixed point is obtained by performing integer bit offset processing, two new fixed point bits are obtained by adding one and subtracting one to the fixed point judgment bit respectively, and are respectively 3 and 5, and then performing similarity judgment processing according to the new fixed point bit to obtain a third similarity judgment result, if the third similarity judgment result is greater than the first similarity judgment result, the new fixed point bit is indicated to be closer to the original fixed point of the quantized picture relative to the original fixed point, so that the new fixed point bit is taken as a target point, thereby completing the quantization process of the segmentation point; if the third similarity judgment result is smaller than or equal to the first similarity judgment result, judging that the quantization condition is not improved when the new fixed point bit after the integer bit offset processing is compared with the initial fixed point bit of the corresponding position, replacing the initial fixed point bit again by the judgment point bit of the corresponding position, and selecting the initial fixed point bit as the final fixed point bit, thereby completing the quantization process and obtaining an accurate quantization result.

In some embodiments, as shown in fig. 2, taking the part of Resnet as an example, after determining the nodes, a layer between two adjacent nodes is selected as a determination point location, and ReLU, and Scale are respectively taken as a determination point location 1 number, a determination point location 2 number, and a determination point location 3 number.

In some embodiments, the performing similarity determination on the first distribution feature and the second distribution feature to obtain a first similarity determination result includes calculating cosine similarity or relative entropy of the first distribution feature and the second distribution feature, and obtaining the first similarity determination result according to the result of the cosine similarity or the relative entropy. When the similarity judgment result is calculated by adopting the cosine similarity, the similarity judgment result is the same as the cosine similarity calculation result, and the larger the calculated cosine similarity is, the larger the similarity judgment result is, and the more similar the two distribution features corresponding to the similarity judgment result are; conversely, the smaller the calculated cosine similarity is, the smaller the similarity judgment result is, and the two distribution features corresponding to the similarity judgment result are dissimilar.

When the relative entropy KLD is adopted to calculate the similarity judgment result, the magnitude of the similarity judgment result is equal to the reciprocal of the relative entropy, and the smaller the calculated relative entropy is, the larger the similarity judgment result is, and the two distribution characteristics corresponding to the similarity judgment result are more similar; conversely, the larger the calculated relative entropy is, the smaller the similarity judgment result is, and the two distribution features corresponding to the similarity judgment result are dissimilar.

The similarity determination result includes the first similarity determination result, the second similarity determination result, and the third similarity determination result, where the first similarity determination result, the second similarity determination result, and the third similarity determination result are all calculated in the foregoing manner.

In a further embodiment, to specifically describe the calculation manner of the similarity determination result, taking the first similarity determination result as an example, the similarity determination processing is performed on the first distribution feature and the second distribution feature corresponding to the first similarity determination result.

When a first similarity judgment result is calculated by adopting cosine similarity, the first similarity judgment result is the same as the calculated cosine similarity, and the cosine similarity calculation process of the first distribution feature and the second distribution feature is as follows:the ratio of the product of the first distribution feature and the second distribution feature to the product between the absolute value of the first distribution feature and the absolute value of the second distribution feature.

When the relative entropy is adopted to calculate a first similarity judgment result, the magnitude of the first similarity judgment result is equal to the reciprocal of the relative entropy, namely 1/KLD (H), and the relative entropy calculation process of the first distribution feature and the second distribution feature is as follows: Wherein i is the sequence number of each layer, namely the positions of the original fixed point and the initial fixed point.

According to the quantization method, the quantization processing is carried out on the network to be quantized, the quantized pictures are quantized in a segmented mode according to the nodes, after the quantization process of each node is completed in sequence, the fixed point bits among the nodes are optimized again, and therefore the quantization process of the quantized pictures is completed in a layer-by-layer circular optimization mode, and the quantization precision and accuracy of the network are effectively improved.

Further, since the judgment fixed point bit obtained when the judgment fixed point bit is obtained is integer bit, and integer bit offset processing is performed on the judgment fixed point bit, the fixed point bit of the finally obtained neural network is a network adopting integer point bit, so that in the process of performing machine vision identification on an image, the operation speed is effectively improved, the power consumption is reduced, and meanwhile, the memory space occupied by the whole neural network is reduced.

It should be noted that, in the above process, the process of the similarity determination processing includes calculating cosine similarity or relative entropy, and the calculation process is similar to the foregoing process, which is not repeated here.

And S6, optimizing and adjusting the feature map of the original network according to the segmentation node quantization result to obtain a target neural network.

In some specific embodiments, according to the position of the initial locating point in the original network, the judging locating point and the adjusting locating point which replace the initial locating point are correspondingly used as new locating points of the original network, so that the original network is optimized to obtain the target neural network.

The original network is quantized through the steps, and then a node quantization result is obtained, namely, quantized locating points are obtained, the judging locating points after the node quantization and the adjusting locating points of the adjusting processing are replaced by the initial locating points, so that the original network is optimized to obtain the target neural network, the locating points of the target neural network are converted from floating point data to integer digits, the optimized neural network can operate through the integer digits instead of the floating point digits, the memory space of the whole target neural network is effectively reduced, and in the process of calculating the data through machine vision, the operation processing speed is effectively improved through the target neural network, and meanwhile, the operation power consumption of the target neural network is further reduced.

Furthermore, the target neural network is used for machine vision image recognition, and the image is input into the target neural network to perform machine vision recognition on the image, so that the transportation speed of the target neural network can be effectively improved, and meanwhile, the calculation power consumption is reduced.

It should be noted that, the quantization method is to apply general linear quantization to the weight value of the network itself, that is, the trained coefficients (weights), for the Feature Map (Feature Map) activated in the neural network.

In some specific embodiments, the original floating point number of the neural network feature map is 32 bits, and the scheme of the application utilizes the integer bit of 8 bits to quantize the neural network feature map, so that the neural network can use the integer bit of 8 bits to replace the floating point number of 32 bits to operate, 4 times of operation benefits can be obtained in theory, the use of the quantized neural network memory is reduced to one fourth of the original one, the operation speed is also greatly improved, and the image processing speed is effectively improved.

It should be noted that, in the actual quantization process, the floating point number size before quantization and the integer bit size after quantization of the neural network can be correspondingly adjusted according to the requirement, which is not described herein.

The present invention further provides a neural network quantization system, as shown in fig. 3, comprising:

the input acquisition module 1 is used for inputting a picture to be processed into an original network, and operating the original network to acquire original floating point data of each layer of the picture to be processed;

the first extraction module 2 is used for simplifying the original floating point data of each layer to obtain a plurality of original fixed points, and extracting the distribution characteristics of each layer of data of the original fixed points as first distribution characteristics;

a second extraction module 3, configured to take at least one of the pictures to be processed as a quantized picture, and import the quantized picture into the original network for first quantization to obtain a second distribution feature of each layer;

a node determining module 4, configured to determine a node according to a network structure of the original network or determine a node according to the first distribution characteristic and the second distribution characteristic;

the node quantization module 5 is used for performing node quantization on the quantized picture according to the node to obtain a node quantization result;

and the optimizing module 6 is used for optimizing and adjusting the feature map of the original network according to the segmentation node quantification result to obtain a target neural network.

It should be noted that, the structure and principle of the neural network quantization system are in one-to-one correspondence with the steps in the neural network quantization method, so that the description thereof is omitted herein.

It should be noted that, it should be understood that the division of the modules of the above system is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the modules may be processing elements that are individually set up, may be implemented as integrated in a chip of the above system, or may be stored in a memory of the above system in the form of program codes, and the functions of the above x modules may be called and executed by a processing element of the above system. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more digital signal processors (Digital Signal Processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable GateArray, abbreviated as FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

The invention also discloses a storage medium having stored thereon a computer program which when run by a processor performs the steps described above.

The storage medium of the present invention has stored thereon a computer program which, when executed by a processor, implements the method described above. The storage medium includes: read-only memory (ROM), random access memory (RandomAccess Memory, RAM), magnetic disk, usb disk, memory card, or optical disk, or other various media capable of storing program codes.

The invention also discloses a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the method when running the computer program.

In a possible embodiment, the memory is used for storing a computer program; for example, the memory includes: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

The processor is connected with the memory and is used for executing the computer program stored in the memory so as to enable the terminal to execute the method.

Preferably, the processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable GateArray, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

While embodiments of the present invention have been described in detail hereinabove, it will be apparent to those skilled in the art that various modifications and variations can be made to these embodiments. It is to be understood that such modifications and variations are within the scope and spirit of the present invention as set forth in the following claims. Moreover, the invention described herein is capable of other embodiments and of being practiced or of being carried out in various ways.

Claims

1. A neural network quantization method, comprising:

inputting a picture to be processed into an original network, and operating the original network to obtain original floating point data of each layer of the network under the picture to be processed;

simplifying the original floating point data of each layer of the network of the picture to be processed to obtain a plurality of original fixed points, and extracting the distribution characteristics of each layer of data of the original fixed points as first distribution characteristics;

Performing node quantization on the feature map of the original network according to the nodes to obtain a node quantization result;

linearly calculating judgment fixed point bits of the feature map between the current node and the next node according to original floating point data between the current node and the next node, wherein the judgment fixed point bits are integer digits;

and carrying out judgment processing according to the second similarity judgment result and the first similarity judgment result, wherein the judgment processing comprises the following steps:

Circularly executing the process until the node quantization is completed for all nodes of the original network;

and optimizing and adjusting the feature map of the original network according to the segmentation node quantification result to obtain a target neural network.

2. The neural network quantization method of claim 1, wherein the pictures to be processed are acquired from the dataset of the original network in a randomly selected manner.

3. The neural network quantization method of claim 2, wherein the data set includes at least one of a training set for training, a validation set for validation, and a test set for testing.

4. The neural network quantization method of claim 1, wherein the first quantization uses a linear quantization process.

5. The neural network quantization method of claim 1, wherein the determining a node from the first distribution characteristic and the second distribution characteristic comprises:

and sequencing all layers in the quantized pictures according to the first similarity judgment result, and selecting a layer with the first similarity judgment result smaller than a preset threshold value as the node.

6. The neural network quantization method according to claim 5, wherein the performing similarity determination on the first distribution feature and the second distribution feature to obtain a first similarity determination result includes calculating cosine similarity or relative entropy of the first distribution feature and the second distribution feature, and obtaining the first similarity determination result according to the cosine similarity or the relative entropy.

7. The neural network quantization method of claim 1, wherein the determining nodes according to the network structure of the original network comprises at least one of:

8. The neural network quantization method according to claim 1, wherein when the second similarity determination result is smaller than or equal to the first similarity determination result, performing tuning processing on the determination fixed-point bit between the current node and the next node to obtain a tuning bit, and replacing the tuning bit with the initial fixed-point bit includes:

9. The neural network quantization method according to claim 8, wherein the optimizing and adjusting the feature map of the original network according to the segmented node quantization result to obtain the target neural network includes:

and according to the position of the initial locating point in the original network, the judging locating point and the adjusting locating point which replace the initial locating point are correspondingly used as new characteristic diagram locating points of the original network, so that the characteristic diagram of the original network is adjusted to obtain the target neural network.

10. A neural network quantization system, comprising:

the first extraction module is used for simplifying the original floating point data of each layer to obtain a plurality of original fixed points, and extracting the data distribution characteristics of each layer of the original fixed points as first distribution characteristics;

the second extraction module takes at least one of the pictures to be processed as a quantized picture, and guides the quantized picture into the original network for first quantization to obtain a plurality of initial locating points and second distribution characteristics of data of each layer of the quantized picture;

the node quantization module is used for carrying out node quantization on the quantized picture according to the nodes so as to obtain a node quantization result; linearly calculating judgment fixed point bits of the feature map between the current node and the next node according to original floating point data between the current node and the next node, wherein the judgment fixed point bits are integer digits; sequentially replacing initial fixed point positions between the current node and the next node by corresponding judgment fixed point positions between the current node and the next node, and acquiring third distribution characteristics of data distribution of the next node; performing similarity judgment on the third distribution characteristic and the first distribution characteristic of the next node to obtain a second similarity judgment result; and carrying out judgment processing according to the second similarity judgment result and the first similarity judgment result, wherein the judgment processing comprises the following steps: when the second similarity judgment result is larger than the first similarity judgment result, correspondingly replacing the initial fixed point position with the current judgment fixed point position; when the second similarity judgment result is smaller than or equal to the first similarity judgment result, performing tuning processing on the judgment fixed point bit between the current node and the next node to obtain a tuning bit, and replacing the tuning bit with the initial fixed point bit; circularly executing the process until the node quantization is completed for all nodes of the original network;

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the method of any one of claims 1 to 9.

12. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the terminal performs the method according to any of claims 1 to 9 when the computer program is executed by the terminal.