CN111831358B

CN111831358B - Weight precision configuration method, device, equipment and storage medium

Info

Publication number: CN111831358B
Application number: CN202010663771.4A
Authority: CN
Inventors: 祝夭龙; 何伟
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2023-04-07
Anticipated expiration: 2040-07-10
Also published as: CN111831358A

Abstract

The embodiment of the invention discloses a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium. The method comprises the following steps: determining a current recognition rate threshold value from at least two candidate recognition rate threshold values smaller than a target recognition rate threshold value, performing reduction adjustment on weight precision corresponding to each layer in a neural network based on the current recognition rate threshold value, training the neural network subjected to reduction adjustment to adjust weight parameter values of each layer, wherein the training target is used for improving the recognition rate of the neural network subjected to reduction adjustment, and determining a final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value. By adopting the technical scheme, the resource utilization rate of the artificial intelligent chip bearing the neural network can be improved, the performance of the chip can be improved, and the power consumption of the chip can be reduced under the condition of ensuring the recognition rate of the neural network.

Description

Weight precision configuration method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium.

Background

With the explosive development of big data information networks and intelligent mobile devices, massive unstructured information is generated, accompanied by a rapid increase in the high-efficiency processing demand for the information. In recent years, the deep learning technology has been rapidly developed, and high accuracy has been achieved in many fields such as image recognition, speech recognition, and natural language processing. However, most of the deep learning research nowadays is still implemented based on the traditional von neumann computer, which is not only high in energy consumption and low in efficiency when processing large-scale complex problems due to the separation of a processor and a memory, but also high in software programming complexity when processing non-formalized problems due to the characteristics of numerical calculation, and even cannot be implemented.

With the development of brain science, because the brain has characteristics such as super low-power consumption and high fault-tolerance than traditional von neumann computer, and has showing the advantage in the aspect of handling unstructured information and intelligent task, it has become a new development direction to draw the reference to the computing mode of brain to establish novel artificial intelligence system and artificial intelligence chip, consequently, the artificial intelligence technique of drawing the reference to the brain development comes into force. The neural network in the artificial intelligence technology is composed of a large number of neurons, the neural network can simulate the self-adaptive learning process of the brain by defining basic learning rules through distributed storage and parallel cooperative processing of information, clear programming is not needed, and the neural network has advantages in processing some non-formalized problems. Artificial intelligence techniques can be implemented using large-scale integrated analog, digital, or mixed-analog circuits and software systems, i.e., based on neuromorphic devices.

At present, a deep learning algorithm can work under different data precisions, better performance (such as accuracy or recognition rate) can be obtained at high precision, but after the deep learning algorithm is applied to an artificial intelligent chip, storage cost and calculation cost are high, and performance loss at a certain degree can be replaced by remarkable saving of storage and calculation at low precision, so that the chip has high power consumption and utility. In a conventional artificial intelligence chip, due to different requirements for computational accuracy, a processing chip also needs to provide storage support for multiple data accuracies, including integer (Int) and floating-point (FP), such as 8-bit integer (Int 8), 16-bit floating-point (FP 16), 32-bit floating-point (FP 32), and 64-bit floating-point (FP), but the weighted accuracies of layers of a neural network carried in a brain-like chip are the same, so that a weighted accuracy configuration scheme in the artificial intelligence chip is not flexible enough, and needs to be improved.

Disclosure of Invention

The embodiment of the invention provides a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium, which can optimize the existing weight precision configuration scheme.

In a first aspect, an embodiment of the present invention provides a method for configuring weight precision, including:

determining a current recognition rate threshold from at least two candidate recognition rate thresholds, wherein the at least two candidate recognition rate thresholds are less than a target recognition rate threshold;

reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold;

training the neural network subjected to the reduction adjustment to adjust the weight parameter value of each layer, wherein the training target is to improve the recognition rate of the neural network subjected to the reduction adjustment;

and determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold.

In a second aspect, an embodiment of the present invention provides a weight precision configuration apparatus, including:

a recognition rate threshold determination module, configured to determine a current recognition rate threshold from at least two candidate recognition rate thresholds, where the at least two candidate recognition rate thresholds are smaller than a target recognition rate threshold;

the weight precision adjusting module is used for reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold;

the neural network training module is used for training the neural network subjected to the reduction adjustment so as to adjust the weight parameter values of each layer, wherein the training target is to improve the recognition rate of the neural network subjected to the reduction adjustment;

and the configuration result determining module is used for determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the weight precision configuration method according to the embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the weight precision configuration method provided by the embodiment of the present invention.

The weight precision configuration scheme provided by the embodiment of the invention determines the current recognition rate threshold from at least two candidate recognition rate thresholds smaller than the target recognition rate threshold, reduces and adjusts the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold, trains the neural network subjected to reduction and adjustment to adjust the weight parameter value of each layer, wherein the training target is to improve the recognition rate of the neural network subjected to reduction and adjustment, the final configuration result of the weight precision of each layer is determined by adopting the technical scheme and taking the recognition rate threshold lower than the target recognition rate threshold as a reference, the weight precision of each layer in the neural network is tried to be reduced, after the weight precision is reduced, the recognition rate is improved in a mode of adjusting the weight parameter value by training the neural network to make up the loss of the recognition rate caused by the reduction of the weight precision, the final configuration result of the weight precision of each layer is determined according to the relationship between the trained recognition rate and the target recognition rate threshold, and the resource utilization rate in an artificial chip carrying the neural network can be improved under the condition of ensuring the recognition rate of the neural network, the performance of the chip is improved, and the performance of the chip is reduced.

Drawings

Fig. 1 is a schematic flowchart of a weight precision configuration method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a prior art arrangement for precision configuration of output data;

FIG. 3 is a schematic diagram of a precision configuration scheme for output data according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of another method for configuring weight precision according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of another method for configuring weight precision according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of a method for adjusting weight precision reduction according to an embodiment of the present invention;

fig. 8 is a block diagram of a weight precision configuration apparatus according to an embodiment of the present invention;

fig. 9 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

It should be noted that the terms "first", "second", and the like in the embodiments of the present invention are only used for distinguishing different apparatuses, modules, units, or other objects, and are not used for limiting the order or interdependence of the functions performed by these apparatuses, modules, units, or other objects.

For a better understanding of embodiments of the present invention, the related art will be described below.

Artificial intelligence generally refers to the basic law of information processing in the brain, and makes essential changes to the existing computing system and system at multiple levels of hardware implementation, software algorithm and the like, so as to realize great improvements in many aspects such as computing energy consumption, computing power, computing efficiency and the like, and is a cross-technical field fusing the fields of brain science and computer science, information science, artificial intelligence and the like. The artificial intelligence chip generally refers to a non-von neumann chip, such as a pulse neural network chip, a memristor, a memcapacitor, a meminductor and the like.

The artificial intelligence chip in the embodiment of the present invention may include a plurality of processing cores, each of the processing cores may include a processor and a memory area, the calculation data may be operated locally, and each of the processing cores may correspond to one layer of the neural network, that is, the neural network may be deployed or mapped onto the corresponding processing core in units of layers. The Neural Network in the embodiment of the present invention may include an Artificial Neural Network (ANN), and may also include a Spiking Neural Network (SNN) and other types of Neural networks. The specific type of the neural network is not limited, and for example, the neural network may be an acoustic model, a voice recognition model, an image recognition model, and the like, and may be applied to a data center, a security field, an intelligent medical field, an automatic driving field, an intelligent transportation field, an intelligent home field, and other related fields. The technical solution provided in the embodiment of the present invention does not improve the neural network algorithm itself, is an improvement on a control mode or an application mode of a hardware platform for implementing a neural network, and belongs to a neuromorphic circuit and a neuromorphic engineering (neuromorphic engineering) system.

In the prior art, the weight precision of each layer of the neural network carried in the artificial intelligence chip is the same. If the weighting precision of all layers is configured to be lower Int4, in this case, in order to ensure the recognition rate, not only parameter adjustment is difficult, which leads to a large increase in training time, but also a large precision loss is often caused. If the weight precision of all layers is configured to be FP32 or higher, at this time, the operation precision can meet the requirement, and the recognition rate is high, but the model of the neural network is generally large, so that the resource utilization rate of the artificial intelligent chip is low, the consumed power consumption is high, and the performance of the chip is influenced.

In the embodiment of the invention, the limiting condition that the weight precision of each layer in the neural network is the same in the prior art is abandoned, and different weight precisions can be configured for each layer, namely, the mixed precision is adopted, so that the relation between the storage capacity and the calculation energy consumption and the recognition rate (or accuracy rate) of the neural network is well balanced. The weight precision is configured based on the idea of mixing precision, and a specific configuration scheme is provided.

Fig. 1 is a flowchart of a method for configuring weight precision according to an embodiment of the present invention, where the method may be performed by a device for configuring weight precision, where the device may be implemented by software and/or hardware, and may be generally integrated in a computer device. As shown in fig. 1, the method includes:

step 101, determining a current recognition rate threshold value from at least two candidate recognition rate threshold values, wherein the at least two candidate recognition rate threshold values are smaller than a target recognition rate threshold value.

In the embodiment of the present invention, a specific structure of the neural network is not limited, and for example, the number of neuron layers included in the neural network may be any number of layers greater than two.

For example, the recognition rate of the neural network can be used to measure the performance of the neural network. For example, a preset number of samples may be used to test the neural network to obtain the current recognition rate. The target recognition rate threshold may be set according to actual usage requirements such as an application scenario of the neural network, which may be understood as a lowest recognition rate that can be tolerated by current usage requirements, and a specific value is not limited, and may be 0.95, for example. The target recognition rate threshold is generally smaller than the initial recognition rate of the neural network, that is, when the weight precision corresponding to each layer in the neural network is not reduced, the recognition rate of the neural network is generally larger than the target recognition rate threshold.

The candidate recognition rate threshold may include at least two, wherein each candidate recognition rate threshold is less than the target recognition rate threshold. In the embodiment of the invention, the weight precision of each layer in the neural network is tried to be reduced and adjusted by taking the candidate recognition rate threshold as a reference, when the weight precision is reduced, a certain influence is generally generated on the recognition rate, and the aim of improving the recognition rate can be achieved by training the neural network and adjusting the weight parameter values of each layer, so that the recognition rate of the trained neural network can possibly reach the target recognition rate threshold. The specific number and the specific numerical value of the candidate recognition rate threshold are not limited. For example, the at least two candidate recognition rate thresholds may be sorted in order from large to small or from small to large, and when determining the current recognition rate threshold, the at least two candidate recognition rate thresholds may be determined in order according to the sorting.

And 102, reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold.

In the embodiment of the present invention, the specific manner of the reduction adjustment is not limited. The precision of the initial weight can be gradually reduced layer by layer, and the reduction sequence, the reduction amplitude and the like are not limited; or the weight precision can be reduced to the lowest one layer by layer, and then gradually increased from the lowest one, and the increasing sequence, the increasing amplitude and the like are not limited. For example, the purpose of the reduction adjustment is to make the recognition rate of the neural network approach the current recognition rate threshold by reducing the precision of the weight corresponding to each layer in the neural network (for example, the difference between the current recognition rate of the neural network and the current recognition rate threshold is smaller than the preset difference threshold).

And 103, training the neural network subjected to reduction adjustment to adjust the weight parameter value of each layer, wherein the training aim is to improve the recognition rate of the neural network subjected to reduction adjustment.

In the embodiment of the present invention, the process of training the neural network is not specifically limited, and in the training process, the layer for adjusting the weight parameter value may include all layers or a part of layers, and when the part of layers is included, the layer with reduced weight precision may be included.

And 104, determining a final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold.

For example, the relationship between the current recognition rate and the target recognition rate threshold may be judged to determine whether the current recognition rate needs to be determined again and subsequent operations need to be performed, or a final configuration result of the weight precision of each layer may be directly obtained. When the sorting modes of the candidate recognition rate thresholds are different, the determination results may be different, and the determination results may be set according to actual situations. Optionally, after determining the final configuration result of the weight precision of each layer, the corresponding neural network may be used as the neural network for the final practical application, for example, as the neural network to be finally deployed on the artificial intelligence chip.

The weight precision configuration method provided by the embodiment of the invention comprises the steps of determining a current recognition rate threshold value from at least two candidate recognition rate threshold values smaller than a target recognition rate threshold value, reducing and adjusting the weight precision corresponding to each layer in a neural network based on the current recognition rate threshold value, training the neural network subjected to the reduction and adjustment to adjust the weight parameter value of each layer, wherein the training target is to improve the recognition rate of the neural network subjected to the reduction and adjustment, and determining the final configuration result of the weight precision of each layer according to the relation between the current recognition rate and the target recognition rate threshold value. By adopting the technical scheme, the weight precision of each layer in the neural network is tried to be reduced by taking the recognition rate threshold value lower than the target recognition rate threshold value as a reference, after the weight precision is reduced, the recognition rate is improved by training the neural network to adjust the weight parameter value, so that the recognition rate loss caused by the reduction of the weight precision is made up, the final configuration result of the weight precision of each layer is determined according to the relation between the trained recognition rate and the target recognition rate threshold value, the resource utilization rate in the artificial intelligent chip bearing the neural network can be improved under the condition of ensuring the recognition rate of the neural network, the chip performance is improved, and the chip power consumption is reduced.

In some embodiments, a candidate recognition rate threshold with a larger value among the at least two candidate recognition rate thresholds is preferentially determined as the current recognition rate threshold; the determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value comprises: judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold, if so, re-determining the current recognition rate threshold, continuing to reduce and adjust the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold until the current recognition rate of the trained neural network cannot reach the target recognition rate threshold, and determining the weight precision corresponding to each layer after the previous reduction and adjustment as a final configuration result.

The larger the current recognition rate threshold value is, the closer the current recognition rate threshold value is to the target recognition rate threshold value, namely, the more likely the recognition rate can reach the target recognition rate threshold value in a neural network training mode, the recognition rate threshold value is gradually reduced, and repeated adjustment operation caused by serious excessive downward adjustment of weight precision can be prevented. For example, the target recognition rate threshold is 0.95, if there are 3 candidate recognition rate thresholds, the target recognition rate thresholds may be set to 0.92, 0.90, and 0.89, respectively, the largest 0.92 may be determined as the current recognition rate threshold, when the current recognition rate threshold needs to be determined again, the 0.90 may be determined as the new current recognition rate threshold, and if the current recognition rate threshold still needs to be determined again, the 0.89 may be determined as the new current recognition rate threshold.

Exemplarily, if the current recognition rate of the trained neural network can reach the target recognition rate threshold, it indicates that the weight precision of a certain layer or layers in the neural network still has a reduction space, so the current recognition rate threshold can be re-determined, the weight precision corresponding to each layer in the neural network is continuously reduced and adjusted based on the new current recognition rate threshold, then the reduced and adjusted neural network is trained to adjust the weight parameter value of each layer until the current recognition rate of the trained neural network cannot reach the target recognition rate threshold, at this time, it indicates that the recognition rate of the neural network cannot meet the current use requirement due to excessive reduction of the weight precision of a certain layer or layers in the neural network, and it is necessary to determine the weight precision corresponding to each layer which is reduced and adjusted last time as the final configuration result.

In some embodiments, a candidate recognition rate threshold with a small value of the at least two candidate recognition rate thresholds is preferentially determined as the current recognition rate threshold; the determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value comprises: judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold, if not, re-determining the current recognition rate threshold, re-reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold or increasing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold until the current recognition rate of the trained neural network can reach the target recognition rate threshold, and determining the weight precision corresponding to each layer after the reduction adjustment or the increase adjustment as a final configuration result.

The smaller the current identification rate threshold value is, the larger the degree of weight precision reduction is, the higher the resource utilization rate can be improved, the identification rate threshold value is gradually increased, and the probability of quickly determining the weight precision configuration result can be improved. As shown in the above example, the minimum value of 0.89 may be determined as the current recognition rate threshold, when the current recognition rate threshold needs to be re-determined, 0.90 may be determined as the new current recognition rate threshold, and if the current recognition rate threshold still needs to be re-determined, 0.92 may be determined as the new current recognition rate threshold.

For example, if the current recognition rate of the trained neural network cannot reach the target recognition rate threshold, it indicates that the weight precision of a certain layer or layers in the neural network is excessively reduced, and the neural network at this time cannot meet the final recognition rate requirement, so that the current recognition rate threshold may be re-determined, and the weight precision corresponding to each layer in the neural network may be adjusted based on the new current recognition rate threshold. The adjustment here may be to decrease again from the initial weight precision of each layer, or may be to perform a rising adjustment based on a previous falling or rising adjustment. If the current recognition rate of the trained neural network can reach the target recognition rate threshold, it is indicated that the weight precision of each layer in the neural network can meet the current use requirement, and the weight precision corresponding to each layer after the current reduction adjustment or the current increase adjustment can be determined as the final configuration result without reducingor further increasing.

In some embodiments, the performing reduction adjustment on the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold includes: determining a current target layer in a neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with low influence degree is preferentially determined as the target layer; reducing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is smaller than a current identification rate threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the reduction; and under the condition that the target layer switching condition is met, re-determining the current target layer. The advantage of this arrangement is that the accuracy of the weights for each layer in the neural network can be reasonably adjusted for degradation.

For example, different layers in the neural network may have different degrees of influence on the recognition rate of the network, and factors that influence the network may be many, such as the number of weight parameters, the values of the weight parameters (weight values), and the weight precision (precision of the weight values). The influence degree of each layer in the neural network on the recognition rate can be respectively evaluated in advance, and the layers are sorted according to a certain sequence (for example, the influence degree is from low to high). In this step, the layer with the lowest influence degree may be determined as the current target layer, and when the target layer needs to be switched, the layer with the second lowest influence degree may be determined as the new current target layer.

For example, the initial weight precision of all layers in the neural network may be set according to actual requirements, may be the same or different, and may be generally set higher, such as FP32 or higher.

For example, when the weight precision corresponding to the current target layer is reduced, the reduction amplitude is not limited. In addition, the magnitude of each reduction may be the same or different. The reduced amplitude can be measured by the precision grade, the precision grade is used for representing the data precision, the higher the precision is, the higher the corresponding precision grade is, and the precision values corresponding to different precision grades can be set according to actual requirements. Illustratively, the order of FP32, FP16, int8, and Int4 may be decreased, one level of accuracy at a time, such as from FP32 to FP16. The advantage of reducing one accuracy level at a time is that the accuracy of the weights can be determined more accurately, i.e. the accuracy of the configuration is higher, if two or more levels are reduced at a time, when the current recognition rate is smaller than the current recognition rate threshold, the accuracy of the locked weights differs from the currently reduced accuracy by two or more accuracy levels, and there may be some or some weight accuracy between the two that can satisfy the recognition rate requirement.

Illustratively, when the neural network is deployed on the artificial intelligence chip, the neural network is deployed or mapped onto the corresponding processing core in units of layers, and the current target layer is mapped into the corresponding processing core, so that the weight precision corresponding to the current target layer can be understood as the core precision of the processing core corresponding to the current target layer, that is, the scheme of the embodiment of the present invention can be understood as configuring the core precision of the processing core in the artificial intelligence chip.

For example, if the weight precision of the current target layer is reduced, the current recognition rate of the neural network is smaller than the current recognition rate threshold, which indicates that the reduction is not appropriate, so that the weight precision corresponding to the current target layer can be locked to the weight precision before the reduction. For example, FP16 before reduction and Int8 after reduction, the weight precision corresponding to the current target layer can be locked to FP16.

For example, the way of locking the precision of the weight may be to rewrite a bit number flag bit of the current target layer or rewrite a name of a calling operator corresponding to the current target layer.

For example, the weight precision can be tried to be reduced for each layer in the neural network, and whether to reduce the weight precision of the next target layer can be decided according to the target layer switching condition. After the weight accuracies corresponding to all layers are locked, the weight accuracy reduction adjustment of the neural network is considered to be completed, and under the current reduction adjustment strategy, the weight accuracy of the neural network is reduced to the minimum on the premise that the recognition rate of the neural network is closest to the current recognition rate threshold value. Optionally, the weight precision may also be tried to be reduced for a part of layers in the neural network, the specific number of the part of layers may be set according to actual requirements, and after the weight precision is tried to be reduced for the part of layers, the recognition rate of the neural network may be close enough to the current recognition rate threshold, so that the efficiency of reducing the adjustment may be improved, and further the configuration efficiency of the weight precision is improved.

In some embodiments, after determining whether the current recognition rate of the neural network is less than the current recognition rate threshold, the method further comprises: if the current identification rate is larger than or equal to the current identification rate threshold, continuously reducing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is smaller than the current identification rate threshold; and, the target layer switching condition includes: the current recognition rate of the neural network is less than a current recognition rate threshold; the preferentially determining the layer with the low influence degree as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a low degree of influence is preferentially determined as the target layer. The advantage of this arrangement is that the accuracy of the weights can be increased to reduce the efficiency of the adjustment. When the current identification rate of the neural network is determined to be greater than or equal to the current identification rate threshold, it is indicated that the weight precision of the current target layer still has a reduced space, so that the weight precision of the current target layer can be continuously tried to be reduced, whether the current identification rate is reduced below the current identification rate threshold or not can be continuously judged, until the current identification rate is less than the current identification rate threshold, it is indicated that the weight precision of the current target layer cannot be reduced any more, and therefore, the target layer can be switched, and the weight precision of the next layer is tried to be reduced.

In some embodiments, a plurality of rounds of reduction operations are performed on the weight precisions corresponding to all layers in the neural network, and in each round of reduction operation, the weight precision corresponding to each layer is reduced at most once; after determining whether the current recognition rate of the neural network is smaller than a current recognition rate threshold, the method further includes: if the weight precision is larger than or equal to the weight precision, temporarily storing the reduced weight precision; and, the target layer switching condition includes: the weight precision corresponding to the current target layer is reduced once in the current round of reduction operation. This has the advantage that the accuracy of the weighting of the layers can be reduced uniformly. For example, there are 4 layers in the neural network, which are L1, L2, L3, and L4, respectively, and the layers are sorted according to the degree of influence on the recognition rate, from the lowest influence to the highest influence, and are L1, L3, L2, and L4 in sequence, so in each round of reduction operation, L1 is determined as a target layer first, that is, the current target layer is L1, the weight precision of L1 is reduced first, then the target layer is switched, so that the current target layer is L3, the weight precision of L3 is reduced, and then the weight precision of L2 and L4 is reduced in sequence.

In some embodiments, the performing reduction adjustment on the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold includes: determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer; reducing the weight precision corresponding to the current target layer to a preset minimum precision; increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than the current identification rate threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the current increase; and under the condition that the target layer switching condition is met, re-determining the current target layer. The advantage of this arrangement is that the accuracy of the weights for each layer in the neural network can be reasonably adjusted for degradation.

For example, the initial weight precision of all layers in the neural network may be set according to actual requirements, and may be the same or different. The preset minimum precision can be set according to actual requirements, and can be determined according to hardware configuration of an artificial intelligence chip. The advantage of such an arrangement is that the neural network may be provided by a third party with application requirements before the neural network needs to be deployed to the artificial intelligence chip, and the third party does not consider the specific situation of the artificial intelligence chip when designing the neural network, so the weight precision of each layer may be higher, so before configuring the weight precision, the weight precision may be reduced to a preset minimum weight precision matched with the artificial intelligence chip, and then an attempt is made to gradually increase.

For example, the influence degree of each layer in the neural network on the recognition rate may be evaluated separately in advance, and the layers may be sorted in a certain order (e.g., the influence degree is from high to low). In this step, the layer with the highest influence degree may be determined as the current target layer, and when the target layer needs to be switched, the layer with the second highest influence degree may be determined as the new current target layer.

For example, when the weight precision corresponding to the current target layer is increased, the increasing amplitude is not limited. In addition, the magnitude of each rise may be the same or different. The magnitude of the rise can be measured in a level of accuracy. Illustratively, the increase may be in the order of Int4, int8, FP16, and FP32, one level of accuracy at a time, such as from Int4 to Int8. The advantage of raising one precision level at a time is that the precision of the weight can be determined more accurately, that is, the precision of the configuration is higher, if two or more layers are raised at a time, when the current recognition rate is greater than the current recognition rate threshold, the precision of the locked weight differs from the current precision after raising by two or more precision levels, and there may be some recognition rate corresponding to some weight precision or some weight precisions between the two that may be smaller than the current recognition rate threshold.

For example, if the weight precision of the current target layer is increased, and the current recognition rate of the neural network is greater than the current recognition rate threshold, it is indicated that the increase is not appropriate, and may cause a large influence on the performance of the chip, so that the weight precision corresponding to the current target layer may be locked as the weight precision before the increase. For example, FP16 before the lifting and FP32 after the lifting can lock the weight precision corresponding to the current target layer as FP16.

For example, the weight precision can be tried to be increased for each layer in the neural network, and whether the weight precision of the next target layer is increased or not can be determined according to the target layer switching condition. After the weight accuracies corresponding to all layers are locked, the weight accuracy reduction adjustment of the neural network is considered to be completed, and under the current reduction adjustment strategy, the weight accuracy of the neural network is increased to the highest on the premise that the recognition rate of the neural network is closest to the current recognition rate threshold value. Optionally, the weight precision may be adjusted by trying to decrease to the lowest and then increase for a part of layers in the neural network, the specific number of the part of layers may be set according to actual requirements, and after the weight precision is tried to be adjusted for the part of layers, the recognition rate of the neural network may be close to the current recognition rate threshold, so that the efficiency of reducing the adjustment may be improved, and the configuration efficiency of the weight precision may be improved.

In some embodiments, after determining whether the current recognition rate of the neural network is greater than the current recognition rate threshold, the method further comprises: if the current identification rate is less than or equal to the current identification rate threshold, continuously increasing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is greater than the current identification rate threshold; and, the target layer switching condition includes: the current recognition rate of the neural network is greater than a current recognition rate threshold; the preferentially determining the layer with the high influence degree as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a high degree of influence is preferentially determined as the target layer. This has the advantage that the efficiency of the weight reduction adjustment can be increased. When the current identification rate of the neural network is determined to be less than or equal to the current identification rate threshold, it is indicated that the weight precision of the current target layer still has a rising space, so that the weight precision of the current target layer can be continuously tried to rise, whether the current identification rate rises above the current identification rate threshold is continuously judged, and the weight precision of the current target layer cannot rise any more until the current identification rate is greater than the current identification rate threshold, so that the target layer can be switched, and the weight precision of the next layer is tried to rise.

In some embodiments, a plurality of rounds of raising operations are performed on the weight precisions corresponding to all layers in the neural network, and in each round of raising operations, the weight precision corresponding to each layer is raised at most once; after judging whether the current recognition rate of the neural network is greater than the current recognition rate threshold value, the method further comprises the following steps: if the weight precision is less than or equal to the weight precision, temporarily storing the raised weight precision; and, the target layer switching condition includes: the weight precision corresponding to the current target layer is raised once in the current round of raising operation; the reducing the weight precision corresponding to the current target layer to the preset minimum precision comprises: and if the weight precision corresponding to the current target layer is not adjusted, reducing the weight precision corresponding to the current target layer to a preset minimum precision. For the current target layer, if the current target layer is determined as the target layer for the first time, the corresponding weight precision is not adjusted, the corresponding weight precision is reduced to the preset lowest precision, and then the weight precision is increased again; if it is not determined as the target layer for the first time, the lifting operation has already been performed once, and in the present round, the lifting operation may be performed once again on the basis of the lifted weight accuracy temporarily stored in the previous round. This has the advantage that the accuracy of the weighting of the layers can be increased uniformly. For example, there are 4 layers in the neural network, which are L1, L2, L3, and L4, respectively, and the layers are sorted according to the degree of influence on the recognition rate, from the highest to the lowest influence, and are L1, L3, L2, and L4 in sequence, so in each round of raising operation, L1 is determined as a target layer, that is, the current target layer is L1, the weight precision of L1 is raised first, then the target layer is switched so that the current target layer is L3, the weight precision of L3 is raised, and then the weight precision of L2 and L4 is raised in sequence.

In some embodiments, said re-determining the current target layer comprises: and re-determining the current target layer until the weight precision corresponding to all the layers is locked. The advantage of this arrangement is that the trial weight accuracy is adjusted for all layers in the neural network as close as possible to the recognition rate of the neural network to the current recognition rate threshold.

In some embodiments, all layers in the neural network are ranked by degree of influence on recognition rate by: calculating the initial recognition rate of the neural network; for each layer in the neural network, reducing the weight precision of the current layer from a first precision to a second precision, and calculating a reduction value of the recognition rate of the neural network relative to the initial recognition rate; and sequencing all layers according to the descending value, wherein the larger the descending value is, the higher the influence degree on the recognition rate is. The advantage of this arrangement is that the influence of the different layers on the recognition rate can be evaluated quickly and accurately. The first precision and the second precision can be set according to actual requirements, the first precision can be the initial precision of the neural network, for example, and the precision grade of the difference between the first precision and the second precision is not limited. For example, the first precision may be FP32 and the second precision may be FP16.

In some embodiments, if there are at least two layers with the same degradation value, the at least two layers are sorted according to the distance from the input layer of the neural network, wherein the smaller the distance, the higher the influence on the recognition rate. The advantage of this arrangement is that the layers can be ordered more reasonably.

In some embodiments, training the de-tuned neural network comprises training the de-tuned neural network on an artificial intelligence chip. The method has the advantages that the neural network in the embodiment of the invention can be mapped to the artificial intelligence chip for application and trained on the artificial intelligence chip, namely, the neural network is mapped to the artificial intelligence chip in advance before actual application, so that the training process is more consistent with the actual application scene, and the neural network is trained more accurately and efficiently.

In some embodiments, training the neural network with the reduced adjustment comprises: acquiring the precision of data to be output of a first layer in a neural network subjected to reduction adjustment, wherein the first layer comprises any one layer or more than the last layer in the neural network subjected to reduction adjustment; acquiring the weight precision of a second layer, wherein the second layer is the next layer of the first layer; and configuring the precision of the data to be output according to the weight precision of the second layer. The advantage of setting up like this is, can dispose the precision of the output data of one or more layers in the neural network of artificial intelligence chip deployment in a flexible way, and then optimize the performance of artificial intelligence chip.

At present, the neural network of artificial intelligence usually comprises several neurons as one layer, and each layer usually corresponds to one processing core in the artificial intelligence chip. The core calculation of the neural network is a matrix vector multiplication operation, when data is input into one layer of the neural network, the calculation precision is generally the product of the data precision and the weight precision (namely, the precision of a weight value), and the precision of a calculation result (namely, the output data of a processing core corresponding to the current layer) is determined by referring to the higher precision of the data precision and the weight precision. Fig. 2 is a schematic diagram of a precision configuration scheme of output data in the prior art, where in the prior art, the precision of weights of each layer of a neural network carried in an artificial intelligence chip is the same, as shown in fig. 2, for convenience of description, only four layers, L1, L2, L3, and L4, are shown in the neural network. The precision (data precision) of the input data of L1 is FP32 (32-bit floating point), the weight precision of L1 is FP32, and the precision obtained after the multiply-accumulate operation is FP32. In the embodiment of the present invention, the precision of the calculation result is determined not by referring to the higher precision of the data precision and the weight precision, but by determining the precision of the output data of the current layer according to the weight precision of the next layer.

In the embodiment of the present invention, the first layer is not necessarily the first layer in the neural network, and may be any layer other than the last layer. If the processing core corresponding to the first layer is referred to as the first processing core, it may be understood that the first processing core performs precision of acquiring data to be output of the first layer in the neural network, and acquires weight precision of the second layer, and configures the precision of the data to be output of the first layer according to the weight precision of the second layer, and any one processing core except the processing core corresponding to the last layer may become the first processing core. Illustratively, the data to be output is calculated by a processor in a first processing core corresponding to the first layer, for example, the data to be output is calculated according to the input data of the first layer and the weight parameter (such as the weight matrix, etc.) of the first layer, and generally, the precision of the data to be output is greater than or equal to the higher of the precision of the input data and the precision of the weight. If the precision and the weight precision of the input data are low (such as Int2, int4, or Int 8), after the multiply-accumulate operation, the number of bits may be insufficient (for example, the requirement on the hardware configuration of the corresponding processing core cannot be met), and the precision needs to be improved, the precision of the data to be output is usually increased to be high (for example, to Int8, int16, or Int16, respectively), and the lower the higher the precision and the weight precision of the input data is, the more precision levels need to be improved; conversely, if the input data precision and weight precision are themselves relatively high (such as FP16, FP32, or FP 64), the precision of the data to be output may not increase, or may increase relatively little (e.g., from FP16 to FP 32), because the precision after the multiply-accumulate operation is sufficiently high.

In the embodiment of the present invention, the weight accuracies of different layers may be different, and a specific manner of obtaining the weight accuracy of the second layer is not limited. For example, the weight precision of the second layer may be stored in a storage area in the first processing core in a compiling stage of the chip, and after the data to be output of the first layer is acquired, the weight precision of the second layer is read from the storage area; for another example, assuming that the processing core corresponding to the second layer is the second processing core, the storage area in the second processing core may store the weight precision of the second layer, and the first processing core may obtain the weight precision of the second layer from the second processing core by means of inter-core communication.

In the embodiment of the present invention, the precision of the data to be output of the first layer is configured with reference to the weight precision of the second layer, and a specific reference manner and a configuration manner are not limited. For example, the precision of the data to be output may be configured to be lower than the weight precision of the second layer, and the precision of the data to be output may also be configured to be higher than the weight precision of the second layer, so as to obtain the precision of the output data, and the precision level of the difference between the weight precision of the second layer and the precision of the output data may be a first preset precision level difference. For example, int8 also exists between accuracies Int4 and FP16, the accuracy level of the phase difference may be 2, and the accuracy level of the phase difference between Int4 and Int8 may be 1. Assuming that the weight precision of the second layer is FP16 and the first preset precision level difference is 2, if the precision of the data to be output is configured to be lower than the weight precision of the second layer, the precision of the data to be output is configured to be Int4.

In some embodiments, the configuring the precision of the data to be output according to the precision of the weight of the second layer includes: when the weight precision of the second layer is lower than the precision of the data to be output, determining target precision according to the weight precision of the second layer, wherein the target precision is lower than the precision of the data to be output; and configuring the precision of the data to be output to be target precision. Optionally, the target precision is equal to or higher than the weight precision of the second layer, which is equivalent to performing an intercept operation on the precision of the data to be output according to the weight precision of the second layer, so that the precision of the data to be output is reduced, thereby reducing the data transmission amount, and when performing data calculation on the second layer, the calculation amount can also be reduced, thereby reducing the energy consumption caused by data processing.

In some embodiments, the determining a target precision according to the precision of the weight of the second layer comprises: and determining the weight precision of the second layer as a target precision. This has the advantage of being equivalent to truncating the accuracy of the data to be output to an accuracy consistent with the accuracy of the weights of the second layer. The data transmission quantity can be further reduced, the energy consumption brought by data processing can be further reduced, and the chip computing power can be improved. Optionally, the weight precision of the second layer and the precision of the data to be output of the first layer may not be determined, and the weight precision of the second layer is directly determined as the target precision.

In some embodiments, may include: judging whether the weight precision of the second layer is lower than the precision of the data to be output of the first layer, if so, determining the weight precision of the second layer as target precision, and configuring the precision of the data to be output of the first layer as the target precision to obtain output data; otherwise, keeping the precision of the data to be output of the first layer unchanged or configuring the precision of the data to be output of the first layer into the weight precision of the second layer to obtain the output data. Wherein maintaining the accuracy of the data to be output of the first layer unchanged can reduce the transmission amount between the first layer and the second layer.

In some embodiments, after the configuring the precision of the data to be output according to the precision of the weight of the second layer, the method further includes: and outputting the configured output data to the processing core corresponding to the second layer. The advantage of this arrangement is that the output data is sent to the processing core corresponding to the second layer by means of inter-core communication, so that the processing core corresponding to the second layer performs the correlation calculation of the second layer.

In some embodiments, the artificial intelligence chip is implemented based on a many-core architecture, the many-core architecture can have a multi-core recombination characteristic, the cores do not have master-slave division, tasks can be flexibly configured by software, different tasks are simultaneously configured in different cores, multi-task parallel processing is achieved, a series of cores form an array to complete calculation of a neural network, various neural network algorithms can be efficiently supported, and the performance of the chip is improved. Illustratively, the artificial intelligence chip can adopt a 2D Mesh network-on-chip structure for communication interconnection between cores, and communication between the chip and the outside can be realized through a high-speed serial port.

Fig. 3 is a schematic diagram of an accuracy configuration scheme of output data according to an embodiment of the present invention, and as shown in fig. 3, for convenience of description, only four layers, L1, L2, L3, and L4, in a neural network are shown.

For L1, the precision of the input data is Int8, the weighting precision of L1 is Int8, and then the precision obtained after the multiply-accumulate operation is Int8, but the precision may be saturated during the multiply-accumulate operation, resulting in lost information. In the prior art, the calculation result is determined by referring to the higher precision of the data precision and the weight precision, and since the weight precision of L2 is FP16, the precision of Int8 after interception needs to be supplemented and then output, which causes the loss of the precision that is intercepted first in the process. In the embodiment of the invention, the weight precision of the L2 is obtained first, so that the precision of the data to be output of the L1 is known to be the same as the weight precision of the L2, the precision interception operation is not carried out, and the precision loss in data conversion can be reduced.

For L3, the precision of the input data is FP16, the precision of the weight is FP16, and in the prior art, the precision of the output data should also be FP16. In the embodiment of the present invention, the weight precision Int8 of L4 is obtained first, so that it is known that the precision of the data to be output of L1 is higher than the weight precision of L2, and the precision of the data to be output can be configured as Int8, which further reduces the precision of the output data, reduces the data transmission amount between the L3 layer and the L4 layer, i.e., reduces the data traffic between the processing core where the L3 layer is located and the processing core where the L4 layer is located, and does not affect the calculation precision of the L4 layer, thereby greatly improving the chip performance.

Fig. 4 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention, and as shown in fig. 4, the method includes:

step 401, a current recognition rate threshold is determined from at least two candidate recognition rate thresholds.

Wherein, the candidate identification rate threshold with a large value is preferentially determined as the current identification rate threshold.

And 402, reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold.

For the specific reduction adjustment operation, reference may be made to the above related contents, and details are not repeated herein.

And 403, training the neural network subjected to reduction adjustment to adjust the weight parameter value of each layer.

Wherein the training objective is to increase the recognition rate of the downtuned neural network.

Optionally, training of the neural network is performed on the artificial intelligence chip, and the training process may refer to the related contents above, which is not described herein again.

Step 404, judging whether the current recognition rate of the trained neural network can reach a target recognition rate threshold value, if so, returning to execute the step 401; otherwise, step 405 is performed.

And 405, determining the weight precision corresponding to each layer after the last reduction and adjustment as a final configuration result.

The weight precision configuration method provided by the embodiment of the invention tries to reduce the weight precision of each layer in the neural network by taking the recognition rate threshold value lower than the target recognition rate threshold value as a reference in sequence from big to small, and after the weight precision is reduced, the recognition rate is improved by training the neural network to adjust the weight parameter value so as to make up the loss of the recognition rate caused by the reduction of the weight precision.

Fig. 5 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention, and as shown in fig. 5, the method includes:

step 501, determining the minimum value of at least two candidate recognition rate threshold values as the current recognition rate threshold value.

And 502, performing reduction adjustment on the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold.

For the specific reduction adjustment operation, reference may be made to the above related contents, which are not described herein again.

Step 503, train the neural network adjusted by reduction to adjust the weight parameter value of each layer.

Step 504, judging whether the current recognition rate of the trained neural network can reach a target recognition rate threshold value, if so, executing step 505; otherwise, step 506 is performed.

And 505, determining the weight precision corresponding to each layer after the current reduction and adjustment as a final configuration result, and ending the process.

Step 506, re-determining the current recognition rate threshold.

In this step, after the minimum candidate recognition rate threshold is excluded, the current recognition rate threshold is sequentially determined in the order from small to large.

And 507, reducing and adjusting the weight precision corresponding to each layer in the neural network again based on the current recognition rate threshold value or increasing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold value.

Specifically, the re-decreasing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold may refer to re-decreasing and adjusting the weight precision corresponding to each layer in the neural network from the initial weight precision based on the current recognition rate threshold. The step of increasing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold may specifically be to increase and adjust the weight precision corresponding to each layer in the neural network based on the current weight precision (temporarily stored) corresponding to each layer in the neural network based on the current recognition rate threshold.

Step 508, judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold value again, if so, executing step 509, and ending the process; otherwise, return to perform step 506.

The weight precision configuration method provided by the embodiment of the invention tries to reduce the weight precision of each layer in the neural network by taking the recognition rate threshold value lower than the target recognition rate threshold value as a reference in sequence from small to large, improves the recognition rate in a mode of adjusting the weight parameter value by training the neural network after reduction so as to make up the recognition rate loss caused by reduction of the weight precision, improves the current recognition rate threshold value if the target recognition rate threshold value cannot be met, reduces the weight precision again or increases the weight precision until the weight precision can be met, locks the weight precision reduced or increased at this time, improves the resource utilization rate in the artificial intelligent chip bearing the neural network, improves the performance of the chip and reduces the power consumption of the chip under the condition of ensuring the recognition rate of the neural network.

Fig. 6 is a schematic flowchart of another weight precision configuration method according to an embodiment of the present invention, in which a neural network is taken as an image recognition model, and assuming that the image recognition model is a convolutional neural network model, the method may include:

step 601, determining a current recognition rate threshold value from at least two candidate recognition rate threshold values.

And step 602, based on the current recognition rate threshold, performing reduction adjustment on the weight precision corresponding to each layer in the image recognition model.

Step 603, training the image recognition model with reduced adjustment to adjust the weight parameter value of each layer.

The training target is to improve the recognition rate of the image recognition model subjected to reduction adjustment.

Optionally, training of the image recognition model is performed on the artificial intelligence chip, and the training process may refer to the above related content. Illustratively, image training sample data is obtained through a first processing core, feature map data to be output of the convolutional layer is calculated according to the image training sample data and weight parameters of the convolutional layer, weight precision of the convolutional layer is obtained, the precision of the feature map data to be output of the convolutional layer is configured to be the weight precision of the convolutional layer, output feature map data of the convolutional layer is obtained and output to a second processing core, the feature vector data to be output of the convolutional layer is calculated through the second processing core according to the output feature map data of the convolutional layer and the weight parameters of the convolutional layer, the weight precision of a fully-connected layer is obtained, the precision of the feature vector data to be output of the convolutional layer is configured to be the weight precision of a fully-connected layer, output feature vector data of the convolutional layer is obtained and output to a third processing core, an image recognition result is calculated and output through the third processing core according to the output feature vector data of the convolutional layer and the weight parameters of the fully-connected layer, and the weight value of each layer is adjusted with the aim of improving the recognition rate of the image recognition model.

Step 604, judging whether the current recognition rate of the trained image recognition model can reach a target recognition rate threshold value, if so, returning to execute the step 601; otherwise, step 605 is executed.

And step 605, determining the weight precision corresponding to each layer after the last reduction and adjustment as a final configuration result.

The weight precision configuration method provided by the embodiment of the invention tries to reduce the weight precision of each layer in the image recognition model by taking the recognition rate threshold value lower than the target recognition rate threshold value as a reference, and after the weight precision is reduced, the recognition rate is improved by adjusting the weight parameter value by training the image recognition model so as to make up the recognition rate loss caused by the reduction of the weight precision.

On the basis of the above embodiments, a scheme for performing reduction adjustment on the weight precision corresponding to each layer in the image recognition model is listed, and it can be understood that any one or more of the above schemes can be adopted to perform reduction adjustment on the weight precision corresponding to each layer in the image recognition model based on the current recognition rate threshold, and the following example is only used for illustration.

Fig. 7 is a schematic flow chart of a method for adjusting weight precision reduction according to an embodiment of the present invention, where the method includes:

and step 701, determining a current target layer in the image recognition model.

All layers in the image recognition model are sorted according to the influence degree on the recognition rate, and the layer with the low influence degree is preferentially determined as the target layer.

Illustratively, before this step, the method may further include: calculating the initial recognition rate of the image recognition model, reducing the weight precision of the current layer from the first precision to the second precision for each layer in the image recognition model, calculating the descending value of the recognition rate of the image recognition model relative to the initial recognition rate, and sequencing all the layers according to the descending value to obtain a sequencing result, wherein the larger the descending value is, the higher the influence degree on the recognition rate is. For example, the image recognition model may include a convolutional layer, a pooling layer, and a fully-connected layer. For example, the initial recognition rate is 0.98, and the initial weight precision of the convolution layer, the pooling layer, and the all-connected layer is FP32. After the weight precision of the convolutional layer is reduced to FP16, the identification rate is 0.9, and the reduction value is 0.08; after the weight precision of the pooling layer is reduced to FP16, the identification rate is changed to 0.94, and the reduction value is 0.04; when the weight accuracy of the full-link layer is reduced to FP16, the discrimination rate becomes 0.96, and the reduction value becomes 0.02. The sorting result is sorted from small to large into a full connection layer, a pooling layer and a convolution layer according to the descending value.

In this step, the current target layer is determined according to the sorting result. When the step is executed for the first time, the full-connection layer is determined as the current target layer, the pooling layer is determined as the current target layer after the weight precision of the full-connection layer is locked, and the convolution layer is determined as the current target layer after the weight precision of the pooling layer is locked.

And step 702, reducing the weight precision corresponding to the current target layer.

For example, the precision of the weight corresponding to the current target layer may be reduced by one precision level. Each reduction hereinafter may be a reduction of one level of accuracy.

Step 703, judging whether the current recognition rate of the image recognition model is smaller than a current recognition rate threshold value, if so, executing step 704; otherwise, return to execute step 702.

And 704, locking the weight precision corresponding to the current target layer to the weight precision before the current reduction.

Step 705, judging whether the weight precision corresponding to all layers is locked, if so, ending the process; otherwise, the step 701 is executed back.

Illustratively, the precision of the weight of the lock is marked in the bit flag of the current target layer or the name of the calling operator.

In the embodiment of the invention, all layers in the image recognition model are sequenced according to the influence degree on the recognition rate, and the weight precision of the current target layer is tried to be reduced in sequence until the recognition rate of the image recognition model is smaller than the current recognition rate threshold value, so that the reduction and adjustment of the weight precision can be realized quickly.

Fig. 8 is a block diagram of a weight precision configuration apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device, and may perform weight precision configuration by executing a weight precision configuration method. As shown in fig. 8, the apparatus includes:

a recognition rate threshold determining module 801, configured to determine a current recognition rate threshold from at least two candidate recognition rate thresholds, where the at least two candidate recognition rate thresholds are smaller than a target recognition rate threshold;

a weight precision adjusting module 802, configured to perform reduction adjustment on the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold;

a neural network training module 803, configured to train the neural network subjected to the reduction adjustment to adjust the weight parameter values of each layer, where the training target is to increase the recognition rate of the neural network subjected to the reduction adjustment;

a configuration result determining module 804, configured to determine a final configuration result of the weight precision of each layer according to a relationship between the current recognition rate and the target recognition rate threshold.

The weight precision configuration device provided by the embodiment of the invention can improve the resource utilization rate in the artificial intelligent chip bearing the neural network, improve the chip performance and reduce the chip power consumption under the condition of ensuring the recognition rate of the neural network.

Optionally, of the at least two candidate recognition rate thresholds, the candidate recognition rate threshold with a larger value is preferentially determined as the current recognition rate threshold. The determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value comprises: judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold, if so, re-determining the current recognition rate threshold, continuing to reduce and adjust the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold until the current recognition rate of the trained neural network cannot reach the target recognition rate threshold, and determining the weight precision corresponding to each layer after the previous reduction and adjustment as a final configuration result.

Optionally, of the at least two candidate recognition rate thresholds, the candidate recognition rate threshold with a smaller value is preferentially determined as the current recognition rate threshold. The determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value comprises: judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold, if not, re-determining the current recognition rate threshold, re-reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold or increasing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold until the current recognition rate of the trained neural network can reach the target recognition rate threshold, and determining the weight precision corresponding to each layer after the reduction adjustment or the increase adjustment as a final configuration result.

Optionally, the reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold includes: determining a current target layer in a neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the low influence degree is preferentially determined as the target layer; reducing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is smaller than a current identification rate threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the reduction; and under the condition that the target layer switching condition is met, re-determining the current target layer.

Optionally, after determining whether the current recognition rate of the neural network is smaller than the current recognition rate threshold, the method further includes: if the current identification rate is larger than or equal to the current identification rate threshold, continuously reducing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is smaller than the current identification rate threshold; and, the target layer switching condition includes: the current recognition rate of the neural network is smaller than a current recognition rate threshold value; the preferentially determining the layer with the low influence degree as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a low degree of influence is preferentially determined as the target layer.

Optionally, performing multiple rounds of reduction operations on the weight accuracies corresponding to all layers in the neural network, where in each round of reduction operations, the weight accuracy corresponding to each layer is reduced at most once; after determining whether the current recognition rate of the neural network is smaller than a current recognition rate threshold, the method further includes: if the weight precision is larger than or equal to the weight precision, temporarily storing the reduced weight precision; and, the target layer switching condition includes: the weight precision corresponding to the current target layer is reduced once in the current round of reduction operation.

Optionally, the reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold includes: determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer; reducing the weight precision corresponding to the current target layer to a preset minimum precision; increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than the current identification rate threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the current increase; and under the condition that the target layer switching condition is met, re-determining the current target layer.

Optionally, after determining whether the current recognition rate of the neural network is greater than the current recognition rate threshold, the method further includes: if the current identification rate is less than or equal to the current identification rate threshold value, continuously increasing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is greater than the current identification rate threshold value; and, the target layer switching condition includes: the current recognition rate of the neural network is greater than a current recognition rate threshold; the preferentially determining the layer with the high degree of influence as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a high degree of influence is preferentially determined as the target layer.

Optionally, multiple rounds of raising operations are performed on the weight precision corresponding to all layers in the neural network, and in each round of raising operations, the weight precision corresponding to each layer is raised at most once; after judging whether the current recognition rate of the neural network is greater than the current recognition rate threshold value, the method further comprises the following steps: if the weight precision is less than or equal to the weight precision, temporarily storing the raised weight precision; and, the target layer switching condition includes: the weight precision corresponding to the current target layer is raised once in the current round of raising operation; the reducing the weight precision corresponding to the current target layer to the preset lowest precision comprises: and if the weight precision corresponding to the current target layer is not adjusted, reducing the weight precision corresponding to the current target layer to a preset minimum precision.

Optionally, the determining the current target layer again includes: and re-determining the current target layer until the weight precision corresponding to all layers is locked.

Optionally, the training the neural network with the reduced adjustment includes training the neural network with the reduced adjustment on an artificial intelligence chip; in training the neural network with the reduced adjustment, the method comprises the following steps: acquiring the precision of data to be output of a first layer in a reduction-adjusted neural network, wherein the first layer comprises any one or more layers except the last layer in the reduction-adjusted neural network; acquiring the weight precision of a second layer, wherein the second layer is the next layer of the first layer; and configuring the precision of the data to be output according to the weight precision of the second layer.

The embodiment of the invention provides computer equipment, and the weight precision configuration device provided by the embodiment of the invention can be integrated in the computer equipment. Fig. 9 is a block diagram of a computer device according to an embodiment of the present invention. Computer device 900 may include: a memory 901, a processor 902 and a computer program stored on the memory 901 and executable by the processor, wherein the processor 902 implements the weight precision configuration method according to the embodiment of the present invention when executing the computer program. It should be noted that, if the neural network is trained on an artificial intelligence chip, the computer device 900 may further include an artificial intelligence chip. Alternatively, if the computer device 900 is denoted as a first computer device, the training may be performed in another second computer device including an artificial intelligence chip, and the second computer device may transmit the training result to the first computer device.

The computer equipment provided by the embodiment of the invention can improve the resource utilization rate in the artificial intelligent chip bearing the neural network, improve the chip performance and reduce the chip power consumption under the condition of ensuring the identification rate.

Embodiments of the present invention also provide a storage medium containing computer-executable instructions that, when executed by a computer processor, are operable to perform a method of weight precision configuration.

The weight precision configuration device, the equipment and the storage medium provided in the above embodiments can execute the weight precision configuration method provided in any embodiment of the present invention, and have corresponding functional modules and beneficial effects for executing the method. For technical details that are not described in detail in the above embodiments, reference may be made to a weight precision configuration method provided in any embodiment of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in some detail by the above embodiments, the invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the invention, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. A weight precision configuration method is applied to an artificial intelligence chip, wherein the artificial intelligence chip comprises a plurality of processing cores, a neural network is deployed in the artificial intelligence chip, the processing cores correspond to layers of the neural network one by one, and the method comprises the following steps:

training the neural network subjected to the reduction adjustment to adjust the weight parameter value of each layer, wherein the training aim is to improve the recognition rate of the neural network subjected to the reduction adjustment;

determining a final configuration result of the weight precision of each layer according to the relation between the current recognition rate and the target recognition rate threshold; for the input data of each layer of the neural network, the processing core corresponding to the current layer determines the precision of the output data of the current layer according to the weight precision of the next layer.

2. The method according to claim 1, wherein a candidate recognition rate threshold with a larger value among the at least two candidate recognition rate thresholds is preferentially determined as the current recognition rate threshold;

the determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value comprises:

judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold, if so, re-determining the current recognition rate threshold, continuing to reduce and adjust the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold until the current recognition rate of the trained neural network cannot reach the target recognition rate threshold, and determining the weight precision corresponding to each layer after the previous reduction and adjustment as a final configuration result.

3. The method according to claim 1, wherein a candidate recognition rate threshold with a smaller value of the at least two candidate recognition rate thresholds is preferentially determined as the current recognition rate threshold;

the determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold value comprises the following steps:

judging whether the current recognition rate of the trained neural network can reach the target recognition rate threshold, if not, re-determining the current recognition rate threshold, re-reducing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold or increasing and adjusting the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold until the current recognition rate of the trained neural network can reach the target recognition rate threshold, and determining the weight precision corresponding to each layer after the reduction adjustment or the increase adjustment as a final configuration result.

4. The method of claim 1, wherein the downward adjustment of the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold comprises:

determining a current target layer in a neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with low influence degree is preferentially determined as the target layer;

reducing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is smaller than a current identification rate threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the reduction;

and under the condition that the target layer switching condition is met, re-determining the current target layer.

5. The method of claim 4, after determining whether the current recognition rate of the neural network is less than a current recognition rate threshold, further comprising:

if the current identification rate is larger than or equal to the current identification rate threshold, continuously reducing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is smaller than the current identification rate threshold;

and, the target layer switching condition includes: the current recognition rate of the neural network is less than a current recognition rate threshold; the preferentially determining the layer with the low influence degree as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a low degree of influence is preferentially determined as the target layer.

6. The method according to claim 4, wherein a plurality of rounds of reduction operations are performed on the weight precisions corresponding to all layers in the neural network, and in each round of reduction operations, the weight precision corresponding to each layer is reduced at most once;

after determining whether the current recognition rate of the neural network is smaller than the current recognition rate threshold, the method further includes:

if the weight precision is larger than or equal to the weight precision, temporarily storing the reduced weight precision;

and, the target layer switching condition includes: the weight precision corresponding to the current target layer is reduced once in the current round of reduction operation.

7. The method of claim 1, wherein the de-emphasis adjusting of the weight precision corresponding to each layer in the neural network based on the current recognition rate threshold comprises:

determining a current target layer in the neural network, wherein all layers in the neural network are ranked according to the influence degree on the recognition rate, and the layer with the high influence degree is preferentially determined as the target layer;

reducing the weight precision corresponding to the current target layer to a preset minimum precision;

increasing the weight precision corresponding to the current target layer, judging whether the current identification rate of the neural network is greater than the current identification rate threshold value, and if so, locking the weight precision corresponding to the current target layer as the weight precision before the current increase;

8. The method of claim 7, after determining whether the current recognition rate of the neural network is greater than a current recognition rate threshold, further comprising:

if the current identification rate is less than or equal to the current identification rate threshold, continuously increasing the weight precision corresponding to the current target layer, and continuously judging whether the current identification rate of the neural network is greater than the current identification rate threshold;

and, the target layer switching condition includes: the current recognition rate of the neural network is greater than a current recognition rate threshold; the preferentially determining the layer with the high influence degree as the target layer comprises the following steps: among the layers whose corresponding weight accuracies are not locked, the layer having a high degree of influence is preferentially determined as the target layer.

9. The method according to claim 7, wherein multiple rounds of raising operations are performed for the weight accuracies corresponding to all layers in the neural network, and in each round of raising operations, the weight accuracy corresponding to each layer is raised at most once;

after judging whether the current recognition rate of the neural network is greater than the current recognition rate threshold value, the method further comprises the following steps:

if the weight precision is less than or equal to the weight precision, temporarily storing the raised weight precision;

and, the target layer switching condition includes: the weight precision corresponding to the current target layer is raised once in the current round of raising operation; the reducing the weight precision corresponding to the current target layer to the preset minimum precision comprises: and if the weight precision corresponding to the current target layer is not adjusted, reducing the weight precision corresponding to the current target layer to a preset minimum precision.

10. The method according to any one of claims 4-9, wherein said re-determining the current target layer comprises:

and re-determining the current target layer until the weight precision corresponding to all the layers is locked.

11. The method of any one of claims 1-9, wherein training the de-tuned neural network comprises training the de-tuned neural network on an artificial intelligence chip;

in training the neural network with the reduced adjustment, the method comprises the following steps:

acquiring the precision of data to be output of a first layer in a reduction-adjusted neural network, wherein the first layer comprises any one or more layers except the last layer in the reduction-adjusted neural network;

acquiring the weight precision of a second layer, wherein the second layer is the next layer of the first layer;

and configuring the precision of the data to be output according to the weight precision of the second layer.

12. A weight precision configuration device is applied to an artificial intelligence chip, wherein the artificial intelligence chip comprises a plurality of processing cores, a neural network is deployed in the artificial intelligence chip, the processing cores are in one-to-one correspondence with layers of the neural network, and the device comprises:

the configuration result determining module is used for determining the final configuration result of the weight precision of each layer according to the relationship between the current recognition rate and the target recognition rate threshold;

the apparatus is further configured to: for input data of each layer of the neural network, a processing core corresponding to the current layer determines the precision of output data of the current layer according to the weight precision of the next layer.

13. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-11 when executing the computer program.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-11.