CN114626500A - Neural network computing method and related equipment - Google Patents

Neural network computing method and related equipment Download PDF

Info

Publication number
CN114626500A
CN114626500A CN202011432705.2A CN202011432705A CN114626500A CN 114626500 A CN114626500 A CN 114626500A CN 202011432705 A CN202011432705 A CN 202011432705A CN 114626500 A CN114626500 A CN 114626500A
Authority
CN
China
Prior art keywords
neural network
convolutional layer
data
attention vector
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011432705.2A
Other languages
Chinese (zh)
Inventor
杨文斌
华幸成
曾重
程捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011432705.2A priority Critical patent/CN114626500A/en
Publication of CN114626500A publication Critical patent/CN114626500A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a neural network computing method and related equipment. According to the method, after input data of a target convolutional layer is obtained, convolution calculation can be performed on the input data based on the weight of the target convolutional layer to obtain first output data, wherein the first output data comprises N first feature maps, and N is an integer greater than or equal to 1. And calculating the first output data based on the attention vector corresponding to the target convolutional layer to obtain second output data, wherein the second output data comprises N second feature maps robust to noise, the attention vector comprises N elements, and each element in the N elements is used for representing the robustness of the corresponding first feature map to noise. By adopting the embodiment of the application, the accuracy of neural network calculation can be improved.

Description

Neural network computing method and related equipment
Technical Field
The invention relates to the technical field of neural networks, in particular to a neural network computing method and related equipment.
Background
In recent years, Neural Network (NN) computation has been developed in a breakthrough manner, and has achieved a high accuracy in many fields such as image recognition, language recognition, and natural language processing. The core calculation of the neural network is matrix vector multiplication operation, and the matrix vector multiplication calculation amount is very large, so that the neural network needs massive calculation resources, the conventional general processor is difficult to meet the calculation requirement of deep learning, and the design of a special chip becomes an important development direction. Meanwhile, the appearance of circuit devices such as memristors, capacitance comparators or voltage comparators provides an efficient solution for neural network chip design. The memristor can be used for storing data and calculating, has the advantages of high density, nonvolatility, low power consumption, integration of storage and calculation, easiness in 3D (three-dimensional) and the like, and provides an efficient solution for the design of a neural network system.
Although the memristor can efficiently perform matrix vector multiplication, in practical application, due to the characteristics of the device itself and the like, the memristor device has the problem of reading and writing errors, namely device noise, that the set conductance value of the memristor is not exactly equal to the required weight value of the neural network, but falls in a distribution near the expected value. How to improve the accuracy of the neural network and reduce the influence of noise of circuit devices (such as memristors) on the accuracy of the neural network calculation is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a neural network computing method and related equipment, so as to reduce the influence of noise of circuit devices in a neural network system on the accuracy of neural network computing.
In a first aspect, embodiments of the present application provide a computing method, which is performed by a neural network system including at least one convolutional layer, and the method may include:
acquiring input data of a target convolutional layer, wherein the target convolutional layer is any one of the at least one convolutional layer; performing convolution calculation on the input data based on the weight of the target convolution layer to obtain first output data, wherein the first output data comprises N first feature maps (feature maps), and N is an integer greater than or equal to 1; and calculating the first output data based on an attention vector corresponding to the target convolutional layer to obtain second output data, wherein the second output data comprises N second feature maps robust to noise, the attention vector comprises N elements, and each element in the N elements is used for representing the robustness of the corresponding first feature map to the noise.
In the prior art, in the process of performing neural network calculation, after a target convolutional layer receives input data, convolution calculation is performed on the input data based on the weight of the target convolutional layer to obtain output data of the target convolutional layer, and the output data is directly used as input data of a next layer of the target convolutional layer and enters lower-layer calculation. However, in all feature maps (or referred to as feature data) of the output data of the target convolutional layer, the robustness of each feature map to noise is different, the robustness of a part of feature maps to noise is strong, and the robustness of a part of feature maps to noise is poor, so that not all feature maps are suitable for being directly input into the lower layer calculation. Therefore, the embodiment of the present application provides an attention vector, and output data obtained by performing convolution calculation based on weights is corrected by the attention vector. Specifically, in the embodiment of the present application, in the process of obtaining output data of a target convolutional layer, after performing convolutional calculation on input data based on a weight of the target convolutional layer to obtain first output data, the first output data is further corrected through an attention vector corresponding to the target convolutional layer, for example, a feature map with strong noise robustness is strengthened, and a feature map with strong noise robustness is weakened or even removed.
In one possible implementation, the target convolutional layer includes M channels, the method further includes: determining N channels for performing the convolution calculation from the M channels based on the attention vector, wherein element values of elements included in the attention vector are greater than or equal to a preset threshold, and M is an integer greater than N.
In the embodiment of the application, for a target convolutional layer, before performing convolution calculation on input data based on the weight of the target convolutional layer to obtain first output data, part of channels included in the target convolutional layer may be closed in advance based on the attention vector, convolution calculation is performed only based on N channels that are reserved to obtain a feature map with relatively strong noise robustness, and corresponding weighting coefficients are allocated to different feature maps based on the difference of the different feature maps in noise robustness, so that the feature map with relatively strong noise robustness is further corrected. According to the embodiment of the application, on one hand, the calculated amount of neural network reasoning is reduced by reducing the number of channels, and the network model is compressed under the condition of ensuring the calculation precision of the neural network; on the other hand, the optimal feature data can be screened out through the attention vector to be used for calculating the next layer of the target convolutional layer, so that the feature graph with poor noise robustness, such as a circuit device, is prevented from flowing into the next layer of the target convolutional layer, the influence of noise on a neural network system is reduced, and the accuracy of neural network reasoning calculation is improved.
In one possible implementation, the method further includes: adjusting initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system, wherein the adjustment process combines at least one of simulated noise and actual noise of circuit devices of the neural network system, and the trained network parameters include weights of the target convolutional layer and attention vectors corresponding to the target convolutional layer.
The neural network training method in the prior art generally improves the robustness of each feature map obtained by performing convolution calculation on the weights by training the weights in the network parameters of the neural network system to noise, and therefore, the neural network training method in the prior art can be understood as longitudinal training, that is, when each feature map is extracted, the robustness of each feature map to noise is improved by training the weights. However, the feature maps extracted by the weights also differ in robustness to noise, and not all feature maps are suitable for entering the lower layer calculation. Therefore, the embodiment of the present application proposes an attention vector, and output data obtained by performing convolution calculation based on weights is corrected by the attention vector. Specifically, in the embodiment of the present application, in the process of obtaining output data of a target convolutional layer for any one convolutional layer (e.g., the target convolutional layer) included in the neural network, after performing convolutional calculation on input data based on a weight of the target convolutional layer to obtain first output data, the first output data is further corrected by using an attention vector corresponding to the target convolutional layer, so that an influence of noise on the neural network system is reduced, and accuracy of calculation of the neural network is improved. Meanwhile, in the training process, the accuracy influence caused by the noise of a part of hardware levels can be counteracted by exposing the noise of the bottom layer circuit device to the software training level and/or carrying out hardware training by combining with the actual noise of the neural network system, so that the training of the neural network can adapt to the condition of the existence of the noise, the robustness of the weight of the neural network is increased to a certain extent, and the attention vector can correct the output characteristic diagram more accurately.
In a possible implementation manner, the adjusting initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system includes: inputting the training data into the neural network system to obtain third output data; calculating based on the third output data and the target output data to obtain a loss value; and updating initial network parameters of the neural network according to the loss value.
According to the embodiment of the application, in the process of training the network parameters of the neural network system, the network parameters are adjusted based on the training data, and the network parameters which enable the neural network system to be converged are finally obtained, so that in the process of obtaining the output data of the target convolution layer, firstly, convolution calculation is carried out on the input data based on the weight of the target convolution layer to obtain first output data, and further, the first output data is corrected through the attention vector corresponding to the target convolution layer, the influence of noise on the neural network system is reduced, and the accuracy of the neural network calculation is improved.
In a second aspect, an embodiment of the present application provides a neural network computing device, which is applied to a neural network system including at least one convolutional layer, and the device includes:
an obtaining unit, configured to obtain input data of a target convolutional layer, where the target convolutional layer is any one of the at least one convolutional layer; a calculating unit, configured to perform convolution calculation on the input data based on the weight of the target convolution layer to obtain first output data, where the first output data includes N first feature maps, and N is an integer greater than or equal to 1; the calculation unit is further configured to calculate the first output data based on an attention vector corresponding to the target convolutional layer to obtain second output data, where the second output data includes N second feature maps robust to noise, the attention vector includes N elements, and each element in the N elements is used to represent robustness of the corresponding first feature map to noise.
In one possible implementation, the target convolutional layer includes M channels, and the apparatus further includes:
a determining unit configured to determine N channels for performing the convolution calculation from the M channels based on the attention vector, where element values of elements included in the attention vector are greater than or equal to a preset threshold value, and M is an integer greater than N.
In one possible implementation, the apparatus further includes:
a training unit, configured to adjust an initial network parameter of the neural network system based on training data to obtain a trained network parameter of the neural network system, where the adjustment process combines at least one of analog noise and actual noise of a circuit device of the neural network system, and the trained network parameter includes a weight of the target convolutional layer and an attention vector corresponding to the target convolutional layer.
In a possible implementation manner, when the training unit is configured to adjust an initial network parameter of the neural network system based on training data to obtain a trained network parameter of the neural network system, the training unit is specifically configured to:
inputting the training data into the neural network system to obtain third output data; calculating based on the third output data and the target output data to obtain a loss value; and updating initial network parameters of the neural network according to the loss value.
In a third aspect, an embodiment of the present application provides a neural network training method, and the weight and the attention vector in the neural network calculation method according to any one of the first aspect of the embodiments of the present invention may be obtained by training with the neural network training method described below. The method is performed by a neural network system comprising at least one convolutional layer, the method may comprise:
acquiring training data of the neural network system to obtain third output data; the target convolutional layer is any one of the at least one convolutional layer, and the target convolutional layer is used for calculating input data of the target convolutional layer through a first attention vector and L first weights so as to obtain input data of a next layer of the target convolutional layer; the L first weights are used for convolution calculation to extract feature data in input data of the target convolution layer, the first attention vector is used for correcting the feature data, and L is an integer greater than or equal to 1;
comparing the third output data with target output data, and calculating a first loss value;
updating network parameters of the neural network system according to the first loss value, the network parameters including the first attention vector and the L first weights.
The training method of the neural network model in the prior art generally improves the robustness of each feature data extracted through the weight to noise by training the weight in the network parameter of the neural network system, and therefore, the training method of the neural network in the prior art can be understood as a longitudinal training, that is, when each feature is extracted, the robustness of each feature data to noise is improved by training the weight. However, the robustness of each feature extracted through the weight on noise is different, the robustness of part of feature data on noise is strong, and the robustness of part of feature data on noise is poor, so that not all features are suitable for entering the lower layer calculation. Therefore, the embodiment of the present application proposes an attention vector by which each feature extracted via a weight is corrected. Specifically, in this embodiment of the present application, in any one convolution layer (e.g., a target convolution layer) included in a neural network, input data of the target convolution layer is calculated through a first attention vector and L first weights, so as to obtain output data of the target convolution layer, where the output data of the target convolution layer is input data of a layer next to the target convolution layer. In the process of obtaining the input data of the next layer of the target convolutional layer, in addition to extracting the feature data in the input data through the L first weights, the extracted feature data is corrected through the first attention vector, for example, the feature data with strong noise robustness on a circuit device in the neural network system is strengthened, a feature map with poor noise robustness on the circuit device in the neural network system is weakened or even removed, and the like, the robustness of each feature data included in the extracted feature data on noise is comprehensively considered, and then the optimal feature data is screened out and input to the next layer for calculation of the next layer of the target convolutional layer, so that the influence of the noise on the neural network system is reduced, and the accuracy of the neural network is improved.
In a possible implementation manner, the updated L first weights are L second weights, and the updated first attention vector is a second attention vector; the method further comprises the following steps: inputting the training data into the neural network system to obtain fourth output data; the target convolutional layer calculates input data of the target convolutional layer through the second attention vector and L third weights to obtain input data of a next layer of the target convolutional layer; the L third weights are in one-to-one correspondence with the L second weights, and each third weight is obtained by writing the corresponding second weight into a circuit device in a neural network system and then combining actual noise of the circuit device in the neural network system; comparing the fourth output data with the target output data, and calculating a second loss value; updating the second attention vector in the network parameter according to the second loss value.
In the embodiment of the present application, after the network parameters of the neural network system are updated according to the first loss value, the network parameters of the neural network system are written into an actual circuit device. In particular, the L second weights may be written to an analog circuit device and the second attention vector may be written to a digital circuit device (e.g., a digital memory module). At this time, due to the circuit device process in the neural network system, when the weight is written into or read from the analog circuit device, the weight (i.e., actual value) after writing or reading may deviate from the weight (i.e., ideal value) before writing or reading, i.e., the L second weights written into the circuit device in the neural network system may be converted into L third weights due to the influence of the circuit device noise in the neural network system when the circuit device in the neural network system operates. Therefore, in the embodiment of the present application, the network parameters of the neural network system are written into the actual circuit device, L second weights are fixed in combination with the actual noise of the circuit device in the neural network system, the second attention vector is retrained again, and the updated second attention vector is obtained, so that the final second attention vector can correct the extracted feature data more accurately.
In one possible implementation, the method further includes: and setting elements smaller than or equal to the preset threshold value in the vector to be zero and keeping the elements larger than the preset threshold value in the vector unchanged aiming at the initial attention vector to obtain the first attention vector.
In the embodiment of the present application, each element in the attention vector is used to evaluate the robustness of the corresponding feature data to noise. The larger the element value is, the stronger the robustness of the characteristic data to noise is, and the stronger the anti-noise performance is; the smaller the element value is, the worse the robustness of the feature data to noise is, and the worse the noise resistance is. Therefore, when the neural network is trained, the initial attention vector is processed according to a preset threshold, the elements corresponding to the feature map with poor anti-noise performance (namely, the elements smaller than the preset threshold) are set to be zero, the elements corresponding to the feature map with strong anti-noise performance (namely, the elements larger than the preset threshold) are reserved, and then the first attention vector with the element composition of which one part of element values are zero and the other part of element values are nonzero is obtained. At this time, when the first attention vector corrects the feature data, the first attention vector including a part of element values being zero and another part of element values being non-zero is multiplied by the input data of the target convolutional layer and the L first weights, so that the feature data with strong noise robustness on the circuit device in the neural network system can be selected, the feature data with poor noise robustness on the circuit device in the neural network system is directly removed, the feature data with poor noise robustness on the circuit device in the neural network system is prevented from flowing into the next layer of the target convolutional layer, the influence of the noise of the circuit device in the neural network system on the neural network system is reduced, and the accuracy of the neural network is improved.
In one possible implementation, the method further includes: processing the initial attention vector based on an activation function; and setting elements smaller than or equal to the preset threshold value in the vector to be zero aiming at the processed initial attention vector, and keeping the elements larger than the preset threshold value in the vector unchanged to obtain the first attention vector.
In the embodiment of the present application, when training the neural network, the initial attention vector is processed based on the activation function to obtain a processed initial attention vector having element values distributed in a closed interval from 0 to 1, and the processed initial attention vector is further processed according to a preset threshold, so as to obtain a first attention vector having a part of element values being zero, another part of element values being non-zero, and being located in a closed interval from 0 to 1. At this time, on one hand, since the value of the nonzero element of the first attention vector is in the closed interval from 0 to 1, the performance of the neural network can be improved by processing the attention vector based on the activation function, so that the neural network is easier to converge; on the other hand, when the first attention vector corrects the feature data, the first attention vector including a part of zero element values and a part of non-zero element values is multiplied by the input data of the target convolutional layer and the L first weights, so that the feature data with strong noise robustness on the circuit devices in the neural network system can be selected, the feature data with poor noise robustness on the circuit devices in the neural network system is directly removed, the feature data with poor noise robustness on the circuit devices in the neural network system is prevented from flowing into the next layer of the target convolutional layer, the influence of the noise of the circuit devices in the neural network system on the neural network system is reduced, and the accuracy of the neural network is improved.
In a possible implementation manner, the setting, to zero, elements in the vector that are smaller than or equal to the preset threshold, and keeping elements in the vector that are larger than the preset threshold unchanged to obtain the first attention vector includes: sorting the elements in the vector according to numerical values from large to small; determining the value of the N-th sorted element as the preset threshold, where N is determined based on a preset pruning rate, and the pruning rate is used to represent a ratio of the number of the elements in the vector that are set to be invalid to the total number of the elements in the vector; and setting the elements smaller than or equal to the preset threshold value in the vector to be zero, and keeping the elements larger than the preset threshold value in the vector unchanged to obtain the first attention vector.
In the embodiment of the present application, the values of the respective elements in the attention vector are used to evaluate the robustness of the corresponding feature data to noise. The larger the element value is, the stronger the robustness of the characteristic data corresponding to the element to noise is; the smaller the element value is, the less robust the feature data corresponding to the element is to noise. Determining a preset threshold according to the value of each element in the initial attention vector or the initial attention vector processed based on the activation function, further processing the initial attention vector or the initial attention vector processed based on the activation function according to the preset threshold to obtain a first attention vector, when the first attention vector corrects the characteristic data, multiplying the input data of the target convolutional layer and L first weights by the first attention vector including a part of elements with zero values and another part of elements with non-zero values, selecting the characteristic data with strong noise robustness of the circuit device in the neural network system, directly removing the characteristic data with poor noise robustness of the circuit device in the neural network system, and avoiding the characteristic data with poor noise robustness of the circuit device in the neural network system from flowing into the next layer of the target convolutional layer, therefore, the influence of the noise of the circuit device in the neural network system on the neural network system is reduced, and the accuracy of the neural network is improved.
In one possible implementation, the method further includes: and combining read-write errors of circuit devices in the simulated neural network system in L initial weights to obtain the L first weights, wherein the L initial weights are in one-to-one correspondence with the L first weights. Optionally, the read-write error of the circuit device in the simulated neural network system follows a normal distribution, and a standard deviation of the normal distribution is associated with a magnitude of the read-write error of the circuit device in the neural network system.
In the embodiment of the application, in the process of training the L first weights, the analog noise of the circuit device can be combined during the training of the neural network, and the accuracy influence caused by the noise of a part of hardware level can be counteracted by exposing the noise of the circuit device in the underlying neural network system to the software training level, so that the training of the neural network can adapt to the condition of existence of the noise, rather than the accurate weight representation. Therefore, noise is introduced into the network training process, and the robustness of the network weight is increased to a certain extent.
In one possible implementation, the updating the network parameter of the neural network system according to the first loss value includes: updating the initial attention vector and L initial weights according to the first loss value; updating the L first weights based on the updated L initial weights; updating the first attention vector based on the updated initial attention vector.
In the embodiment of the present application, when the L first weights and the first attention vector are updated, the initial attention vector and the L initial weights are updated by performing back propagation based on the first loss value. And setting the elements smaller than or equal to the preset threshold value in the vector to be zero aiming at the updated initial attention vector (or aiming at the initial attention vector processed based on the activation function), keeping the elements larger than the preset threshold value in the vector unchanged, and further updating the first attention vector. For the updated L initial weights, when the multiplication operation of the input data and the weights is performed, noise is added to each element of the initial weights. Optionally, the noise may be normally distributed noise, which is used to simulate noise of a circuit device in a neural network system such as a memristor, and then obtain the updated L first weights.
In a fourth aspect, an embodiment of the present application provides a neural network training apparatus applied to a neural network system including at least one convolutional layer, where the apparatus may include:
the input unit is used for acquiring training data of the neural network system to obtain third output data; the target convolutional layer is any one of the at least one convolutional layer, and the target convolutional layer is used for calculating input data of the target convolutional layer through a first attention vector and L first weights so as to obtain input data of a next layer of the target convolutional layer; the L first weights are used for convolution calculation to extract feature data in input data of the target convolution layer, the first attention vector is used for correcting the feature data, and L is an integer greater than or equal to 1;
a calculating unit, configured to compare the third output data with the target output data, and calculate a first loss value;
an updating unit, configured to update a network parameter of the neural network system according to the first loss value, where the network parameter includes the first attention vector and the L first weights.
In a possible implementation manner, the updated L first weights are L second weights, and the updated first attention vector is a second attention vector;
the input unit is further used for inputting the training data into the neural network system to obtain second output data; the target convolutional layer calculates input data of the target convolutional layer through the second attention vector and L third weights to obtain input data of a next layer of the target convolutional layer; the L third weights are in one-to-one correspondence with the L second weights, and each third weight is obtained by writing the corresponding second weight into a circuit device in a neural network system and then combining actual noise of the circuit device in the neural network system;
the calculating unit is further used for comparing the second output data with the target output data and calculating a second loss value;
the updating unit is further configured to update the second attention vector in the network parameter according to the second loss value.
In one possible implementation, the apparatus further includes:
and the first processing unit is used for setting elements which are smaller than or equal to the preset threshold value in the vector to be zero and keeping the elements which are larger than the preset threshold value in the vector unchanged aiming at the initial attention vector to obtain the first attention vector.
In one possible implementation, the apparatus further includes:
a first processing unit for processing the initial attention vector based on an activation function; and setting elements smaller than or equal to the preset threshold value in the vector to be zero aiming at the processed initial attention vector, and keeping the elements larger than the preset threshold value in the vector unchanged to obtain the first attention vector.
In a possible implementation manner, the first processing unit is configured to set, to zero, an element in the vector that is smaller than or equal to the preset threshold, and keep an element in the vector that is larger than the preset threshold unchanged to obtain the first attention vector, and is specifically configured to:
sorting the elements in the vector according to numerical values from large to small;
determining the value of the N-th sorted element as the preset threshold, where N is determined based on a preset pruning rate, and the pruning rate is used to represent a ratio of the number of the elements in the vector that are set to be invalid to the total number of the elements in the vector;
and setting elements smaller than or equal to the preset threshold value in the vector to be zero, and keeping the elements larger than the preset threshold value in the vector unchanged to obtain the first attention vector.
In one possible implementation, the apparatus further includes:
and the second processing unit is used for combining read-write errors of circuit devices in the simulated neural network system in L initial weights to obtain the L first weights, and the L initial weights are in one-to-one correspondence with the L first weights. Optionally, the read-write error of the circuit device in the simulated neural network system follows a normal distribution, and a standard deviation of the normal distribution is associated with a magnitude of the read-write error of the circuit device in the neural network system.
In a possible implementation manner, the updating unit is specifically configured to:
updating the initial attention vector and L initial weights according to the first loss value;
updating the L first weights based on the updated L initial weights;
updating the first attention vector based on the updated initial attention vector.
In a fifth aspect, the present application provides a neural network computing device, where the neural network computing device includes a processor, and the processor is configured to support the neural network computing device to implement corresponding functions in the neural network computing method provided in the first aspect. The neural network computing device may also include a memory, coupled to the processor, that retains program instructions and data necessary for the neural network computing device. The neural network computing device may also include a communication interface for the neural network computing device to communicate with other devices or communication networks.
In a sixth aspect, an embodiment of the present application provides a neural network training device, where the neural network training device includes a processor, and the processor is configured to support the neural network computing device to implement corresponding functions in the neural network training method provided in the third aspect. The neural network training device may also include a memory, coupled to the processor, that stores program instructions and data necessary for the neural network training device. The neural network training device may also include a communication interface for the neural network training device to communicate with other devices or a communication network.
In a seventh aspect, an embodiment of the present application provides a neural network processor configured to support the neural network processor to implement corresponding functions in the neural network computing method provided in the first aspect and/or the neural network training method provided in the third aspect.
In an eighth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor, and the processor is configured to support the electronic device to implement corresponding functions in the neural network computing method provided in the first aspect and/or the neural network computing method provided in the third aspect. The electronic device may also include a memory, coupled to the processor, that stores program instructions and data necessary for the electronic device. The electronic device may also include a communication interface for the neural network computing device to communicate with other devices or communication networks.
In a ninth aspect, embodiments of the present application provide a computer storage medium for storing program code, where the program code includes instructions for performing any one of the methods of the first aspect or the third aspect.
In a tenth aspect, the present application provides a computer program product, which includes instructions that, when executed by a computer, enable the computer to perform any one of the methods as described in the first aspect or the third aspect.
In an eleventh aspect, the present application provides a chip system, where the chip system includes a processor, configured to support a neural network application apparatus to implement the functions related to the first aspect or support a neural network training apparatus to implement the functions related to the third aspect, for example, perform convolution calculation on the input data based on the weights of the target convolutional layer to obtain first output data, and perform calculation on the first output data based on the attention vector corresponding to the target convolutional layer to obtain second output data. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the data transmission device. The chip system may be constituted by a chip, or may include a chip and other discrete devices.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
Fig. 1 is a schematic diagram of a neural network provided in an embodiment of the present application.
FIG. 2 is a memristor crossbar array schematic diagram provided by an embodiment of the present application.
FIG. 3 is another memristor crossbar array schematic provided by embodiments of the present application.
Fig. 4 is a schematic diagram of a convolutional neural network according to an embodiment of the present application.
Fig. 5 provides another schematic diagram of a convolutional neural network according to an embodiment of the present application.
Fig. 6 is a system architecture diagram provided in an embodiment of the present application.
Fig. 7 is a hardware structure diagram of a neural network processor according to an embodiment of the present disclosure.
Fig. 8 is a schematic flowchart of a neural network training method according to an embodiment of the present application.
Fig. 9 is a schematic diagram of a neural network training method according to an embodiment of the present application.
Fig. 10 is a schematic flowchart of another neural network training method according to an embodiment of the present application.
Fig. 11 is a schematic diagram of a neural network training method according to an embodiment of the present application.
FIG. 12 is a diagram comparing the computational effects of a neural network in the present embodiment with those of the prior art.
Fig. 13 is a schematic flowchart of a neural network computing method according to an embodiment of the present application.
Fig. 14 is a mathematical expression diagram of a neural network computing method according to an embodiment of the present disclosure.
Fig. 15 is a schematic flowchart of a neural network training and calculating method according to an embodiment of the present disclosure.
Fig. 16 is a schematic structural diagram of a chip system according to an embodiment of the present application.
Fig. 17 is a schematic structural diagram of a neural network training device according to an embodiment of the present application.
Fig. 18 is a schematic structural diagram of a neural network computing device according to an embodiment of the present application.
Fig. 19 is a schematic structural diagram of another neural network training device according to an embodiment of the present application.
Fig. 20 is a schematic structural diagram of another neural network computing device provided in an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
The following describes in detail a neural network computing method and apparatus provided in the embodiments of the present application with reference to the accompanying drawings.
In order to facilitate understanding of the embodiments of the present application, technical problems to be solved by the embodiments of the present application and corresponding application scenarios are specifically analyzed below.
The neural network is composed of neurons, and a large number of neurons are connected with one another to form a network. The connections between neurons can be viewed as weighted directed edges, and the output of each neuron will be weighted by the connections between neurons and then passed to the neuron to which it is connected; all inputs received by each neuron are summed for further processing to produce a neuron output. The neural network model is usually built by using a plurality of neurons as one layer, and the layers are interconnected to form a chain-like neural network schematic diagram as shown in fig. 1. Each circle in fig. 1 represents a neuron, each arrow represents a connection between neurons, and each connection has a weight. Certainly, the embodiment of the present application may be applied to a neural network in any shape, and the chain-shaped neural network shown in fig. 1 is only an illustration, and is not an application scenario for limiting the neural network training method provided in the present application.
Illustratively, a layer L _ n of the neural network including n neurons is fully associated with a layer L _ m including m neurons (i.e., each neuron in the layer L _ n is connected to each neuron in the layer L _ m, respectively), and an output generated by the layer L _ n is input to the layer L _ m after weighted connection. The input of layer L _ n may be represented by a vector V _ n of length n, the connection weights may be represented as a matrix M _ (n × M) of size n rows and M columns, where each matrix element represents the weight of one connection; the output produced by layer L _ n after weighting, i.e. the vector input by layer L _ n to layer L _ M, is M _ (n × M) V _ n. Such matrix vector multiplication operations are the most central and common calculations of neural networks.
Memristors, all known as memristors (memristors). Which is a circuit device that represents the relationship of magnetic flux to electrical charge. A memristor has a dimension of resistance, but unlike resistance, the resistance of a memristor is determined by the charge flowing through it. Therefore, the memristor, as a nonlinear circuit element with resistance memory behavior, can store data by using the resistance value, namely, the charge quantity flowing through the memristor can be known by measuring the resistance value of the memristor, thereby realizing the function of memorizing charge. Moreover, the memristor stored data has the characteristic of nonvolatility, namely the resistance of the memristor is unchanged after the memristor is powered off.
The large-scale matrix vector multiplication and addition operation of the neural network brings great challenges to application scenes with high real-time requirements and sensitive power consumption, and the key problem is how to process data at high speed and low power consumption by utilizing computer hardware. When the Central Processing Unit (CPU) performs matrix vector multiplication, the central processing unit is limited by the bandwidth limitation of serial processing and data reading, and the operation efficiency is low; while the Graphic Processing Unit (GPU) has higher parallel computing efficiency when carrying out matrix vector multiplication, but has larger energy consumption. Therefore, the existing general-purpose processor has difficulty in meeting the computation requirement of deep learning, and designing a special chip becomes an important development direction. Because the memristor has the function of storing and calculating the whole body, the analog matrix vector multiplication performed by the memristor cross array becomes an effective method for solving the problems. The memristor cross array structure is formed by metal wires which are distributed transversely and longitudinally and memristors on metal wire cross points. The appearance of the memristor cross array provides an efficient solution for the chip design of the neural network, has the characteristics of high density, nonvolatility, low power consumption, integration of storage and calculation and the like, and can simultaneously realize the storage and calculation of the weights of the neural network so as to efficiently finish the multiply-accumulate calculation in the neural network. Fig. 2 shows a schematic diagram of a memristor crossbar array, and as shown in fig. 2, after voltages with different amplitudes are input to each row of the crossbar array, memristors on the same column convert corresponding voltage weights into currents and sum the currents to output. Specifically, matrix vector multiplication and addition operation is performed on input voltage at an input end and a conductance value of a memristor to obtain output current. Equivalently, the matrix vector multiplication operation is efficiently completed in a simulation mode, and at the moment, the memristor cross array is simultaneously responsible for the storage of the matrix and the calculation of the matrix vector multiplication.
OptionallyAnd in order to convert the summation current into voltage output, a load resistor is connected to an output port of each column, and the peripheral circuit only needs to read the voltage value of the load resistor. The characteristic of the memristor cross array has high application value, and the calculation efficiency of the neural network can be greatly improved. Where the input voltage is equivalent to the input data in the neural network, the memristor conductance represents the weight of the neural network, and the output current is equal to the input voltage multiplied by the conductance value, which may represent the output of the neural network. Fig. 3 is a schematic diagram of another memristor crossbar array provided by an embodiment of the present application, as shown in fig. 3. Wherein the input vectors are different voltage values V0-VnCan be represented as a vector V, and the output is a new voltage value V 'after being calculated by the memristor crossbar array'0-V'mAnd may be represented as a vector V'. Wherein, V ═ VGRsVector RsComprising an element
Figure BDA0002826563510000101
Wherein
Figure BDA0002826563510000102
Representing the ground resistance value of the j-th column, the conductance value G of the memristor switch array shown in fig. 3 may represent a weight G _ ((n +1) × (m +1)) as shown below.
Figure BDA0002826563510000103
Wherein matrix G _ ((n +1) × (m +1)) is a matrix of (n +1) rows and (m +1) columns, the matrix elements GijIndicating the conductance value of the ith row and the jth column. Exemplary, V'0=(V0*G0,0+V1*G1,0+…+Vn*Gn,0)。
The neural network training method provided by the embodiment of the application can be applied to the memristor cross array shown in fig. 2 or fig. 3.
However, due to the process limitation of the memristor, in practical application, the memristor device has the problems of low precision and large disturbance, namely, the memristor runs through in practiceThe presence of read and write errors in the process causes the memristor to exhibit a conductance value that is not exactly equal to the desired value, but rather falls within a distribution around the desired value, ultimately resulting in a degradation of the precision of the inference. That is, each element G in the weight G _ ((n +1) × (m +1)) isijIs not exactly equal to each conductance value in the required memristor switch array. The noise present by the memristor device negatively impacts the expressive power of the memristor-based neural network.
In the prior art, in order to eliminate the influence of noise on the performance of an actual neural network, the accuracy influence caused by the noise of a part of hardware level can be counteracted by exposing the noise of a bottom layer device to a software training level in combination with the analog noise of a circuit device in a neural network system during the training of the neural network. Illustratively, in the training process of the neural network, when the multiplication operation of the input and the weight is carried out, a noise is added to each element of the weight, and optionally, the noise can be a normal distribution noise which is used for simulating the noise of the memristor device. In this way, training of the neural network can be enabled. The training can be adapted to the presence of noise, not just to an accurate weight representation. Although the method introduces noise into the network training process, the robustness of the network weight is increased to a certain extent, the method can only resist the noise to a certain extent, the effect is small for the larger noise generated on the actual memristor device, and the effect of reducing the power consumption is not achieved. Moreover, when the noise of the actual memristor device is relatively large, the noise introduced by the neural network in the training process cannot be converged. Therefore, how to improve the accuracy of the neural network and reduce the influence of the noise of the circuit device (such as a memristor) in the neural network system on the accuracy of the neural network is a problem to be solved urgently.
The embodiment of the application provides a neural network training and calculating method, wherein an attention vector is introduced in the neural network training and practical application process, namely in a target convolutional layer included in a neural network, the input data of the target convolutional layer is continuously calculated through the weight of the target convolutional layer to obtain first output data, the first output data is calculated based on the attention vector corresponding to the target convolutional layer to obtain second output data, and the second output data is the input data of the next layer of the target convolutional layer. In the process of obtaining the input data of the next layer of the target convolutional layer, in addition to extracting the feature map in the input data through the weight, the extracted feature map is corrected through the attention vector, for example, the feature map with strong noise robustness is strengthened, and the feature map with poor noise robustness is weakened or even removed, so that the influence of noise on a neural network system is reduced, and the accuracy of neural network calculation is improved. The neural network training method may be performed by a neural network training device, and the neural network computing method may be performed by a neural network computing device. The neural network training device or the neural network computing device may be a chip or a system of chips; the neural network training device or the neural network computing device may also be a computer-readable storage medium; the neural network training device or the neural network computing device may also be a computer program product; this is not limited in the embodiments of the present application.
It should be noted that the neural network architecture shown in fig. 1 and the memristor crossbar array schematic diagrams shown in fig. 2 and 3 are only for example and are not used to limit the technical solution of the present application. Those skilled in the art should understand that in the specific implementation process, the neural network architecture may also be in other forms, and may also include other devices, and the number of memristors may also be configured according to specific needs.
In the following, some terms related to the neural network are explained to facilitate understanding by those skilled in the art.
(1) Deep Neural Networks (DNNs) are a broad concept, and some sense of the concept include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and generation of antagonistic Neural Networks (GANs). DNN refers to a neural network that contains multiple hidden layers. The neural network computing method and the related equipment provided by the embodiment of the application can be applied to a deep neural network, and particularly can be applied to a convolutional neural network.
(2) The convolutional neural network CNN is a multi-layer neural network, each layer is composed of a plurality of two-dimensional planes, each plane is composed of a plurality of independent neurons, and the plurality of neurons of each plane share weights, so that the number of parameters in the neural network can be reduced through weight sharing. Currently, in a convolutional neural network, a convolution operation performed by a processor is usually to convert the convolution of an input signal feature and a weight into a matrix multiplication operation between a signal matrix and the weight.
(3) The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) The filter is a concatenation of a plurality of convolution kernels, each assigned to a particular channel of the input. When the number of channels is 1, then the filter is a convolution kernel, and when the number of channels is greater than 1, then the filter refers to a concatenation of multiple convolution kernels. For example, if a picture is stored in RGB form with one tensor, the input includes three channels, i.e., R matrix, G matrix, and B matrix (red, green, and blue, corresponding to three images of the same size). The matrix of each channel is convolved with a corresponding convolution kernel, and all convolution kernels corresponding to all channels form a filter. Each filter is used to extract different feature data. As another example, a picture has four channels ARGB (transparency and red, green, and blue, corresponding to four images of the same size), and assuming that the convolution kernel size is 100 × 100, 16 convolution kernels w1 to w16 are used, where the convolution kernels w1 to w4 constitute a first filter, the convolution kernels w5 to w8 constitute a second filter, the convolution kernels w9 to w12 constitute a third filter, and the convolution kernels w13 to w6 constitute a fourth filter, and different filters are used to extract different feature data of the input image. Performing convolution operation on the ARGB image by using a first filter, namely performing convolution operation on four images on four channels by using w 1-w 4 correspondingly to obtain a first image; the first pixel in the top left corner of this image is a weighted sum of the pixels in the 100 x 100 region in the top left corner of the four input images, and so on. Similarly, with the other filters, the output of this layer corresponds to 4 "images". Each image pair is responsive to a different feature in the original image.
(5) The convolutional neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in a training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal forward until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is an error loss dominated back propagation motion aimed at obtaining optimal parameters of the super-resolution model, such as weights and attention vectors in the embodiments of the present application.
(6) Convolution is to extract feature data from the original input, and in short, feature data is extracted from a small region to a small region from the original input. Expressed in terms of mathematical relationships, convolution is the operation of a convolution kernel with the input matrix of the convolutional layer. Usually the input matrix (input matrix) is a matrix extracted from the image matrix in accordance with the step size (stride) of the convolution kernel when convolved. The convolution kernel is a small window and the weights are recorded. The convolution kernel slides on the image matrix according to the step length, the convolution kernel slides each time to correspond to a sub-matrix of the image matrix, the weight in the convolution kernel is multiplied by the value contained in the sub-matrix and then added, and the weight is assigned to an element corresponding to the current output characteristic diagram (output matrix) of the convolution kernel. The convolution is not limited to the convolution of the original input, and also includes reusing the convolution for the output result after the convolution, which is not limited in the embodiment of the present application. For example, the first convolution extracts feature data of a lower level, the second convolution extracts feature data of an intermediate level, the third convolution extracts features of a higher level, and so on. The features can be continuously extracted and compressed, the finally extracted features with higher levels can be understood as further concentration of the original features, so that the finally obtained features are more reliable, and various tasks such as classification, regression and the like are processed by utilizing the features with the last level.
(7) Convolution operations are one of the most important operators in convolutional neural networks. For example, X represents an input profile (input matrix of convolutional layers), W represents weights, b represents offsets, Y0 represents the result of multiplying X and W matrices, and Y represents an output profile (output matrix of convolutional layers). Optionally, through an optional Activation operation, the Activation value of each element in the output Y is calculated, and a final result is obtained.
(8) Robustness (Robust) refers to the property that a system or device or apparatus can maintain certain other performances under certain (structure, size) parameter perturbation. The cause of parameter perturbation mainly includes slow drift of characteristics or parameters caused by influence of environmental factors (such as noise) on a system or equipment or device during operation.
As shown in fig. 4, fig. 4 is a schematic diagram of a convolutional neural network provided in an embodiment of the present application, and the Convolutional Neural Network (CNN)100 may include an input layer 110, a convolutional/pooling layer 120, fully-connected layers 131 to 13n, and an output layer 140, where the pooling layer is optional.
Convolutional layer/pooling layer 120:
as shown in FIG. 4, convolutional layer/pooling layer 120 may include, for example, 121-126 layers, in one implementation, 121 layers are convolutional layers, 122 layers are pooling layers, 123 layers are convolutional layers, 124 layers are pooling layers, 125 layers are convolutional layers, and 126 layers are pooling layers; in another implementation, 121, 122 are convolutional layers, 123 are pooling layers, 124, 125 are convolutional layers, and 126 are pooling layers. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.
A convolutional layer:
taking convolutional layer 121 as an example, convolutional layer 121 may include a plurality of convolution operators, which are also called convolutional kernels, and act as a filter for extracting specific information from the input matrix in image processing, and the convolution operator may be essentially a weight, which is usually predefined, and during the convolution operation on the image, the weight is usually processed on the input matrix one pixel after another in the horizontal direction (or two pixels after two pixels … …, depending on the value of step size stride), so as to complete the task of extracting specific feature data from the input matrix. When convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (e.g., 121) tends to extract more general feature data, which may also be referred to as low-level features; as the depth of the convolutional neural network 100 increases, the more and more sophisticated features extracted by the convolutional layer (e.g., 126), such as feature data with high-level semantics, the more semantic features are suitable for the problem to be solved.
A pooling layer:
since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce pooling layers after the convolutional layer, i.e. the layers 121-126 as illustrated by 120 in fig. 1, may be one convolutional layer followed by one pooling layer, or may be multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image.
All-connection layers 131 to 13n and output layer 140:
after processing by convolutional layer/pooling layer 120, convolutional neural network 100 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 120 will only extract feature data and reduce the parameters brought by the input image. However, in order to generate the final output information (the required class information or other relevant information), the convolutional neural network 100 needs to generate one or a set of required class outputs using the fully-connected layers 131 to 13n and the output layer 140.
It should be noted that the convolutional neural network 100 shown in fig. 4 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, as shown in fig. 5, fig. 5 is another schematic diagram of a convolutional neural network provided by the embodiment of the present application, where multiple convolutional layers/pooling layers are parallel, and the features extracted respectively are all input to the fully-connected layers 131 to 13n for processing.
It should be noted that the neural network architecture shown in fig. 4 or fig. 5 is only for example and is not used to limit the technical solution of the present application. Those skilled in the art will appreciate that in a specific implementation, the neural network architecture may also be in other forms, may also include other devices, and may also configure the number of memristors according to specific needs.
Referring to fig. 6, fig. 6 illustrates a system architecture according to an embodiment of the present application. As shown in the system architecture, the system may include a neural network computing device 110, a neural network training device 120, a database 130, a client device 140, a data storage system 150, a data acquisition device 160, and the like. The data acquisition device 160 is used for acquiring training data, and in this application, the data acquisition device 160 may include a microphone or a camera, for example. The training data (i.e. input and output of the neural network training side) in the embodiment of the present application may include, for example: video sample data, image sample data, or voice sample data, and an object (or referred to as a tag) that matches the video sample data, the image sample data, or the voice sample data. For example, the image sample data is a face image of a user, and the target corresponding to the image sample data is identity information of the user. The video sample data, the image sample data, or the voice sample data, and the target or the tag matched with the video sample data, the image sample data, or the voice sample data may be collected by the data collecting device 160, or may be downloaded from the cloud, and fig. 6 is only an exemplary architecture, which is not limited thereto. Optionally, the training data in this embodiment may also include other types of sample data besides video sample data, or voice sample data, and this embodiment is not limited in this application. Further, the data collecting device 160 stores the training data into the database 130, and the neural network training device 120 trains to obtain the network parameters of the neural network system based on the training data maintained in the database 130 (the network parameters of the neural network system here may be the network parameters, such as weights and attention vectors, obtained through training by the neural network training method in the embodiment of the present application).
It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the neural network training device 120 does not necessarily have to perform training based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for model training.
The neural network computing device 110 or the neural network training device 120 in the embodiment of the present application may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), an intelligent wearable device, an intelligent robot, a vehicle-mounted terminal, and the like, and may also be a server or a cloud. In fig. 6, the neural network computing device 110 is configured with an I/O interface for data interaction with an external device, and a user can input data to the I/O interface through a client device 140 (the client device in this application may also include a microphone, a camera, or other data acquisition devices), where the input data (i.e., input data on the computing side) may include voice information, image information, or video information in this embodiment of the application. It is to be understood that the input data here may be user input data, or may be provided by a related database, which is different according to different application scenarios, and this is not limited in this embodiment of the present application.
In the embodiment of the present application, the client device 140 may be on the same device as the neural network computing device 110, and the data collection device 160, the database 130 and the neural network training device 120 may also be on the same device as the neural network computing device 110 and the client device 140.
It should be noted that the neural network training device 120 may generate network parameters of a corresponding neural network system based on different training data for different targets or different tasks, and the corresponding neural network system may be used to achieve the targets or complete the tasks, so as to provide the user with a desired result.
It should be noted that fig. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, etc. shown in the diagram does not constitute any limitation, for example, in fig. 6, the data storage system 150 is an external memory with respect to the neural network computing device 110, and in other cases, the data storage system 150 may also be disposed in the neural network computing device 110. The specific structure of the neural network training device 120 in fig. 6 can refer to fig. 17 or fig. 19 in the embodiment of the present application, and the specific structure of the neural network computing device 110 in fig. 6 can refer to fig. 18 or fig. 20 in the embodiment of the present application. It should be noted that the structure disclosed in fig. 17 or fig. 19 is only one exemplary structure of the neural network training device 120 proposed in the embodiment of the present application, and the structure disclosed in fig. 18 or fig. 20 is also only one exemplary structure of the neural network computing device 110 proposed in the embodiment of the present application, and the embodiment of the present application is not limited thereto.
Referring to fig. 7, based on the above description of the related functions of the convolutional neural network in fig. 4 and fig. 5 and the description of the system architecture 100 in fig. 6, fig. 7 is a hardware structure diagram of a neural network processor according to an embodiment of the present application, where:
the neural network processor NPU 302 is mounted as a coprocessor on a CPU (e.g., Host CPU)301, and tasks are assigned by the Host CPU 301. For example, corresponding to the system architecture 100 described above, the CPU 301 may be located in the client device 140 in the present application for extracting voice information or image information to be recognized from voice data and video data; the NPU 302 may be located in the computing module 111, and is configured to perform feature extraction and feature matching on the voice information or the image information to be recognized extracted by the CPU 301, so as to send a matching result to the CPU 301 for further computing processing, which is not described in detail herein. It will be appreciated that the CPU and NPU may be located in different devices, which may be configured differently depending on the actual requirements of the product. For example, the NPU is located on a cloud server, and the CPU is located on a user device (e.g., a smartphone, a smart robot); alternatively, both the CPU and the NPU are located on a client device (e.g., a smartphone, a smart robot, etc.).
The core portion of the NPU 302 is an arithmetic circuit 3023, and the controller 3024 controls the arithmetic circuit 3023 to extract matrix data in a memory and perform multiplication.
In some implementations, the arithmetic circuit 3023 internally includes a plurality of processing units (PEs). In some implementations, the operational circuit 3023 is a two-dimensional systolic array. The operational circuit 3023 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 3023 is a general purpose matrix processor. For example, assume that there is an input matrix A and a weight B. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 3022 and buffers the data corresponding to the matrix B in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 3021 and performs matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in the accumulator 3028 accumulator.
The unified memory 3026 is used to store input data and output data. The weight data is directly transferred to the weight Memory 3022 through the Memory cell Access Controller 3025Direct Memory Access Controller, DMAC. The input data is also carried into the unified memory 3026 by the DMAC.
A Bus Interface Unit 30210(Bus Interface Unit, BIU for short) is configured to obtain an instruction from the external memory by the instruction fetch memory 3029, and further configured to obtain the input matrix a or the weight B and the raw data of the attention vector V from the external memory by the storage Unit access controller 3025.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 3026, or to transfer weight data to the weight memory 3022, or to transfer input data to the input memory 3021.
The vector calculation unit 3027 may include a plurality of operation processing units, and the vector calculation unit 3027 may be configured to take the attention vector stored in the vector memory 3030 and perform recalculation based on the matrix operation result obtained by the operation circuit 3023 and the attention vector.
An instruction fetch buffer 3029 connected to the controller 3024, and configured to store instructions used by the controller 3024;
the unified memory 3026, the input memory 3021, the weight memory 3022, the vector memory 3030, and the instruction fetch memory 3029 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
It is understood that, regarding the extraction of feature data and the modification of feature data in the input in any embodiment of the neural network training method in the present application, and regarding the extraction of feature data and the modification of feature data in the input in any embodiment of the neural network computing method in the present application, the related functions are implemented by the related functional units in the neural network processor 302(NPU), and will not be described in detail herein.
The following describes embodiments of the neural network training method and the neural network computing method provided by the present application from a model training side and a computing side, and specifically analyzes and solves technical problems presented in the present application, in combination with the above application scenario, system architecture, structure of convolutional neural network, and structure of neural network processor.
Referring to fig. 8, fig. 8 is a schematic flowchart of a neural network training method according to an embodiment of the present disclosure, which may be applied to the neural network processor corresponding to fig. 7. The method may include the following steps S801 to S804, and optionally, may further include steps S805 to S809.
Step S801, inputting training data into a neural network system, where the neural network system includes at least one convolutional layer.
In particular, the neural network system updates network parameters by gradient-side back propagation, and a required data set is generally divided into training data and test data. Generally, an input data X (i.e. training data in the embodiment of the present application) is given to a neural network, and then a data Y _ label of an output data that the network should have is determined, i.e. a target output data in the embodiment of the present application, which is also called a label (label). A set of output data Y is obtained according to the input data X, this Y is compared with Y _ label corresponding to X, the difference (also called loss value) between the two is compared, and then the network parameter in the network is adjusted by this difference (gradient back propagation adjustment parameter), so that Y of the next output data is closer to Y _ label. After training, the trained network is further verified through test data, namely the accuracy of the neural network on the test data is evaluated by adopting data which is never used by the neural network to test the network. The test accuracy is high, and the better the neural network training is.
Step S802, the target convolutional layer calculates input data of the target convolutional layer through the first attention vector and L first weights to obtain input data of a next layer of the target convolutional layer.
Wherein the at least one convolutional layer comprises the target convolutional layer. The L first weights are used for convolution calculation to extract feature data (or called a feature map) in the input data of the target convolution layer, the first attention vector is used for correcting the feature data, and L is an integer greater than or equal to 1.
In particular, a convolutional layer may correspond to one or more filters, and one filter may be used in the convolutional calculation to extract a feature map in the convolutional layer input data. The input data of the convolutional layer may correspond to a plurality of channels, and for example, if the input data is an RGB image, the input data of the convolutional layer corresponds to 3 channels. One filter corresponds to a plurality of convolution kernels (i.e., weights), each convolution kernel in the plurality of convolution kernels corresponding to each channel in the plurality of channels, respectively. Therefore, a filter extracts a feature map, and actually extracts features at the same position through the convolution kernel corresponding to each channel for superposition. In the embodiment of the present application, still taking the example that the input data is an RGB image, if there is only one filter, the target convolutional layer extracts one feature data in the input data of the target convolutional layer by 3 first weights. Further, after the L first weights are used for convolution calculation to extract feature data in the input data of the target convolution layer, the first attention vector is used for correcting the feature data. The meaning of "correction" here can be that the characteristic data with strong robustness to noise is strengthened through the difference of the coefficient, the characteristic with poor robustness to noise is weakened; or selecting characteristic data with strong robustness to noise, and removing the characteristic with poor robustness to noise; or selecting the characteristic with strong noise robustness, distributing different weighting coefficients for different characteristic data based on the difference of the noise robustness of the different characteristic data, and removing the characteristic data with poor noise robustness.
Step S803, inputting the training data to the neural network to obtain third output data, comparing the third output data with the target output data, and calculating a first loss value.
In particular, the process of calculating the third output data from the training data and the network parameters of the neural network system may be referred to as "forward propagation". The loss function for calculating the loss value commonly used may include the following: 0-1Loss function (0-1Loss function), square Loss function (square Loss function), absolute Loss function (absolute Loss function), logarithmic Loss function (logarithmic Loss function), Hinge Loss function (Hinge Loss), and the like, but are not limited thereto.
Step S804, updating network parameters of the neural network system according to the first loss value, where the network parameters include the first attention vector and the L first weights.
The network parameters of the neural network system are parameters to be trained in the neural network system, and at least include attention vectors and weights corresponding to convolutional layers. In this embodiment, the target convolutional layer corresponds to the first attention vector and the L first weights, but it does not mean that all convolutional layers in the neural network system correspond to an attention vector to correct feature data extracted by the weights, and the neural network provided in this embodiment includes at least one convolutional layer, in which feature data in input data of the convolutional layer is extracted based on a plurality of weights, and then the feature data is corrected by an attention vector. Specifically, in the process of updating the network parameters of the neural network system according to the first loss value, the network parameters of the neural network system can be updated by performing back propagation by using the first loss value as input data, that is, the training process of the neural network model is completed once. Then, the updated network parameter may be used as the network parameter of the next forward propagation, and step S801 to step S804 may be continuously performed until the first loss value is smaller than the preset threshold or the number of times of performing the neural network training reaches the preset number of iteration cycles. Optionally, in a case that the first loss value does not converge, performing back propagation according to the first loss value to update the first attention vector and the L first weights, and re-determining the first loss value based on the training data, the target output data, the updated first attention vector, and the updated L first weights until the first loss value converges; and determining that the current first attention vector and the L first weights are the second attention vector and the L second weights corresponding to the target convolutional layer when the first loss value converges. Through a plurality of training processes of the neural network model, expected network parameters can be obtained, and the obtained final network parameters can be deployed in a circuit device.
In the embodiment of the present application, in a target convolutional layer included in a neural network, input data of the target convolutional layer is calculated through a first attention vector and L first weights, so as to obtain output data of the target convolutional layer, where the output data of the target convolutional layer is input data of a next layer of the target convolutional layer. When the neural network is trained, besides training the weights, the first attention vector is trained to be used for evaluating the noise robustness of each feature data extracted through weight convolution. In the process of obtaining input data of a next layer of the target convolutional layer, in addition to extracting feature data in the input data through the L first weights, the extracted feature data are corrected through the first attention vector, so that the feature data with strong noise robustness of circuit devices in the neural network system are strengthened, the feature data with poor noise robustness of the circuit devices in the neural network system are weakened or even removed, further the optimal feature data are screened out to be used for calculation of the next layer of the target convolutional layer, the feature data with poor noise robustness of the circuit devices in the neural network system are prevented from flowing into the next layer of the target convolutional layer, the influence of noise of the circuit devices in the neural network system on the neural network system is reduced, and the accuracy of the neural network is improved.
In a possible implementation manner, before the target convolutional layer calculates the input data of the target convolutional layer through the first attention vector and the L first weights, step S805 and step S806 may be further included.
Step S805, an initial attention vector corresponding to the target convolutional layer is obtained.
Step S806, setting, for the initial attention vector, an element in the vector that is smaller than or equal to the preset threshold to zero, and keeping the element in the vector that is larger than the preset threshold unchanged to obtain the first attention vector.
In particular, the weights and attention vectors may be initialized prior to training the network parameters of the neural network system. For the initial attention vector, a part of the elements are set to invalid values, i.e. a pruning operation is performed for the initial attention vector. In this embodiment of the present application, the element that needs to be pruned is set to zero, that is, the element that is smaller than or equal to the preset threshold in the vector is set to zero, and the element that is larger than the preset threshold in the vector is kept unchanged, so as to obtain the first attention vector. The preset threshold value can be determined by the following method: sorting the elements in the initial attention vector from big to small according to numerical values; and determining the value of the sequenced Nth element as the preset threshold, wherein N is determined based on a preset pruning rate, and the pruning rate is used for representing the ratio of the number of the elements which are set to be invalid in the vector to the total number of the elements in the vector. For example, a pruning rate of 20% means that 20% of the elements in the initial attention vector are pruned, and specifically, the value of 20% of the elements in the initial attention vector is set to zero. In this embodiment of the application, the feature data in the input data of the target convolutional layer extracted by the L first weights may include M feature maps, where M is an integer greater than or equal to 1, M is less than or equal to L, the M feature maps correspond to M weight sets, and a union of the M weight sets is the L second weights; wherein, a weight set is used for convolution calculation to extract a feature map in the input data of the target convolution layer, the initial attention vector comprises M elements, and the M weight sets, the M feature maps and the M elements are in one-to-one correspondence. By setting part of elements in the initial attention vector to be zero, the characteristic diagram corresponding to the element with the element value of zero is removed, the characteristic data with poor noise robustness is removed, the characteristic data with good noise robustness is kept to enter the next layer of the target convolutional layer for calculation, and therefore the optimal channel combination (wherein each channel corresponds to one characteristic diagram) is screened out and used for reasoning and calculation of the neural network system.
In a possible implementation manner, after obtaining the initial attention vector corresponding to the target convolutional layer, step S807 may be further included.
Step S807, the initial attention vector is processed based on an activation function, and then, for the processed initial attention vector, an element in the vector that is less than or equal to the preset threshold is set to zero, and an element in the vector that is greater than the preset threshold is kept unchanged, so as to obtain the first attention vector.
The commonly used activation function may include, but is not limited to, a Sigmoid function, a tanh function, a Relu function, a leak Relu function (prilu), an elu (explicit Linear units) function, a MaxOut function, and the like. The Sigmoid function is a common nonlinear activation function, and the mathematical form of the Sigmoid function is as follows:
Figure BDA0002826563510000181
the Sigmoid function is able to transform successive real values of input data into output data between zero and 1, in particular, if it is a very large negative number, then the output data is 0; if it is a very large positive number, the output data is 1. The value of the non-zero element of the first attention vector processed by the activation function is in a closed interval from 0 to 1, so that the performance of the neural network can be improved, and the neural network can be enabled to haveThe collaterals are more convergent during training. The preset threshold in the embodiment of the present application may be determined according to the following method: sorting the elements in the initial attention vector after the activation function is processed according to numerical values from large to small; determining the value of the N-th element after sorting as the preset threshold, where N is determined based on a preset pruning rate, where the pruning rate is used to represent a ratio of the number of the elements in the vector that are invalid to the total number of the elements in the vector, and further setting the elements in the vector that are smaller than or equal to the preset threshold to zero, and keeping the elements in the vector that are larger than the preset threshold unchanged to obtain the first attention vector.
In a possible implementation manner, before one of the target convolutional layers is used for calculating the input data of the target convolutional layer through the first attention vector and the L first weights, step S808 and step S809 may be further included.
Step S808, L initial weights of the target convolutional layer are obtained.
As described above, the weights and attention vectors may be initialized prior to training the network parameters of the neural network system. The initialization of the weights in the neural network model is important for the training of the neural network, poor initialization parameters can cause the problem of gradient propagation, and the training speed is reduced; and good initialization parameters can accelerate convergence and find a better solution more likely. If the weight is small initially, the signal arrival will be small finally; if the weight is initially large, the signal arrival will be eventually large. Generally common weight initialization methods may include, but are not limited to, the following: 1. initializing a constant (constant), namely initializing elements in the weight to be a self-defined constant; 2. gaussian distribution initialization (gaussian), that is, initializing the elements in the weight to be small random numbers, such as gaussian distribution with mean value of zero and variance of zero.01, the initialization method is only applicable to small networks, and for deep networks, the small weight causes small gradient in back propagation calculation, and further causes the gradient "signal" to be weakened; 3. initializing positive _ unit; 4. initializing elements in the weights to be subject to a normal distribution; 5. uniform distribution initialization (uniform), that is, initialization of uniform distribution is performed on elements in the weights, and the upper limit of the element value is controlled by using a maximum value and a minimum value; 6. xavier initialization, namely elements in the weight are subjected to uniform distribution with a mean value of zero and a variance associated with input data; 7. MSRA initialization, i.e., elements in the weights obey a gaussian distribution with a mean of zero, variance associated with the input data, etc.
Step S809, introducing read/write errors of circuit devices in the simulated neural network system into the L initial weights, to obtain the L first weights.
Wherein the L initial weights are in one-to-one correspondence with the L first weights. Optionally, the read-write error of the circuit device in the simulated neural network system follows a normal distribution, and a standard deviation of the normal distribution is associated with a magnitude of the read-write error of the circuit device in the neural network system.
Specifically, each element of the L initial weights is added with a noise, and the noise is used for simulating the noise of a circuit device in a neural network system such as a memristor. Wherein, optionally, the noise of the circuit device in the neural network system can be subject to a normal distribution, and the standard deviation of the normal distribution is related to the magnitude of the read-write error of the circuit device in the neural network system. In other embodiments, the noise of the circuit device in the neural network system may also follow other distributions, and the distribution of the noise of the circuit device in the neural network system is associated with the circuit device in the neural network system, and the embodiments of the present application are not limited. In the embodiment of the application, in the process of training the L first weights, analog noise of a circuit device of a neural network system can be combined during neural network training, and accuracy influence caused by noise of a part of hardware level is counteracted by exposing noise of a bottom-layer device to a software training level, so that the training of the neural network can adapt to the condition that the noise exists, rather than accurate weight representation. Therefore, noise is introduced into the network training process, and robustness of the weight is increased to a certain extent.
In one possible implementation, the updating the network parameter of the neural network system according to the first loss value includes: updating the initial attention vector and L initial weights according to the first loss value; updating the L first weights based on the updated L initial weights; updating the first attention vector based on the updated initial attention vector.
In the embodiment of the present application, when the L first weights and the first attention vector are updated, the initial attention vector and the L initial weights are updated by performing back propagation based on the first loss value. And setting the elements smaller than or equal to the preset threshold value in the vector to be zero aiming at the updated initial attention vector (or aiming at the initial attention vector processed based on the activation function), keeping the elements larger than the preset threshold value in the vector unchanged, and further updating the first attention vector. For the updated L initial weights, when performing multiplication operation on the input data and the weights, adding a noise to each element of the initial weights, optionally, the noise may be a normal distribution noise, and is used to simulate noise of a circuit device in a neural network system such as a memristor, so as to obtain the updated L first weights.
And testing the trained neural network model by using test data, and finding that the first attention vector can screen the calculation result of the convolutional layer in the testing process. Specifically, fig. 9 is a schematic diagram of a neural network training method provided in the embodiment of the present application, and as shown in fig. 9, after convolving input data of a target convolutional layer with L first weights corresponding to the target convolutional layer, features 1 to 6 are extracted. The attention vector after the activation function processing corresponding to the target convolutional layer may be (0.2, 0.3, 0.4, 0.9, 0.6, 0.8). Since the value of each element in the vector is used to represent the robustness of the feature data corresponding to the element to noise, it can be seen that, since the robustness to noise is poor, the features 1 to 3 have no benefit to the lower layer calculation of the target convolutional layer, and the value of the element in the corresponding attention vector is set to zero; features 4 to 6 may be retained for lower layer computation of the target convolutional layer due to their robustness to noise. Further, for the retained feature data, the value of the corresponding element in the attention force vector is multiplied by the retained feature data, and the corresponding feature is corrected by multiplying the value of the different element corresponding to the different element.
The neural network corresponding to the network parameters of the neural network system trained based on the above steps is marked as a neural network 1, the neural network 1 trained by the present invention is subjected to test verification on voice wake-up data, and table 1 shows that the neural network 1 trained by the neural network training method according to the embodiment of the present application is compared with the wake-up rate of the neural network in the prior art at the same false wake-up rate. The noise simulation test system comprises a test platform, a test platform and a test platform, wherein the test platform comprises a plurality of circuit devices, the test platform comprises a plurality of Std3 and Std6, the Std3 is used for indicating that the circuit devices in a neural network system such as a current memristor add noise which follows 0-3 normal distribution in a test process, the test platform comprises a plurality of Std6 and a plurality of circuit devices which follow 0-6 normal distribution in the test process, and the noise simulated by the Std6 is larger than the noise simulated by the Std 3.
TABLE 1
Figure BDA0002826563510000201
From the above table, the neural network trained by the neural network training method according to the embodiment of the present invention has a one-point improvement in std3 compared to the neural network in the prior art, because the noise model is relatively small, and the attention vector may not be able to clearly distinguish the characteristic data with strong robustness, and when the noise is improved, in std6, the neural network trained by the neural network training method according to the embodiment of the present invention has an improvement in about 4 points compared to the neural network in the prior art, that is, in the case of relatively large noise, the beneficial effect of the present invention is better.
The above steps S801 to S809 can be summarized as network parameters for training the neural network model by the software part. After the second attention vector and the L second weights are trained by the software through the above steps, in one possible implementation, the second attention vector may be deployed to a digital circuit device (e.g., a digital memory module) in the neural network system, and the L second weights may be deployed to an analog circuit device (e.g., a memristor) in the neural network system, for example, the weights in the L second weights are set to conductance values corresponding to the memristor crossbar array. After the second attention vector and the L second weights trained by the software part in steps S801 to S809, the embodiments of the present application may further include steps S810 to S813. Steps S810 to S813 describe that the second attention vector and the L second weights trained by the software part of steps S801 to S809 are deployed to the corresponding hardware, the training data is input to the neural network again, and the second attention vector in the network parameters of the neural network system is trained again in conjunction with the actual noise of the circuit device in the neural network system.
As shown in fig. 10, fig. 10 is a schematic flowchart of another neural network training method provided in the embodiment of the present application, where the neural network training method further includes steps S810 to S813. Wherein the updated L first weights are L second weights, and the updated first attention vector is a second attention vector;
step S810: inputting the training data into the neural network system.
Step S811: and the target convolutional layer calculates the input data of the target convolutional layer through the second attention vector and L third weights to obtain the input data of the next layer of the target convolutional layer.
Specifically, the L third weights correspond to the L second weights one to one, and each third weight is obtained by writing the corresponding second weight into a circuit device in the neural network system and then combining actual noise of the circuit device in the neural network system. After the network parameters are deployed to an actual circuit device through the software part training, output data of the neural network system are obtained, wherein L second weights corresponding to the target convolution layer and written into the analog circuit device are converted into L third weights due to the influence of circuit device noise in the neural network system in the actual process of the circuit device in the neural network system.
Step S812: and comparing the fourth output data with the target output data, and calculating a second loss value.
Specifically, the neural network system performs forward propagation based on network parameters corresponding to the neural network deployed to the actual hardware to obtain the fourth output data. The second Loss value can be calculated by a Loss function such as, but not limited to, a 0-1Loss function (0-1Loss function), a square Loss function (square Loss function), an absolute Loss function (absolute Loss function), a logarithmic Loss function (logarithmic Loss function), and a Hinge Loss function (Hinge Loss).
Step S813: updating the second attention vector in the network parameter according to the second loss value.
After the L second weights are deployed to the circuit devices in the neural network system, the circuit devices need to be erased to rewrite the circuit devices, for example, after the L second weights are deployed to the conductance values corresponding to the memristor crossbar array, the L second weights are not changed before erasing. Therefore, updating the parameters of the neural network again based on steps S810 to S813 corresponds to the weights in the network parameters of the fixed neural network system, and the second loss value input data is propagated back to the neural network to retrain the second attention vector. And in the process of updating the network parameters of the neural network system according to the second loss value, performing back propagation by taking the second loss value as input data, namely updating the network parameters of the neural network system, namely completing the training process of the neural network model once. The updated attention vector may then be used as a network parameter for the next forward propagation, and steps S810 to S813 may be continued until the second loss value is less than the preset threshold or the number of times the neural network training is performed reaches the preset number of iteration cycles. Optionally, specifically, in a case that the second loss value does not converge, performing back propagation according to the second loss value to update the second attention vector, and re-determining the second loss value based on the training data, the target output data, and the updated second attention vector until the second loss value converges; and determining that the current second attention vector is the final attention vector corresponding to the target convolutional layer when the second loss value converges.
As shown in fig. 11, fig. 11 is a schematic diagram of a neural network training method provided in this embodiment of the present application, in the neural network training method provided in this embodiment of the present application, first, network parameters of a neural network system are trained in a software part, and then, a neural network model trained by software is deployed on a circuit device and a digital circuit device in the corresponding neural network system in an on-chip training manner, a deployed weight on the circuit device in the neural network system is fixed, and a second attention vector is optimized to obtain a new attention vector.
The neural network corresponding to the network parameters of the neural network system trained based on the above steps S810 to S813 is labeled as a neural network 2, the neural network 2 trained in the embodiment of the present application is still tested and verified on the voice wake-up data, and table 2 shows the wake-up rate comparison between the neural network trained by the neural network training method described in the embodiment of the present application and the neural network in the prior art at the same false wake-up rate. The noise simulation test system comprises a test platform, a test platform and a test platform, wherein the test platform comprises a plurality of circuit devices, the test platform comprises a plurality of Std3 and Std6, the Std3 is used for indicating that the circuit devices in a neural network system such as a current memristor add noise which follows 0-3 normal distribution in a test process, the test platform comprises a plurality of Std6 and a plurality of circuit devices which follow 0-6 normal distribution in the test process, and the noise simulated by the Std6 is larger than the noise simulated by the Std 3. The network parameters of the neural network 1 are trained through the software training process represented by steps S801 to S809, and the network parameters of the neural network 2 are obtained by deploying the network parameters to corresponding digital circuit devices and analog circuit devices in the neural network system after the network parameters are trained through the software, and fitting the network parameters on the devices according to the actual noise of the circuit devices in the neural network system, that is, the network parameters are obtained through steps S801 to S813.
TABLE 2
Figure BDA0002826563510000211
Referring to fig. 12, fig. 12 is a schematic diagram illustrating a comparison between the calculation effect of the neural network in the embodiment of the present application and the neural network in the prior art, and it can be seen from table 1, table 2 and fig. 12 that after the attention vector is retrained on the device, the effect of the embodiment of the present application is improved by 1 point compared with the performance of the neural network 1 without retraining, and is improved by about five points compared with the original network in the prior art, which is substantially close to the performance of the original network in the prior art at std 3.
The above embodiment only describes that after the second attention vector and the L second weights trained by the software portion are trained, the hardware portion is trained, that is, the second attention vector and the L second weights are deployed to corresponding hardware, the training data is input to the neural network again, and the second attention vector in the network parameters of the neural network system is trained again in combination with the actual noise of the circuit device. It should be understood that the above-mentioned training process is only an example, and is not intended to limit the technical solution of the present application. For example, it is within the scope of the present invention to perform only the training of the software phase (e.g., some or all of the steps from step S801 to step S809), or to perform only the training of the hardware phase (e.g., some or all of the steps from step S810 to step S813), or to perform the training of the hardware phase first and then the training of the software phase, or to perform the training of the software phase and the training of the hardware phase synchronously.
After the neural network model is trained through the above embodiments, inference calculation can be performed based on the trained neural network model.
The neural network training method described in any one of the method embodiments corresponding to fig. 8 to 11 obtains the network parameters of the neural network system, and performs actual calculation of the neural network based on the obtained network parameters of the neural network system. Referring to fig. 13, fig. 13 is a schematic flowchart of a neural network computing method provided in the embodiment of the present application, where the neural network computing method describes a practical application of the neural network system trained in any one of the method embodiments corresponding to fig. 8 to fig. 11. The neural network computing method is performed by a neural network system including at least one convolutional layer, and the method may include method flow steps S1301 to S1303.
Step S1301: input data of the target convolutional layer is acquired.
Specifically, the neural network system includes at least one convolutional layer, and the target convolutional layer is any one of the at least one convolutional layer. That is, in all convolutional layers included in the neural network system, each convolutional layer may correspond to a corresponding attention vector; alternatively, rather than each convolution layer corresponding to a respective attention vector, there may be only one convolution layer, each of which corresponds to an attention vector. The input data of the neural network system may include video data, image data, or voice data, and the like, and the embodiments of the present application are not limited thereto.
Step S1302: and performing convolution calculation on the input data based on the weight of the target convolution layer to obtain first output data, wherein the first output data comprises N first feature maps, and N is an integer greater than or equal to 1.
Specifically, the weights of the target convolutional layer may be L first weights updated based on the neural network training method described in any one of the method embodiments corresponding to fig. 8 to 11.
Wherein the number of feature maps corresponds to the number of channels included in the convolutional layers and the number of filters between the convolutional layers. Taking input data of a neural network system as image data as an example, in an input layer, if the input data is a gray level image, only one characteristic map is available; if the input data is an RGB picture, it is typically 3 feature maps. There are several filters in the convolutional layer, between convolutional layer 1 and convolutional layer 2. The filter is a concatenation of multiple convolution kernels, each assigned to a particular channel of the input. When the number of channels is 1, then the filter is a convolution kernel, and when the number of channels is greater than 1, then the filter refers to a concatenation of multiple convolution kernels. For example, if a picture is stored in RGB form with one tensor, the input includes three channels, i.e., R matrix, G matrix, and B matrix (red, green, and blue, corresponding to three images of the same size). The matrix of each channel is convolved with a corresponding convolution kernel, and all convolution kernels corresponding to all channels form a filter. Each filter is used to extract a different feature map (or called feature data). As another example, a picture has four channels ARGB (transparency and red, green, and blue, corresponding to four images of the same size), and assuming that the convolution kernel size is 100 × 100, 16 convolution kernels w1 to w16 are used in total, where convolution kernels w1 to w4 constitute a first filter, convolution kernels w5 to w8 constitute a second filter, convolution kernels w9 to w12 constitute a third filter, convolution kernels w13 to w6 constitute a fourth filter, and different filters are used to extract different feature maps of the input image. Performing convolution operation on the ARGB image by using a first filter, namely performing convolution operation on four images on four channels by using w 1-w 4 correspondingly to obtain a first image; the first pixel in the top left corner of this image is a weighted sum of the pixels in the 100 x 100 region in the top left corner of the four input images, and so on. Similarly, with the other filters, the output of this layer corresponds to 4 "images". Each image pair is responsive to a different feature in the original image. In a convolutional layer, a plurality of feature maps are extracted because it is desirable to describe a picture in a plurality of angles, specifically, a plurality of different filters are used to convolve the image, and the responses on the different filters are obtained as feature data of the image.
Step S1303: and calculating the first output data based on an attention vector corresponding to the target convolutional layer to obtain second output data, wherein the second output data comprises N second feature maps robust to noise, the attention vector comprises N elements, and each element in the N elements is used for representing the robustness of the corresponding first feature map to the noise.
Specifically, the attention vector is the updated first attention vector obtained according to the neural network training method corresponding to fig. 8 to 11. The calculation of the first output data based on the attention vector corresponding to the target convolutional layer to obtain the second output data may be based on a software implementation and also based on a hardware implementation described in fig. 20. The output data of the neural network system may be a label corresponding to input data such as video data, image data, or voice data. For example, if the image data of the input data is a face image of a user, the image data is input into the neural network model and then the identity information of the user can be output; for another example, if the voice data of the input data is the voice of a user, the identity information of the user can be obtained after the voice data is input into the neural network model.
In the prior art, in the process of performing neural network calculation, after a target convolutional layer receives input data, convolution calculation is performed on the input data based on the weight of the target convolutional layer to obtain output data of the target convolutional layer, and the output data is directly used as input data of a next layer of the target convolutional layer and enters lower-layer calculation. However, in all feature maps (or referred to as feature data) of the output data of the target convolutional layer, the robustness of each feature map to noise is different, the robustness of part of feature data to noise is strong, and the robustness of part of feature data to noise is poor, so that not all feature maps are suitable for being directly input into the lower layer calculation. Therefore, the embodiment of the present application provides an attention vector, and output data obtained by performing convolution calculation based on weights is corrected by the attention vector. Specifically, in the embodiment of the present application, in the process of obtaining output data of a target convolutional layer, after performing convolutional calculation on input data based on a weight of the target convolutional layer to obtain first output data, the first output data is further corrected through an attention vector corresponding to the target convolutional layer, for example, a feature map with strong noise robustness is strengthened, and a feature map with strong noise robustness is weakened or even removed.
In a possible implementation manner, the target convolutional layer includes M channels, and before acquiring the input data of the target convolutional layer, step S1304 may be further included.
Step S1304: determining N channels for performing the convolution calculation from the M channels based on the attention vector, wherein element values of elements included in the attention vector are greater than or equal to a preset threshold, and M is an integer greater than N.
The target convolutional layer comprises M channels, each channel corresponds to a weight set, and each channel is used for extracting a feature map. Each channel corresponds to an element in the second attention vector trained based on any one of the method embodiments corresponding to fig. 8 to fig. 11, and in the embodiment of the present application, the attention vector is a composition of elements whose element values in the second attention vector trained are greater than or equal to a preset threshold.
Specifically, in the neural network calculation process, before convolution calculation is performed based on weights, corresponding channels (or filters) can be directly closed based on elements of which element values are smaller than a preset threshold value in the attention vector, and feature data with strong noise robustness are extracted only based on the remaining channels, so that the operation amount of inference calculation is reduced. Specifically, for example, as shown in fig. 14, fig. 14 is a mathematical expression diagram of a neural network computing method provided in the embodiment of the present application, where, taking the input data as the image data as an example, three matrices in the left area are the input data of the original image in RGB format, where the image data of the three channels R, G, and B are represented by three matrices, and the size of the original image in RGB format is mxnx3 (i.e., the width is m, the height is n, and the depth is 3). Filter 1 represents a first Filter, Filter N represents an Nth Filter, the filters have the same size and are s × t × 3 (i.e., the height is s, the width is t, and the depth is 3), that is, each Filter is composed of three convolution kernels (three weights), and each convolution kernel corresponds to one channel of input data. Therefore, since the current convolution uses M (M ═ 3 × N) weights, that is, N filters (corresponding to N weight sets: the first weight set, the second weight set … …, and the nth weight set), the target convolutional layer can extract N feature maps, and the depth of the output data of the target convolutional layer is N, that is, the output data size of the target convolutional layer is k × j, and the depth is N. Wherein the size of the output data is associated with the step size of the filter sliding. The rolling layer corresponds to an attention vector p (p1, p2, p3 … …, pN), where element p1 corresponds to a first set of weights (and corresponds to a first filter), element p2 corresponds to a second set of weights (and corresponds to a second filter), and so on. When a certain element in the final attention vector obtained by the neural network training method corresponding to fig. 8 to 11 is smaller than the preset threshold, then, based on the attention vector obtained by training, a channel corresponding to the element whose element value is smaller than the preset threshold is closed, and based on the input data of the target convolutional layer, the attention vector (specifically, the element in the attention vector that is larger than the preset threshold), and the remaining channels, the output data of the target convolutional layer is calculated, thereby simplifying the structure of the neural network and reducing the calculation amount of the neural network.
Specifically, the attention vector vectors screen the features extracted by the convolution matrix, so that optionally, when the features are extracted, the number of channels can be increased to ensure that the screened features have stronger robustness to noise on one hand, and on the other hand, the number of the extracted features is not reduced, so that the extracted features can accurately describe the input data.
According to the embodiment of the application, for a target convolutional layer, N channels used for executing convolution calculation are determined from the M channels based on the attention vector, a feature map with relatively strong noise robustness is extracted through the N channels, and different weighting coefficients are distributed to different feature maps based on different feature data and different noise robustness, so that the feature maps are corrected. On one hand, the calculated amount of neural network reasoning is reduced by reducing the number of channels, and a network model is compressed under the condition of ensuring the calculation precision of the neural network; on the other hand, the optimal feature map can be screened out through the attention vector for the calculation of the next layer of the target convolutional layer, so that the feature map with poor noise robustness, such as a circuit device, is prevented from flowing into the next layer of the target convolutional layer, the influence of the noise of the circuit device and the like in the neural network system on the neural network system is reduced, and the accuracy of neural network reasoning calculation is improved.
In a possible implementation manner, before performing the neural calculation, a training process of the neural network is further included, and specifically, the method further includes step S1305.
Step S1305: adjusting initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system, wherein the adjustment process combines at least one of simulated noise and actual noise of circuit devices of the neural network system, and the trained network parameters include weights of the target convolutional layer and attention vectors corresponding to the target convolutional layer.
The process of the neural network training may refer to the related description of the neural network training method corresponding to fig. 8 to 11, and is not repeated here.
In a possible implementation manner, the adjusting initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system includes:
inputting the training data into the neural network system to obtain third output data; calculating based on the third output data and the target output data to obtain a loss value; and updating initial network parameters of the neural network according to the loss value.
Referring to fig. 15, fig. 15 is a schematic flowchart of a neural network training and calculating method provided in an embodiment of the present application, where the neural network training and calculating method includes the neural network training method in any one of the embodiments corresponding to fig. 8 to 11 and the neural network calculating method in any one of the embodiments corresponding to fig. 13 and 14. The implementation process of the neural network training method corresponds to the neural network training stage, and the implementation process of the neural network computing method corresponds to the neural network computing stage. The neural network training phase may also be referred to as a software & hardware training phase, and the neural network computation phase may also be referred to as a hardware reasoning phase. The neural network training and calculating method corresponding to fig. 15 may be applied to the chip system corresponding to fig. 16, where fig. 16 is a schematic structural diagram of a chip system provided in the embodiment of the present application.
Firstly, the software and hardware training phase comprises the following steps:
1. and acquiring training data, and training L second weights and second attention vectors of the neural network model on the GPU based on the training data.
2. The L second weights obtained by training on the GPU are deployed to the memristor crossbar array of the chip system corresponding to fig. 16, and the second attention vector is deployed to the digital storage module of the chip system.
3. Based on the L second weights deployed on the memristor cross array of the chip system and the second attention vector deployed on the digital storage module of the chip system, combining actual noise of the memristor cross array, fixing the L second weights deployed on the memristor cross array of the chip system, retraining the second attention vector based on training data, and obtaining an updated second attention vector.
Secondly, the hardware reasoning phase comprises the following steps:
4. redeploying the retrained updated second attention vector to a digital memory module portion of the chip system;
5. controlling the opening and closing of each column of the memristor crossbar array according to the second attention vector;
6. data inference is performed based on L second weights (actually weights corresponding to closed portions in the memristor crossbar array) deployed on the memristor crossbar array and the second attention vector.
The corresponding system-on-chip architecture is shown in fig. 16. The chip system may include: the circuit comprises a switch control module, a memristor crossbar array, a digital storage module and a multiplier, wherein the detailed description of each unit is as follows.
The memristor cross array is used for storing L second weights and performing multiply-add calculation based on L second weight moments and input data of the target convolutional layer.
And the digital storage module is used for storing a second attention vector, wherein each element in the second attention vector is used for judging the robustness of each feature data extracted by the second weight to noise.
The switch control module is used for controlling closing of all columns of the memristor cross array according to the second attention vector stored by the digital storage module, and screening all columns of the memristor cross array participating in calculation through switch control.
The multiplier is used for data inference based on L second weights (actually weights corresponding to closed parts in the memristor crossbar array) and the second attention vector, wherein the L second weights are arranged on the memristor crossbar array.
In this embodiment, only L second weights corresponding to the target convolutional layer stored in the memristor crossbar array and a second attention vector corresponding to the target convolutional layer stored in the digital storage module are taken as examples for explanation, the memristor crossbar array may be further used to store weights corresponding to other convolutional layers of the neural network, and the digital storage module may be further used to store attention vectors corresponding to other convolutional layers of the neural network, which is not limited in this embodiment.
It is to be noted that the attention vector provided in the embodiment of the present application may correct, in the convolutional layer, the feature extracted by the convolutional matrix, and may also correct, in the fully-connected layer, the feature of the output data after passing through the convolutional layer/pooling layer, and the correction process may refer to the description related to the attention vector in the above embodiment, and is not described again in the embodiment of the present application.
The method of the embodiments of the present application is explained in detail above, and the related apparatus of the embodiments of the present application is provided below.
Referring to fig. 17, fig. 17 is a schematic structural diagram of a neural network training device according to an embodiment of the present disclosure, where the neural network training device 17 may include an input unit 1701, a calculation unit 1702, and an update unit 1703, where details of each unit are described below.
An input unit 1701, configured to obtain training data of the neural network system to obtain third output data; the target convolutional layer is any one of the at least one convolutional layer, and the target convolutional layer is used for calculating input data of the target convolutional layer through a first attention vector and L first weights so as to obtain input data of a next layer of the target convolutional layer; the L first weights are used for convolution calculation to extract feature data in input data of the target convolution layer, the first attention vector is used for correcting the feature data, and L is an integer greater than or equal to 1;
a calculating unit 1702, configured to compare the third output data with a target output data, and calculate a first loss value;
an updating unit 1703, configured to update a network parameter of the neural network system according to the first loss value, where the network parameter includes the first attention vector and the L first weights.
In a possible implementation manner, the updated L first weights are L second weights, and the updated first attention vector is a second attention vector;
the input unit 1701 is further configured to input the training data into the neural network system to obtain fourth output data; the target convolutional layer calculates input data of the target convolutional layer through the second attention vector and L third weights to obtain input data of a next layer of the target convolutional layer; the L third weights are in one-to-one correspondence with the L second weights, and each third weight is obtained by writing the corresponding second weight into a circuit device in a neural network system and introducing actual noise of the circuit device in the neural network system;
the calculating unit 1702 is further configured to compare the fourth output data with the target output data, and calculate a second loss value;
the updating unit 1703 is further configured to update the second attention vector in the network parameter according to the second loss value.
In one possible implementation, the apparatus further includes: a first processing unit 1704, configured to, for an initial attention vector, set an element of the vector that is less than or equal to the preset threshold to zero, and keep an element of the vector that is greater than the preset threshold unchanged, to obtain the first attention vector.
In one possible implementation, the apparatus further includes: a first processing unit 1704 for processing the initial attention vector based on an activation function; and setting elements smaller than or equal to the preset threshold value in the vector to be zero aiming at the processed initial attention vector, and keeping the elements larger than the preset threshold value in the vector unchanged to obtain the first attention vector.
In a possible implementation manner, in a case that the first processing unit 1704 is configured to set an element in the vector that is smaller than or equal to the preset threshold to zero, and keep the element in the vector that is larger than the preset threshold unchanged, to obtain the first attention vector, the first processing unit is specifically configured to:
sorting the elements in the vector according to numerical values from large to small; determining the value of the N-th sorted element as the preset threshold, where N is determined based on a preset pruning rate, and the pruning rate is used to represent a ratio of the number of the elements in the vector that are set to be invalid to the total number of the elements in the vector; and setting elements smaller than or equal to the preset threshold value in the vector to be zero, and keeping the elements larger than the preset threshold value in the vector unchanged to obtain the first attention vector.
In one possible implementation, the apparatus further includes: a second processing unit 1705, configured to introduce a read-write error of a circuit device in the simulated neural network system into L initial weights to obtain the L first weights, where the L initial weights are in one-to-one correspondence with the L first weights. Optionally, the read-write error of the circuit device in the simulated neural network system follows a normal distribution, and a standard deviation of the normal distribution is associated with a magnitude of the read-write error of the circuit device in the neural network system.
In a possible implementation manner, the updating unit 1703 is specifically configured to: updating the initial attention vector and L initial weights according to the first loss value; updating the L first weights based on the updated L initial weights; updating the first attention vector based on the updated initial attention vector.
It should be noted that, for the functions of each functional unit in the neural network training device 17 described in the embodiment of the present application, reference may be made to the description of each step in the above embodiment of the neural network training method, and details are not repeated herein.
Referring to fig. 18, fig. 18 is a schematic structural diagram of a neural network computing device 18 according to an embodiment of the present application, where the neural network computing device 18 is applied to a neural network system including at least one convolutional layer, and may include an obtaining unit 1801 and a computing unit 1802, where details of each unit are described below.
An obtaining unit 1801, configured to obtain input data of a target convolutional layer, where the target convolutional layer is any one of the at least one convolutional layer;
a calculating unit 1802, configured to perform convolution calculation on the input data based on the weight of the target convolution layer to obtain first output data, where the first output data includes N first feature maps, and N is an integer greater than or equal to 1;
the computing unit 1802 is further configured to compute the first output data based on an attention vector corresponding to the target convolutional layer, and obtain second output data, where the second output data includes N second feature maps robust to noise, the attention vector includes N elements, and each element of the N elements is used to represent robustness of the corresponding first feature data to noise.
In one possible implementation, the target convolutional layer includes M channels, and the apparatus further includes a determining unit 1803.
A determining unit 1803, configured to determine, from the M channels, N channels for performing the convolution calculation based on the attention vector, where an element value of an element included in the attention vector is greater than or equal to a preset threshold, and M is an integer greater than N.
In one possible implementation, the apparatus further includes a training unit 1804.
A training unit 1804, configured to adjust initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system, where the adjustment process combines at least one of simulated noise and actual noise of circuit devices of the neural network system, and the trained network parameters include weights of the target convolutional layers and attention vectors corresponding to the target convolutional layers.
In a possible implementation manner, when the training unit 1804 is configured to adjust an initial network parameter of the neural network system based on training data to obtain a trained network parameter of the neural network system, the training unit is specifically configured to:
inputting the training data into the neural network system to obtain third output data; calculating based on the third output data and the target output data to obtain a loss value; and updating initial network parameters of the neural network according to the loss value.
Wherein the weights are L updated first weights obtained according to the neural network training method corresponding to fig. 8 to 11, and the attention vector is determined based on the updated first attention vector obtained according to the neural network training method corresponding to fig. 8 to 11.
It should be noted that, for the functions of each functional unit in the neural network computing device 18 described in the embodiment of the present application, reference may be made to the description of each step in the above neural network computing method embodiment, and details are not repeated herein.
Fig. 19 is a schematic structural diagram of another neural network training device according to an embodiment of the present disclosure, and as shown in fig. 19, the neural network training device 19 includes at least one processor 191, at least one memory 192, and at least one communication interface 193. In addition, the device may also include common components such as an antenna, which will not be described in detail herein.
The processor 191 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control the execution of programs according to the above schemes.
Communication interface 193 is used for communicating with other devices or communication Networks, such as ethernet, Radio Access Network (RAN), core network, Wireless Local Area Networks (WLAN), etc.
The Memory 192 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.
The memory 192 is used for storing application program codes for executing the above schemes, and is controlled by the processor 191 to execute. The processor 191 is configured to execute application program code stored in the memory 192.
The code stored in the memory 192 may perform the neural network training method provided in fig. 8 to 11, for example, acquiring training data of the neural network system to obtain third output data; the target convolutional layer is any one of the at least one convolutional layer, and the target convolutional layer is used for calculating input data of the target convolutional layer through a first attention vector and L first weights so as to obtain input data of a next layer of the target convolutional layer; the L first weights are used for convolution calculation to extract feature data in input data of the target convolution layer, the first attention vector is used for correcting the feature data, and L is an integer greater than or equal to 1; comparing the third output data with the target output data to calculate a first loss value; updating network parameters of the neural network system according to the first loss value, the network parameters including the first attention vector and the L first weights.
It should be noted that, for the functions of each functional unit in the neural network training device described in the embodiment of the present application, reference may be made to the description of each step in the embodiment of the neural network training method described above, and details are not repeated here.
Fig. 20 is a schematic structural diagram of another neural network computing device provided in an embodiment of the present application, and as shown in fig. 20, the neural network computing device 20 may include a first storage unit 2001, a second storage unit 2002, a third storage unit 2003, and a processing unit 2004, where details of each unit are described below.
A first storage unit 2001, configured to store input data of a neural network system, where the neural network system includes at least one convolutional layer, and a target convolutional layer is configured to calculate input data of the target convolutional layer through a second attention vector and L second weights, so as to obtain input data of a next layer of the target convolutional layer; wherein the L second weights are used for convolution calculation to extract feature data in the input data of the target convolutional layer, the second attention vector is used for correcting the feature data, the network parameters include the second attention vector and the L second weights, and the target convolutional layer is any one of the at least one convolutional layer.
A second storage unit 2002 for storing the L second weights.
A third storage unit 2003 for storing the second attention vector.
The processing unit 2004 is configured to obtain the input data and a network parameter, and calculate output data of the neural network calculation apparatus based on the input data and the network parameter, where the network parameter includes the second attention vector and the L second weights.
Wherein the L second weights are the L updated first weights obtained according to the neural network training method corresponding to fig. 8 to 11, and the second attention vector is the first updated attention vector obtained according to the neural network training method corresponding to fig. 8 to 11.
In one possible implementation, the apparatus further includes a switch controller 2005 and M switches (first switch 2006-1, second switch 2006-2, … …, mth switch 2006-M), the second storage unit includes M storage subunits (first storage subunit 2002-1, second storage subunit 2002-2, … …, mth storage subunit 2002-M), the M storage subunits are connected to the M switches one by one, the switch controller 2005 is connected to the M switches, and the switch controller 2005 is connected to the third storage unit 2003.
The L second weights comprise M weight sets, and the union of the M weight sets is the L second weights; wherein, a weight set is used for convolution calculation to extract a feature map of the input data of the target convolution layer, each of the M storage subunits is used for storing a weight set, the second attention vector comprises M elements, and the M elements, the M switches and the M weight sets are in one-to-one correspondence;
the switch controller 2005 is configured to, when a value of a target element in the second attention vector is greater than or equal to a preset threshold, control a switch corresponding to the target element to be turned on so that the processing unit obtains a weight set corresponding to the value of the target element being greater than or equal to the preset threshold, and control the switch corresponding to the target element to be turned off when the value of the element in the second attention vector is less than or equal to the preset threshold, where the target element is any one of the second attention vectors;
the processing unit 2004, further configured to: determining N weight sets based on the values of the M elements and the M weight sets, wherein the N weight sets are formed by weight sets corresponding to elements of which the element values in the second attention vector are greater than or equal to a preset threshold value in the M weight sets; determining a third attention vector according to the values of the M elements and the second attention vector, wherein the third attention vector is composed of elements of which the element values are greater than or equal to a preset threshold value in the second attention vector; wherein the calculating, by the target convolutional layer, input data of the target convolutional layer through the second attention vector and the L second weights to obtain input data of a next layer of the target convolutional layer comprises: the target convolutional layer is used for calculating input data of the target convolutional layer through the third attention vector and the N weight sets to obtain input data of a next layer of the target convolutional layer.
The processing unit 2004 may be a processor, such as a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of the above programs.
The first storage unit 2001, the second storage unit 2002 and the third storage unit 2003 may be memories, such as a Read-Only Memory (ROM) or other types of static storage devices capable of storing static information and instructions, a Random Access Memory (RAM) or other types of dynamic storage devices capable of storing information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium capable of carrying or storing desired program codes in the form of instructions or data structures and capable of being accessed by a computer, but is not limited thereto.
Optionally, the neural network computing device in this embodiment of the present application may further include a communication interface, configured to communicate with other devices or communication Networks, such as an ethernet, a Radio Access Network (RAN), a core network, a Wireless Local Area Network (WLAN), and the like.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, and may specifically be a processor in the computer device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. The storage medium may include: a U-disk, a removable hard disk, a magnetic disk, an optical disk, a Read-Only Memory (ROM) or a Random Access Memory (RAM), and the like.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (12)

1. A neural network computing method, performed by a neural network system including at least one convolutional layer, the method comprising:
acquiring input data of a target convolutional layer, wherein the target convolutional layer is any one of the at least one convolutional layer;
performing convolution calculation on the input data based on the weight of the target convolution layer to obtain first output data, wherein the first output data comprises N first feature maps, and N is an integer greater than or equal to 1;
and calculating the first output data based on an attention vector corresponding to the target convolutional layer to obtain second output data, wherein the second output data comprises N second feature maps robust to noise, the attention vector comprises N elements, and each element in the N elements is used for representing the robustness of the corresponding first feature map to the noise.
2. The method of claim 1, wherein the target convolutional layer comprises M channels, the method further comprising:
determining N channels for performing the convolution calculation from the M channels based on the attention vector, wherein element values of elements included in the attention vector are greater than or equal to a preset threshold, and M is an integer greater than N.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
adjusting initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system, wherein the adjustment process combines at least one of simulated noise and actual noise of circuit devices of the neural network system, and the trained network parameters include weights of the target convolutional layer and attention vectors corresponding to the target convolutional layer.
4. The method of claim 3, wherein the adjusting initial network parameters of the neural network system based on the training data to obtain trained network parameters of the neural network system comprises:
inputting the training data into the neural network system to obtain third output data;
calculating based on the third output data and the target output data to obtain a loss value;
and updating initial network parameters of the neural network according to the loss value.
5. A neural network computing device, applied to a neural network system including at least one convolutional layer, the device comprising:
an obtaining unit, configured to obtain input data of a target convolutional layer, where the target convolutional layer is any one of the at least one convolutional layer;
a calculating unit, configured to perform convolution calculation on the input data based on the weight of the target convolution layer to obtain first output data, where the first output data includes N first feature maps, and N is an integer greater than or equal to 1;
the calculation unit is further configured to calculate the first output data based on an attention vector corresponding to the target convolutional layer to obtain second output data, where the second output data includes N second feature maps robust to noise, the attention vector includes N elements, and each element in the N elements is used to represent robustness of the corresponding first feature map to noise.
6. The apparatus of claim 5, wherein the target convolutional layer comprises M channels, the apparatus further comprising:
a determining unit configured to determine N channels for performing the convolution calculation from the M channels based on the attention vector, where element values of elements included in the attention vector are greater than or equal to a preset threshold value, and M is an integer greater than N.
7. The apparatus of claim 5 or 6, further comprising:
a training unit, configured to adjust an initial network parameter of the neural network system based on training data to obtain a trained network parameter of the neural network system, where the adjustment process combines at least one of analog noise and actual noise of a circuit device of the neural network system, and the trained network parameter includes a weight of the target convolutional layer and an attention vector corresponding to the target convolutional layer.
8. The apparatus according to claim 7, wherein the training unit, when configured to adjust initial network parameters of the neural network system based on training data to obtain trained network parameters of the neural network system, is specifically configured to:
inputting the training data into the neural network system to obtain third output data; calculating based on the third output data and the target output data to obtain a loss value; and updating initial network parameters of the neural network according to the loss value.
9. A neural network computing device comprising a processor, a memory and a communication interface, wherein the memory is configured to store neural network computing program code and the processor is configured to invoke the neural network computing program code to perform the method of any one of claims 1 to 4.
10. A chip system, comprising at least one processor, a memory, and an interface circuit, the memory, the interface circuit, and the at least one processor interconnected by a line, the at least one memory having instructions stored therein; the neural network computing method of any one of claims 1 through 4 when the instructions are executed by the processor.
11. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed by a processor, implements the neural network computing method of any one of claims 1 to 4.
12. A computer program product, characterized in that it comprises instructions which, when executed by a computer, cause the computer to carry out the neural network computing method of any one of the preceding claims 1 to 4.
CN202011432705.2A 2020-12-09 2020-12-09 Neural network computing method and related equipment Pending CN114626500A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011432705.2A CN114626500A (en) 2020-12-09 2020-12-09 Neural network computing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011432705.2A CN114626500A (en) 2020-12-09 2020-12-09 Neural network computing method and related equipment

Publications (1)

Publication Number Publication Date
CN114626500A true CN114626500A (en) 2022-06-14

Family

ID=81895062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011432705.2A Pending CN114626500A (en) 2020-12-09 2020-12-09 Neural network computing method and related equipment

Country Status (1)

Country Link
CN (1) CN114626500A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115113814A (en) * 2022-06-21 2022-09-27 腾讯科技(深圳)有限公司 Neural network model online method and related device
CN115169530A (en) * 2022-06-29 2022-10-11 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115113814A (en) * 2022-06-21 2022-09-27 腾讯科技(深圳)有限公司 Neural network model online method and related device
CN115113814B (en) * 2022-06-21 2024-02-02 腾讯科技(深圳)有限公司 Neural network model online method and related device
CN115169530A (en) * 2022-06-29 2022-10-11 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN115169530B (en) * 2022-06-29 2023-09-26 北京百度网讯科技有限公司 Data processing method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
Borovykh et al. Dilated convolutional neural networks for time series forecasting
CN111667399B (en) Training method of style migration model, video style migration method and device
CN112183713A (en) Neural network device and method for operating a neural network
KR20160034814A (en) Client device with neural network and system including the same
CN113705769A (en) Neural network training method and device
CN112288086A (en) Neural network training method and device and computer equipment
WO2007027452A1 (en) Training convolutional neural networks on graphics processing units
KR20180060257A (en) Metohd and apparatus for object recognition
CN111325318B (en) Neural network training method, neural network training device and electronic equipment
KR20170038622A (en) Device and method to segment object from image
CN107958285A (en) The mapping method and device of the neutral net of embedded system
Kang et al. Random forest with learned representations for semantic segmentation
CN112288011A (en) Image matching method based on self-attention deep neural network
CN113191489B (en) Training method of binary neural network model, image processing method and device
CN114595799A (en) Model training method and device
CN114626500A (en) Neural network computing method and related equipment
CN112561028A (en) Method for training neural network model, and method and device for data processing
CN113065632A (en) Method and apparatus for validating training of neural networks for image recognition
CN113627163A (en) Attention model, feature extraction method and related device
KR102236582B1 (en) Image processing apparatus and operating method for the same
Skočaj et al. Incremental and robust learning of subspace representations
JP2022008236A (en) Neuromorphic device, and method for implementing neural network
CN117237756A (en) Method for training target segmentation model, target segmentation method and related device
US20220284545A1 (en) Image processing device and operating method thereof
CN113313133A (en) Training method for generating countermeasure network and animation image generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination