WO2021035598A1 - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
WO2021035598A1
WO2021035598A1 PCT/CN2019/103198 CN2019103198W WO2021035598A1 WO 2021035598 A1 WO2021035598 A1 WO 2021035598A1 CN 2019103198 W CN2019103198 W CN 2019103198W WO 2021035598 A1 WO2021035598 A1 WO 2021035598A1
Authority
WO
WIPO (PCT)
Prior art keywords
bits
data
floating
point
neural network
Prior art date
Application number
PCT/CN2019/103198
Other languages
French (fr)
Chinese (zh)
Inventor
周爱春
余俊峰
聂谷洪
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2019/103198 priority Critical patent/WO2021035598A1/en
Priority to CN201980033583.9A priority patent/CN112189216A/en
Publication of WO2021035598A1 publication Critical patent/WO2021035598A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • This application relates to the field of computer technology, and in particular to a data processing method and equipment.
  • neural networks are more and more widely used.
  • the neural network is a complex network system formed by the related connection of a large number of simple processing units (also called neurons).
  • a neural network may include multiple computing nodes, and each computing node may include multiple neurons.
  • the parameters and activation data of the computing nodes in the neural network are stored in a storage device, and the computing device is running the neural network. During the process, a large amount of data interaction with the storage device is required.
  • the parameters of the neural network and the excitation data are all fixed-point data represented by a certain number of bits. The number of bits is the number of bits of the operand of the computing device, such as 8 bits. .
  • the embodiments of the present application provide a data processing method and device to solve the problem of a neural network occupying a relatively large bandwidth between a computing device and a storage device in the prior art.
  • an embodiment of the present application provides a data processing method applied to a computing device, including:
  • the target data of the computing node in the neural network is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit
  • the first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
  • the output data of the computing node is calculated.
  • an embodiment of the present application provides a data processing device, including: a processor and a memory; the memory is used to store program code; the processor calls the program code, and when the program code is executed, Used to perform the following operations:
  • the target data of the computing node in the neural network is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit
  • the first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
  • the output data of the computing node is calculated.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes at least one piece of code, the at least one piece of code can be executed by a computer to control all The computer executes the method described in any one of the above-mentioned first aspects.
  • an embodiment of the present application provides a computer program, when the computer program is executed by a computer, it is used to implement the method described in any one of the above-mentioned first aspects.
  • the embodiments of the application provide a data processing method and device, which reads target data of a computing node in a neural network from a storage device.
  • the target data is fixed-point data and the number of bits occupied by the storage device is less than the number of operations of the computing device.
  • the second number of bits is calculated according to the target data to obtain the output data of the computing node. Since the target data is fixed-point data, it can ensure the computational efficiency of the neural network on the computing device. Because the number of bits occupied by the target data in the storage device is the first One bit number, and the first bit number is less than the second bit number of the operand of the computing device.
  • the data volume of the neural network interacting between the computing device and the storage device can be reduced, thereby ensuring that the neural network is computing Based on the computing efficiency of the device, the neural network occupies the bandwidth between the computing device and the storage device. In addition, by reducing the bandwidth requirements between computing devices and storage devices, costs can also be reduced.
  • FIG. 1 is a schematic diagram of an application scenario of a data processing method provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of this application.
  • FIG. 3 is a schematic flowchart of a data processing method provided by another embodiment of this application.
  • 4A is a schematic diagram of storing target data before expansion provided by an embodiment of the application.
  • 4B is a schematic diagram of storage of expanded target data provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of a data processing method provided by another embodiment of this application.
  • 6A is a schematic diagram of storage of output data before compression provided by an embodiment of the application.
  • 6B is a schematic diagram of storing compressed output data provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of the relationship between the expansion and compression and the calculation process provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a training process of a computing node provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of a prediction process of a computing node provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of this application.
  • the data processing method provided in the embodiments of the present application can be applied to any scenario that requires data processing through a neural network, and the data processing method can be specifically executed by a computing device.
  • a schematic diagram of the application scenario of the data processing method provided by the embodiments of the present application may be as shown in FIG. 1.
  • the computing device 11 is in communication with the server 12, the computing device 11 can obtain the neural network from the server 12, and the computing device 11 can obtain the neural network from the server 12. 12
  • the obtained neural network performs data processing.
  • the data processing may specifically be any type of processing that can be completed through a neural network.
  • the computing device 11 may specifically be a device capable of completing computing functions.
  • the computing device includes but is not limited to one or more of the following: Central Processing Unit (CPU), Advanced RISC Machine (ARM), Digital Signal processor (Digital Signal Processor, DSP), Graphics Processor (Graphics Processing Unit, GPU).
  • CPU Central Processing Unit
  • ARM Advanced RISC Machine
  • DSP Digital Signal Processor
  • GPU Graphics Processor
  • a wireless communication connection may be realized based on a Bluetooth interface, or a wired communication connection may be realized based on an RS232 interface.
  • the computing device obtains the neural network from the server as an example.
  • the computing device can obtain the neural network in other ways.
  • the computing device can obtain the neural network from other devices, or the computing device The neural network can be obtained through self-training.
  • the target data of a computing node in a neural network is read from a storage device through a computing device, and the number of bits occupied by a target data in the storage device is less than the second number of bits of the operand of the computing device, Compared with the prior art where the target data of a computing node is fixed-point data represented by a second number of bits, the number of bits occupied by a target data in the storage device is equal to the second number of bits, which reduces the distance between the computing device and the storage device.
  • the data volume of the interactive neural network reduces the bandwidth occupied by the neural network between the computing device and the storage device.
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the application.
  • the execution subject of this embodiment may be a computing device, and specifically may be a processor of the computing device.
  • the method of this embodiment may include:
  • Step 201 Read the target data of the computing node in the neural network from the storage device.
  • the storage device may specifically be a device that can be accessed by the computing device and used to implement the data storage function.
  • the storage device includes but is not limited to one or more of the following: memory, solid state hard disk, mechanical hard disk, and floppy disk.
  • a neural network can be composed of multiple computing nodes. Each computing node can fetch activation data from the storage device, perform calculations, and then store the calculation results in the storage device as the input data of the next-level computing node, namely: Activation data of the first-level computing node).
  • a neural network as a convolutional neural network (Convolutional Neural Networks, CNN) as an example
  • a convolutional block (Conv Block) can correspond to a computing node.
  • the computing nodes in this embodiment may be all computing nodes in the neural network, or the computing nodes in this embodiment may be some computing nodes in the neural network.
  • the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by a target data in the storage device is the first number of bits, and the first number of bits is less than the second number of bits, so The second number of bits is the number of bits of the operand of the computing device.
  • fixed-point data means that the position of the decimal point in a number is fixed.
  • the second number of bits may be 16 or 8.
  • the first number of bits may be 8 or 4.
  • the first number of bits may be 4.
  • the target data is fixed-point data, it can ensure the computational efficiency of the neural network on the computing device. Since the number of bits occupied by a target data in the storage device is the first number of bits, which is less than the number of bits of the operand of the computing device (ie, the second number of bits), it is the same as the target data of the computing node in the prior art. For fixed-point data represented by the second number of bits, the number of bits occupied by a target data in the storage device is equal to the second number of bits, which reduces the amount of data in the neural network that interacts between the computing device and the storage device, so it can be reduced The amount of data of the neural network interacting between the computing device and the storage device.
  • the target data may specifically be one or more types of data of any type that needs to be read from the storage device during the calculation process of the neural network.
  • the target data includes weight parameters and/or activation data.
  • the weight parameter is the network parameter of the neural network.
  • Step 202 Calculate the output data of the computing node according to the target data.
  • the specific calculation method for calculating according to the target data may include a preset calculation related to the type of neural network.
  • the preset calculation may include a convolution calculation, and specifically, a convolution operation may be performed on the activation data and the network parameters.
  • the output data of the computing node is the activation data of the next-level computing node of the computing node.
  • the target data of the computing node in the neural network is read from the storage device.
  • the target data is fixed-point data and the number of bits occupied by the storage device is less than the second bit number of the operand of the computing device.
  • the data is calculated to obtain the output data of the computing node. Because the target data is fixed-point data, it can ensure the computational efficiency of the neural network on the computing device. Because the number of bits occupied by the target data in the storage device is the first bit number, and the first bit The number is smaller than the second bit number of the operand of the computing device, so the data volume of the neural network interacting between the computing device and the storage device can be reduced, so that the computing efficiency of the neural network on the computing device can be ensured. , To reduce the bandwidth between the computing device and the storage device by the neural network. In addition, by reducing the bandwidth requirements between computing devices and storage devices, costs can also be reduced.
  • Figure 3 is a schematic flow chart of a data processing method provided by another embodiment of the application. This embodiment mainly describes how to calculate the output data of the computing node based on the target data on the basis of the embodiment shown in Figure 3 An optional implementation. As shown in FIG. 3, the method of this embodiment may include:
  • Step 301 Read the target data of the computing node in the neural network from the storage device.
  • the number of bits occupied by one target data in the storage device is the first number of bits, and the number of bits occupied by one target data read from the storage device in step 301 is also the first number of bits.
  • Step 302 Expand the number of bits occupied by the target data from the first number of bits to the second number of bits to obtain the expanded target data.
  • the number of bits occupied by the target data can be expanded from the first number of bits to the second number of bits through step 302 to obtain the expanded target data.
  • a combination with instruction "and”, shift instruction “shift”, sum instruction “add”, etc. can be used to expand the number of bits occupied by the target data from the first number of bits to all. State the number of second bits.
  • step 302 may specifically include: adding a total of N padding bits to the head and/or tail of the first number of bits occupied by the target data to obtain the expanded target data, where N is all The difference between the second number of bits and the first number of bits.
  • the effective bits in the expanded target data can be located in the lower 4 bits.
  • the first number of bits is equal to 4, the second number of bits is equal to 8, and N padding bits are added to the end of the first number of bits occupied by the target data to obtain the expanded target data At this time, the effective bits in the expanded target data can be located in the upper 4 bits.
  • the padding bits can be understood as invalid bits, the bits other than the padding bits in the expanded target data can be understood as valid bits, and the valid bits are consecutive first bits.
  • the padding bits can be flexibly designed according to requirements. Exemplarily, taking the first number of bits as 4 and the second number of bits as 8 as an example, assuming that the expanded target data needs to be used as a signed 8-digit number, and the computing device uses the complement code to represent the signed number, The effective bits can be placed in the upper 4 bits, and the upper 4 bits can be complemented by 1, that is, the padding bits can be 1.
  • the unpack process can be as shown in Figure 4A and Figure 4B, for example.
  • the expansion of the previous data occupies 4 bits
  • the storage address relationship of data A1 to F1 in the computing device can be as shown in Figure 4A.
  • one piece of data occupies a total of 8 bits, and the storage address relationship of the expanded data in the computing device can be shown in Figure 4A. Shown in 4B.
  • Step 303 Calculate the output data of the computing node according to the expanded target data.
  • the specific calculation method for calculating based on the expanded target data may include a preset calculation related to the type of neural network.
  • the preset calculation may include a convolution calculation, and specifically, a convolution operation may be performed on the activation data and the network parameters.
  • the expanded target data is obtained, and the expanded target data is calculated based on the expanded target data.
  • the computing device can calculate the output data of the computing node based on the target data whose number of bits occupied in the storage device is less than the second number of bits, and solve the problem of the number of bits occupied by the target data in the storage device and the computing device.
  • the number of bits in the operands is inconsistent, leading to the problem that the neural network cannot be deployed in the computing device.
  • Figure 5 is a schematic flow chart of a data processing method provided by another embodiment of the application. This embodiment mainly describes the calculation of the output data of the computing node based on the target data on the basis of the embodiment shown in Figure 2 An optional implementation. As shown in FIG. 5, the method of this embodiment may include:
  • Step 501 Read the target data of the computing node in the neural network from the storage device.
  • step 501 is similar to step 201 and step 301, and will not be repeated here.
  • Step 502 Perform a preset calculation according to the target data to obtain a calculation result.
  • step 502 may specifically include: performing a preset calculation according to the expanded target data to obtain a calculation result.
  • the preset calculation is a convolution operation
  • the calculation result may specifically be a convolution calculation result.
  • Step 503 quantize the calculation result by using the first number of bits to obtain output data of the calculation node.
  • the output data is fixed-point data represented by the first number of bits, and the number of occupied bits is the second number of bits.
  • the output data of the computing node obtained by quantizing the calculation result using the first number of bits may be 0b1001.
  • step 503 may be replaced by using the second number of bits to quantify the calculation result to obtain The output data of the computing node.
  • Step 504 Compress the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output data.
  • step 504 may specifically include: selecting valid consecutive bits of the first number of bits from the number of bits of the second number of bits occupied by the output data as the compressed output data. Among them, invalid bits can be discarded.
  • the pack process corresponds to the unpack process. For example, assuming that the data before compression occupies 4 bits, the storage address relationship of data G to L in the computing device can be shown in FIG. The data occupies a total of 4 bits, and the storage address relationship of the compressed data in the computing device may be as shown in FIG. 6B.
  • a combination with the instruction “and”, the shift instruction “shift”, the sum instruction “add”, etc. can be used to compress the number of bits occupied by the output data from the second number of bits to the The first number of bits, the compressed output data is obtained.
  • Step 505 Write the compressed output data to the storage device.
  • step 505 the compressed output data is stored in the storage device, that is, the number of bits occupied by one piece of compressed output data in the storage device is the first number of bits.
  • the calculation result is obtained by performing a preset calculation according to the target data, the calculation result is quantized by the first number of bits to obtain the output data of the calculation node, and the number of bits occupied by the output data is compressed from the second number of bits to the first number.
  • the number of bits obtains the compressed output data, and the compressed output data is written to the storage device, which can reduce the amount of data that the computing device reads the output data of the computing node from the storage device.
  • pack process in step 302 above can be replaced by a data migration process in which the storage device moves the target data to the computing device, that is, the target data can be occupied during the process of reading the target data from the storage device in step 201.
  • the number of bits is expanded from the first number of bits to the second number of bits.
  • the unpack process in step 504 can be replaced by a data migration process in which the computing device writes output data to the storage device, that is, step 504 and step 505 can be replaced with the output data obtained in step 503,
  • the number of bits occupied by the output data is compressed from the second number of bits to the first number of bits.
  • direct memory access (Direct Memory Access, DMA) transmission can be used to implement the pack process and the unpack process in the process of realizing data relocation.
  • DMA Direct Memory Access
  • the foregoing process of calculating the output data of a computing node based on the target data of the computing node may be considered as the computing process of the computing node.
  • the computing process may include step 502 and step 503.
  • the unpack process can be performed before the calculation process k, so that the storage device occupies 4 bits of input data i[k ]
  • the number of bits is expanded to 8 bits.
  • the pack process can be carried out to realize the 8-bit output data output by the calculation process k.
  • the number of bits occupied by o[k] is compressed to 4 bits, so as to store in the storage device. O[k] occupies 4 bits.
  • the neural network can be trained according to a specific training strategy.
  • the training strategy may include a first training strategy.
  • the first training strategy includes: storing floating-point weight parameters during the training process, and before performing calculations according to the floating-point weight parameters of the computing node, using the first bit number to compare the floating-point weight parameters The weight parameter is quantified.
  • the foregoing process of calculating the output data of a computing node based on the activation data and weight parameters of the computing node can be considered as the computing process of the computing node.
  • the neural network is based on the data type of the calculation process of a computing node in the prediction process and the data type of the calculation process of the computing node in the training process is consistent, both are based on the fixed-point data represented by the first bit number, and the fixed-point type
  • the accuracy of the data is greatly affected by the number of bits of the fixed-point data. Therefore, the floating-point weight parameters are stored during the training process, and the floating-point weight parameters are used to simulate the fixed-point weight parameters based on the first bit number. Compared with storing the fixed-point weight parameter represented by the first bit number, the accuracy of parameter learning in the neural network can be improved, and the accuracy of the neural network can be improved.
  • the use of the first bit number to quantize the floating-point weight parameter may specifically include the following steps A1 and A2.
  • Step A1 transform the floating-point weight parameter to a first preset range to obtain a transformed floating-point weight parameter; the minimum value of the first preset range is greater than or equal to zero.
  • Step A2 using the first bit number to quantize the converted floating-point weight parameter.
  • step A1 can specifically convert the floating-point weight parameter from a real number range to a positive number range.
  • the first preset range includes 0 to 1.
  • Step A2 can specifically implement the first-bit fixed-point quantization of the floating-point weight parameter transformed to the first preset range, so as to obtain the fixed-point weight parameter represented by the first bit number.
  • the computing node can perform the first bit number and fixed-point quantization. The fixed-point weight parameter represented by the number of bits is calculated.
  • step A1 may specifically include the following steps A11 and A12.
  • Step A11 Perform a non-linear transformation on the floating-point weight parameter to transform the floating-point weight parameter to a second preset range, and the minimum value of the second preset range is less than 0.
  • Step A12 performing scale & shift transformation on the floating-point weight parameter after nonlinear transformation, so as to transform the floating-point weight parameter to the first preset range.
  • the specific manner of the nonlinear transformation in step A11 may not be limited in this application.
  • the nonlinear transformation may be realized through a tangent function or a sigmoid function.
  • the second preset range includes -1 to 1.
  • the purpose of further scaling and shifting the floating-point weight parameter after the nonlinear transformation in step A12 may include making the value range of the floating-point weight parameter after the transformation a first preset range.
  • the specific method of conversion may not be limited in this application.
  • the first number of bits may be used to perform symmetric uniform quantization on the transformed floating-point weight parameter.
  • the combined effect of asymmetric and non-uniform quantization can be better than that of symmetric uniform quantization. Because of hardware considerations, symmetric uniform quantization has high hardware implementation efficiency, so the symmetric uniform quantization in step A2 can ensure hardware efficiency.
  • the foregoing correlation operation of quantizing the floating-point weight parameter by using the first number of bits is derivable.
  • the correlation operation of quantizing the floating-point weight parameter by using the first number of bits is derivable, so that the network parameter of the computing node can be iteratively solved by the gradient descent method.
  • the output data is fixed-point data represented by the first number of bits, and the number of bits occupied by the output data is the second number of bits;
  • the training strategy when the target data includes activation data, includes a second training strategy.
  • the second training strategy includes: storing floating-point activation data during the training process, and before performing calculations based on the floating-point activation data of the computing node, using the first bit number to compare the floating-point activation data.
  • the activation data is quantified.
  • the neural network is based on the data type of the calculation process of a computing node in the prediction process and the data type of the calculation process of the computing node in the training process is consistent, both are based on the fixed-point data represented by the first bit number, and the fixed-point type
  • the accuracy of the data is greatly affected by the number of bits of the fixed-point data. Therefore, the floating-point activation data is used to simulate the fixed-point activation data based on the first bit number, and the fixed-point type based on the first bit number is stored during the training process. Compared with the activation data, the accuracy of parameter learning in the neural network can be improved, so that the accuracy of the neural network can be improved.
  • step B1 may be used to quantize the floating-point activation data.
  • Step B1 When the value range of the floating-point activation data does not include a negative number, the first bit number is directly used to quantize the floating-point activation data.
  • steps B2 and B3 may be used to quantize the floating-point activation data.
  • Step B2 When the value range of the floating-point activation data includes a negative number, scale and shift the floating-point activation data to transform the floating-point activation data to a third preset range , The minimum value of the third preset range is greater than or equal to zero.
  • Step B3 using the first bit number to quantize the floating-point activation data after scaling and shifting.
  • the purpose of scaling and shifting the floating-point activation data in step B2 may include: making the minimum value of the value range of the floating-point activation data after the conversion greater than or equal to 0, and for the specific manner of scaling and transformation , This application is not limited.
  • Step B1 or Step B2-Step B3 can specifically implement floating-point activation data to perform fixed-point quantization of the first bit number, thereby obtaining fixed-point activation data represented by the first bit number. Further, the computing node can express according to the first bit number The fixed-point activation parameters are calculated.
  • the offset parameter in the network parameters occupies a small proportion in the calculation, the offset parameter may be quantized with a high-precision quantization method in the embodiment of the present application.
  • the above-mentioned training strategy may further include a third training strategy; the third training strategy includes: storing floating-point bias (bias) parameters during the training process, according to the floating-point bias of the computing node Before the parameter is calculated, the floating-point offset parameter is quantized by using the third bit number.
  • the third number of bits may be greater than the first number of bits.
  • the third number of bits may be greater than the second number of bits, for example, the third number of bits is equal to 32, the second number of bits is equal to 8, and the third number of bits is equal to 4.
  • the first training strategy, the second training strategy, and the third training strategy can be combined, that is, the training strategy can include one or more of the first training strategy, the second training strategy, and the third training strategy. .
  • the neural network is a convolutional neural network and the first bit number is 4 as an example
  • the training procedure of the computing node may be as shown in FIG. 8, for example.
  • the quantification of the training process can be divided into two parts: weight quantization and input quantization.
  • the weighting can be composed of (1) nonlinear transformation, (2) scaling and shifting, and (3) symmetric uniform quantization.
  • (1) Non-linear transformation, the floating-point (float) weight (weight) is nonlinearly transformed to the range of (-1, 1), such as Etc.; (2) scaling and shifting, through the conversion function Convert the weight parameter range from [-1,1] to [0,1]; (3) symmetric and uniform quantization, through the quantization function Perform symmetric uniform quantization.
  • Input quantization is to distinguish whether the quantization of the active data contains negative numbers. For negative numbers, it can be scaled and shifted first to make the range [0, a], and finally 4bit uniform quantization is performed.
  • the quantized input and weights can be subjected to convolution operation based on the conv2d function, and further the results of the convolution operation can be batch-normalized
  • the (Batch Normalization, BN) layer/activation function ReLU obtains the output data of the computing node.
  • the prediction process (inference procedure) of the computing node in the neural network may be as shown in FIG. 9.
  • the activation data and weight parameters used in the prediction process of the computing node are all fixed-point data represented by 4 bits.
  • the result of the convolution calculation can be calculated. 4 bits are used for quantization, and fixed-point data represented by 4 bits is obtained as the output data of the computing node.
  • the second number of bits is 8
  • the activation data and weight parameters based on the prediction process of the computing node, and the output data of the computing node are all fixed-point data represented by 4 bits
  • the activation data, The weight parameter and the number of bits occupied by the output data are both the second number of bits, such as 8 bits.
  • the weight parameters are saved in the form of floating-point data during the training process, but the weight data is fixed-point during the prediction process.
  • Type data is stored in the form. Specifically, after the neural network training is completed, the weight parameter can be converted from floating-point data to fixed-point data represented by the first bit number.
  • the computing nodes in the foregoing method embodiment may be part of the computing nodes in the neural network.
  • the node data may be fixed-point data represented by the second number of bits, and the number of bits occupied by one node data in the storage device is the second bit number.
  • it may further include: reading node data of other computing nodes in the neural network from the storage device, where the node data is a fixed-point type represented by the second number of bits. data.
  • the node data may include activation data and network parameters.
  • the other computing node may be a point that requires a relatively high accuracy.
  • the other computing node uses the first number of bits to represent node data and the second number of bits to represent node data.
  • the fixed-point data represented by the second bit number can be realized on the basis of ensuring the accuracy of the algorithm, reducing the bandwidth occupation between the computing device and the storage device .
  • the other computing nodes include, but are not limited to, one or more of the following: computing nodes of the first layer in the neural network, computing nodes of the last layer in the neural network, and Computing nodes related to coordinates in the network.
  • the calculation node related to the coordinate may specifically be a calculation node whose input includes coordinates or an output includes coordinates, for example, a calculation node used to extract a region of interest from a graph.
  • the above-mentioned training strategy may further include: quantizing the floating-point weight parameter using the second number of bits before performing calculation according to the floating-point weight parameter of the other computing node; and/or, Before performing calculation according to the floating-point activation data of the other computing node, the floating-point activation data is quantized by using the second number of bits.
  • FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of this application. As shown in FIG. 10, the computing device 100 may include a processor 101 and a memory 102.
  • the memory 102 is used to store program codes
  • the processor 101 calls the program code, and when the program code is executed, is used to perform the following operations:
  • the target data of the computing node in the neural network is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit
  • the first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
  • the output data of the computing node is calculated.
  • the computing device provided in this embodiment may be used to execute the technical solutions of the foregoing method embodiments, and its implementation principles and technical effects are similar to those of the method embodiments, and will not be repeated here.
  • a person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware.
  • the aforementioned program can be stored in a computer readable storage medium. When the program is executed, it executes the steps including the foregoing method embodiments; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A data processing method and device. The method comprises: reading, from a storage device, target data of a computing node in a neural network (201), the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first number of bits, the first number of bits is less than a second number of bits, the second number of bits is the number of bits of an operand of the computing device; and computing output data of the computing node according to the target data (202). The method can reduce the bandwidth occupied by the neural network between the computing device and the storage device on the basis of ensuring the computational efficiency of the neural network on the computing device.

Description

数据处理方法及设备Data processing method and equipment 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种数据处理方法及设备。This application relates to the field of computer technology, and in particular to a data processing method and equipment.
背景技术Background technique
随着人工智能技术的发展,神经网络(Neural Network,NN)的应用越来越广泛。其中,神经网络是由大量简单的处理单元(也可以称为神经元)相关连接而形成的复杂网络***。With the development of artificial intelligence technology, neural networks (Neural Network, NN) are more and more widely used. Among them, the neural network is a complex network system formed by the related connection of a large number of simple processing units (also called neurons).
现有技术中,神经网络可以包括多个计算节点,每个计算节点中可以包括多个神经元,神经网络中计算节点的参数以及激活数据均存储在存储设备中,计算设备在运行神经网络的过程中需要与存储设备进行大量的数据交互。通常,为了提高神经网络在计算设备上的计算效率,神经网络的参数以及激励数据均是通过一定比特数表示的定点型数据,该比特数即为计算设备的操作数的比特数,例如8比特。In the prior art, a neural network may include multiple computing nodes, and each computing node may include multiple neurons. The parameters and activation data of the computing nodes in the neural network are stored in a storage device, and the computing device is running the neural network. During the process, a large amount of data interaction with the storage device is required. Generally, in order to improve the computational efficiency of the neural network on the computing device, the parameters of the neural network and the excitation data are all fixed-point data represented by a certain number of bits. The number of bits is the number of bits of the operand of the computing device, such as 8 bits. .
但是,现有技术中,存在神经网络占用计算设备与存储设备之间较大带宽的问题。However, in the prior art, there is a problem that the neural network occupies a relatively large bandwidth between the computing device and the storage device.
发明内容Summary of the invention
本申请实施例提供一种数据处理方法及设备,用以解决现有技术中神经网络占用计算设备与存储设备之间较大带宽的问题。The embodiments of the present application provide a data processing method and device to solve the problem of a neural network occupying a relatively large bandwidth between a computing device and a storage device in the prior art.
第一方面,本申请实施例提供一种数据处理方法,应用于计算设备,包括:In the first aspect, an embodiment of the present application provides a data processing method applied to a computing device, including:
从存储设备读取神经网络中计算节点的目标数据,所述目标数据为采用第一比特数表示的定点型数据,且所述存储设备中所述目标数据占用的比特 数为所述第一比特数,所述第一比特数小于第二比特数,所述第二比特数为所述计算设备的操作数的比特数;Read the target data of the computing node in the neural network from the storage device, where the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit The first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
根据所述目标数据,计算得到所述计算节点的输出数据。According to the target data, the output data of the computing node is calculated.
第二方面,本申请实施例提供一种数据处理装置,包括:处理器和存储器;所述存储器,用于存储程序代码;所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:In a second aspect, an embodiment of the present application provides a data processing device, including: a processor and a memory; the memory is used to store program code; the processor calls the program code, and when the program code is executed, Used to perform the following operations:
从存储设备读取神经网络中计算节点的目标数据,所述目标数据为采用第一比特数表示的定点型数据,且所述存储设备中所述目标数据占用的比特数为所述第一比特数,所述第一比特数小于第二比特数,所述第二比特数为所述计算设备的操作数的比特数;Read the target data of the computing node in the neural network from the storage device, where the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit The first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
根据所述目标数据,计算得到所述计算节点的输出数据。According to the target data, the output data of the computing node is calculated.
第三方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包含至少一段代码,所述至少一段代码可由计算机执行,以控制所述计算机执行上述第一方面任一项所述的方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes at least one piece of code, the at least one piece of code can be executed by a computer to control all The computer executes the method described in any one of the above-mentioned first aspects.
第四方面,本申请实施例提供一种计算机程序,当所述计算机程序被计算机执行时,用于实现上述第一方面任一项所述的方法。In a fourth aspect, an embodiment of the present application provides a computer program, when the computer program is executed by a computer, it is used to implement the method described in any one of the above-mentioned first aspects.
本申请实施例提供一种数据处理方法及设备,通过从存储设备读取神经网络中计算节点的目标数据,该目标数据是定点型数据且存储设备中其占用的比特数小于计算设备的操作数的第二比特数,根据目标数据计算得到计算节点的输出数据,由于目标数据是定点型数据,因此能够确保神经网络在计算设备上的计算效率,由于存储设备中目标数据占用的比特数是第一比特数,而第一比特数小于所述计算设备的操作数的第二比特数,因此能够减小计算设备与存储设备之间交互的神经网络的数据量,从而能够在确保神经网络在计算设备上的计算效率的基础上,减小神经网络占用计算设备与存储设备之间的带宽。另外,通过降低对于计算设备与存储设备之间的带宽要求,也可以降低成本。The embodiments of the application provide a data processing method and device, which reads target data of a computing node in a neural network from a storage device. The target data is fixed-point data and the number of bits occupied by the storage device is less than the number of operations of the computing device. The second number of bits is calculated according to the target data to obtain the output data of the computing node. Since the target data is fixed-point data, it can ensure the computational efficiency of the neural network on the computing device. Because the number of bits occupied by the target data in the storage device is the first One bit number, and the first bit number is less than the second bit number of the operand of the computing device. Therefore, the data volume of the neural network interacting between the computing device and the storage device can be reduced, thereby ensuring that the neural network is computing Based on the computing efficiency of the device, the neural network occupies the bandwidth between the computing device and the storage device. In addition, by reducing the bandwidth requirements between computing devices and storage devices, costs can also be reduced.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的数据处理方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a data processing method provided by an embodiment of the application;
图2为本申请一实施例提供的数据处理方法的流程示意图;2 is a schematic flowchart of a data processing method provided by an embodiment of this application;
图3为本申请另一实施例提供的数据处理方法的流程示意图;FIG. 3 is a schematic flowchart of a data processing method provided by another embodiment of this application;
图4A为本申请实施例提供的扩充前的目标数据的存储示意图;4A is a schematic diagram of storing target data before expansion provided by an embodiment of the application;
图4B为本申请实施例提供的扩充后的目标数据的存储示意图;4B is a schematic diagram of storage of expanded target data provided by an embodiment of the application;
图5为本申请又一实施例提供的数据处理方法的流程示意图;FIG. 5 is a schematic flowchart of a data processing method provided by another embodiment of this application;
图6A为本申请实施例提供的压缩前的输出数据的存储示意图;6A is a schematic diagram of storage of output data before compression provided by an embodiment of the application;
图6B为本申请实施例提供的压缩后的输出数据的存储示意图;6B is a schematic diagram of storing compressed output data provided by an embodiment of the application;
图7为本申请实施例提供的扩展及压缩与计算过程的关系示意图;FIG. 7 is a schematic diagram of the relationship between the expansion and compression and the calculation process provided by an embodiment of the application;
图8为本申请实施例提供的计算节点的训练过程示意图;FIG. 8 is a schematic diagram of a training process of a computing node provided by an embodiment of the application;
图9为本申请实施例提供的计算节点的预测过程示意图;FIG. 9 is a schematic diagram of a prediction process of a computing node provided by an embodiment of the application;
图10为本申请一实施例提供的计算设备的结构示意图。FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of this application.
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请实施例提供的数据处理方法可以应用于任何需要通过神经网络进行数据处理的场景中,该数据处理方法具体可以由计算设备执行。本申请实施例提供的数据处理方法的应用场景示意图可以如图1所示,具体的,计算设备11与服务器12通讯连接,计算设备11可以从服务器12获得神经网络,计算设备11可以通过从服务器12获得的神经网络进行数据处理。其中,所述数据处理具体可以为能够通过神经网络完成的任意类型处理。The data processing method provided in the embodiments of the present application can be applied to any scenario that requires data processing through a neural network, and the data processing method can be specifically executed by a computing device. A schematic diagram of the application scenario of the data processing method provided by the embodiments of the present application may be as shown in FIG. 1. Specifically, the computing device 11 is in communication with the server 12, the computing device 11 can obtain the neural network from the server 12, and the computing device 11 can obtain the neural network from the server 12. 12 The obtained neural network performs data processing. Wherein, the data processing may specifically be any type of processing that can be completed through a neural network.
其中,计算设备11具体可以为能够完成计算功能的设备。示例性的,所述计算设备包括但不限于下述中的一种或多种:中央处理器(Central Pro- cessing Unit,CPU)、高级精简指令集处理器(Advanced RISC Machine,ARM)、数字信号处理器(Digital Signal Processor,DSP)、图形处理器(Graphics Processing Unit,GPU)。Among them, the computing device 11 may specifically be a device capable of completing computing functions. Exemplarily, the computing device includes but is not limited to one or more of the following: Central Processing Unit (CPU), Advanced RISC Machine (ARM), Digital Signal processor (Digital Signal Processor, DSP), Graphics Processor (Graphics Processing Unit, GPU).
需要说明的是,对于数据处理装置11与服务器12通讯连接的具体方式,本申请可以不做限定,例如可以基于蓝牙接口实现无线通讯连接,或者基于RS232接口实现有线通讯连接。It should be noted that the specific manner of the communication connection between the data processing device 11 and the server 12 is not limited in this application. For example, a wireless communication connection may be realized based on a Bluetooth interface, or a wired communication connection may be realized based on an RS232 interface.
需要说明的是,图1中以计算设备从服务器获得神经网络为例,可替换的,计算设备可以通过其他方式获得神经网络,示例性的,计算设备可以从其他设备获得神经网络,或者计算设备可以通过自身训练获得神经网络。It should be noted that, in Figure 1, the computing device obtains the neural network from the server as an example. Alternatively, the computing device can obtain the neural network in other ways. Illustratively, the computing device can obtain the neural network from other devices, or the computing device The neural network can be obtained through self-training.
本申请实施例提供的数据处理方法,通过计算设备从存储设备读取神经网络中计算节点的目标数据,该存储设备中一个目标数据占用的比特数小于计算设备的操作数的第二比特数,与现有技术中计算节点的目标数据是采用第二比特数表示的定点型数据,存储设备中一个目标数据占用的比特数等于第二比特数相比,减小了计算设备与存储设备之间交互的神经网络的数据量,从而减小了神经网络占用计算设备与存储设备之间的带宽。In the data processing method provided by the embodiment of the present application, the target data of a computing node in a neural network is read from a storage device through a computing device, and the number of bits occupied by a target data in the storage device is less than the second number of bits of the operand of the computing device, Compared with the prior art where the target data of a computing node is fixed-point data represented by a second number of bits, the number of bits occupied by a target data in the storage device is equal to the second number of bits, which reduces the distance between the computing device and the storage device. The data volume of the interactive neural network reduces the bandwidth occupied by the neural network between the computing device and the storage device.
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Hereinafter, some embodiments of the present application will be described in detail with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.
图2为本申请一实施例提供的数据处理方法的流程示意图,本实施例的执行主体可以为计算设备,具体可以为计算设备的处理器。如图2所示,本实施例的方法可以包括:FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the application. The execution subject of this embodiment may be a computing device, and specifically may be a processor of the computing device. As shown in Figure 2, the method of this embodiment may include:
步骤201,从存储设备读取神经网络中计算节点的目标数据。Step 201: Read the target data of the computing node in the neural network from the storage device.
本步骤中,存储设备具体可以为计算设备能够访问的用于实现数据存储功能的设备。示例性的,存储设备包括但不限于下述中的一种或多种:内存、固态硬盘、机械硬盘、软盘。In this step, the storage device may specifically be a device that can be accessed by the computing device and used to implement the data storage function. Exemplarily, the storage device includes but is not limited to one or more of the following: memory, solid state hard disk, mechanical hard disk, and floppy disk.
神经网络可以由多个计算节点组成,每个计算节点均可以从存储设备中取到激活数据,然后做计算,之后把计算结果存到存储设备中作为下一级计算节点的输入数据,即下一级计算节点的激活数据)。以神经网络为卷积神经网络(Convolutional Neural Networks,CNN)为例,一个卷积块(Conv Block)可以对应一个计算节点。需要说明的是,本实施例中的计算节点可以为神经网络中的全部计算节点,或者,本实施例中的计算节点可以为神经网络中的部分计算节点。A neural network can be composed of multiple computing nodes. Each computing node can fetch activation data from the storage device, perform calculations, and then store the calculation results in the storage device as the input data of the next-level computing node, namely: Activation data of the first-level computing node). Taking a neural network as a convolutional neural network (Convolutional Neural Networks, CNN) as an example, a convolutional block (Conv Block) can correspond to a computing node. It should be noted that the computing nodes in this embodiment may be all computing nodes in the neural network, or the computing nodes in this embodiment may be some computing nodes in the neural network.
所述目标数据为采用第一比特数表示的定点型数据,且所述存储设备中一个目标数据占用的比特数为所述第一比特数,所述第一比特数小于第二比特数,所述第二比特数为所述计算设备的操作数的比特数。其中,定点型数据是指小数点在数中的位置是固定不变的,通常有定点整数和定点小数,在对小数点位置作出选择之后,运算中的所有数均统一为定点整数或定点小数,在运算中不再考虑小数问题,因此能够提高计算效率。The target data is fixed-point data represented by a first number of bits, and the number of bits occupied by a target data in the storage device is the first number of bits, and the first number of bits is less than the second number of bits, so The second number of bits is the number of bits of the operand of the computing device. Among them, fixed-point data means that the position of the decimal point in a number is fixed. There are usually fixed-point integers and fixed-point decimals. After the decimal point position is selected, all numbers in the operation are unified as fixed-point integers or fixed-point decimals. The problem of decimals is no longer considered in the calculation, so the calculation efficiency can be improved.
示例性的,所述第二比特数可以为16或者8。可选的,在比特数均满足2的幂次的规律且第二比特数为16时,所述第一比特数可以为8或4。在比特数均满足2的幂次的规律且第二比特数为8时,所述第一比特数可以为4。Exemplarily, the second number of bits may be 16 or 8. Optionally, when the number of bits all satisfy the rule of the power of two and the second number of bits is 16, the first number of bits may be 8 or 4. When the number of bits all satisfy the rule of the power of two and the second number of bits is 8, the first number of bits may be 4.
由于目标数据是定点型数据,因此能够确保神经网络在计算设备上的计算效率。由于存储设备中一个目标数据占用的比特数是第一比特数,其小于所述计算设备的操作数的比特数(即,第二比特数),与现有技术中计算节点的目标数据是采用第二比特数表示的定点型数据,存储设备中一个目标数据占用的比特数等于第二比特数相比,减小了计算设备与存储设备之间交互的神经网络的数据量,因此能够减小计算设备与存储设备之间交互的神经网络的数据量。Since the target data is fixed-point data, it can ensure the computational efficiency of the neural network on the computing device. Since the number of bits occupied by a target data in the storage device is the first number of bits, which is less than the number of bits of the operand of the computing device (ie, the second number of bits), it is the same as the target data of the computing node in the prior art. For fixed-point data represented by the second number of bits, the number of bits occupied by a target data in the storage device is equal to the second number of bits, which reduces the amount of data in the neural network that interacts between the computing device and the storage device, so it can be reduced The amount of data of the neural network interacting between the computing device and the storage device.
需要说明的是,目标数据具体可以为神经网络在计算过程中需要从存储设备读取的任意类型数据中的一种或多种数据。示例性的,目标数据包括权重参数和/或激活数据。其中,权重参数是神经网络的网络参数。It should be noted that the target data may specifically be one or more types of data of any type that needs to be read from the storage device during the calculation process of the neural network. Exemplarily, the target data includes weight parameters and/or activation data. Among them, the weight parameter is the network parameter of the neural network.
步骤202,根据所述目标数据,计算得到所述计算节点的输出数据。Step 202: Calculate the output data of the computing node according to the target data.
本步骤中,根据目标数据进行计算的具体计算方式可以包括与神经网络的类型相关的预设计算。示例性的,当神经网络为卷积神经网络时,预设计算可以包括卷积计算,具体可以对激活数据和网络参数进行卷积运算。In this step, the specific calculation method for calculating according to the target data may include a preset calculation related to the type of neural network. Exemplarily, when the neural network is a convolutional neural network, the preset calculation may include a convolution calculation, and specifically, a convolution operation may be performed on the activation data and the network parameters.
需要说明的是,该计算节点的输出数据即为该计算节点的下一级计算节点的激活数据。It should be noted that the output data of the computing node is the activation data of the next-level computing node of the computing node.
本实施例中,通过从存储设备读取神经网络中计算节点的目标数据,该目标数据是定点型数据且存储设备中其占用的比特数小于计算设备的操作数的第二比特数,根据目标数据计算得到计算节点的输出数据,由于目标数据是定点型数据,因此能够确保神经网络在计算设备上的计算效率,由于存储设备中目标数据占用的比特数是第一比特数,而第一比特数小于所述计算设备的操作数的第二比特数,因此能够减小计算设备与存储设备之间交互的神 经网络的数据量,从而能够在确保神经网络在计算设备上的计算效率的基础上,减小神经网络占用计算设备与存储设备之间的带宽。另外,通过降低对于计算设备与存储设备之间的带宽要求,也可以降低成本。In this embodiment, the target data of the computing node in the neural network is read from the storage device. The target data is fixed-point data and the number of bits occupied by the storage device is less than the second bit number of the operand of the computing device. The data is calculated to obtain the output data of the computing node. Because the target data is fixed-point data, it can ensure the computational efficiency of the neural network on the computing device. Because the number of bits occupied by the target data in the storage device is the first bit number, and the first bit The number is smaller than the second bit number of the operand of the computing device, so the data volume of the neural network interacting between the computing device and the storage device can be reduced, so that the computing efficiency of the neural network on the computing device can be ensured. , To reduce the bandwidth between the computing device and the storage device by the neural network. In addition, by reducing the bandwidth requirements between computing devices and storage devices, costs can also be reduced.
图3为本申请另一实施例提供的数据处理方法的流程示意图,本实施例在图3所示实施例的基础上主要描述了根据所述目标数据,计算得到所述计算节点的输出数据的一种可选的实现方式。如图3所示,本实施例的方法可以包括:Figure 3 is a schematic flow chart of a data processing method provided by another embodiment of the application. This embodiment mainly describes how to calculate the output data of the computing node based on the target data on the basis of the embodiment shown in Figure 3 An optional implementation. As shown in FIG. 3, the method of this embodiment may include:
步骤301,从存储设备读取神经网络中计算节点的目标数据。Step 301: Read the target data of the computing node in the neural network from the storage device.
本步骤中,存储设备中一个目标数据占用的比特数是第一比特数,且步骤301从存储设备读取到的一个目标数据占用的比特数也为第一比特数。In this step, the number of bits occupied by one target data in the storage device is the first number of bits, and the number of bits occupied by one target data read from the storage device in step 301 is also the first number of bits.
步骤302,将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数,得到扩充后的所述目标数据。Step 302: Expand the number of bits occupied by the target data from the first number of bits to the second number of bits to obtain the expanded target data.
本步骤中,由于计算设备支持的操作数的比特数是第二比特数,而从存储设备中读取到的目标数据占用的比特数是第一比特数,且第一比特数小于第二比特数,因此为了使得计算设备能够基于目标数据进行神经网络的计算,可以通过步骤302将目标数据所占用的比特数由第一比特数扩充为第二比特数,得到扩充后的目标数据。示例性的,可以使用与指令“and”、移位指令“shift”、求和指令“add”等的组合来完成将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数。In this step, because the number of bits of the operand supported by the computing device is the second number of bits, and the number of bits occupied by the target data read from the storage device is the first number of bits, and the first number of bits is smaller than the second number of bits Therefore, in order to enable the computing device to perform neural network calculations based on the target data, the number of bits occupied by the target data can be expanded from the first number of bits to the second number of bits through step 302 to obtain the expanded target data. Exemplarily, a combination with instruction "and", shift instruction "shift", sum instruction "add", etc. can be used to expand the number of bits occupied by the target data from the first number of bits to all. State the number of second bits.
其中,将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数的过程可以理解为目标数据的解包(unpack)过程。示例性的,步骤302具体可以包括:在所述目标数据占用的所述第一比特数个比特的首部和/或尾部共增加N个填充比特,得到扩充后的所述目标数据,N为所述第二比特数与所述第一比特数之差。Wherein, the process of expanding the number of bits occupied by the target data from the first number of bits to the second number of bits can be understood as a process of unpacking the target data. Exemplarily, step 302 may specifically include: adding a total of N padding bits to the head and/or tail of the first number of bits occupied by the target data to obtain the expanded target data, where N is all The difference between the second number of bits and the first number of bits.
例如,在第一比特数等于4,第二比特数等于8,且在所述目标数据占用的所述第一比特数个比特的首部增加N个填充比特,得到扩充后的所述目标数据时,扩充后的目标数据中有效比特可以位于低4位。又例如,在第一比特数等于4,第二比特数等于8,且在所述目标数据占用的所述第一比特数个比特的尾部增加N个填充比特,得到扩充后的所述目标数据时,扩充后的目标数据中有效比特可以位于高4位。For example, when the first number of bits is equal to 4, the second number of bits is equal to 8, and N padding bits are added to the head of the first number of bits occupied by the target data to obtain the expanded target data , The effective bits in the expanded target data can be located in the lower 4 bits. For another example, when the first number of bits is equal to 4, the second number of bits is equal to 8, and N padding bits are added to the end of the first number of bits occupied by the target data to obtain the expanded target data At this time, the effective bits in the expanded target data can be located in the upper 4 bits.
需要说明的是,填充比特可以理解为无效的比特,扩充后的所述目标数据中除填充比特之外的比特可以理解为有效的比特,且有效的比特为连续的 第一比特数个比特。其中,填充比特可以根据需求灵活设计。示例性的,以第一比特数为4,第二比特数为8为例,假设扩充后的目标数据需要作为有符号8位数来用,且计算设备用补码来表示有符号数时,可以将有效比特放在高4位,高4位可以补1,即填充比特可以为1。It should be noted that the padding bits can be understood as invalid bits, the bits other than the padding bits in the expanded target data can be understood as valid bits, and the valid bits are consecutive first bits. Among them, the padding bits can be flexibly designed according to requirements. Exemplarily, taking the first number of bits as 4 and the second number of bits as 8 as an example, assuming that the expanded target data needs to be used as a signed 8-digit number, and the computing device uses the complement code to represent the signed number, The effective bits can be placed in the upper 4 bits, and the upper 4 bits can be complemented by 1, that is, the padding bits can be 1.
以第一比特数为4,第二比特数为8且扩充后的有效比特位于低4位为例,unpack过程例如可以如图4A和图4B所示,具体的,扩充前一个数据占用4比特,数据A1至F1在计算设备中的存储地址关系可以如图4A所示,在经过unpack过程的处理后一个数据共占用8比特,且扩充后的数据在计算设备中的存储地址关系可以如图4B所示。Taking the first number of bits as 4, the second number of bits as 8 and the expanded effective bits are located in the lower 4 bits as an example, the unpack process can be as shown in Figure 4A and Figure 4B, for example. Specifically, the expansion of the previous data occupies 4 bits The storage address relationship of data A1 to F1 in the computing device can be as shown in Figure 4A. After the unpack process, one piece of data occupies a total of 8 bits, and the storage address relationship of the expanded data in the computing device can be shown in Figure 4A. Shown in 4B.
步骤303,根据扩充后的所述目标数据,计算得到所述计算节点的输出数据。Step 303: Calculate the output data of the computing node according to the expanded target data.
本步骤中,根据扩充后的目标数据进行计算的具体计算方式可以包括与神经网络的类型相关的预设计算。示例性的,当神经网络为卷积神经网络时,预设计算可以包括卷积计算,具体可以对激活数据和网络参数进行卷积运算。In this step, the specific calculation method for calculating based on the expanded target data may include a preset calculation related to the type of neural network. Exemplarily, when the neural network is a convolutional neural network, the preset calculation may include a convolution calculation, and specifically, a convolution operation may be performed on the activation data and the network parameters.
本实施例中,通过将从存储设备中读取的目标数据占用的比特数由第一比特数扩充为所述第二比特数,得到扩充后的目标数据,并根据扩充后的目标数据,计算得到计算节点的输出数据,使得计算设备可以基于存储设备中占用的比特数小于第二比特数的目标数据计算获得计算节点的输出数据,解决了由于存储设备中目标数据占用的比特数与计算设备的操作数的比特数不一致,导致无法神经网络在计算设备无法部署的问题。In this embodiment, by expanding the number of bits occupied by the target data read from the storage device from the first number of bits to the second number of bits, the expanded target data is obtained, and the expanded target data is calculated based on the expanded target data. Obtain the output data of the computing node, so that the computing device can calculate the output data of the computing node based on the target data whose number of bits occupied in the storage device is less than the second number of bits, and solve the problem of the number of bits occupied by the target data in the storage device and the computing device The number of bits in the operands is inconsistent, leading to the problem that the neural network cannot be deployed in the computing device.
图5为本申请又一实施例提供的数据处理方法的流程示意图,本实施例在图2所示实施例的基础上主要描述了根据所述目标数据,计算得到所述计算节点的输出数据的一种可选的实现方式。如图5所示,本实施例的方法可以包括:Figure 5 is a schematic flow chart of a data processing method provided by another embodiment of the application. This embodiment mainly describes the calculation of the output data of the computing node based on the target data on the basis of the embodiment shown in Figure 2 An optional implementation. As shown in FIG. 5, the method of this embodiment may include:
步骤501,从存储设备读取神经网络中计算节点的目标数据。Step 501: Read the target data of the computing node in the neural network from the storage device.
需要说明的是,步骤501与步骤201、步骤301类似,在此不再赘述。It should be noted that step 501 is similar to step 201 and step 301, and will not be repeated here.
步骤502,根据所述目标数据进行预设计算,得到计算结果。Step 502: Perform a preset calculation according to the target data to obtain a calculation result.
可选的,在步骤502之前执行步骤302时,步骤502具体可以包括:根据扩充后的所述目标数据进行预设计算,得到计算结果。示例性的,在预设计算为卷积运算时,计算结果具体可以为卷积计算结果。Optionally, when step 302 is performed before step 502, step 502 may specifically include: performing a preset calculation according to the expanded target data to obtain a calculation result. Exemplarily, when the preset calculation is a convolution operation, the calculation result may specifically be a convolution calculation result.
步骤503,采用所述第一比特数对所述计算结果进行量化,得到所述计算节点的输出数据。Step 503: quantize the calculation result by using the first number of bits to obtain output data of the calculation node.
本步骤中,所述输出数据为采用所述第一比特数表示的定点型数据且占用的比特数为所述第二比特数。示例性的,以第一比特数为4,第二比特数为8为例,假设计算结果为0b10010110,则采用第一比特数对计算结果进行量化后得到的计算节点的输出数据可以为0b1001。In this step, the output data is fixed-point data represented by the first number of bits, and the number of occupied bits is the second number of bits. Exemplarily, taking the first number of bits as 4 and the second number of bits as 8 as an example, assuming that the calculation result is 0b10010110, the output data of the computing node obtained by quantizing the calculation result using the first number of bits may be 0b1001.
可选的,在不考虑减小计算设备从存储设备读取所述计算节点的输出数据的数据量时,步骤503可以替换为采用所述第二比特数对所述计算结果进行量化,得到所述计算节点的输出数据。Optionally, when reducing the data volume of the output data of the computing node read by the computing device from the storage device is not considered, step 503 may be replaced by using the second number of bits to quantify the calculation result to obtain The output data of the computing node.
步骤504,将所述输出数据占用的比特数由所述第二比特数压缩为所述第一比特数,得到压缩后的所述输出数据。Step 504: Compress the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output data.
本步骤中,将所述目标数据所占用的比特数由所述第二比特数压缩为所述第一比特数的过程可以理解为输出数据的打包(pack)过程。示例性的,步骤504具体可以包括:从所述输出数据占用的所述第二比特数个比特中,选出有效的所述第一比特数个连续比特,作为压缩后的所述输出数据。其中,无效的比特可以被舍弃。需要说明的是,pack过程与unpack过程对应,例如,假设压缩前一个数据占用4比特,数据G至L在计算设备中的存储地址关系可以如图6A所示,在经过pack过程的处理后一个数据共占用4比特,且压缩后的数据在计算设备中的存储地址关系可以如图6B所示。In this step, the process of compressing the number of bits occupied by the target data from the second number of bits to the first number of bits can be understood as a packing process of output data. Exemplarily, step 504 may specifically include: selecting valid consecutive bits of the first number of bits from the number of bits of the second number of bits occupied by the output data as the compressed output data. Among them, invalid bits can be discarded. It should be noted that the pack process corresponds to the unpack process. For example, assuming that the data before compression occupies 4 bits, the storage address relationship of data G to L in the computing device can be shown in FIG. The data occupies a total of 4 bits, and the storage address relationship of the compressed data in the computing device may be as shown in FIG. 6B.
示例性的,可以使用与指令“and”、移位指令“shift”、求和指令“add”等的组合来完成将所述输出数据占用的比特数由所述第二比特数压缩为所述第一比特数,得到压缩后的所述输出数据。Exemplarily, a combination with the instruction “and”, the shift instruction “shift”, the sum instruction “add”, etc. can be used to compress the number of bits occupied by the output data from the second number of bits to the The first number of bits, the compressed output data is obtained.
步骤505,将压缩后的所述输出数据写入所述存储设备。Step 505: Write the compressed output data to the storage device.
需要说明的是,步骤505执行之后,存储设备中保存有压缩后的所述输出数据,即存储设备中压缩后的一个输出数据占用的比特数为第一比特数。It should be noted that after step 505 is executed, the compressed output data is stored in the storage device, that is, the number of bits occupied by one piece of compressed output data in the storage device is the first number of bits.
本实施例中,通过根据目标数据进行预设计算得到计算结果,采用第一比特数对计算结果进行量化得到计算节点的输出数据,将输出数据占用的比特数由第二比特数压缩为第一比特数得到压缩后的输出数据,并将压缩后的输出数据写入存储设备,可以减小计算设备从存储设备读取该计算节点的输出数据的数据量。In this embodiment, the calculation result is obtained by performing a preset calculation according to the target data, the calculation result is quantized by the first number of bits to obtain the output data of the calculation node, and the number of bits occupied by the output data is compressed from the second number of bits to the first number. The number of bits obtains the compressed output data, and the compressed output data is written to the storage device, which can reduce the amount of data that the computing device reads the output data of the computing node from the storage device.
需要说明的是,上述步骤302的pack过程可以替换为由存储设备向计算设备搬迁目标数据的数据搬迁过程中实现,即可以在步骤201从存储设备读取目标数据的过程中将目标数据占用的比特数由第一比特数扩充为第二比特数。It should be noted that the pack process in step 302 above can be replaced by a data migration process in which the storage device moves the target data to the computing device, that is, the target data can be occupied during the process of reading the target data from the storage device in step 201. The number of bits is expanded from the first number of bits to the second number of bits.
和/或,上述步骤504的unpack过程可以替换为在由计算设备向存储设备写入输出数据的数据搬迁过程中实现,即步骤504和步骤505可以替换为在步骤503得到的输出数据之后,向计算设备写入输出数据的过程中将输出数据占用的比特数由第二比特数压缩为第一比特数。And/or, the unpack process in step 504 can be replaced by a data migration process in which the computing device writes output data to the storage device, that is, step 504 and step 505 can be replaced with the output data obtained in step 503, In the process of writing the output data by the computing device, the number of bits occupied by the output data is compressed from the second number of bits to the first number of bits.
示例性的,可以使用直接存储器访问(Direct Memory Access,DMA)传输可以在实现数据搬迁过程中实现pack过程和unpack过程。Exemplarily, direct memory access (Direct Memory Access, DMA) transmission can be used to implement the pack process and the unpack process in the process of realizing data relocation.
需要说明的是,上述根据一个计算节点的目标数据计算得到该计算节点的输出数据的过程可以认为是该计算节点的计算过程,示例性的,计算过程可以包括步骤502和步骤503。进一步的,以第一比特数为4,第二比特数为8为例,如图7所示,在计算过程k之前可以进行unpack过程,实现将存储设备中占用4比特的输入数据i[k]比特数扩充为8比特,在计算过程k之后可以进行pack过程,实现将计算过程k输出的占用8比特的输出数据o[k]占用的比特数压缩为4比特,以使存储设备中存储的o[k]占用4比特。It should be noted that the foregoing process of calculating the output data of a computing node based on the target data of the computing node may be considered as the computing process of the computing node. Illustratively, the computing process may include step 502 and step 503. Further, taking the first number of bits as 4 and the second number of bits as 8 as an example, as shown in FIG. 7, the unpack process can be performed before the calculation process k, so that the storage device occupies 4 bits of input data i[k ] The number of bits is expanded to 8 bits. After the calculation process k, the pack process can be carried out to realize the 8-bit output data output by the calculation process k. The number of bits occupied by o[k] is compressed to 4 bits, so as to store in the storage device. O[k] occupies 4 bits.
在上述实施例的基础上,所述神经网络可以根据特定的训练策略训练得到。示例性的,当目标数据包括权重参数时,所述训练策略可以包括第一训练策略。其中,所述第一训练策略包括:在训练过程中存储浮点型权重参数,在根据所述计算节点的浮点型权重参数进行计算之前,采用所述第一比特数对所述浮点型权重参数进行量化。On the basis of the foregoing embodiment, the neural network can be trained according to a specific training strategy. Exemplarily, when the target data includes a weight parameter, the training strategy may include a first training strategy. Wherein, the first training strategy includes: storing floating-point weight parameters during the training process, and before performing calculations according to the floating-point weight parameters of the computing node, using the first bit number to compare the floating-point weight parameters The weight parameter is quantified.
需要说明的是,上述根据一个计算节点的激活数据以及权重参数,计算得到该计算节点的输出数据的过程可以认为是该计算节点的计算过程。由于神经网络在预测过程中一个计算节点的计算过程根据的数据类型与在训练过程中该计算节点的计算过程根据的数据类型一致,均是基于第一比特数表示的定点型数据,而定点型数据的精度受定点型数据的比特数的影响较大,因此在训练过程中存储浮点型权重参数,使用浮点型权重参数模拟基于第一比特数表示的定点性权重参数,与在训练过程中存储第一比特数表示的定点型权重参数相比,可以提高神经网络中参数学习的精度,从而可以提高神经网络的精度。It should be noted that the foregoing process of calculating the output data of a computing node based on the activation data and weight parameters of the computing node can be considered as the computing process of the computing node. Because the neural network is based on the data type of the calculation process of a computing node in the prediction process and the data type of the calculation process of the computing node in the training process is consistent, both are based on the fixed-point data represented by the first bit number, and the fixed-point type The accuracy of the data is greatly affected by the number of bits of the fixed-point data. Therefore, the floating-point weight parameters are stored during the training process, and the floating-point weight parameters are used to simulate the fixed-point weight parameters based on the first bit number. Compared with storing the fixed-point weight parameter represented by the first bit number, the accuracy of parameter learning in the neural network can be improved, and the accuracy of the neural network can be improved.
示例性的,所述采用第一比特数对所述浮点型权重参数进行量化,具体可以包括如下步骤A1和步骤A2。Exemplarily, the use of the first bit number to quantize the floating-point weight parameter may specifically include the following steps A1 and A2.
步骤A1,将所述浮点型权重参数变换至第一预设范围,得到变换后的浮点型权重参数;所述第一预设范围的最小值大于或等于0。Step A1, transform the floating-point weight parameter to a first preset range to obtain a transformed floating-point weight parameter; the minimum value of the first preset range is greater than or equal to zero.
步骤A2,采用所述第一比特数,对变换后的所述浮点型权重参数进行量化。Step A2, using the first bit number to quantize the converted floating-point weight parameter.
其中,步骤A1具体可以将浮点型权重参数由实数范围转换为正数范围。为了提高神经网络的计算效率,第一预设范围包括0到1。步骤A2具体可以实现对变换至第一预设范围的浮点型权重参数进行第一比特数定点量化,从而得到采用第一比特数表示的定点型权重参数,进一步的,计算节点可以根据第一比特数表示的定点型权重参数进行计算。Among them, step A1 can specifically convert the floating-point weight parameter from a real number range to a positive number range. In order to improve the computational efficiency of the neural network, the first preset range includes 0 to 1. Step A2 can specifically implement the first-bit fixed-point quantization of the floating-point weight parameter transformed to the first preset range, so as to obtain the fixed-point weight parameter represented by the first bit number. Further, the computing node can perform the first bit number and fixed-point quantization. The fixed-point weight parameter represented by the number of bits is calculated.
示例性的,步骤A1具体可以包括如下步骤A11和A12。Exemplarily, step A1 may specifically include the following steps A11 and A12.
步骤A11,对所述浮点型权重参数进行非线性变换(non-linear transform),以将所述浮点型权重参数变换至第二预设范围,所述第二预设范围的最小值小于0。Step A11: Perform a non-linear transformation on the floating-point weight parameter to transform the floating-point weight parameter to a second preset range, and the minimum value of the second preset range is less than 0.
步骤A12,对非线性变换后的所述浮点型权重参数进行缩放和移位(scale&shift)变换,以将所述浮点型权重参数变换至所述第一预设范围。Step A12, performing scale & shift transformation on the floating-point weight parameter after nonlinear transformation, so as to transform the floating-point weight parameter to the first preset range.
其中,对于步骤A11中非线性变换的具体方式,本申请可以不做限定,例如可以通过正切函数或S型函数等实现非线性变换。示例性的,所述第二预设范围包括-1到1。步骤A12中对非线性变换后的所述浮点型权重参数进一步进行缩放和移位变换的目的可以包括使得变换后的浮点型权重参数的取值范围为第一预设范围,对于缩放和变换的具体方式,本申请可以不做限定。Among them, the specific manner of the nonlinear transformation in step A11 may not be limited in this application. For example, the nonlinear transformation may be realized through a tangent function or a sigmoid function. Exemplarily, the second preset range includes -1 to 1. The purpose of further scaling and shifting the floating-point weight parameter after the nonlinear transformation in step A12 may include making the value range of the floating-point weight parameter after the transformation a first preset range. The specific method of conversion may not be limited in this application.
示例性的,步骤A2中可以采用所述第一比特数对变换后的所述浮点型权重参数进行对称均匀量化(symmetric uniform quantization)。非对称非均匀量化的组合效果可以优于对称均匀量化,由于基于硬件考虑,对称均匀量化硬件实现效率高,因此步骤A2中能够对称均匀量化可以保证硬件效率。Exemplarily, in step A2, the first number of bits may be used to perform symmetric uniform quantization on the transformed floating-point weight parameter. The combined effect of asymmetric and non-uniform quantization can be better than that of symmetric uniform quantization. Because of hardware considerations, symmetric uniform quantization has high hardware implementation efficiency, so the symmetric uniform quantization in step A2 can ensure hardware efficiency.
可选的,上述采用所述第一比特数对所述浮点型权重参数进行量化的相关运算是可导的。通过用所述第一比特数对所述浮点型权重参数进行量化的相关运算是可导的,使得可以通过梯度下降法来迭代求解计算节点的网络参数。Optionally, the foregoing correlation operation of quantizing the floating-point weight parameter by using the first number of bits is derivable. The correlation operation of quantizing the floating-point weight parameter by using the first number of bits is derivable, so that the network parameter of the computing node can be iteratively solved by the gradient descent method.
在所述目标数据包括激活数据时,所述输出数据为采用所述第一比特数表示的定点型数据,所述输出数据占用的比特数为所述第二比特数;When the target data includes activation data, the output data is fixed-point data represented by the first number of bits, and the number of bits occupied by the output data is the second number of bits;
示例性的,在目标数据包括激活数据时,所述训练策略包括第二训练策略。其中,所述第二训练策略包括:在训练过程中存储浮点型激活数据,在根据所述计算节点的浮点型激活数据进行计算之前,采用所述第一比特数对 所述浮点型激活数据进行量化。Exemplarily, when the target data includes activation data, the training strategy includes a second training strategy. Wherein, the second training strategy includes: storing floating-point activation data during the training process, and before performing calculations based on the floating-point activation data of the computing node, using the first bit number to compare the floating-point activation data. The activation data is quantified.
由于神经网络在预测过程中一个计算节点的计算过程根据的数据类型与在训练过程中该计算节点的计算过程根据的数据类型一致,均是基于第一比特数表示的定点型数据,而定点型数据的精度受定点型数据的比特数的影响较大,因此使用浮点型激活数据模拟基于第一比特数表示的定点性激活数据,与在训练过程中存储基于第一比特数表示的定点型激活数据相比,可以提高神经网络中参数学习的精度,从而可以提高神经网络的精度。Because the neural network is based on the data type of the calculation process of a computing node in the prediction process and the data type of the calculation process of the computing node in the training process is consistent, both are based on the fixed-point data represented by the first bit number, and the fixed-point type The accuracy of the data is greatly affected by the number of bits of the fixed-point data. Therefore, the floating-point activation data is used to simulate the fixed-point activation data based on the first bit number, and the fixed-point type based on the first bit number is stored during the training process. Compared with the activation data, the accuracy of parameter learning in the neural network can be improved, so that the accuracy of the neural network can be improved.
示例性的,可以采用如下步骤B1对浮点型激活数据进行量化。Exemplarily, the following step B1 may be used to quantize the floating-point activation data.
步骤B1,在所述浮点型激活数据的取值范围不包括负数时,直接采用所述第一比特数对所述浮点型激活数据进行量化。Step B1: When the value range of the floating-point activation data does not include a negative number, the first bit number is directly used to quantize the floating-point activation data.
或者,示例性的,可以采用如下步骤B2和步骤B3对浮点型激活数据进行量化。Or, exemplarily, the following steps B2 and B3 may be used to quantize the floating-point activation data.
步骤B2,在所述浮点型激活数据的取值范围包括负数时,对所述浮点型激活数据进行缩放和移位变换,以将所述浮点型激活数据变换至第三预设范围,所述第三预设范围的最小值大于或等于0。Step B2: When the value range of the floating-point activation data includes a negative number, scale and shift the floating-point activation data to transform the floating-point activation data to a third preset range , The minimum value of the third preset range is greater than or equal to zero.
步骤B3,采用所述第一比特数,对缩放和移位变换后的所述浮点型激活数据进行量化。Step B3, using the first bit number to quantize the floating-point activation data after scaling and shifting.
其中,步骤B2中对浮点型激活数据进行缩放和移位变换的目的可以包括:使得变换后的浮点型激活数据的取值范围的最小值大于或等于0,对于缩放和变换的具体方式,本申请可以不做限定。Wherein, the purpose of scaling and shifting the floating-point activation data in step B2 may include: making the minimum value of the value range of the floating-point activation data after the conversion greater than or equal to 0, and for the specific manner of scaling and transformation , This application is not limited.
步骤B1或步骤B2-步骤B3具体可以实现浮点型激活数据进行第一比特数定点量化,从而得到采用第一比特数表示的定点型激活数据,进一步的,计算节点可以根据第一比特数表示的定点型激活参数进行计算。Step B1 or Step B2-Step B3 can specifically implement floating-point activation data to perform fixed-point quantization of the first bit number, thereby obtaining fixed-point activation data represented by the first bit number. Further, the computing node can express according to the first bit number The fixed-point activation parameters are calculated.
由于网络参数中的偏移参数在计算中占比很少,因此,本申请实施例中可以对偏移参数采用精度高的量化方式进行量化。示例性的,上述训练策略还可以包括第三训练策略;所述第三训练策略包括:在训练过程中存储浮点型偏移(bias)参数,在根据所述计算节点的浮点型偏移参数进行计算之前,采用第三比特数对所述浮点型偏移参数进行量化。其中,所述第三比特数可以大于第一比特数。Since the offset parameter in the network parameters occupies a small proportion in the calculation, the offset parameter may be quantized with a high-precision quantization method in the embodiment of the present application. Exemplarily, the above-mentioned training strategy may further include a third training strategy; the third training strategy includes: storing floating-point bias (bias) parameters during the training process, according to the floating-point bias of the computing node Before the parameter is calculated, the floating-point offset parameter is quantized by using the third bit number. Wherein, the third number of bits may be greater than the first number of bits.
示例性的,所述第三比特数可以大于第二比特数,例如第三比特数等于32,第二比特数等于8,第三比特数等于4。Exemplarily, the third number of bits may be greater than the second number of bits, for example, the third number of bits is equal to 32, the second number of bits is equal to 8, and the third number of bits is equal to 4.
需要说明的是,上述第一训练策略、第二训练策略和第三训练策略可以结合,即,上述训练策略可以包括第一训练策略、第二训练策略、第三训练策略中的一个或多个。It should be noted that the first training strategy, the second training strategy, and the third training strategy can be combined, that is, the training strategy can include one or more of the first training strategy, the second training strategy, and the third training strategy. .
以训练策略包括上述第一训练策略和第二训练策略,神经网络为卷积神经网络且第一比特数为4为例,计算节点的训练过程(Training procedure)例如可以如图8所示。Taking the training strategy including the above-mentioned first training strategy and the second training strategy, the neural network is a convolutional neural network and the first bit number is 4 as an example, the training procedure of the computing node may be as shown in FIG. 8, for example.
如图8所示,训练过程的量化可以分为权重量化和输入量化两部分。其中,权重量化可以由(1)非线性变换、(2)缩放和移位以及(3)对称均匀量化三部分组成。(1)非线性变换,将浮点型(float)权重(weight)非线性变换到(-1,1)范围,非线性变换如
Figure PCTCN2019103198-appb-000001
等;(2)缩放和移位,通过转换函数
Figure PCTCN2019103198-appb-000002
将权重参数范围由[-1,1]转换到[0,1];(3)对称均匀量化,通过量化函数
Figure PCTCN2019103198-appb-000003
进行对称均匀量化。输入量化即激活数据的量化区分是否含有负数,对于含有负数的情况,可以先对其进行缩放和移位操作,使其范围为[0,a],最后进行4bit均匀量化。
As shown in Figure 8, the quantification of the training process can be divided into two parts: weight quantization and input quantization. Among them, the weighting can be composed of (1) nonlinear transformation, (2) scaling and shifting, and (3) symmetric uniform quantization. (1) Non-linear transformation, the floating-point (float) weight (weight) is nonlinearly transformed to the range of (-1, 1), such as
Figure PCTCN2019103198-appb-000001
Etc.; (2) scaling and shifting, through the conversion function
Figure PCTCN2019103198-appb-000002
Convert the weight parameter range from [-1,1] to [0,1]; (3) symmetric and uniform quantization, through the quantization function
Figure PCTCN2019103198-appb-000003
Perform symmetric uniform quantization. Input quantization is to distinguish whether the quantization of the active data contains negative numbers. For negative numbers, it can be scaled and shifted first to make the range [0, a], and finally 4bit uniform quantization is performed.
进一步的,在对计算节点的浮点输入以及浮点权重进行量化之后,可以对量化后的输入以及权重基于conv2d函数进行卷积运算,进一步的可以对卷积运算的运算结果经过批量归一化(Batch Normalization,BN)层/激活函数ReLU得到计算节点的输出数据。Further, after quantizing the floating-point input and floating-point weights of the computing nodes, the quantized input and weights can be subjected to convolution operation based on the conv2d function, and further the results of the convolution operation can be batch-normalized The (Batch Normalization, BN) layer/activation function ReLU obtains the output data of the computing node.
在图8所示的训练过程的基础上,神经网络中计算节点的预测过程(Inference procedure)可以如图9所示。如图9所示,计算节点的预测过程中根据的激活数据以及权重参数均是采用4bit表示的定点型数据。另外,如图9所示,为了可以减小计算设备从存储设备读取该计算节点的输出数据的数据量,在该计算节点的计算过程中进行卷积计算之后,可以对卷积计算的结果采用4比特进行量化,得到4bit表示的定点型数据作为该计算节点的输出数据。可以理解的是,在第二比特数为8时,虽然计算节点的预测过程中根据的激活数据以及权重参数,以及计算节点的输出数据均是采用4比特表示的定点型数据,但是激活数据、权重参数以及输出数据占用的比特数均为第二比特数,例如8比特。Based on the training process shown in FIG. 8, the prediction process (inference procedure) of the computing node in the neural network may be as shown in FIG. 9. As shown in Figure 9, the activation data and weight parameters used in the prediction process of the computing node are all fixed-point data represented by 4 bits. In addition, as shown in FIG. 9, in order to reduce the amount of data that the computing device reads the output data of the computing node from the storage device, after the convolution calculation is performed in the calculation process of the computing node, the result of the convolution calculation can be calculated. 4 bits are used for quantization, and fixed-point data represented by 4 bits is obtained as the output data of the computing node. It is understandable that when the second number of bits is 8, although the activation data and weight parameters based on the prediction process of the computing node, and the output data of the computing node are all fixed-point data represented by 4 bits, the activation data, The weight parameter and the number of bits occupied by the output data are both the second number of bits, such as 8 bits.
需要说明的是,根据图8和图9可以看出,在目标数据包括权重参数时,训练过程中权重参数是以浮点型数据的形式进行保存的,但是在预测过程中权重数据是以定点型数据的形式进行保存的。具体的,可以在神经网络训练 完成之后,将权重参数由浮点型数据转换为采用第一比特数表示的定点型数据。It should be noted that, according to Figures 8 and 9, it can be seen that when the target data includes weight parameters, the weight parameters are saved in the form of floating-point data during the training process, but the weight data is fixed-point during the prediction process. Type data is stored in the form. Specifically, after the neural network training is completed, the weight parameter can be converted from floating-point data to fixed-point data represented by the first bit number.
可选的,上述方法实施例中计算节点可以为所述神经网络中的部分计算节点。对于所述神经网络中除所述部分计算节点之外的其他计算节点,其节点数据可以为采用第二比特数表示的定点型数据,一个节点数据在存储设备中占用的比特数为第二比特数。具体的,在上述方法实施例的基础上还可以包括:从所述存储设备读取所述神经网络中其他计算节点的节点数据,所述节点数据为采用所述第二比特数表示的定点型数据。其中,节点数据可以包括激活数据和网络参数。Optionally, the computing nodes in the foregoing method embodiment may be part of the computing nodes in the neural network. For other computing nodes in the neural network except for the part of the computing nodes, the node data may be fixed-point data represented by the second number of bits, and the number of bits occupied by one node data in the storage device is the second bit number. Specifically, on the basis of the foregoing method embodiment, it may further include: reading node data of other computing nodes in the neural network from the storage device, where the node data is a fixed-point type represented by the second number of bits. data. Among them, the node data may include activation data and network parameters.
可选的,所述其他计算节点可以为对精度要求比较高的点,示例性的,所述其他计算节点为采用所述第一比特数表示节点数据与所述第二比特数表示节点数据相比,对所述神经网络精度的降低程度大于程度阈值的计算节点。通过神经网络中对精度要求比较高的其他计算节点的节点数据为采用第二比特数表示的定点型数据,可以实现在确保算法精度的基础上,减少对于计算设备与存储设备之间带宽的占用。Optionally, the other computing node may be a point that requires a relatively high accuracy. For example, the other computing node uses the first number of bits to represent node data and the second number of bits to represent node data. Compared with computing nodes whose accuracy of the neural network is reduced by a degree greater than a degree threshold. Through the node data of other computing nodes with higher accuracy requirements in the neural network, the fixed-point data represented by the second bit number can be realized on the basis of ensuring the accuracy of the algorithm, reducing the bandwidth occupation between the computing device and the storage device .
进一步可选的,所述其他计算节点包括但不限于下述中的一个或多个:所述神经网络中第一层的计算节点、所述神经网络中最后一层的计算节点、所述神经网络中与坐标相关的计算节点。其中,与坐标相关的计算节点具体可以为输入包括坐标或者输出包括坐标的计算节点,例如用于从图中取出感兴趣区域的计算节点。Further optionally, the other computing nodes include, but are not limited to, one or more of the following: computing nodes of the first layer in the neural network, computing nodes of the last layer in the neural network, and Computing nodes related to coordinates in the network. Wherein, the calculation node related to the coordinate may specifically be a calculation node whose input includes coordinates or an output includes coordinates, for example, a calculation node used to extract a region of interest from a graph.
需要说明的是,对于其他计算节点在训练过程中也是基于第二比特数进行量化。示例性的,上述训练策略还可以包括:在根据所述其他计算节点的浮点型权重参数进行计算之前,采用所述第二比特数对所述浮点型权重参数进行量化;和/或,在根据所述其他计算节点的浮点型激活数据进行计算之前,采用所述第二比特数对所述浮点型激活数据进行量化。It should be noted that other computing nodes are also quantized based on the second bit number during the training process. Exemplarily, the above-mentioned training strategy may further include: quantizing the floating-point weight parameter using the second number of bits before performing calculation according to the floating-point weight parameter of the other computing node; and/or, Before performing calculation according to the floating-point activation data of the other computing node, the floating-point activation data is quantized by using the second number of bits.
图10为本申请一实施例提供的计算设备的结构示意图,如图10所示,该计算设备100可以包括:处理器101和存储器102。FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of this application. As shown in FIG. 10, the computing device 100 may include a processor 101 and a memory 102.
所述存储器102,用于存储程序代码;The memory 102 is used to store program codes;
所述处理器101,调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor 101 calls the program code, and when the program code is executed, is used to perform the following operations:
从存储设备读取神经网络中计算节点的目标数据,所述目标数据为采用 第一比特数表示的定点型数据,且所述存储设备中所述目标数据占用的比特数为所述第一比特数,所述第一比特数小于第二比特数,所述第二比特数为所述计算设备的操作数的比特数;Read the target data of the computing node in the neural network from the storage device, where the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit The first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
根据所述目标数据,计算得到所述计算节点的输出数据。According to the target data, the output data of the computing node is calculated.
本实施例提供的计算设备,可以用于执行前述方法实施例的技术方案,其实现原理和技术效果与方法实施例类似,在此不再赘述。The computing device provided in this embodiment may be used to execute the technical solutions of the foregoing method embodiments, and its implementation principles and technical effects are similar to those of the method embodiments, and will not be repeated here.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware. The aforementioned program can be stored in a computer readable storage medium. When the program is executed, it executes the steps including the foregoing method embodiments; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application, not to limit them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application. range.

Claims (58)

  1. 一种数据处理方法,应用于计算设备,其特征在于,包括:A data processing method applied to a computing device, characterized in that it includes:
    从存储设备读取神经网络中计算节点的目标数据,所述目标数据为采用第一比特数表示的定点型数据,且所述存储设备中所述目标数据占用的比特数为所述第一比特数,所述第一比特数小于第二比特数,所述第二比特数为所述计算设备的操作数的比特数;Read the target data of the computing node in the neural network from the storage device, where the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit The first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
    根据所述目标数据,计算得到所述计算节点的输出数据。According to the target data, the output data of the computing node is calculated.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标数据,计算得到所述计算节点的输出数据,包括:The method according to claim 1, wherein the calculating the output data of the computing node according to the target data comprises:
    将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数,得到扩充后的所述目标数据;Expanding the number of bits occupied by the target data from the first number of bits to the second number of bits to obtain the expanded target data;
    根据扩充后的所述目标数据,计算得到所述计算节点的输出数据。According to the expanded target data, the output data of the computing node is calculated.
  3. 根据权利要求1所述的方法,其特征在于,所述从存储设备读取神经网络中计算节点的目标数据,包括:The method according to claim 1, wherein the reading the target data of the computing node in the neural network from the storage device comprises:
    在从存储设备读取神经网络中计算节点的目标数据的过程中,将所述目标数据占用的比特数由所述第一比特数扩充为所述第二比特数。In the process of reading the target data of the computing node in the neural network from the storage device, the number of bits occupied by the target data is expanded from the first number of bits to the second number of bits.
  4. 根据权利要求2或3所述的方法,其特征在于,所述将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数,包括:The method according to claim 2 or 3, wherein the expanding the number of bits occupied by the target data from the first number of bits to the second number of bits comprises:
    在所述目标数据占用的所述第一比特数个比特的首部和/或尾部共增加N个填充比特,N为所述第二比特数与所述第一比特数之差。A total of N padding bits are added to the header and/or tail of the first number of bits occupied by the target data, where N is the difference between the second number of bits and the first number of bits.
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述目标数据,计算得到所述计算节点的输出数据,包括:The method according to claim 1, wherein the calculating the output data of the computing node according to the target data comprises:
    根据所述目标数据进行预设计算,得到计算结果;Perform preset calculations according to the target data to obtain calculation results;
    采用所述第一比特数对所述计算结果进行量化,得到所述计算节点的输出数据;所述输出数据为采用所述第一比特数表示的定点型数据且占用的比特数为所述第二比特数;The calculation result is quantized by using the first number of bits to obtain the output data of the computing node; the output data is fixed-point data represented by the first number of bits and the number of occupied bits is the first Two-bit number
    将所述输出数据写入所述存储设备。Write the output data to the storage device.
  6. 根据权利要求5所述的方法,其特征在于,所述将所述输出数据写入所述存储设备之前,还包括:The method according to claim 5, wherein before writing the output data to the storage device, the method further comprises:
    将所述输出数据占用的比特数由所述第二比特数压缩为所述第一比特数,得到压缩后的所述输出数据;Compressing the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output data;
    所述将所述输出数据写入所述存储设备,包括:将压缩后的所述输出数据写入所述存储设备。The writing the output data into the storage device includes: writing the compressed output data into the storage device.
  7. 根据权利要求6所述的方法,其特征在于,所述将所述输出数据占用的比特数由所述第二比特数压缩为所述第一比特数,得到压缩后的所述输出数据,包括:The method according to claim 6, wherein the compressing the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output data comprises :
    从所述输出数据占用的所述第二比特数个比特中,选出有效的所述第一比特数个连续比特,作为压缩后的所述输出数据。From the second number of bits occupied by the output data, a valid number of consecutive bits of the first bit is selected as the compressed output data.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述目标数据包括权重参数和/或激活数据。The method according to any one of claims 1-7, wherein the target data includes weight parameters and/or activation data.
  9. 根据权利要求8所述的方法,其特征在于,所述神经网络是根据训练策略训练得到,在所述目标数据包括权重参数时,所述训练策略包括第一训练策略;The method according to claim 8, wherein the neural network is obtained by training according to a training strategy, and when the target data includes a weight parameter, the training strategy includes a first training strategy;
    所述第一训练策略包括:在训练过程中存储浮点型权重参数,在根据所述计算节点的浮点型权重参数进行计算之前,采用所述第一比特数对所述浮点型权重参数进行量化。The first training strategy includes: storing floating-point weight parameters during the training process, and before performing calculations based on the floating-point weight parameters of the computing node, using the first bit number to calculate the floating-point weight parameters Quantify.
  10. 根据权利要求9所述的方法,其特征在于,所述采用所述第一比特数对所述浮点型权重参数进行量化,包括:The method according to claim 9, wherein the quantizing the floating-point weight parameter using the first number of bits comprises:
    将所述浮点型权重参数变换至第一预设范围,得到变换后的浮点型权重参数;所述第一预设范围的最小值大于或等于0;Transform the floating-point weight parameter to a first preset range to obtain a transformed floating-point weight parameter; the minimum value of the first preset range is greater than or equal to 0;
    采用所述第一比特数,对变换后的所述浮点型权重参数进行量化。Using the first number of bits, the transformed floating-point weight parameter is quantized.
  11. 根据权利要求10所述的方法,其特征在于,所述将所述浮点型权重参数变换至第一预设范围,包括:The method according to claim 10, wherein the transforming the floating-point weight parameter to a first preset range comprises:
    对所述浮点型权重参数进行非线性变换,以将所述浮点型权重参数变换至第二预设范围,所述第二预设范围的最小值小于0;Performing a nonlinear transformation on the floating-point weight parameter to transform the floating-point weight parameter to a second preset range, and the minimum value of the second preset range is less than 0;
    对非线性变换后的所述浮点型权重参数进行缩放和移位变换,以将所述浮点型权重参数变换至所述第一预设范围。Performing scaling and shift transformation on the floating-point weight parameter after nonlinear transformation, so as to transform the floating-point weight parameter to the first preset range.
  12. 根据权利要求11所述的方法,其特征在于,所述第二预设范围包括-1到1。The method according to claim 11, wherein the second preset range includes -1 to 1.
  13. 根据权利要求10所述的方法,其特征在于,第一预设范围包括0到1。The method according to claim 10, wherein the first preset range includes 0 to 1.
  14. 根据权利要求10-13任一项所述的方法,其特征在于,采用所述第一比特数对所述浮点型权重参数进行量化的相关运算是可导的。The method according to any one of claims 10-13, wherein the correlation operation of using the first number of bits to quantize the floating-point weight parameter is derivable.
  15. 根据权利要求8所述的方法,其特征在于,所述神经网络是根据训练策略训练得到,在所述目标数据包括激活数据时,所述训练策略包括第二训练策略;The method according to claim 8, wherein the neural network is trained according to a training strategy, and when the target data includes activation data, the training strategy includes a second training strategy;
    所述第二训练策略包括:在训练过程中存储浮点型激活数据,在根据所述计算节点的浮点型激活数据进行计算之前,采用所述第一比特数对所述浮点型激活数据进行量化。The second training strategy includes: storing floating-point activation data during the training process, and before performing calculations based on the floating-point activation data of the computing node, using the first number of bits to perform the calculation on the floating-point activation data Quantify.
  16. 根据权利要求15所述的方法,其特征在于,所述采用所述第一比特数对所述浮点型激活数据进行量化,包括:The method according to claim 15, wherein the quantizing the floating-point activation data using the first number of bits comprises:
    在所述浮点型激活数据的取值范围不包括负数时,直接采用所述第一比特数对所述浮点型激活数据进行量化。When the value range of the floating-point activation data does not include a negative number, the first bit number is directly used to quantize the floating-point activation data.
  17. 根据权利要求16所述的方法,其特征在于,所述采用所述第一比特数对所述浮点型激活数据进行量化,包括:The method according to claim 16, wherein the quantizing the floating-point activation data using the first number of bits comprises:
    在所述浮点型激活数据的取值范围包括负数时,对所述浮点型激活数据进行缩放和移位变换,以将所述浮点型激活数据变换至第三预设范围,所述第三预设范围的最小值大于或等于0;When the value range of the floating-point activation data includes a negative number, scaling and shifting the floating-point activation data are performed to transform the floating-point activation data to a third preset range, the The minimum value of the third preset range is greater than or equal to 0;
    采用所述第一比特数,对缩放和移位变换后的所述浮点型激活数据进行量化。Using the first number of bits, the floating-point activation data after scaling and shifting is quantized.
  18. 根据权利要求8-17任一项所述的方法,其特征在于,所述训练策略还包括第三训练策略;The method according to any one of claims 8-17, wherein the training strategy further comprises a third training strategy;
    所述第三训练策略包括:在训练过程中存储浮点型偏移参数,在根据所述计算节点的浮点型偏移参数进行计算之前,采用第三比特数对所述浮点型偏移参数进行量化,所述第三比特数大于所述第一比特数。The third training strategy includes: storing floating-point offset parameters during the training process, and before performing calculations based on the floating-point offset parameters of the computing node, using a third bit number to perform the floating-point offset The parameter is quantized, and the third number of bits is greater than the first number of bits.
  19. 根据权利要求1-18任一项所述的方法,其特征在于,所述计算节点为所述神经网络中的部分计算节点。The method according to any one of claims 1-18, wherein the computing nodes are part of the computing nodes in the neural network.
  20. 根据权利要求19所述的方法,其特征在于,所述方法还包括:The method according to claim 19, wherein the method further comprises:
    从所述存储设备读取所述神经网络中其他计算节点的节点数据,所述节点数据为采用所述第二比特数表示的定点型数据。Read node data of other computing nodes in the neural network from the storage device, where the node data is fixed-point data represented by the second bit number.
  21. 根据权利要求20所述的方法,其特征在于,所述其他计算节点为采用所述第一比特数表示节点数据与所述第二比特数表示节点数据相比,对所述神经网络精度的降低程度大于程度阈值的计算节点。The method according to claim 20, wherein the other computing node uses the first number of bits to represent node data compared with the second number of bits to represent node data, which reduces the accuracy of the neural network A computing node whose degree is greater than the degree threshold.
  22. 根据权利要求21所述的方法,其特征在于,所述其他计算节点包括 下述中的一个或多个:The method according to claim 21, wherein the other computing nodes comprise one or more of the following:
    所述神经网络中第一层的计算节点、所述神经网络中最后一层的计算节点、所述神经网络中与坐标相关的计算节点。The calculation nodes of the first layer in the neural network, the calculation nodes of the last layer in the neural network, and the calculation nodes related to the coordinates in the neural network.
  23. 根据权利要求1所述的方法,其特征在于,所述第二比特数等于8。The method according to claim 1, wherein the second number of bits is equal to 8.
  24. 根据权利要求23所述的方法,其特征在于,所述第一比特数等于4。The method according to claim 23, wherein the first number of bits is equal to 4.
  25. 根据权利要求1所述的方法,其特征在于,所述第二比特数等于16。The method according to claim 1, wherein the second number of bits is equal to 16.
  26. 根据权利要求25所述的方法,其特征在于,所述第一比特数等于8。The method according to claim 25, wherein the first number of bits is equal to 8.
  27. 根据权利要求1所述的方法,其特征在于,所述计算设备包括下述中的一种或多种:The method according to claim 1, wherein the computing device comprises one or more of the following:
    中央处理器CPU、高级精简指令集处理器ARM、数字信号处理器DSP、图形处理器GPU。Central processing unit CPU, advanced reduced instruction set processor ARM, digital signal processor DSP, graphics processor GPU.
  28. 根据权利要求1所述的方法,其特征在于,所述神经网络包括卷积神经网络。The method of claim 1, wherein the neural network comprises a convolutional neural network.
  29. 一种计算设备,其特征在于,包括:处理器和存储器;A computing device, characterized by comprising: a processor and a memory;
    所述存储器,用于存储程序代码;The memory is used to store program code;
    所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor calls the program code, and when the program code is executed, is used to perform the following operations:
    从存储设备读取神经网络中计算节点的目标数据,所述目标数据为采用第一比特数表示的定点型数据,且所述存储设备中所述目标数据占用的比特数为所述第一比特数,所述第一比特数小于第二比特数,所述第二比特数为所述计算设备的操作数的比特数;Read the target data of the computing node in the neural network from the storage device, where the target data is fixed-point data represented by a first number of bits, and the number of bits occupied by the target data in the storage device is the first bit The first number of bits is smaller than the second number of bits, and the second number of bits is the number of bits of the operand of the computing device;
    根据所述目标数据,计算得到所述计算节点的输出数据。According to the target data, the output data of the computing node is calculated.
  30. 根据权利要求29所述的设备,其特征在于,所述处理器用于根据所述目标数据,计算得到所述计算节点的输出数据,具体包括:The device according to claim 29, wherein the processor is configured to calculate the output data of the computing node according to the target data, which specifically comprises:
    将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数,得到扩充后的所述目标数据;Expanding the number of bits occupied by the target data from the first number of bits to the second number of bits to obtain the expanded target data;
    根据扩充后的所述目标数据,计算得到所述计算节点的输出数据。According to the expanded target data, the output data of the computing node is calculated.
  31. 根据权利要求29所述的设备,其特征在于,所述处理器用于从存储设备读取神经网络中计算节点的目标数据,具体包括:The device according to claim 29, wherein the processor is configured to read target data of a computing node in a neural network from a storage device, which specifically comprises:
    在从存储设备读取神经网络中计算节点的目标数据的过程中,将所述目标数据占用的比特数由所述第一比特数扩充为所述第二比特数。In the process of reading the target data of the computing node in the neural network from the storage device, the number of bits occupied by the target data is expanded from the first number of bits to the second number of bits.
  32. 根据权利要求30或31所述的设备,其特征在于,所述处理器用于将所述目标数据所占用的比特数由所述第一比特数扩充为所述第二比特数,具体包括:The device according to claim 30 or 31, wherein the processor is configured to expand the number of bits occupied by the target data from the first number of bits to the second number of bits, which specifically comprises:
    在所述目标数据占用的所述第一比特数个比特的首部和/或尾部共增加N个填充比特,N为所述第二比特数与所述第一比特数之差。A total of N padding bits are added to the header and/or tail of the first number of bits occupied by the target data, where N is the difference between the second number of bits and the first number of bits.
  33. 根据权利要求29所述的设备,其特征在于,所述处理器用于根据所述目标数据,计算得到所述计算节点的输出数据,具体包括:The device according to claim 29, wherein the processor is configured to calculate the output data of the computing node according to the target data, which specifically comprises:
    根据所述目标数据进行预设计算,得到计算结果;Perform preset calculations according to the target data to obtain calculation results;
    采用所述第一比特数对所述计算结果进行量化,得到所述计算节点的输出数据;所述输出数据为采用所述第一比特数表示的定点型数据且占用的比特数为所述第二比特数;The calculation result is quantized by using the first number of bits to obtain the output data of the computing node; the output data is fixed-point data represented by the first number of bits and the number of occupied bits is the first Two-bit number
    将所述输出数据写入所述存储设备。Write the output data to the storage device.
  34. 根据权利要求33所述的设备,其特征在于,所述处理器,还用于将所述输出数据占用的比特数由所述第二比特数压缩为所述第一比特数,得到压缩后的所述输出数据;The device according to claim 33, wherein the processor is further configured to compress the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed The output data;
    所述处理器用于将所述输出数据写入所述存储设备,具体包括:将压缩后的所述输出数据写入所述存储设备。The processor is configured to write the output data into the storage device, which specifically includes: writing the compressed output data into the storage device.
  35. 根据权利要求34所述的设备,其特征在于,所述处理器用于将所述输出数据占用的比特数由所述第二比特数压缩为所述第一比特数,得到压缩后的所述输出数据,具体包括:The device according to claim 34, wherein the processor is configured to compress the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output Data, including:
    从所述输出数据占用的所述第二比特数个比特中,选出有效的所述第一比特数个连续比特,作为压缩后的所述输出数据。From the second number of bits occupied by the output data, a valid number of consecutive bits of the first bit is selected as the compressed output data.
  36. 根据权利要求29-35任一项所述的设备,其特征在于,所述目标数据包括权重参数和/或激活数据。The device according to any one of claims 29-35, wherein the target data includes weight parameters and/or activation data.
  37. 根据权利要求36所述的设备,其特征在于,所述神经网络是根据训练策略训练得到,在所述目标数据包括权重参数时,所述训练策略包括第一训练策略;The device according to claim 36, wherein the neural network is trained according to a training strategy, and when the target data includes a weight parameter, the training strategy includes a first training strategy;
    所述第一训练策略包括:在训练过程中存储浮点型权重参数,在根据所述计算节点的浮点型权重参数进行计算之前,采用所述第一比特数对所述浮点型权重参数进行量化。The first training strategy includes: storing floating-point weight parameters during the training process, and before performing calculations based on the floating-point weight parameters of the computing node, using the first bit number to calculate the floating-point weight parameters Quantify.
  38. 根据权利要求37所述的设备,其特征在于,所述采用所述第一比特 数对所述浮点型权重参数进行量化,包括:The device according to claim 37, wherein the quantizing the floating-point weight parameter using the first number of bits comprises:
    将所述浮点型权重参数变换至第一预设范围,得到变换后的浮点型权重参数;所述第一预设范围的最小值大于或等于0;Transform the floating-point weight parameter to a first preset range to obtain a transformed floating-point weight parameter; the minimum value of the first preset range is greater than or equal to 0;
    采用所述第一比特数,对变换后的所述浮点型权重参数进行量化。Using the first number of bits, the transformed floating-point weight parameter is quantized.
  39. 根据权利要求38所述的设备,其特征在于,所述将所述浮点型权重参数变换至第一预设范围,包括:The device according to claim 38, wherein said transforming said floating-point weight parameter to a first preset range comprises:
    对所述浮点型权重参数进行非线性变换,以将所述浮点型权重参数变换至第二预设范围,所述第二预设范围的最小值小于0;Performing a nonlinear transformation on the floating-point weight parameter to transform the floating-point weight parameter to a second preset range, and the minimum value of the second preset range is less than 0;
    对非线性变换后的所述浮点型权重参数进行缩放和移位变换,以将所述浮点型权重参数变换至所述第一预设范围。Performing scaling and shift transformation on the floating-point weight parameter after nonlinear transformation, so as to transform the floating-point weight parameter to the first preset range.
  40. 根据权利要求39所述的设备,其特征在于,所述第二预设范围包括-1到1。The device according to claim 39, wherein the second preset range includes -1 to 1.
  41. 根据权利要求38所述的设备,其特征在于,第一预设范围包括0到1。The device of claim 38, wherein the first preset range includes 0 to 1.
  42. 根据权利要求38-41任一项所述的设备,其特征在于,采用所述第一比特数对所述浮点型权重参数进行量化的相关运算是可导的。The device according to any one of claims 38-41, wherein the correlation operation of using the first number of bits to quantize the floating-point weight parameter is derivable.
  43. 根据权利要求36所述的设备,其特征在于,所述神经网络是根据训练策略训练得到,在所述目标数据包括激活数据时,所述训练策略包括第二训练策略;The device according to claim 36, wherein the neural network is trained according to a training strategy, and when the target data includes activation data, the training strategy includes a second training strategy;
    所述第二训练策略包括:在训练过程中存储浮点型激活数据,在根据所述计算节点的浮点型激活数据进行计算之前,采用所述第一比特数对所述浮点型激活数据进行量化。The second training strategy includes: storing floating-point activation data during the training process, and before performing calculations based on the floating-point activation data of the computing node, using the first number of bits to perform the calculation on the floating-point activation data Quantify.
  44. 根据权利要求43所述的设备,其特征在于,所述采用所述第一比特数对所述浮点型激活数据进行量化,包括:The device according to claim 43, wherein the quantizing the floating-point activation data using the first number of bits comprises:
    在所述浮点型激活数据的取值范围不包括负数时,直接采用所述第一比特数对所述浮点型激活数据进行量化。When the value range of the floating-point activation data does not include a negative number, the first bit number is directly used to quantize the floating-point activation data.
  45. 根据权利要求43所述的设备,其特征在于,所述采用所述第一比特数对所述浮点型激活数据进行量化,包括:The device according to claim 43, wherein the quantizing the floating-point activation data using the first number of bits comprises:
    在所述浮点型激活数据的取值范围包括负数时,对所述浮点型激活数据进行缩放和移位变换,以将所述浮点型激活数据变换至第三预设范围,所述第三预设范围的最小值大于或等于0;When the value range of the floating-point activation data includes a negative number, scaling and shifting the floating-point activation data are performed to transform the floating-point activation data to a third preset range, the The minimum value of the third preset range is greater than or equal to 0;
    采用所述第一比特数,对缩放和移位变换后的所述浮点型激活数据进行 量化。Using the first number of bits, the floating-point activation data after scaling and shifting is quantized.
  46. 根据权利要求37-45任一项所述的设备,其特征在于,所述训练策略还包括第三训练策略;The device according to any one of claims 37-45, wherein the training strategy further comprises a third training strategy;
    所述第三训练策略包括:在训练过程中存储浮点型偏移参数,在根据所述计算节点的浮点型偏移参数进行计算之前,采用第三比特数对所述浮点型偏移参数进行量化,所述第三比特数大于所述第一比特数。The third training strategy includes: storing floating-point offset parameters during the training process, and before performing calculations based on the floating-point offset parameters of the computing node, using a third bit number to perform the floating-point offset The parameter is quantized, and the third number of bits is greater than the first number of bits.
  47. 根据权利要求29-46任一项所述的设备,其特征在于,所述计算节点为所述神经网络中的部分计算节点。The device according to any one of claims 29-46, wherein the computing node is a part of the computing node in the neural network.
  48. 根据权利要求47所述的设备,其特征在于,所述处理器还用于:The device according to claim 47, wherein the processor is further configured to:
    从所述存储设备读取所述神经网络中其他计算节点的节点数据,所述节点数据为采用所述第二比特数表示的定点型数据。Read node data of other computing nodes in the neural network from the storage device, where the node data is fixed-point data represented by the second bit number.
  49. 根据权利要求48所述的设备,其特征在于,所述其他计算节点为采用所述第一比特数表示节点数据与所述第二比特数表示节点数据相比,对所述神经网络精度的降低程度大于程度阈值的计算节点。The device according to claim 48, wherein the other computing node uses the first number of bits to represent node data compared with the second number of bits to represent node data, which reduces the accuracy of the neural network A computing node whose degree is greater than the degree threshold.
  50. 根据权利要求49所述的设备,其特征在于,所述其他计算节点包括下述中的一个或多个:The device according to claim 49, wherein the other computing nodes comprise one or more of the following:
    所述神经网络中第一层的计算节点、所述神经网络中最后一层的计算节点、所述神经网络中与坐标相关的计算节点。The calculation nodes of the first layer in the neural network, the calculation nodes of the last layer in the neural network, and the calculation nodes related to the coordinates in the neural network.
  51. 根据权利要求29所述的设备,其特征在于,所述第二比特数等于8。The device according to claim 29, wherein the second number of bits is equal to 8.
  52. 根据权利要求51所述的设备,其特征在于,所述第一比特数等于4。The device according to claim 51, wherein the first number of bits is equal to 4.
  53. 根据权利要求29所述的设备,其特征在于,所述第二比特数等于16。The device according to claim 29, wherein the second number of bits is equal to 16.
  54. 根据权利要求53所述的设备,其特征在于,所述第一比特数等于8。The device according to claim 53, wherein the first number of bits is equal to 8.
  55. 根据权利要求29所述的设备,其特征在于,所述计算设备包括下述中的一种或多种:The device according to claim 29, wherein the computing device comprises one or more of the following:
    中央处理器CPU、高级精简指令集处理器ARM、数字信号处理器DSP、图形处理器GPU。Central processing unit CPU, advanced reduced instruction set processor ARM, digital signal processor DSP, graphics processor GPU.
  56. 根据权利要求29所述的设备,其特征在于,所述神经网络包括卷积神经网络。The device of claim 29, wherein the neural network comprises a convolutional neural network.
  57. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包含至少一段代码,所述至少一段代码可由计算机执行,以控制所述计算机执行如权利要求1-28任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program contains at least one piece of code, and the at least one piece of code can be executed by a computer to control the computer to execute The method of any one of 1-28 is required.
  58. 一种计算机程序,其特征在于,当所述计算机程序被计算机执行时,用于实现如权利要求1-28任一项所述的方法。A computer program, characterized in that, when the computer program is executed by a computer, it is used to implement the method according to any one of claims 1-28.
PCT/CN2019/103198 2019-08-29 2019-08-29 Data processing method and device WO2021035598A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/103198 WO2021035598A1 (en) 2019-08-29 2019-08-29 Data processing method and device
CN201980033583.9A CN112189216A (en) 2019-08-29 2019-08-29 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/103198 WO2021035598A1 (en) 2019-08-29 2019-08-29 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2021035598A1 true WO2021035598A1 (en) 2021-03-04

Family

ID=73918993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103198 WO2021035598A1 (en) 2019-08-29 2019-08-29 Data processing method and device

Country Status (2)

Country Link
CN (1) CN112189216A (en)
WO (1) WO2021035598A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3128555A1 (en) * 2021-10-21 2023-04-28 Stmicroelectronics (Rousset) Sas COMPUTER SYSTEM FOR PROCESSING PIXEL DATA OF AN IMAGE

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022151505A1 (en) * 2021-01-18 2022-07-21 深圳市大疆创新科技有限公司 Neural network quantization method and apparatus, and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502626A (en) * 2016-11-03 2017-03-15 北京百度网讯科技有限公司 Data processing method and device
CN108108809A (en) * 2018-03-05 2018-06-01 山东领能电子科技有限公司 A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork
CN108229670A (en) * 2018-01-05 2018-06-29 中国科学技术大学苏州研究院 Deep neural network based on FPGA accelerates platform
CN109074335A (en) * 2017-12-29 2018-12-21 深圳市大疆创新科技有限公司 Data processing method, equipment, dma controller and computer readable storage medium
CN109992541A (en) * 2017-12-29 2019-07-09 深圳云天励飞技术有限公司 A kind of data method for carrying, Related product and computer storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502626A (en) * 2016-11-03 2017-03-15 北京百度网讯科技有限公司 Data processing method and device
CN109074335A (en) * 2017-12-29 2018-12-21 深圳市大疆创新科技有限公司 Data processing method, equipment, dma controller and computer readable storage medium
CN109992541A (en) * 2017-12-29 2019-07-09 深圳云天励飞技术有限公司 A kind of data method for carrying, Related product and computer storage medium
CN108229670A (en) * 2018-01-05 2018-06-29 中国科学技术大学苏州研究院 Deep neural network based on FPGA accelerates platform
CN108108809A (en) * 2018-03-05 2018-06-01 山东领能电子科技有限公司 A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3128555A1 (en) * 2021-10-21 2023-04-28 Stmicroelectronics (Rousset) Sas COMPUTER SYSTEM FOR PROCESSING PIXEL DATA OF AN IMAGE

Also Published As

Publication number Publication date
CN112189216A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
JP6794593B2 (en) Methods and devices for optimizing and applying multi-layer neural network models, and storage media
CN107608715B (en) Apparatus and method for performing artificial neural network forward operations
WO2021208151A1 (en) Model compression method, image processing method and device
JP6998968B2 (en) Deep neural network execution method, execution device, learning method, learning device and program
TWI791610B (en) Method and apparatus for quantizing artificial neural network and floating-point neural network
CN106951962B (en) Complex arithmetic unit, method and electronic device for neural network
WO2017124644A1 (en) Artificial neural network compression encoding device and method
US11551068B2 (en) Processing system and method for binary weight convolutional neural network
TWI796286B (en) A training method and training system for a machine learning system
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
US20220076123A1 (en) Neural network optimization method, electronic device and processor
WO2020098368A1 (en) Adaptive quantization method and apparatus, device and medium
WO2021035598A1 (en) Data processing method and device
WO2023011002A1 (en) Overflow-aware quantization model training method and apparatus, medium and terminal device
JP7408723B2 (en) Neural network processing unit, neural network processing method and device
CN109472344A (en) The design method of neural network system
TWI775210B (en) Data dividing method and processor for convolution operation
WO2022057406A1 (en) Natural language processing method based on neural network, and electronic device
KR20190089685A (en) Method and apparatus for processing data
TW202022798A (en) Method of processing convolution neural network
CN117574970A (en) Inference acceleration method, system, terminal and medium for large-scale language model
CN112183744A (en) Neural network pruning method and device
CN115953651B (en) Cross-domain equipment-based model training method, device, equipment and medium
TW202333041A (en) System and method performing floating-point operations
JP2020098469A (en) Arithmetic processing device and method for controlling arithmetic processing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19943813

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19943813

Country of ref document: EP

Kind code of ref document: A1