CN114730331A - Data processing apparatus and data processing method - Google Patents

Data processing apparatus and data processing method Download PDF

Info

Publication number
CN114730331A
CN114730331A CN201980102503.0A CN201980102503A CN114730331A CN 114730331 A CN114730331 A CN 114730331A CN 201980102503 A CN201980102503 A CN 201980102503A CN 114730331 A CN114730331 A CN 114730331A
Authority
CN
China
Prior art keywords
initial
convolution kernel
layer
parameter
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980102503.0A
Other languages
Chinese (zh)
Inventor
董镇江
杨帆
李震桁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114730331A publication Critical patent/CN114730331A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a data processing device and a data processing method in the field of artificial intelligence. The application provides a data processing device is used for carrying out convolution processing according to two-dimensional convolution kernel to the matrix of waiting to convolute, and two-dimensional convolution kernel includes a first parameter and M second parameter, waits that the convolution matrix includes a first eigenvalue and M second eigenvalue, M second parameter with M second eigenvalue one-to-one, first parameter correspondence first eigenvalue, data processing device includes: m multipliers and M-1 first adders, configured to perform multiply-accumulate operation on the M second parameters and the M second eigenvalues to obtain a multiply-accumulate result; and the second adder is used for performing addition operation on the multiply-accumulate result and the first characteristic value to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved. The data processing device and the data processing method are beneficial to avoiding resource waste and improving the resource utilization rate.

Description

Data processing apparatus and data processing method Technical Field
The present application relates to data calculation in the field of artificial intelligence, and more particularly, to a data processing apparatus and a data processing method.
Background
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.
Neural Networks (NN) are important branches of artificial intelligence, and are network structures that simulate animal neural network behavior characteristics to process information. The structure of the neural network is formed by connecting a large number of nodes (or called neurons) mutually, and the purpose of processing information is achieved by learning and training input information based on a specific operation model. A neural network comprises an input layer, a hidden layer and an output layer, wherein the input layer is responsible for receiving input signals, the output layer is responsible for outputting calculation results of the neural network, the hidden layer is responsible for calculation processes such as learning and training and is a memory unit of the network, the memory function of the hidden layer is represented by a weight matrix, and each neuron corresponds to a weight coefficient.
The Convolutional Neural Network (CNN) is a multi-layer neural network, each layer is composed of a plurality of two-dimensional planes, each plane is composed of a plurality of independent neurons, and each feature plane may be composed of a plurality of rectangular neural units. The neural units of the same feature plane share weights, where the shared weights are convolution kernels.
In a convolutional neural network, the convolution operation performed by the processor is usually a convolution of feature information in a feature plane and a weight, and is converted into a matrix multiplication operation between a signal matrix and a weight matrix. In the specific matrix multiplication operation, the signal matrix and the weight matrix are subjected to block processing to obtain a plurality of fractal (fractional) signal matrices and fractal weight matrices, and then matrix multiplication and accumulation operation is performed on the plurality of fractal signal matrices and the fractal weight matrices.
Currently, when a processor performs convolution on an input layer, each processing unit (PE) used for performing convolution operation in the processor generally includes 2nMultiply and Accumulate (MAC) unit. Specifically, each PE used by the processor to perform convolution calculations includes 2' snMultiplication unit and 2n-1 adding unit. However, the size of the convolution kernel used for performing the convolution kernel operation is generally less than 2nThis causes some idle operations of the multiplication unit and the addition unit when the PE performs convolution operation, thereby causing resource waste of the processor.
For example, as shown in fig. 15, when a PE performing convolution operation includes 16 MACs and the convolution kernel size is 3 × 3, at least the multiplication units 9 to 15, and the addition unit 8, the addition unit 9, the addition unit 10, and the addition unit 12 in the PE may idle, thereby causing waste of processing resources.
Disclosure of Invention
The application provides a data processing device and a data processing method, which are beneficial to avoiding the waste of resources and improving the utilization rate of the resources.
In a first aspect, the present application provides a data processing apparatus, where the data processing apparatus is configured to perform convolution processing on a to-be-convolved matrix according to a two-dimensional convolution kernel, the two-dimensional convolution kernel includes a first parameter and M second parameters, the to-be-convolved matrix includes a first eigenvalue and M second eigenvalues, the M second parameters are in one-to-one correspondence with the M second eigenvalues, and the first parameter corresponds to the first eigenvalue. The data processing apparatus includes: m multipliers and M-1 first adders, configured to perform multiply-accumulate operation on the M second parameters and the M second eigenvalues to obtain a multiply-accumulate result; and the second adder is used for performing addition operation on the multiply-accumulate result and the first characteristic value to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved. Wherein M is a positive integer greater than 1.
When convolution processing is carried out on a matrix to be convolved with the size of M +1 according to a convolution kernel with the size of M +1, the data processing device calculates convolution results of M parameters in the convolution kernel and M characteristic values in the matrix to be convolved through M multipliers and M-1 adders, calculates the sum of the convolution result and the M +1 th characteristic value in the matrix to be convolved through another adder, and finally takes the sum as the convolution result of the convolution kernel on the matrix to be convolved. This makes it possible to avoid using a plurality of conventional data processing apparatuses including M +1 multipliers and M adders to perform calculations when the size of a convolution kernel to be subjected to convolution calculation is 1 more than the number of multiply-accumulate units in the data processing apparatus, thereby avoiding waste of resources and improving the utilization rate of resources.
The matrix to be convolved may include features of an image input to the neural network, or may include speech features included in speech data input to the neural network, or may include text features included in text data input to the neural network, or the like.
In some possible implementations, the first parameter is equal to 1. That is, the first parameter corresponding to the first characteristic value input to the second adder is 1. Thus, the accuracy of the convolution result calculated by the data processing device can be ensured.
In some possible implementations, the apparatus further includes a processor configured to: determining the two-dimensional convolution kernel according to an initial convolution kernel, wherein the initial convolution kernel comprises a first initial parameter and M second initial parameters, the M second parameters are in one-to-one correspondence with the M second initial parameters, the first parameter corresponds to the first initial parameter, the second parameter is equal to the quotient of the second initial parameter corresponding to the second parameter and the first initial parameter, and the first initial parameter is not zero.
When the data processing apparatus in this implementation is used to calculate a convolution result between an initial convolution kernel and a matrix to be convolved, all parameters in the initial convolution kernel may be first divided by one of the nonzero parameters (i.e., a first initial parameter) by the processor, so that one parameter (i.e., a first parameter) in the obtained two-dimensional convolution kernel may be 1. Thus, when the convolution result of the two-dimensional convolution kernel and the matrix to be convolved is calculated by using the M multipliers and the M adders, a more accurate value can be obtained.
Optionally, the processor may be further configured to: and determining a convolution result of the initial convolution kernel and the matrix to be convolved according to a convolution result of the two-bit convolution kernel and the matrix to be convolved, wherein the convolution result of the initial convolution kernel and the matrix to be convolved is equal to a product of the convolution result of the two-dimensional convolution kernel and the matrix to be convolved and the first initial parameter.
In some possible implementations, when the quotient of at least one of the second initial parameters and the first initial parameter is greater than a first threshold, before the determining the two-dimensional convolution kernel from the initial convolution kernel, the processor is further configured to: and reducing the M second parameters or the M second initial parameters by M times, wherein the quotient of any second initial parameter reduced by M times and the first initial parameter is not more than the first threshold value. Wherein m may be a positive integer.
In the data processing apparatus in this implementation, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is greater than a first threshold, all the second initial parameters may be reduced by m times first, so that the quotient of any reduced second initial parameter and the first initial parameter is not greater than the first threshold, and thus a second parameter obtained by dividing the reduced second initial parameter by the first initial parameter is not greater than the first threshold. The first threshold may be smaller than or equal to the maximum expressible value of the processor, or the first threshold may be smaller than or equal to the maximum expressible value of the multiplier and adder in the data processing apparatus. Therefore, the calculated convolution result can not generate the problem of inaccurate calculation result because of overflowing the maximum expressible value range of the data processing device.
Alternatively, in the data processing apparatus in this implementation, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is greater than a first threshold, all the second parameters may be reduced by m times, so that the reduced second parameter is not greater than the first threshold. The first threshold may be smaller than or equal to the maximum expressible value of the processor, or the first threshold may be smaller than or equal to the maximum expressible value of the multiplier and adder in the data processing apparatus. Therefore, the calculated convolution result can not generate the problem of inaccurate calculation result because of overflowing the maximum expressible value range of the data processing device.
In a case where the processor is further configured to scale down the M second parameters or the M second initial parameters by a factor of M, before adding the multiply-accumulate result and the first eigenvalue, the processor is further configured to: reducing the first characteristic value by m times; correspondingly, the second adder is specifically configured to add the multiply-accumulate result and the reduced first eigenvalue. This is because, when the second initial parameter or the second parameter is reduced by m times, the first parameter should be reduced by m times accordingly, and therefore the processing result of the first parameter on the first feature value should be reduced by m times. The data processing device directly reduces the first characteristic value by m times to ensure the accuracy of a convolution result.
Alternatively, the processor may scale down the first characteristic value by a shift operation. Specifically, the first feature value may be shifted left.
In some possible implementations, when the quotient of at least one of the second initial parameters and the first initial parameter is less than a second threshold, before the determining the two-dimensional convolution kernel according to the initial convolution kernel, the processor is further configured to: expanding the M second parameters or the M second initial parameters by n times, wherein the quotient of any second initial parameter expanded by n times and the first initial parameter is not less than the second threshold value. Wherein n may be a positive integer.
In the data processing apparatus in this implementation, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is smaller than a second threshold, all the second initial parameters may be expanded by m times, so that the quotient of the expanded any second initial parameter and the first initial parameter is not smaller than the second threshold, and thus a second parameter obtained by dividing the reduced second initial parameter by the first initial parameter is not smaller than the second threshold. The second threshold may be greater than or equal to the minimum expressible value of the processor, or the second threshold may be greater than or equal to the minimum expressible value of the multiplier and adder in the data processing apparatus. This makes it possible to prevent the calculated convolution result from causing the problem of inaccurate calculation result due to overflow of the minimum expressible value range of the data processing apparatus.
Or, in the data processing apparatus in this implementation, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is smaller than a second threshold, all the second parameters may be reduced by m times, so that the reduced second parameters are not smaller than the second threshold. The second threshold may be greater than or equal to the minimum expressible value of the processor, or the second threshold may be greater than or equal to the minimum expressible value of the multiplier and adder in the data processing apparatus. This makes it possible to prevent the calculated convolution result from causing the problem of inaccurate calculation result due to overflow of the minimum expressible value range of the data processing apparatus.
In some possible implementations, before the adding the multiply-accumulate result and the first eigenvalue, the processor is further configured to: enlarging the first eigenvalue by a factor of n; correspondingly, the second adder is specifically configured to add the multiply-accumulate result and the expanded first eigenvalue.
This is because, when the second initial parameter or the second parameter is enlarged m times, the first parameter should be enlarged m times accordingly, and therefore, the result of processing the first eigenvalue by the first parameter should be enlarged m times. The data processing device directly enlarges the first characteristic value by m times to ensure the accuracy of the convolution result.
Alternatively, the processor may perform the expansion processing on the first feature value by a shift operation. In particular, the first feature value may be right-shifted.
In some possible implementations, the processor includes one or more of the following in combination: a central processing unit, a graphics processor, or a neural network processor.
In some possible implementations, M is equal to 8, and the two-dimensional convolution kernel is a3 x 3 matrix. Alternatively, the M multipliers and the M-1 first adders may constitute a product accumulator.
In some possible implementations, M is equal to 24 and the two-dimensional convolution kernel is a 5 x 5 matrix. Optionally, the M multipliers and the M-1 first adders constitute 3 product accumulators, wherein each of the product accumulators includes M/3 multipliers and M/3-1 first adders.
In some possible implementations, the two-dimensional convolution kernel is a two-dimensional matrix component of an N-dimensional convolution kernel, where N is an integer greater than 2.
In a second aspect, the present application provides a data processing method, including: obtaining an equivalent convolution kernel of an L-th layer according to an initial convolution kernel of the L-th layer of a neural network, wherein parameters of the equivalent convolution kernel of the L-th layer are obtained based on a quotient of corresponding parameters of the initial convolution kernel of the L-th layer and first initial parameters in the initial convolution kernel of the L-th layer, the values of the first initial parameters are K, K is a nonzero number, and the equivalent convolution kernel of the L-th layer is used for performing convolution processing on a feature map of the L-th layer; acquiring an initial convolution kernel of the L +1 th layer in the L +1 th layer of the neural network, wherein the initial convolution kernel of the L +1 th layer has a mapping relation with the initial convolution kernel of the L +1 th layer; expanding each parameter in the initial convolution kernel of the L +1 th layer by a factor of K; and determining the equivalent convolution kernel of the L +1 th layer according to the expanded initial convolution kernel of the L +1 th layer. Wherein K may be a positive integer.
In the method, parameters in the initial convolution kernel of the L-th layer of the neural network are processed, so that when convolution processing is performed according to the equivalent convolution kernel obtained through processing, one-time multiplication operation processing can be reduced, and when the number of the parameters in the initial convolution kernel is 1 more than that of multipliers in a device for performing convolution calculation, idling of other multipliers and adders in the device is caused without using at least one device more, so that resources can be saved, and the utilization rate of the resources is improved.
In addition, although the equivalent convolution kernel is reduced by K times in the initial convolution kernel processing process, the problem that the convolution result obtained by performing convolution processing by using the equivalent convolution kernel is smaller than the convolution result obtained by performing convolution processing by using the initial convolution kernel by K times can be solved by expanding corresponding parameters in the L +1 th layer of the neural network.
The feature map input to the L-th layer may include features of an image input to the neural network, or may include voice features included in voice data input to the neural network, or may include text features included in text data input to the neural network, or the like.
In some possible implementations, the equivalent convolution kernel of the L-th layer includes M second parameters and one first parameter, where the M second parameters respectively correspond to M second feature values of the feature map, the first parameter corresponds to a first feature value of the feature map, and the first parameter is 1, where the performing convolution processing on the feature map of the L-th layer includes: performing multiply-accumulate operation on the M second parameters and the M second characteristic values to obtain a multiply-accumulate result; and performing addition operation on the multiply-accumulate result and the first characteristic value.
That is to say, the parameters in the equivalent convolution kernel may be obtained by dividing all the parameters in the initial convolution kernel by one of the first non-zero initial parameters, and at this time, the first parameter corresponding to the first initial parameter in the equivalent convolution kernel is 1. In this way, when the feature map is processed by using the equivalent convolution kernel, the multiply-accumulate unit may be used only when M second parameters except the first parameter in the equivalent convolution kernel are calculated to be added to the multiply-accumulate of the corresponding M feature values, and then after the multiply-accumulate result of the M second parameters and the M feature values is obtained by calculation, the multiply-accumulate result may be added to the first feature value corresponding to the first parameter, so as to obtain the convolution result of the equivalent convolution kernel and the feature map.
In some possible implementations, the obtaining of the parameter of the L-th layer equivalent convolution kernel based on a quotient of a corresponding parameter of the L-th layer initial convolution kernel and the first initial parameter in the L-th layer initial convolution kernel includes: when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is larger than a first threshold value, reducing the corresponding parameter of the initial convolution kernel of the L-th layer by m times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer reduced by m times and the first initial parameter is not larger than the first threshold value. Wherein m may be a positive integer.
In the data processing apparatus in this implementation manner, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is greater than a first threshold, all second initial parameters may be first reduced by m times, so that the quotient of any second initial parameter after reduction and the first initial parameter is not greater than the first threshold, and thus a second parameter obtained by dividing the second initial parameter after reduction by the first initial parameter may be not greater than the first threshold. The first threshold may be smaller than or equal to the maximum expressible value of the processor, or the first threshold may be smaller than or equal to the maximum expressible value of the multiplier and adder in the data processing apparatus. Therefore, the calculated convolution result can not generate the problem of inaccurate calculation result because of overflowing the maximum expressible value range of the data processing device.
Optionally, the adding the multiply-accumulate result and the first feature value includes: reducing the first characteristic value by m times; and performing addition operation on the multiply-accumulate result and the reduced first characteristic value.
In some possible implementations, the obtaining of the parameter of the L-th layer equivalent convolution kernel based on a quotient of a corresponding parameter of the L-th layer initial convolution kernel and the first initial parameter in the L-th layer initial convolution kernel includes: and when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is smaller than a second threshold value, expanding the corresponding parameter of the initial convolution kernel of the L-th layer by n times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer expanded by n times and the first initial parameter is not smaller than the second threshold value. Wherein n may be a positive integer.
In the data processing apparatus in this implementation manner, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is smaller than a second threshold, all second initial parameters may be first expanded by m times, so that the quotient of any expanded second initial parameter and the first initial parameter is not smaller than the second threshold, and thus a second parameter obtained by dividing the first initial parameter by the second initial parameter after the expansion is not smaller than the second threshold. The second threshold may be greater than or equal to the minimum expressible value of the processor, or the second threshold may be greater than or equal to the minimum expressible value of the multiplier and adder in the data processing apparatus. This makes it possible to prevent the calculated convolution result from causing the problem of inaccurate calculation result due to overflow of the minimum expressible value range of the data processing apparatus.
Optionally, the adding the multiply-accumulate result and the first feature value includes: enlarging the first eigenvalue by a factor of n; and performing addition operation on the multiply-accumulate result and the expanded first characteristic value.
In a third aspect, the present application provides a data processing apparatus comprising: the processing module is used for obtaining an equivalent convolution kernel of an L-th layer according to an initial convolution kernel of the L-th layer of the neural network, wherein parameters of the equivalent convolution kernel of the L-th layer are obtained based on a quotient of corresponding parameters of the initial convolution kernel of the L-th layer and first initial parameters in the initial convolution kernel of the L-th layer, values of the first initial parameters are K, K is a nonzero number, and the equivalent convolution kernel of the L-th layer is used for performing convolution processing on the feature map of the L-th layer; an obtaining module, configured to obtain an initial convolution kernel of an L +1 th layer in an L +1 th layer of the neural network, where the initial convolution kernel of the L +1 th layer has a mapping relationship with the initial convolution kernel of the L th layer; the expanding module is used for expanding each parameter in the initial convolution kernel of the L +1 th layer by K times; and the determining module is used for determining the equivalent convolution kernel of the L +1 th layer according to the expanded initial convolution kernel of the L +1 th layer. Wherein K may be a positive integer.
The feature map input to the L-th layer may include features of an image input to the neural network, or may include voice features included in voice data input to the neural network, or may include text features included in text data input to the neural network, or the like.
In some possible implementations, the equivalent convolution kernel of the L-th layer includes M second parameters and one first parameter, where the M second parameters respectively correspond to M second eigenvalues of the feature map, the first parameter corresponds to a first eigenvalue of the feature map, and the first parameter is 1; performing convolution processing on the feature map of the L-th layer includes: performing multiply-accumulate operation on the M second parameters and the M second characteristic values to obtain a multiply-accumulate result; and performing addition operation on the multiply-accumulate result and the first characteristic value.
In some possible implementations, the obtaining of the parameter of the L-th layer equivalent convolution kernel based on a quotient of a corresponding parameter of the L-th layer initial convolution kernel and the first initial parameter in the L-th layer initial convolution kernel includes: when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is larger than a first threshold value, reducing the corresponding parameter of the initial convolution kernel of the L-th layer by m times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer reduced by m times and the first initial parameter is not larger than the first threshold value. Wherein m may be a positive integer.
Optionally, the adding the multiply-accumulate result and the first feature value includes: reducing the first characteristic value by m times; and performing addition operation on the multiply-accumulate result and the reduced first characteristic value.
In some possible implementations, the obtaining of the parameter of the L-th layer equivalent convolution kernel based on a quotient of a corresponding parameter of the L-th layer initial convolution kernel and the first initial parameter in the L-th layer initial convolution kernel includes: and when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is smaller than a second threshold value, expanding the corresponding parameter of the initial convolution kernel of the L-th layer by n times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer expanded by n times and the first initial parameter is not smaller than the second threshold value. Wherein n may be a positive integer.
Optionally, the adding the multiply-accumulate result and the first feature value includes: enlarging the first eigenvalue by a factor of n; and performing addition operation on the multiply-accumulate result and the expanded first characteristic value.
In a fourth aspect, the present application provides a data apparatus, comprising: a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to perform the method of the second aspect when the memory-stored program is executed.
In a fifth aspect, the present application provides a computer readable medium storing instructions for execution by a device to implement the method of the second aspect.
In a sixth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the second aspect.
In a seventh aspect, the present application provides a chip, where the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method in the second aspect.
Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, and when the instructions are executed, the processor is configured to execute the method in the second aspect.
In an eighth aspect, the present application provides a computing device comprising a processor and a memory, wherein: the memory has stored therein computer instructions that are executed by the processor to implement the method of the second aspect.
In a ninth aspect, the present application provides a data processing apparatus, which includes a programmable device and a memory, wherein the memory is used for storing a configuration file required by the programmable device to operate, and the programmable device is used for reading the configuration file from the memory and executing the configuration file, so as to implement the method of the second aspect.
Optionally, as an implementation, the programmable device includes a Field Programmable Gate Array (FPGA) or a Complex Programmable Logic Device (CPLD).
Drawings
Fig. 1 is a schematic structural diagram of a convolutional neural network provided in the present application.
Fig. 2 is a schematic structural diagram of another convolutional neural network provided in the present application.
Fig. 3 is a schematic diagram of a hardware structure of a chip provided in the present application.
Fig. 4 is a schematic structural diagram of a system architecture provided in the present application.
Fig. 5 is a schematic flow chart of a data processing method provided in the present application.
Fig. 6 is a schematic diagram of a method for obtaining an equivalent convolution kernel according to the present application.
Fig. 7 is a schematic diagram of a method for obtaining equivalent parameters according to the present application.
Fig. 8 is a schematic diagram of another method for obtaining equivalent parameters provided in the present application.
Fig. 9 is a schematic diagram of another method for obtaining equivalent parameters provided in the present application.
Fig. 10 is a schematic diagram of another method for obtaining an equivalent convolution kernel according to the present application.
Fig. 11 is a schematic diagram of another method for obtaining an equivalent convolution kernel according to the present application.
Fig. 12 is a schematic structural diagram of a data processing apparatus provided in the present application.
Fig. 13 is a schematic structural diagram of another data processing apparatus provided in the present application.
Fig. 14 is a schematic structural diagram of another data processing apparatus provided in the present application.
Fig. 15 is a schematic structural diagram of another data processing apparatus provided in the present application.
Fig. 16 is a schematic diagram of a method for reading data according to the present application.
Fig. 17 is a schematic diagram of a method for reading data according to the present application.
Fig. 18 is a schematic structural diagram of another data processing apparatus provided in the present application.
Fig. 19 is a schematic structural diagram of another data processing apparatus provided in the present application.
Fig. 20 is a schematic structural diagram of a data processing apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application are described below, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiments of the present application relate to related applications of neural networks, and in order to better understand the solution of the embodiments of the present application, the following first introduces related terms and other related concepts of neural networks that may be related to the embodiments of the present application.
A convolutional neural network is a deep neural network with a convolutional structure. The convolutional neural network includes a feature extractor consisting of convolutional layers and sub-sampling layers. The feature extractor may be viewed as a filter and the convolution process may be viewed as convolving an input image or convolved feature plane (feature map) with a trainable filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The underlying principle is: the statistics of one part of the image are the same as the other parts. Meaning that image information learned in one part can also be used in another part. We can use the same learned image information for all locations on the image. In the same convolution layer, a plurality of convolution kernels can be used to extract different image information, and generally, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.
The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
The structure of CNN is described in detail below with reference to fig. 1. As described in the introduction of the basic concept above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images input thereto.
The structure of the convolutional neural network in the embodiment of the present application may be as shown in fig. 1. In fig. 1, Convolutional Neural Network (CNN)300 may include an input layer 310, convolutional/pooling layer 320 (where pooling layer is optional), and neural network layer 330.
Taking image processing as an example (similar operation when the input data is text or voice), the input layer 310 may obtain an image to be processed, and deliver the obtained image to be processed to the convolutional layer/pooling layer 320 and the following neural network layer 330 for processing, so as to obtain a processing result of the image.
The following describes the internal layer structure in CNN 300 in fig. 1 in detail.
Convolutional layer/pooling layer 320:
and (3) rolling layers:
the convolutional layer/pooling layer 320 as shown in fig. 1 may comprise layers as in examples 321-326, for example: in one implementation, 321 layers are convolutional layers, 322 layers are pooling layers, 323 layers are convolutional layers, 324 layers are pooling layers, 325 layers are convolutional layers, 326 layers are pooling layers; in another implementation, 321, 322 are convolutional layers, 323 are pooling layers, 324, 325 are convolutional layers, and 326 are pooling layers. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.
The internal operation principle of one convolutional layer will be described below by taking convolutional layer 321 as an example and taking input data as an image as an example. The internal working principle of convolutional layers is similar when the input data is speech or text or other types of data.
Convolution layer 321 may include a plurality of convolution operators, also called kernels, whose role in image processing is equivalent to a filter for extracting specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel (or two pixels by two pixels … …, depending on the value of step size stride) in the horizontal direction, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the sizes of the convolution feature maps extracted by the plurality of weight matrices having the same size are also the same, and the extracted plurality of convolution feature maps having the same size are combined to form the output of the convolution operation.
The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 300 can perform correct prediction.
When convolutional neural network 300 has multiple convolutional layers, the initial convolutional layer (e.g., 321) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 300 increases, the more convolutional layers (e.g., 326) later extract more complex features, such as features with high levels of semantics, the more highly semantic features are suitable for the problem to be solved.
A pooling layer:
since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the layers 321-326, as illustrated by 320 in fig. 1, may be one convolutional layer followed by one pooling layer, or may be multiple convolutional layers followed by one or more pooling layers. For example, in image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may comprise an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller size images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.
The neural network layer 330:
after processing by convolutional layer/pooling layer 320, convolutional neural network 300 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 320 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), convolutional neural network 300 needs to generate one or a set of the required number of classes of outputs using neural network layer 330. Accordingly, a plurality of hidden layers (331, 332 to 33n shown in fig. 3) and an output layer 340 may be included in the neural network layer 330, and parameters included in the plurality of hidden layers may be obtained by pre-training according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.
After the hidden layers in the neural network layer 330, i.e. the last layer of the whole convolutional neural network 300 is the output layer 340, the output layer 340 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from 310 to 340 in fig. 1 is the forward propagation) of the whole convolutional neural network 300 is completed, the backward propagation (i.e. the propagation from 340 to 310 in fig. 1 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 300, and the error between the result output by the convolutional neural network 300 through the output layer and the ideal result.
The structure of the neural network in the embodiment of the present application may be as shown in fig. 2. In fig. 2, Convolutional Neural Network (CNN)400 may include an input layer 410, a convolutional/pooling layer 420 (where the pooling layer is optional), and a neural network layer 430. In contrast to fig. 1, a plurality of convolutional/pooling layers (421 to 426) in convolutional/pooling layer 420 in fig. 2 are parallel, and the features extracted respectively are all input to whole neural network layer 430 for processing. The neural network layer 430 may include a plurality of hidden layers, hidden layer 1 through hidden layer n, which may be noted as 431 through 43 n.
It should be noted that the convolutional neural networks shown in fig. 1 and fig. 2 are only examples of two possible convolutional neural networks in the embodiment of the present application, and in a specific application, the convolutional neural networks in the embodiment of the present application may also exist in the form of other network models.
The algorithms or operators for the various layers of the convolutional neural network shown in fig. 1 and 2 can be implemented in a chip as shown in fig. 3.
Fig. 3 is a hardware structure of a chip provided in an embodiment of the present application, where the chip includes a neural network processor 50. The chip may be provided in a client device 240 as shown in fig. 4 to implement a corresponding service. The chip may also be disposed in the training apparatus 220 as shown in fig. 4, to complete the training work of the training apparatus 220 and output the target model 201.
The neural network processor NPU 50 is mounted as a coprocessor on a main processing unit (CPU) (host CPU), and tasks are distributed by the main CPU. The core portion of the NPU is an arithmetic circuit 503, and the controller 504 controls the arithmetic circuit 503 to extract data in a memory (weight memory or input memory) and perform an operation.
In some implementations, the arithmetic circuit 503 includes a plurality of processing units (PEs) therein.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 502 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 501 and performs matrix operation with the matrix B, and partial results or final results of the obtained matrix C are stored in an accumulator (accumulator) 508.
The vector calculation unit 507 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculation of non-convolution/non-FC layers in a neural network, such as pooling (Pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.
In some implementations, the vector calculation unit can 507 store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a non-linear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 503, for example for use in subsequent layers in a neural network.
The unified memory 506 is used to store input data as well as output data.
The weight data directly passes through a memory unit access controller 505 (DMAC) to transfer the input data in the external memory to the input memory 501 and/or the unified memory 506, store the weight data in the external memory in the weight memory 502, and store the data in the unified memory 506 in the external memory.
A Bus Interface Unit (BIU) 510, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through a bus.
An instruction fetch buffer 509 connected to the controller 504 for storing instructions used by the controller 504;
the controller 504 is configured to call the instruction cached in the instruction storage 509 to implement controlling the working process of the operation accelerator.
Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are On-Chip memories, and the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.
The operations of the layers in the convolutional neural networks shown in fig. 1 and 2 may be performed by the operation circuit 503 or the vector calculation unit 507.
As shown in fig. 4, the present embodiment provides a system architecture 200. In fig. 4, a data acquisition device 260 is used to acquire training data. Taking the target model 201 for image processing as an example, the training data may include training images and corresponding classification results of the training images, where the results of the training images may be manually pre-labeled results. The target model 201 may also be referred to as target rules 201.
After the training data is collected, the data collection device 260 stores the training data in the database 230, and the training device 220 trains the target model 201 based on the training data maintained in the database 230.
Describing the target model 201 obtained by the training device 220 based on the training data, the training device 220 processes the input original image, and compares the output image with the original image until the difference between the output image and the original image of the training device 120 is smaller than a certain threshold, thereby completing the training of the target model 201.
The target model 201 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 230 may not necessarily all come from the collection of the data collection device 260, and may also be received from other devices. It should be noted that, the training device 220 may not necessarily perform the training of the target model 201 based on the training data maintained by the database 230, and may also obtain the training data from the cloud or other places for performing the model training.
The target model 201 obtained by training according to the training device 220 may be applied to different systems or devices, for example, the target model is applied to the client device 240 shown in fig. 2, where the client device 240 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud.
The training device 220 may generate corresponding target models 201 for different targets or different tasks based on different training data, and the corresponding target models 201 may be used to achieve the targets or complete the tasks, thereby providing the user with the desired results.
The target model 201 is obtained by training according to the training device 220, and may be CNN, Deep Convolutional Neural Networks (DCNN), and the like.
It should be noted that fig. 4 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the position relationship between the devices, modules, and the like, the type of training data, and the type or function of the neural network shown in the diagram do not constitute any limitation. For example, in FIG. 4, the client device 240 and the training device 220 may be the same device.
After the training device 220 trains the obtained neural network, if a target task, such as image segmentation, image recognition, etc., is directly executed according to the trained neural network, and each PE in the operation circuit in the neural network processor can only calculate a convolution result of a convolution kernel with a size M and a feature map with a size M, while the number of parameters in at least one convolution kernel in the neural network is M +1, at least two PEs are required to calculate a convolution result of a convolution kernel with a size M +1 and a feature map with a size M + 1. But this would leave more multipliers and adders in the PE idle, wasting resources.
For example, when a neural network includes a depth separable convolutional layer whose convolution kernel has a size of 3 × 1, and one PE can only compute a multiply-accumulate of 8, two PEs are required to compute the convolution result of the convolution kernel. This results in 6 multipliers and four adders in one PE being idle, wasting resources.
Aiming at the problem of resource waste, the application provides a new data processing method and a data processing device.
FIG. 5 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application. The method includes at least S510 to S540. The method may be performed by the host CPU in fig. 3.
And S510, obtaining an equivalent convolution kernel of the L-th layer according to the initial convolution kernel of the L-th layer of the neural network, wherein the parameter of the equivalent convolution kernel of the L-th layer is obtained based on the quotient of the corresponding parameter of the initial convolution kernel of the L-th layer and a first initial parameter in the initial convolution kernel of the L-th layer, the value of the first initial parameter is K, and K is a nonzero number, and the equivalent convolution kernel of the L-th layer is used for performing convolution processing on the feature map of the L-th layer.
The neural network in the present embodiment may be any neural network including convolution processing. For example, the neural network in the embodiment of the present application may be a convolutional neural network shown in fig. 1 or fig. 2.
The L-th layer in this embodiment may be any layer including convolution processing, and may be, for example, a convolutional layer, or may be a depth separable convolutional layer.
The initial convolution kernel in this embodiment may be a convolution kernel obtained by training a neural network, or may be a convolution kernel obtained by training and optimizing a neural network.
In this embodiment, the parameter in the equivalent convolution kernel of the L-th layer is obtained based on a quotient of a corresponding parameter of the initial convolution kernel of the L-th layer and the first initial parameter in the initial convolution kernel of the L-th layer, and may be understood as: one nonzero parameter in the initial convolution kernel of the L-th layer is called a first initial parameter, all parameters in the initial convolution kernel are divided by the first initial parameter to obtain a quotient of each parameter and the first initial parameter, and an equivalent convolution kernel for performing convolution processing on the feature map of the L-th layer is determined according to the quotient of each parameter and the first initial parameter. That is to say, after obtaining the equivalent convolution kernel, when the L-th layer in the neural network is subsequently used to process the feature map, the initial convolution kernel is not used, but the equivalent convolution kernel is used.
The feature map input to the L-th layer may include features of an image input to the neural network, or may include voice features included in voice data input to the neural network, or may include text features included in text data input to the neural network, or the like.
S520, acquiring the initial convolution kernel of the L +1 th layer in the L +1 th layer of the neural network, wherein the initial convolution kernel of the L +1 th layer has a mapping relation with the initial convolution kernel of the L +1 th layer.
That is, the initial parameters in the first layer after the L-th layer, which have a mapping relationship with the parameters in the aforementioned initial convolution kernel in the L-th layer, are obtained.
The initial parameter of the L +1 th layer may be a parameter of the L +1 th layer obtained by training the neural network, or may be a parameter of the L +1 th layer obtained by training and optimizing the neural network.
The initial parameters in the L +1 th layer having a mapping relationship with the parameters in the aforementioned initial convolution kernel in the L th layer can be understood as: after the characteristic diagram is processed by using the initial convolution kernel at the L-th layer of the neural network and the characteristic value obtained by convolution is output, the characteristic value obtained by convolution is input into the L + 1-th layer, and the parameter originally applied to processing the characteristic value in the L + 1-th layer is the initial parameter which has a mapping relation with the parameter in the initial convolution kernel of the L-th layer in the L + 1-th layer.
The L +1 th layer may be a depth separable convolutional layer, a normal convolutional layer, a fully connected layer, or the like. Wherein a normal convolutional layer may also be referred to as a regular convolutional layer or a standard convolutional layer. And the initial parameters of the different types of L +1 th layers are different from the initial parameters of the convolution kernels of the L-th layer in mapping relation.
Examples of the mapping relationship between the initial parameters in the L +1 th layer of the neural network and the initial parameters in the initial convolution kernel of the L-th layer will be described later with reference to fig. 7, 8, and 9.
S530, expanding each parameter in the initial convolution kernel of the L +1 th layer by K times.
Since the parameters in the equivalent convolution kernel of the L-th layer are obtained based on the quotient of the corresponding parameters of the initial convolution kernel of the L-th layer and the first initial parameters in the initial convolution kernel of the L-th layer in S510, the parameters in the equivalent convolution kernel are usually reduced by K times compared with the initial convolution kernel of the L-th layer. In order to ensure the accuracy of the neural network on the result of the input data processing, the initial parameter of the L +1 th layer can be expanded by K times, so that the characteristic value output by the L-th layer can be compensated.
And S540, determining the equivalent convolution kernel of the L +1 th layer according to the expanded initial convolution kernel of the L +1 th layer.
In this embodiment, the equivalent convolution kernel of the L +1 th layer is used to process the feature value output by the L +1 th layer, or to process the feature value input to the L +1 th layer. That is, when performing business processes, such as image classification, image segmentation, image recognition, etc., in an application scene using the neural network, the initial convolution kernel of the L +1 th layer is not used, but the equivalent convolution kernel of the L +1 th layer is used.
In the data processing method in this embodiment, correlation processing is performed on the initial convolution kernel of the L-th layer, so that when the equivalent convolution kernel obtained through processing is used for performing convolution processing on the feature map input to the L-th layer, the convolution operation between the equivalent convolution kernel and the feature map does not need multiply-accumulate operation any more, but the convolution operation between the equivalent convolution kernel and the feature map can be decomposed into multiply-accumulate operation of a part of convolution parameters and corresponding feature values, and non-multiply-accumulate operation of the multiply-accumulate operation result and the feature values in the feature map without parameter multiply-accumulate operation. Therefore, when the characteristic diagram is subjected to convolution processing on the L-th layer of the neural network, if the number of the parameters in the convolution kernel of the L-th layer is 1 more than the number of times of multiplication and accumulation which can be calculated by an arithmetic circuit for performing convolution processing at one time, the problem of resource waste caused by idling of most of multipliers and adders in the multi-use arithmetic circuit due to the fact that one more arithmetic circuit is needed can be solved.
In addition, when the equivalent convolution kernel is obtained according to the initial convolution kernel of the L layer, the difference between the equivalent convolution kernel and the initial convolution kernel caused by the adopted processing can be compensated back through the difference between the equivalent convolution kernel of the L +1 th layer and the initial convolution kernel, so that the accuracy of processing the input data by the neural network is ensured.
In this embodiment, as a possible implementation manner, obtaining the equivalent convolution kernel of the L-th layer according to the initial convolution kernel of the L-th layer may include the following steps: and determining one parameter in the initial convolution kernel of the L-th layer as a first initial parameter, and dividing all parameters in the initial convolution kernel by the first initial parameter under the condition that the first initial parameter is not zero, wherein the obtained convolution kernel formed by the quotient is the equivalent convolution kernel. If the first initial parameter is 0, the initial convolution kernel can be directly used as an equivalent convolution kernel.
In this implementation, the value obtained by dividing the first initial parameter by the first initial parameter in the initial convolution kernel of the L-th layer is 1, that is, the value of one parameter in the equivalent convolution kernel is 1, and this parameter with the value of 1 is referred to as the first parameter.
In this embodiment, assuming that the initial convolution kernel of the L-th layer includes one first initial parameter and M second initial parameters, the correspondingly obtained equivalent convolution kernel of the L-th layer includes M second parameters and one first parameter, the M second parameters may respectively correspond to M second eigenvalues of the feature map to be processed, and the first parameter may correspond to the first eigenvalue of the feature map to be processed.
In this embodiment, in some implementation manners, when obtaining the parameter of the L-th layer equivalent convolution kernel based on a quotient of a corresponding parameter of the L-th layer initial convolution kernel and the first initial parameter in the L-th layer initial convolution kernel, the method may include the following steps: when the quotient of at least one parameter in the initial convolution kernel of the L-th layer and the first initial parameter in the initial convolution kernel is greater than the first threshold, all parameters in the initial convolution kernel of the L-th layer may be reduced by m times first, and then the parameter reduced by m times is divided by the first initial parameter, where the quotient of the parameter obtained by reducing any parameter in the initial convolution kernel of the L-th layer by m times and the first initial parameter should not be greater than the first threshold.
The first threshold value may be a maximum expressible value of a device used when performing convolution processing on the feature map based on the equivalent convolution kernel. For example, when the apparatus for performing convolution processing on the feature map based on the equivalent convolution kernel is as shown in the neural network processor 50 in fig. 3, the first threshold may be smaller than or equal to the maximum expressible values of the memory and the arithmetic unit therein. Therefore, data overflow can be avoided, and the accuracy of the neural network in service processing is improved.
In some possible implementations, when obtaining the parameter of the L-th layer equivalent convolution kernel based on a quotient of the parameter of the L-th layer initial convolution kernel and the first initial parameter in the L-th layer initial convolution kernel, the following steps may be included: when the quotient of at least one parameter in the initial convolution kernel of the L-th layer and the first initial parameter is smaller than the second threshold, all parameters of the initial convolution kernel of the L-th layer may be expanded by n times, and then the expanded parameters are divided by the first initial parameter, so as to obtain an equivalent convolution kernel, wherein the quotient of the parameter obtained by expanding any parameter of the initial convolution kernel of the L-th layer by n times and the first initial parameter should not be smaller than the second threshold.
The second threshold value may be a minimum expressible value of a device used when performing convolution processing on the feature map based on the equivalent convolution kernel. For example, when the apparatus for performing convolution processing on the feature map based on the equivalent convolution kernel is as shown in the neural network processor 50 in fig. 3, the second threshold may be greater than or equal to the minimum expressible value of the memory and the arithmetic unit therein. Therefore, numerical value overflow can be avoided, and the accuracy of the neural network in service processing is improved.
Taking the convolution kernel with the L-th layer as the depth separable convolution layer and the initial convolution kernel of the L-th layer as 3 × 3 as an example, an implementation mode of obtaining the equivalent convolution kernel of the L-th layer according to the initial convolution kernel of the L-th layer of the neural network is described below.
In this implementation, any one of the 3 × 3 convolution kernels in the lth level is taken as a first initial parameter (or may be referred to as a common factor); when the first initial parameter is not 0, all 9 parameters (i.e. 8 second initial parameters and 1 first initial parameter) of the convolution kernel are divided by the absolute value of the first initial parameter, and when the first initial parameter is 0, all the parameters are unchanged, so as to ensure that 8 normal second parameters and a second parameter with a value of 1, 0 or-1 are in the obtained equivalent convolution kernel. This makes it possible to make the convolution operation of the L-th layer change from 9 MAC operations to a calculation of 8 MAC results (i.e., 8 second parameters and 8 second eigenvalues) added to one constant (i.e., first eigenvalue). Based on the implementation mode, an adder can be added in the existing PE capable of processing 8 MACs, and then processing can be performed based on a convolution kernel of 3 × 3 in each clock, so that the new PE can work at 100% efficiency when processing convolution processing, and the performance of hardware is greatly improved.
As shown in fig. 6, when the initial convolution kernel includes nine initial parameters w0, w1, w2, w3, w4, w5, w6, w7 and w8, w4 is selected as a first initial parameter; in the case that w4 is greater than 0, dividing all nine initial parameters by w4 to obtain an equivalent convolution kernel, in which case w4 is called a common factor; in the case where w4 is equal to 0, the nine initial parameters are not additionally processed, i.e., the equivalent convolution kernel is the same as the initial convolution kernel, in which case the common factor is 0; in the case where w4 is less than 0, the nine initial parameters are divided by-w 4 to obtain an equivalent convolution kernel, in which case, -w4 is referred to as a common factor.
Wherein optionally, when w4 is less than 0, the nine initial parameters may be divided by w4 instead of-w 4.
And performing convolution processing on the characteristic value of the L-th layer input by using an equivalent convolution kernel obtained according to the initial convolution kernel, wherein a convolution result is used as the input of the L + 1-th layer.
An implementation manner of obtaining the equivalent parameters of the L +1 th layer according to the initial parameters of the L +1 th layer of the neural network when the initial convolution kernel of the L th layer is the convolution kernel shown in fig. 6 and the L +1 th layer is also the depth separable convolution layer is described below.
When the L +1 th layer is the depth separable convolution layer, if the common factor of the initial convolution kernel of the L th layer is 0, the initial convolution kernel of the L +1 th layer may not be additionally processed, that is, the initial convolution kernel of the L +1 th layer is the equivalent convolution kernel; otherwise, the initial convolution kernel of the L +1 th layer may be multiplied by a common factor of the initial convolution kernel of the L th layer, and similar operations as in fig. 6 are performed, where the initial convolution kernel of the L +1 th layer is a convolution kernel corresponding to the initial convolution kernel of the L th layer in the L +1 th layer. An example of the initial convolution kernel and the equivalent convolution kernel for the L +1 th layer is shown in fig. 7.
As can be seen from fig. 7, the second parameter in the equivalent convolution kernel of the L +1 th layer is equal to a quotient of a product obtained by multiplying the corresponding second initial parameter in the initial convolution kernel of the L +1 th layer by the first initial parameter in the L th layer and the first initial parameter in the initial convolution kernel of the L +1 th layer.
An implementation manner of obtaining the equivalent parameters of the L +1 th layer according to the initial parameters of the L +1 th layer of the neural network when the initial convolution kernel of the L th layer is a depth separable convolution kernel and the L +1 th layer is a normal convolution layer is described below.
As shown in fig. 8, the L-th layer has 16 input channels (input channels), the 16 input channels are numbered from 0 to 15, the 16 input channels correspond to 16 initial convolution kernels one to one, and the common factor of the 16 initial convolution kernels is w0 to w15, so that when the equivalent parameter of the L + 1-th layer is obtained according to the initial parameter of the L + 1-th layer of the neural network, each convolution kernel in the input channel of the L + 1-th layer can be multiplied by the corresponding common factor of w0 to w15, thereby obtaining the equivalent parameter of the L + 1-th layer.
An implementation manner of obtaining the equivalent parameters of the L +1 th layer according to the initial parameters of the L +1 th layer of the neural network when the initial convolution kernel of the L th layer is the depth separable convolution kernel and the L +1 th layer is the fully connected layer is described below.
As shown in fig. 9, when L +1 is a fully connected layer, it is first determined where each eigenvalue in the feature map output by the L +1 th layer is after the feature map is converted into a 1-dimensional vector, based on the characteristics that the 1-dimensional vector of the feature map is equal to the parameter of the fully connected layer and the positions are in a corresponding relationship, which initial convolution kernel in the L +1 th layer corresponds to each parameter, and each parameter is multiplied by a common factor corresponding to the initial convolution kernel, where the common factor is not zero.
In the data processing method of the present application, all or part of the steps in fig. 5 may be performed on all depth separable convolutional layers in the neural network, or all or part of the steps in fig. 5 may be performed only on part of the depth separable convolutional layers in the neural network.
In the following, taking a first threshold as a maximum expressible value of a built-in hardware data format of a device for implementing convolution processing, that is, taking an example that a quotient of any second initial parameter in the initial convolution kernel of the L-th layer and the first initial parameter exceeds a maximum expressible value range, and the maximum expressible value is a power of 16 of 2, how to process the initial convolution kernel of the L-th layer is described to obtain an equivalent convolution kernel that does not exceed the expressible value range.
As shown in the left diagram of fig. 10, when the first initial parameter in the initial convolution kernel of the L-th layer is 0.001, and one of the second initial parameters is 64000, if 64000 is directly divided by 0.001, the quotient is larger than the maximum value. At this time, as shown in the middle diagram of fig. 10, all the initial parameters may be divided by 2 to the power of 10 (or by 2 to the negative power of 10) and divided by 0.001 to obtain the equivalent convolution kernel as shown in the right diagram of fig. 10. Wherein a power of-10 of 2 is the nearest number in the power of 2 that is less than the smallest expressible value.
Taking the second threshold as the minimum expressible value of the built-in hardware data format of the device for realizing convolution processing, that is, taking the example that the quotient of any second initial parameter and the first initial parameter in the initial convolution kernel of the L-th layer exceeds the minimum expressible value range, and the minimum expressible value is the power of 2 to the power of 16, how to process the initial convolution kernel of the L-th layer is described to obtain the equivalent convolution kernel which does not exceed the expressible value range.
As shown in the left diagram of fig. 11, when the first initial parameter in the initial convolution kernel of the L-th layer is 64000, and one of the second initial parameters is 0.001, if 0.001 is directly divided by 64000, the quotient is smaller than the maximum value. At this point, all initial parameters may be raised to the 16 th power of 2 and divided by 64000 to obtain the equivalent convolution kernel as shown in the right diagram of fig. 11, as shown in the middle diagram of fig. 11. Where a power of 16 of 2 is the number closest to and greater than the maximum expressible value in the power of 2.
The data device provided by the present application is described below. The data processing device provided by the application can be used for carrying out convolution processing on a matrix to be convolved according to a two-dimensional convolution kernel, the two-dimensional convolution kernel comprises a first parameter and M second parameters, the matrix to be convolved comprises a first eigenvalue and M second eigenvalues, the M second parameters correspond to the M second eigenvalues one to one, and the first parameter corresponds to the first eigenvalue.
Fig. 12 is a schematic configuration diagram of a data processing apparatus according to the present application. The data processing apparatus includes: m multipliers, M-1 first adders and a second adder, the M multipliers being Mul 0 to Mul M, the M-1 first adders being ADD 0 to ADD M-2, the second adder being ADD M-1. M is an even number.
The M multipliers and the M-1 first adders are used for performing multiply-accumulate operation on the M second parameters and the M second characteristic values to obtain multiply-accumulate results. The second adder is configured to perform addition operation on the multiply-accumulate result and the first eigenvalue to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved.
When convolution processing is carried out on a matrix to be convolved with the size of M +1 according to a convolution kernel with the size of M +1, the data processing device calculates convolution results of M parameters in the convolution kernel and M characteristic values in the matrix to be convolved through M multipliers and M-1 adders, calculates the sum of the convolution result and the M +1 th characteristic value in the matrix to be convolved through another adder, and finally takes the sum as the convolution result of the convolution kernel on the matrix to be convolved. This makes it possible to avoid using a plurality of conventional data processing apparatuses including M +1 multipliers and M adders to perform calculations when the size of a convolution kernel to be subjected to convolution calculation is 1 more than the number of multiply-accumulate units in the data processing apparatus, thereby avoiding waste of resources and improving the utilization rate of resources.
For example, the data processing apparatus shown in fig. 12 may be used to: and performing convolution processing on the characteristic value matrix to be convolved input into the L-th layer based on the equivalent convolution kernel of the L-th layer in the S510.
The data processing apparatus shown in fig. 12 may be an integral part of the arithmetic circuit 503 in the neural network processor 50 in fig. 3.
In some possible implementations, the first parameter is equal to 1. That is, the first parameter corresponding to the first characteristic value input to the second adder is 1. Thus, the accuracy of the convolution result calculated by the data processing device can be ensured.
In some possible implementations, the apparatus further includes a processor configured to: determining the two-dimensional convolution kernel according to an initial convolution kernel, wherein the initial convolution kernel comprises a first initial parameter and M second initial parameters, the M second parameters are in one-to-one correspondence with the M second initial parameters, the first parameter corresponds to the first initial parameter, the second parameter is equal to the quotient of the second initial parameter corresponding to the second parameter and the first initial parameter, and the first initial parameter is not zero. Alternatively, the processor may be configured to perform S510.
When the data processing apparatus in this implementation is used to calculate a convolution result between an initial convolution kernel and a matrix to be convolved, all parameters in the initial convolution kernel may be first divided by one of the nonzero parameters (i.e., a first initial parameter) by the processor, so that one parameter (i.e., a first parameter) in the obtained two-dimensional convolution kernel may be 1. Thus, when the convolution result of the two-dimensional convolution kernel and the matrix to be convolved is calculated by using the M multipliers and the M adders, a more accurate value can be obtained.
Optionally, the processor may be further configured to: and determining a convolution result of the initial convolution kernel and the matrix to be convolved according to a convolution result of the two-bit convolution kernel and the matrix to be convolved, wherein the convolution result of the initial convolution kernel and the matrix to be convolved is equal to a product of the convolution result of the two-dimensional convolution kernel and the matrix to be convolved and the first initial parameter.
In some possible implementations, when the quotient of at least one of the second initial parameters and the first initial parameter is greater than a first threshold, before the determining the two-dimensional convolution kernel from the initial convolution kernels, the processor is further configured to: and reducing the M second parameters or the M second initial parameters by M times, wherein the quotient of any second initial parameter reduced by M times and the first initial parameter is not more than the first threshold value.
In the data processing apparatus in this implementation, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is greater than a first threshold, all the second initial parameters may be reduced by m times first, so that the quotient of any reduced second initial parameter and the first initial parameter is not greater than the first threshold, and thus a second parameter obtained by dividing the reduced second initial parameter by the first initial parameter is not greater than the first threshold. The first threshold may be smaller than or equal to the maximum expressible value of the processor, or the first threshold may be smaller than or equal to the maximum expressible value of the multiplier and adder in the data processing apparatus. Therefore, the calculated convolution result can not generate the problem of inaccurate calculation result because of overflowing the maximum expressible value range of the data processing device.
The operation of reducing the first feature value by m times may be performed by the processor or may be performed by a shifter. For example, when m is the power of s of 2, the first eigenvalue may be reduced by a factor of m by a shifter capable of shifting at least s bits to the left, s being a non-negative integer. An exemplary configuration of an apparatus for performing convolution processing based on an equivalent convolution kernel on a feature map in such a scenario is shown in fig. 13.
When convolution processing is performed by the apparatus shown in fig. 13, M second parameters in the equivalent convolution kernel and M second feature values in the feature map may be input to M multipliers, and the first feature value in the feature map may be input to a shifter for shifting. The M second parameters and the M second characteristic values are subjected to multiply-accumulate processing by the M multipliers and the M adders to obtain multiply-accumulate results, the characteristic values obtained by reducing the first characteristic values by M times after the first characteristic values are subjected to left shift by the shifter are input to ADD M-1, and the sum of the outputs of the ADD M-1 is the convolution result of the equivalent convolution kernel and the characteristic diagram.
This is because, when the second initial parameter or the second parameter is reduced by m times, the first parameter should be reduced by m times accordingly, and therefore the result of processing the first feature value by the first parameter should be reduced by m times. The data processing device directly reduces the first characteristic value by m times to ensure the accuracy of a convolution result.
For example, when performing convolution processing on the feature map based on the equivalent convolution kernel shown in the right diagram of fig. 10, the first feature value in the feature map may be input to the shifter and shifted to the left by 10 bits, and the obtained feature value may be added to the multiply-accumulate result to obtain the convolution result.
In some possible implementations, when the quotient of at least one of the second initial parameters and the first initial parameter is less than a second threshold, before the determining the two-dimensional convolution kernel according to the initial convolution kernel, the processor is further configured to: expanding the M second parameters or the M second initial parameters by n times, wherein the quotient of any second initial parameter expanded by n times and the first initial parameter is not less than the second threshold value.
In the data processing apparatus in this implementation, when a quotient of any one second initial parameter in the initial convolution kernel and the first initial parameter is smaller than a second threshold, all the second initial parameters may be expanded by m times, so that the quotient of the expanded any second initial parameter and the first initial parameter is not smaller than the second threshold, and thus a second parameter obtained by dividing the reduced second initial parameter by the first initial parameter is not smaller than the second threshold. The second threshold may be greater than or equal to the minimum expressible value of the processor, or the second threshold may be greater than or equal to the minimum expressible value of the multiplier and adder in the data processing apparatus. This makes it possible to prevent the calculated convolution result from causing the problem of inaccurate calculation result due to overflow of the minimum expressible value range of the data processing apparatus.
The operation of enlarging the first eigenvalue by n times may be performed by a main CPU to which a neural network processor that performs convolution processing is attached, or may be performed by a shifter. For example, when n is the t-th power of 2, the first eigenvalue may be enlarged by n times by a shifter capable of right-shifting at least t bits, n being a non-negative integer. An exemplary configuration of an apparatus for performing convolution processing based on an equivalent convolution kernel on a feature map in such a scenario is shown in fig. 13.
When convolution processing is performed by the apparatus shown in fig. 13, M second parameters in the equivalent convolution kernel and M second feature values in the feature map may be input to M multipliers, and the first feature value in the feature map may be input to a shifter for shifting. The M second parameters and the M second characteristic values are subjected to multiply-accumulate processing by the M multipliers and the M adders to obtain multiply-accumulate results, the first characteristic values and the characteristic values are subjected to right shift and n times of expansion by the shifter to obtain characteristic values, the characteristic values are input to ADD M-1, and the sum of the output of the ADD M-1 is the convolution result of the equivalent convolution kernel and the characteristic diagram.
This is because, when the second initial parameter or the second parameter is enlarged m times, the first parameter should be enlarged m times accordingly, and therefore, the result of processing the first eigenvalue by the first parameter should be enlarged m times. The data processing device directly enlarges the first characteristic value by m times to ensure the accuracy of the convolution result.
For example, when performing convolution processing on the feature map based on the equivalent convolution kernel shown in the right diagram of fig. 11, the first feature value in the feature map may be input to the shifter and shifted to the right by 16 bits, and the obtained feature value may be added to the multiply-accumulate result to obtain a convolution result.
In some possible implementations, the processor includes one or more of the following in combination: a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a neural Network Processor (NPU).
In some possible implementations, M is equal to 8, and the two-dimensional convolution kernel is a3 x 3 matrix. Alternatively, the M multipliers and the M-1 first adders may constitute a product accumulator.
For example, as shown in fig. 14, the data processing apparatus may include 8 multipliers and 8 adders, wherein 8 multipliers and 7 adders are used for performing multiply-accumulate on 8 convolution kernel parameters and 8 eigenvalues, and another adder is used for performing add on the multiply-accumulate result and another eigenvalue, so as to obtain a convolution result of a two-bit convolution kernel and an eigenvalue matrix to be convolved.
It will be appreciated that the data processing apparatus may be based on processing 8 multiply-accumulate PEs, with an adder being added to obtain the data processing apparatus of the present application. That is, the data processing apparatus of the present application can be obtained by multiplexing the existing hardware logic and adding the adder.
Alternatively, the 8 multipliers and 7 adders in fig. 14 may be composed of two PEs that can process 4 multiply-accumulate.
Alternatively, as shown in fig. 15, an adder may be added based on processing 16 multiply-accumulated PEs to obtain two data processing apparatuses shown in fig. 14. Here, the output of ADD 6 is no longer output to ADD 14, but is output to ADD 15 newly.
In some possible implementations, M is equal to 24 and the two-dimensional convolution kernel is a 5 x 5 matrix. Optionally, the M multipliers and the M-1 first adders constitute 3 product accumulators, wherein each of the product accumulators includes M/3 multipliers and M/3-1 first adders.
For example, three adders may be added based on 3 PEs that can process 8 multiply-accumulate results to obtain a data processing apparatus of the present application that can check a feature matrix to be convolved with 5 × 5 based on 5 × 5 binary convolution. Wherein, the 3 newly added adders are used for adding the multiply-accumulate result of the 3 PEs and the first eigenvalue in the eigen matrix to be convolved.
In some implementations, the apparatus shown in fig. 14 may further include a shifter, an output port of the shifter being connected to one input port of the ADD 7; alternatively, the apparatus shown in fig. 15 may further include two shifters, in which the output port of one shifter is connected to the input port of ADD 15, and the output port of the other shifter is connected to the input port of ADD 14. In this case, the first eigenvalue in the eigen matrix to be convolved is first input to the shifter for shifting, and the result obtained by shifting and the result obtained by multiplying and accumulating are input to the adder for addition, so as to obtain the convolution result. Such a device including a shifter is used when performing convolution processing based on an equivalent convolution kernel obtained by enlarging or reducing an initial convolution kernel. Of course, when convolution processing is performed based on an equivalent convolution kernel obtained without performing expansion or reduction processing on the initial convolution kernel, a device including a shifter may be used.
In some possible implementations of this embodiment, the two-dimensional convolution kernel is a two-dimensional matrix component of an N-dimensional convolution kernel, where N is an integer greater than 2.
Taking an example that a PE can perform convolution on a feature map including 9 feature values based on a convolution kernel including 9 parameters, the data reading method provided in the present application is exemplarily described below with reference to fig. 16 and 17. For example, the PE has the structure shown in fig. 14 or fig. 15. Further, for example, the PE is obtained by adding an adder, and even a shifter, based on the PE capable of processing 8 multiply-accumulate.
As shown in fig. 16, the left solid box represents the feature map, a square dashed box in the left solid box represents a convolution window, the convolution window is shifted to the right by a step size of 1, the size of the convolution window is determined by the size of the convolution kernel represented by the right solid box, the first row of feature values in the feature map is represented by a0 to An, the second row of feature values is represented by B0 to Bn, and so on; the middle solid box represents a PE; the right solid box represents the convolution kernel K, an example of which is the aforementioned equivalent convolution kernel, which has a size of 3 x 3. In fig. 16, 16 PEs, PE0 to PE15 are exemplarily shown, and for example, such 16 PEs are included in a neural network processor. In fig. 16, arrows indicate data flow.
Each PE may convolve the feature value in one convolution kernel window based on one convolution kernel, and then 16 PEs may convolve the feature value in 16 convolution kernel windows based on the same convolution kernel in one clock cycle.
As shown in fig. 17, at the initial clock, the direct-coupled buffer directly connected to the convolution processing unit may read 3 × 16+2 data from the lower-level buffer, where the 3 × 16+2 data are a0 to a17, B0 to B17, and C0 to C17, and then the direct-coupled buffer may transmit a0 to a2, B0 to B2, and C0 to C2 to the PE0, transmit a1 to A3, B1 to B3, and C1 to C3 to the PE1, and so on until a15 to a17, B15 to B17, and C15 to C17 output to the PE 15.
At the next clock, the direct-connection buffer can only read (16+2) data from the lower-level buffer, where the 18 data are D0 to D17, then B0 to B2, C0 to C2, and D0 to D2 are transmitted to PE0, B1 to B3, C1 to C3, and D1 to D3 are transmitted to PE1, and so on until B15 to B17, C15 to C17, and D15 to D17 are transmitted to PE 15.
This is because the data that the direct-coupled cache needs to transmit to PE0 to PE15 in two clock cycles have partial overlap, so the direct-coupled cache can multiplex the repeated partial data that it read from the lower-level cache in the previous clock cycle in the next clock cycle, and only needs to read the partial data that needs to be updated. Therefore, transmission resources can be saved, and transmission efficiency is improved.
It is to be understood that the transmission of parameters in the convolution kernel is not shown in fig. 17.
If a PE includes two devices configured as shown in fig. 14 or fig. 15, for example, the PE is obtained by adding an adder and even a shifter based on the PE capable of processing 16 multiply-accumulate operations, the method of directly caching read data is similar to the method shown in fig. 17. The difference is that in the initial clock cycle, the straight-through buffer needs to read 16 more data in the convolution window at the horizontal latitude of the feature map, or 16 more data in the next row of the convolution window of fig. 16 at the vertical latitude of the feature map. That is, in the initial clock cycle, the direct connection cache needs to read 4 × data (16+2), and only 2 × data (16+2) can be updated in each subsequent clock cycle.
Fig. 18 is an exemplary block diagram of a data processing apparatus according to the present application. The apparatus 1800 includes a processing module 1810, an acquisition module 1820, a dilation module 1830, and a determination module 1840. The apparatus 1800 may implement the method illustrated in fig. 5, described above.
For example, the processing module 1810 may be used to perform S510, the obtaining module 1820 may be used to perform S520, the expanding module 1830 may be used to perform S530, and the determining module 1840 may be used to perform S540.
In some possible implementations, the apparatus 1800 may be the main CPU in fig. 3; in other possible implementations, the apparatus 1800 may be the training device 220 shown in fig. 4; in other possible implementations, the apparatus 1800 may be the client device 240 described in fig. 4.
The present application also provides an apparatus 1900 as shown in fig. 19, the apparatus 1900 comprising a processor 1902, a communication interface 1903 and a memory 1904. An example of the device 1900 is a chip. Another example of an apparatus 1900 is a computing device.
The processor 1902, the memory 1904, and the communication interface 1903 may communicate with one another via a bus. The memory 1904 stores executable code, and the processor 1902 reads the executable code in the memory 1904 to execute a corresponding method. The memory 1904 may also include other software modules required to run processes, such as an operating system. The operating system may be LINUXTM,UNIX TM,WINDOWS TMAnd the like.
For example, the executable code in memory 1904 is used to implement the method shown in FIG. 5, and processor 1902 reads the executable code in memory 1804 to perform the method shown in FIG. 5.
The processor 1902 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or the like. The memory 1904 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 1904 may also include a non-volatile memory (NVM), such as read-only memory (ROM), flash memory, Hard Disk Drive (HDD), or Solid State Drive (SSD).
The present application further provides a data processing apparatus 2000 as shown in fig. 20, which includes a programmable device 2001 and a memory 2002, where the memory 2002 is used for storing a configuration file required by the programmable device 2001 to operate, and the programmable device 2001 is used for reading the configuration file from the memory 2002 and executing the configuration file to implement a corresponding method.
The Programmable Device may include a Field Programmable Gate Array (FPGA) or a Complex Programmable Logic Device (CPLD).
Taking an FPGA as an example, those skilled in the art can understand that the basic working principle is to change the content of the configuration RAM inside the FPGA by loading one configuration data (for example, existing in a configuration file form), so as to change the configuration of various logic resources inside the FPGA to realize different circuit functions, and the configuration data can be loaded for multiple times, so that the FPGA can complete different functions by loading different configuration data, and has good flexibility. In the actual application, the function of the FPGA often needs to be updated, at this time, the new configuration data may be loaded into the FPGA configuration memory in advance, and then the FPGA is enabled to load the new configuration data to implement the function defined by the new configuration data, which is called as an upgrade process of the FPGA. Meanwhile, when the FPGA leaves the factory, the FPGA is provided with a configuration loading circuit for loading configuration data, and the configuration loading circuit can be used to ensure the most basic loading operation after a user-defined circuit function (i.e., a function defined by the configuration data) fails.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (28)

  1. A data processing apparatus, wherein the data processing apparatus is configured to perform convolution processing on a matrix to be convolved according to a two-dimensional convolution kernel, the two-dimensional convolution kernel includes a first parameter and M second parameters, the matrix to be convolved includes a first eigenvalue and M second eigenvalues, the M second parameters are in one-to-one correspondence with the M second eigenvalues, and the first parameter corresponds to the first eigenvalue, the data processing apparatus comprising:
    m multipliers and M-1 first adders, configured to perform multiply-accumulate operation on the M second parameters and the M second eigenvalues to obtain a multiply-accumulate result;
    and the second adder is used for performing addition operation on the multiply-accumulate result and the first characteristic value to obtain a convolution result of the two-dimensional convolution kernel and the matrix to be convolved.
  2. The apparatus of claim 1, wherein the first parameter is equal to 1.
  3. The apparatus of claim 1 or 2, wherein the apparatus further comprises a processor to: determining the two-dimensional convolution kernel according to an initial convolution kernel, wherein the initial convolution kernel comprises a first initial parameter and M second initial parameters, the M second parameters are in one-to-one correspondence with the M second initial parameters, the first parameter corresponds to the first initial parameter, the second parameter is equal to the quotient of the second initial parameter corresponding to the second parameter and the first initial parameter, and the first initial parameter is not zero.
  4. The apparatus of claim 3, wherein when a quotient of at least one of the second initial parameters and the first initial parameters is greater than a first threshold, the processor is further configured to, prior to said determining the two-dimensional convolution kernel from the initial convolution kernel:
    and reducing the M second parameters or the M second initial parameters by M times, wherein the quotient of any second initial parameter reduced by M times and the first initial parameter is not more than the first threshold value.
  5. The apparatus of claim 4, wherein prior to said adding said multiply-accumulate result and said first eigenvalue, said processor is further configured to:
    reducing the first characteristic value by m times;
    correspondingly, the second adder is specifically configured to add the multiply-accumulate result and the reduced first eigenvalue.
  6. The apparatus of claim 3, wherein when a quotient of at least one of the second initial parameters and the first initial parameters is less than a second threshold, the processor is further configured to, prior to said determining the two-dimensional convolution kernel from the initial convolution kernels:
    expanding the M second parameters or the M second initial parameters by n times, wherein the quotient of any second initial parameter expanded by n times and the first initial parameter is not less than the second threshold value.
  7. The apparatus of claim 6, wherein prior to said adding said multiply-accumulate result and said first eigenvalue, said processor is further configured to:
    enlarging the first eigenvalue by a factor of n;
    correspondingly, the second adder is specifically configured to add the multiply-accumulate result and the expanded first eigenvalue.
  8. The apparatus of any one of claims 3 to 7, wherein the processor comprises a combination of one or more of: a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a neural Network Processor (NPU).
  9. The apparatus of any one of claims 1 to 8, wherein M is equal to 8 and the two-dimensional convolution kernel is a3 x 3 matrix.
  10. The apparatus of claim 8 wherein said M multipliers and said M-1 first adders form a product accumulator.
  11. The apparatus of any one of claims 1 to 8, wherein M is equal to 24 and the two-dimensional convolution kernel is a 5 x 5 matrix.
  12. The apparatus of claim 11 wherein said M multipliers and said M-1 first adders form 3 product accumulators, wherein each of said product accumulators includes M/3 multipliers and M/3-1 first adders.
  13. The apparatus of any one of claims 1 to 12, wherein the two-dimensional convolution kernel is a two-dimensional matrix component of an N-dimensional convolution kernel, N being an integer greater than 2.
  14. A data processing method, comprising:
    obtaining an equivalent convolution kernel of an L-th layer according to an initial convolution kernel of the L-th layer of a neural network, wherein parameters of the equivalent convolution kernel of the L-th layer are obtained based on a quotient of corresponding parameters of the initial convolution kernel of the L-th layer and first initial parameters in the initial convolution kernel of the L-th layer, the values of the first initial parameters are K, K is a nonzero number, and the equivalent convolution kernel of the L-th layer is used for performing convolution processing on a feature map of the L-th layer;
    acquiring an initial convolution kernel of the L +1 th layer in the L +1 th layer of the neural network, wherein the initial convolution kernel of the L +1 th layer has a mapping relation with the initial convolution kernel of the L +1 th layer;
    expanding each parameter in the initial convolution kernel of the L +1 th layer by a factor of K;
    and determining the equivalent convolution kernel of the L +1 layer according to the initial convolution kernel of the L +1 layer after the expansion processing.
  15. The method according to claim 14, wherein the equivalent convolution kernel of the L-th layer includes M second parameters and a first parameter, the M second parameters respectively correspond to M second eigenvalues of the feature map, the first parameter corresponds to a first eigenvalue of the feature map, and the first parameter is 1;
    performing convolution processing on the feature map of the L-th layer includes:
    performing multiply-accumulate operation on the M second parameters and the M second characteristic values to obtain a multiply-accumulate result;
    and performing addition operation on the multiply-accumulate result and the first characteristic value.
  16. The method of claim 14, wherein the parameters of the L-th layer equivalent convolution kernel are obtained based on a quotient of corresponding parameters of the L-th layer initial convolution kernel and first initial parameters in the L-th layer initial convolution kernel, comprising:
    when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is larger than a first threshold value, reducing the corresponding parameter of the initial convolution kernel of the L-th layer by m times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer reduced by m times and the first initial parameter is not larger than the first threshold value.
  17. The method of claim 16, wherein said adding the multiply-accumulate result and the first eigenvalue comprises:
    reducing the first characteristic value by m times;
    and performing addition operation on the multiply-accumulate result and the reduced first characteristic value.
  18. The method of claim 14, wherein the parameters of the L-th layer equivalent convolution kernel are obtained based on a quotient of corresponding parameters of the L-th layer initial convolution kernel and first initial parameters in the L-th layer initial convolution kernel, comprising:
    and when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is smaller than a second threshold value, expanding the corresponding parameter of the initial convolution kernel of the L-th layer by n times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer expanded by n times and the first initial parameter is not smaller than the second threshold value.
  19. The method of claim 18, wherein said adding said multiply-accumulate result and said first eigenvalue comprises:
    enlarging the first eigenvalue by a factor of n;
    and performing addition operation on the multiply-accumulate result and the expanded first characteristic value.
  20. A data processing apparatus, comprising:
    the processing module is used for obtaining an equivalent convolution kernel of an L-th layer according to an initial convolution kernel of the L-th layer of the neural network, wherein parameters of the equivalent convolution kernel of the L-th layer are obtained based on a quotient of corresponding parameters of the initial convolution kernel of the L-th layer and first initial parameters in the initial convolution kernel of the L-th layer, values of the first initial parameters are K, K is a nonzero number, and the equivalent convolution kernel of the L-th layer is used for performing convolution processing on the feature map of the L-th layer;
    an obtaining module, configured to obtain an initial convolution kernel of the L +1 th layer in an L +1 th layer of the neural network, where the initial convolution kernel of the L +1 th layer has a mapping relationship with the initial convolution kernel of the L +1 th layer;
    the expanding module is used for expanding each parameter in the initial convolution kernel of the L +1 th layer by K times;
    and the determining module is used for determining the equivalent convolution kernel of the L +1 th layer according to the expanded initial convolution kernel of the L +1 th layer.
  21. The apparatus according to claim 20, wherein the equivalent convolution kernel of the L-th layer includes M second parameters and a first parameter, the M second parameters respectively correspond to M second eigenvalues of the feature map, the first parameter corresponds to a first eigenvalue of the feature map, and the first parameter is 1;
    performing convolution processing on the feature map of the L-th layer includes:
    performing multiply-accumulate operation on the M second parameters and the M second characteristic values to obtain a multiply-accumulate result;
    and performing addition operation on the multiply-accumulate result and the first characteristic value.
  22. The apparatus of claim 20, wherein the parameters of the L-th layer equivalent convolution kernel are obtained based on a quotient of corresponding parameters of the L-th layer initial convolution kernel and first initial parameters in the L-th layer initial convolution kernel, comprising:
    when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is larger than a first threshold value, reducing the corresponding parameter of the initial convolution kernel of the L-th layer by m times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer, which is reduced by m times, and the first initial parameter is not larger than the first threshold value.
  23. The apparatus as recited in claim 22 wherein said adding said multiply-accumulate result and said first eigenvalue comprises:
    reducing the first characteristic value by m times;
    and performing addition operation on the multiply-accumulate result and the reduced first characteristic value.
  24. The apparatus of claim 20, wherein the parameters of the L-th layer equivalent convolution kernel are obtained based on a quotient of corresponding parameters of the L-th layer initial convolution kernel and first initial parameters in the L-th layer initial convolution kernel, comprising:
    and when the quotient of the parameter of at least one initial convolution kernel of the L-th layer and the first initial parameter is smaller than a second threshold value, expanding the corresponding parameter of the initial convolution kernel of the L-th layer by n times, wherein the quotient of any parameter of the initial convolution kernel of the L-th layer expanded by n times and the first initial parameter is not smaller than the second threshold value.
  25. The apparatus as recited in claim 24 wherein said adding said multiply-accumulate result and said first eigenvalue comprises:
    enlarging the first eigenvalue by a factor of n;
    and performing addition operation on the multiply-accumulate result and the expanded first characteristic value.
  26. A computer-readable storage medium comprising instructions which, when executed on a processor, cause the processor to perform the method of any of claims 14 to 19.
  27. A data processing apparatus comprising a programmable device and a memory, said memory being arranged to store a configuration file required for operation of said programmable device, said programmable device being arranged to read said configuration file from said memory and to perform the method of any of claims 14 to 19.
  28. The apparatus of claim 27, wherein the programmable device comprises a Field Programmable Gate Array (FPGA) or a Complex Programmable Logic Device (CPLD).
CN201980102503.0A 2019-12-18 2019-12-18 Data processing apparatus and data processing method Pending CN114730331A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/126179 WO2021120036A1 (en) 2019-12-18 2019-12-18 Data processing apparatus and data processing method

Publications (1)

Publication Number Publication Date
CN114730331A true CN114730331A (en) 2022-07-08

Family

ID=76476963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980102503.0A Pending CN114730331A (en) 2019-12-18 2019-12-18 Data processing apparatus and data processing method

Country Status (2)

Country Link
CN (1) CN114730331A (en)
WO (1) WO2021120036A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278360A (en) * 2022-07-18 2022-11-01 天翼云科技有限公司 Video data processing method and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201512278D0 (en) * 2015-07-14 2015-08-19 Apical Ltd Hybrid neural network
CN105654729B (en) * 2016-03-28 2018-01-02 南京邮电大学 A kind of short-term traffic flow forecast method based on convolutional neural networks
CN109313663B (en) * 2018-01-15 2023-03-31 深圳鲲云信息科技有限公司 Artificial intelligence calculation auxiliary processing device, method, storage medium and terminal
CN109886391B (en) * 2019-01-30 2023-04-28 东南大学 Neural network compression method based on space forward and backward diagonal convolution

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278360A (en) * 2022-07-18 2022-11-01 天翼云科技有限公司 Video data processing method and electronic equipment
CN115278360B (en) * 2022-07-18 2023-11-07 天翼云科技有限公司 Video data processing method and electronic equipment

Also Published As

Publication number Publication date
WO2021120036A1 (en) 2021-06-24

Similar Documents

Publication Publication Date Title
CN110175671B (en) Neural network construction method, image processing method and device
CN109190756B (en) Arithmetic device based on Winograd convolution and neural network processor comprising same
US20210224125A1 (en) Operation Accelerator, Processing Method, and Related Device
WO2021018163A1 (en) Neural network search method and apparatus
WO2022042713A1 (en) Deep learning training method and apparatus for use in computing device
CN112418392A (en) Neural network construction method and device
CN112163601B (en) Image classification method, system, computer device and storage medium
CN112215332B (en) Searching method, image processing method and device for neural network structure
CN113326930B (en) Data processing method, neural network training method, related device and equipment
CN112561027A (en) Neural network architecture searching method, image processing method, device and storage medium
CN112529146B (en) Neural network model training method and device
CN111882031A (en) Neural network distillation method and device
EP4379607A1 (en) Neural network accelerator, and data processing method for neural network accelerator
CN111931901A (en) Neural network construction method and device
CN112561028A (en) Method for training neural network model, and method and device for data processing
CN111797970B (en) Method and device for training neural network
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
CN115081588A (en) Neural network parameter quantification method and device
CN112789627A (en) Neural network processor, data processing method and related equipment
US20230143985A1 (en) Data feature extraction method and related apparatus
CN114698395A (en) Quantification method and device of neural network model, and data processing method and device
CN116888605A (en) Operation method, training method and device of neural network model
CN114298289A (en) Data processing method, data processing equipment and storage medium
CN114730331A (en) Data processing apparatus and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination