WO2023059215A1 - Apparatus and method for winograd convolution - Google Patents

Apparatus and method for winograd convolution Download PDF

Info

Publication number
WO2023059215A1
WO2023059215A1 PCT/RU2021/000416 RU2021000416W WO2023059215A1 WO 2023059215 A1 WO2023059215 A1 WO 2023059215A1 RU 2021000416 W RU2021000416 W RU 2021000416W WO 2023059215 A1 WO2023059215 A1 WO 2023059215A1
Authority
WO
WIPO (PCT)
Prior art keywords
winograd
tensor
balancing
floating point
filter
Prior art date
Application number
PCT/RU2021/000416
Other languages
French (fr)
Other versions
WO2023059215A8 (en
Inventor
Vladimir Maximovich CHIKIN
Vladimir Mikhailovich KRYZHANOVSKIY
Alexandr Alexandrovich ZURUEV
Yury Alexandrovich PARFENOV
Original Assignee
Huawei Technologies Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd filed Critical Huawei Technologies Co., Ltd
Priority to PCT/RU2021/000416 priority Critical patent/WO2023059215A1/en
Publication of WO2023059215A1 publication Critical patent/WO2023059215A1/en
Publication of WO2023059215A8 publication Critical patent/WO2023059215A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present disclosure relates to an apparatus and a method for processing a matrix in the fields of artificial intelligence.
  • the disclosure relates to an apparatus and a method for performing a convolution in an artificial neural network.
  • ANNs Artificial neural networks
  • An ANN usually involves massive data processing, such as matrix convolution.
  • a matrix convolution may be seen as a process of adding each element of an input matrix to its local neighbours, weighted by a kernel matrix (or filter). Therefore, the matrix convolution normally includes matrix addition and multiplication.
  • the matrix convolution is often used in a convolution neural network (CNN).
  • Winograd algorithm an algorithm for performing matrix multiplication, which is often referred to as “Coppersmith-Winograd algorithm”, or simply, “Winograd algorithm”.
  • the matrix convolution performed based on the Winograd algorithm may be referred to as “Winograd-based convolution”, or simply “Winograd convolution”.
  • Winograd convolution is widely used in ANNs to reduce the computational complexity, e.g., by reducing the number of multiplications.
  • two operands of a convolution e.g. an input and a filter
  • a transformed output is obtained in the Winograd domain.
  • this transformed output is transformed back to a normal domain (sometimes also referred to as a spatial domain).
  • Transformation matrices B, G and A depend on specific configurations of the Winograd convolution and have different values commonly known in the art for different configurations of the Winograd convolution. For efficient implementation and execution, it is often desired to perform Winograd convolution with integer data.
  • a floating point neural network is often quantized to an integer neural network.
  • the floating point neural network is a neural network comprising floating point parameters, such as inputs and filters.
  • the integer neural network is a neural network comprising only integer parameters.
  • a quantized 8-bit integer (INT8) neural network can achieve comparable accuracies as a 32-bit float-point (FP32) neural network.
  • FP32 float-point
  • INT8 32-bit float-point
  • model sizes can be reduced by a factor of four compared to the FP32 model.
  • calculations can be accelerated for quantized integer neural networks on processors compared to their floating point counterparts.
  • the speedup can be further improved. Overall, quantization may bring improvements including model compression and latency reduction.
  • Quantization of a neural network with a Winograd algorithm may lead to a significant drop in the performance of the neural network in some scenarios.
  • an actual neural network can have thousands of parameters.
  • 2 8 256 integer numbers (from -128 to 127) can be used. This is often much less than the range of a floating point neural network.
  • the floating point neural network has a range of 0.0001 to 0.1 (i.e., [0.0001, 0.1])
  • 256 is much less than 10000. Therefore, rounding may be performed and rounding error may be introduced. This may lead to information loss and may jeopardize the accuracy of the quantized neural network.
  • Winograd convolution involves a preliminary calculation of parameters in order to achieve transformed inputs and transformed filters in the Winograd domain, a neural network with Winograd convolution is much less vulnerable to errors introduced by quantization.
  • Apparatus and methods according to this disclosure facilitate performing a Winograd convolution of a neural network in a robust and efficient manner. This and other objectives are achieved by the subject matter of the independent claims. Further implementations forms are apparent from the dependent claims, the description and the drawings.
  • a balancing tensor is used to balance channel ranges of inputs and weights at each layer of a neural network before quantization.
  • a direct algorithm for calculating the balancing tensor is disclosed, which exploits the distributions of inputs and weights at each layer of a neural network.
  • a first aspect of the present disclosure provides an apparatus for performing Winograd-based convolution of a floating point neural network.
  • the apparatus is configured to generate a Winograd input tensor based on an original input tensor, wherein the Winograd input tensor comprises one or more first floating point channels.
  • the apparatus is configured to generate a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the Winograd filter tensor comprises one or more second floating point channels.
  • a tensor may be a data structure used in neural networks in the field of artificial intelligence to carry a specific amount of information.
  • the tensor may be: a 0-dimensional (0-D) array, such as a single number; a 1 -dimensional (1 -D) array, such as a vector; a 2-dimensional (2-D) array, such as a matrix; a 3 -dimensional (3-D) array, such as data representing an RGB image, or
  • the apparatus may be configured to transform a tensor with a specific dimension into another dimension.
  • a vector may be transformed into a matrix, while a matrix may also be transformed into a vector. This may be useful for satisfying different requirements of inputs required by different configurations of the Winograd convolution.
  • a tensor (i.e., the input or the filter tensor) may comprise one or more channels.
  • a channel may be used to transmit information from a certain aspect. That is, a channel may have a certain capacity for transmitting information.
  • the number of the one or more channels is the depth of the tensor involved in the convolution.
  • all channels of a tensor may share a same size.
  • an NxM pixel RGB image may also be represented by a 2D (N X M) tensor with three channels: red, green and blue.
  • the Winograd convolution may be performed at one or more hidden layers of the floating point neural network.
  • the original input tensor may be or may be part of an input applied to a hidden layer.
  • the input applied to a hidden layer may be an output of a previous layer.
  • the apparatus may be configured to apply Winograd transformation on the original input tensor.
  • the original input tensor may be split into several tiles that are suitable for performing the Winograd convolution. Then, the apparatus may be configured to transform each tile into the Winograd input tensor.
  • the apparatus may be configured to apply Winograd transformation to the original filter tensor in order to obtain the Winograd filter tensor.
  • the filter tensor of the floating point neural network may be weights of neurons at each hidden layer of the floating point neural network. Therefore, the filter tensor may also be referred to as a weight tensor.
  • a floating point channel may be a channel comprising at least one element that is a floating point value.
  • the apparatus is configured to determine a balancing tensor based on the Winograd input tensor and the Winograd filter tensor.
  • the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels.
  • the balancing tensor may comprise one or more balancing coefficients.
  • the one or more balancing coefficients, the one or more first channels, and the one or more second channels may be in a one-to-one correspondence.
  • the apparatus may be configured to balance each first channel and each second channel based on a corresponding balancing coefficient.
  • the apparatus may be configured to divide the Winograd input tensor by the balancing tensor and multiply the Winograd filter tensor by the balancing tensor.
  • the apparatus may be configured to multiply the Winograd input tensor by the balancing tensor and divide the Winograd filter tensor by the balancing tensor. In this way, the balancing tensor may be canceled afterwards when Winograd multiplication of the Winograd input tensor and the Winograd filter tensor is performed. Therefore, no additional operation is introduced.
  • the apparatus is configured to determine a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor.
  • the first and the second scale factors are adapted to quantize the one or more first balanced floating point channels and the one or more second balanced floating point channels into one or more first integer channels and one or more second integer channels, respectively.
  • the quantization errors may be reduced to a minimum.
  • the apparatus is configured to perform the Winograd convolution based on the balancing tensor, the first scale factor, and the second scale factor.
  • the apparatus may be configured to obtain a balanced and quantized Winograd input sensor and a balanced and quantized Winograd filter tensor based on the balancing tensor, the first scale factor, and the second scale factor. Then, the apparatus may be configured to perform Winograd multiplication based on the balanced and quantized Winograd input sensor and the balanced and quantized Winograd filter tensor.
  • channel ranges of the Winograd filter and input tensors can be balanced while the number of operations of a Winograd convolution based thereon is equivalent to that of the conventional Winograd convolution. Further, quantization errors can be reduced because of the balanced channel ranges. In this way, the precision of the Winograd convolution according to the present disclosure can be increased.
  • the balancing of the Winograd filter and input tensors can be compatible with various quantization and training techniques in the art, such as post-training quantization and quantization aware training.
  • the balancing of the Winograd filter and input tensors can be universal. Because the balancing does not depend on any specific type of the Winograd convolution, such as bit width, quantization scheme, scale type and so on. Therefore, the balancing can be applied to Winograd algorithm of any type.
  • the floating point neural network may be a trained neural network.
  • the apparatus may be configured to use the trained neural network for image processing such as image classification and image feature extraction.
  • the apparatus may be further configured to obtain an image or a feature map of the image as the original input.
  • the apparatus may be configured to determine the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor. Then, the apparatus may be configured to process the image or the feature map of the image by performing the Winograd convolution.
  • the apparatus may be configured to split the image or the feature map of the image into multiple tiles.
  • Each tile may still be considered as an original input, because values comprised therein are not altered. Therefore, each tile may still carry original information.
  • the feature map of the image may be obtained by the apparatus as an output of a hidden layer comprised in the floating point neural network.
  • the first floating point channel and the second floating point channel may be in a one-to-one correspondence.
  • the balancing tensor may comprise one or more balancing coefficients.
  • the apparatus may be configured to determine each balancing coefficient based on a quantization range of each first floating point channel and a quantization range of each corresponding second floating point channel.
  • a quantization range of a channel may be understood as a range between the maximum value and the minimum value of the channel.
  • the apparatus may be configured to determine each balancing coefficient based on the following equation: wherein b k is a balancing coefficient for channel k, is a quantization range of channel k of the one or more first floating point channels, is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
  • the apparatus may be configured to determine each balancing coefficient based on the following equation:
  • the apparatus can be configured to obtain each balancing coefficient according to equation (1) or (2) in a simple and direct manner.
  • the apparatus may be configured to obtain a set of sample inputs.
  • the set of sample inputs may be a part of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the set of sample inputs may be similar to a set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the apparatus may be configured to determine each balancing coefficient for each channel based on the sample inputs, and apply each determined balancing coefficient to each corresponding channel for later input(s).
  • the apparatus may be configured to: divide the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
  • the apparatus may be configured to: multiply the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor; divide the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor;
  • the apparatus may be further configured to: combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balanced scale tensor and a second balanced scale tensor;
  • a second aspect of the present disclosure provides a computer-implemented method for performing Winograd convolution of a floating point neural network.
  • the method comprises the following steps: - generating a Winograd input tensor based on an original input tensor, wherein the transformed input tensor comprises one or more first floating point channels; generating a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the transformed filter tensor comprises one or more second floating point channels; determining a balancing tensor based on the Winograd input tensor and the Winograd filter tensor, wherein the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels; determining a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels
  • the floating point neural network may be a trained neural network.
  • the trained neural network may be used for image processing, such as image classification and image feature extraction.
  • the method may further comprise: obtaining an image or a feature map of the image as the original input; determining the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor;
  • the first floating point channel and the second floating point channel may be in a one-to-one correspondence, and the balancing tensor may comprise one or more balancing coefficients.
  • the determining of the balancing tensor may comprise:
  • each balancing coefficient may be based on the following equation: wherein b k is a balancing coefficient for channel k, is a quantization range of channel k of the one or more first floating point channels, r is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
  • the determining of each balancing coefficient may be based on the following equation:
  • the method may further comprise obtaining a set of sample inputs.
  • the set of sample inputs may be a part of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the set of sample inputs may be similar to a set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the method may further comprise determining each balancing coefficient for each channel based on the sample inputs, and applying each determined balancing coefficient to each corresponding channel for later input(s).
  • the performing of the Winograd-based convolution may comprise the following steps:
  • the performing of the Winograd- based convolution may comprise the following steps: multiplying the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
  • the quantized Winograd input tensor comprises the one or more first integer channels
  • the method may further comprise the following steps:
  • a third aspect of the present disclosure provides a computer program product comprising a program code for performing the method according to the second aspect or any implementation form thereof, when executed on a computer.
  • a fourth aspect of the present disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to any one of the second aspect or any implementation form thereof.
  • a fifth aspect of the present disclosure provides a chipset comprising instructions which, when executed by the chipset, cause the chipset to carry out the method according to any one of the second aspect or any implementation form thereof.
  • FIG. 1 shows an example of a Winograd convolution performed by an apparatus
  • FIG. 2 shows an example of an apparatus for performing a Winograd convolution
  • FIG. 3 shows a method for performing a Winograd convolution
  • FIG. 4 shows an application scenario
  • FIG. 5A-5C show results based on different methods for performing convolution
  • FIG. 6 shows an illustrative example of a conventional Winograd convolution.
  • a framework for performing a Winograd convolution in a neural network is provided.
  • improvements are introduced based on the conventional Winograd convolution illustrated in FIG. 6.
  • a solution for reducing errors generated by quantization of the neural network is provided.
  • quantization may be referred to as a process of mapping a set of input values to a smaller set of values.
  • the conversion of floating point numbers to fixed point (e.g., integer) numbers may be a process of quantization.
  • embodiments disclosed herein may be applied to a neural network that is adapted for image processing, such as image classification, image enhancement, and image recognition.
  • image processing such as image classification, image enhancement, and image recognition.
  • information loss of the image processing based on the neural network according to embodiments disclosed herein may be reduced.
  • a neural network may be referred to as a neural network model or simply as a model.
  • a Winograd parameter shall be understood as a parameter in the Winograd domain.
  • FIG. 1 shows an example of a Winograd convolution performed by an apparatus for a floating point neural network.
  • the Winograd convolution comprises four phases: a transformation phase 101 , a balancing phase 102, a quantization phase 103, and an output phase 104.
  • an original input tensor and a filter tensor of the neural network are transformed into a Winograd input tensor, denoted as V, and a Winograd filter tensor, denoted as U, respectively.
  • the input tensor is denoted as X in the present disclosure and is exemplarily shown as a 4x4 input tile in FIG. 1.
  • the filter tensor of the neural network (also referred to as an original filter tensor) is denoted as W in the present disclosure and is exemplarily shown as a 3x3 filter tile in FIG. 1.
  • the apparatus may be configured to apply the following equations:
  • the Winograd input tensor comprises one or more first floating point channels
  • the Winograd filter tensor comprises one or more second floating point channels.
  • FIG. 1 it is exemplarily shown that the original input tensor X and the filter tensor W both comprise three channels, and operations according to equations (3) and (4) may be performed for each channel comprised therein. Then, each channel may be carried over respectively into the Winograd domain during the transformation according to equations (3) and (4).
  • the neural network may be a trained neural network.
  • the trained neural network may be configured to perform image processing such as image classification and image feature extraction.
  • the original input tensor may be or may be part of an image.
  • the original input tensor may be or may be part of a feature map of an image.
  • the feature map may be obtained at a hidden layer of the neural network.
  • the feature map may be an output of a previous layer at a certain hidden layer of the neural network.
  • the original filter tensor may be dependent upon the hidden layer. That is, the neural network may comprise different original filter tensors at different hidden layers.
  • the apparatus may perform one or more Winograd convolutions at any layer comprised in the neural network. Therefore, the original input tensor and the original filter tensor may be associated with each layer comprised in the neural network.
  • the original input tensor and the original filter tensor are transformed into the Winograd domain, where transformation matrices B and G may be used to facilitate the transformation.
  • Any input tensor not transformed into the Winograd domain may be referred to as an original input. That is, elements of the original input tensor are not altered and may still reflect original true data. Details regarding transformation matrices of B and G may depend on a specific Winograd algorithm applied to the Winograd convolution.
  • the transformation matrix B and G for an F(2*2, 3 *3) Winograd convolution with a 4x4 input tile and a 3x3 filter tile may be as follows: It is noted that a F(m *n, r*s) Winograd convolution may denote a 2D Winograd convolution used to compute an m *n feature map with r*s filters. Other types of Winograd convolutions, such as an F(4 *4, 3 *3) and an F(6*6, 3 *3) Winograd convolution may also be commonly used in the art, in particular for image processing.
  • F(2 *2, 3 *3), F(4 *4, 3 *3) and F(6*6, 3 *3) Winograd convolutions may simply be referred to as F(2,3), F(4, 3) and F(6,3) Winograd convolutions, especially for image processing where a 2D convolution may be considered as default.
  • the Winograd filter tensor U may have a dimension (C,K,a,a), and the Winograd input tensor V may have a dimension (P,C,a,a), where C is the number of input channels, K is the number of output channels, P is the number of tiles of a complete input, and (a, a) determines the dimension of the Winograd domain.
  • the apparatus may be configured to split a complete original input tensor into a plurality of tiles.
  • Each tile may comprise a part of the original input tensor.
  • the size of each tile may be based on the size of the Winograd input tensor of the specific Winograd algorithm applied to the Winograd convolution.
  • each tile may be equivalent to the original input, because it also carries original data that is not transformed into the Winograd domain.
  • the 4x4 input tile in FIG. 1 may be an image segment that is spilt from a complete image of a dimension of 80x80 (pixels).
  • the apparatus is configured to determine a balancing tensor, denoted as b in the present disclosure.
  • the balancing tensor b may have a dimension (C, a, a).
  • the balancing tensor b may be used to balance channel ranges of the Winograd input and filter tensors.
  • the precision of channel k in Winograd domain (i, j) of the Winograd filter tensor U may be denoted as: r k
  • a quantization range may be referred to as a range between a minimum value and a maximum value of a domain before quantization.
  • the precision of channel k in Winograd domain (i, j) of the Winograd input tensor V may be denoted as: where t- is the quantization range of channel k in Winograd domain (i, j) of V, and T- is the quantization range of all channels in the Winograd domain of T.
  • the apparatus may be configured to obtain parameters such as s-j by obtaining a set of sample data without training.
  • An optimal balancing tensor b may be used to achieve a maximized total precision of all channels. That is, the apparatus may be configured to determine the balancing tensor b such that the total precision of all channels ij.k Pij ' s ij is maximized.
  • the apparatus may be configured to determine the balancing tensor b as: wherein denotes each balancing coefficient for channel k in Winograd domain (i,j) comprised in the balancing tensor b.
  • each balancing coefficient in a same Winograd domain may also be denoted as:
  • the apparatus may be configured to obtain a set of sample inputs.
  • the set of sample inputs may be a part (e.g., 5.0-20.0 %) of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the set of sample inputs may be similar to a set of inputs to be applied to the trained floating point neural network in the inference phase.
  • the apparatus may be configured to determine each balancing coefficient for each channel based on the sample inputs, and apply each determined balancing coefficient to each corresponding channel for later input(s).
  • a complete set of inputs to be applied to the trained floating point neural network may comprise 100 images, each image comprising three channels (RGB channels).
  • the apparatus may be configured to obtain 5 to 20 images from the 100 images as the sample inputs, and determine the balancing tensor b for the three channels based on the 5 to 20 images. Then, the apparatus may be configured to apply the determined balancing tensor b for other images of the 100 images.
  • a complete set of inputs to be applied to the trained floating point neural network may comprise 100 tiles split from a complete image. Each tile is a part of the complete image and comprises three channels (RGB channels).
  • the apparatus may be configured to obtain 5 to 20 tiles from the 100 tiles as the sample inputs, and determine the balancing tensor b for the three channels based on the 5 to 20 tiles. Then, the apparatus may be configured to apply the determined balancing tensor b for other tiles of the 100 tiles.
  • the apparatus may be configured to determine the balancing tensor b based on part or all of training samples given during the training phase of the floating point neural network.
  • the floating point neural network may be referred to as the trained floating point neural network.
  • the apparatus may be configured to determine the balancing tensor b in a learnable way. That is, the balancing tensor b may be seen as one of trainable parameters in a so-called “quantization aware training”. That is, the balancing tensor b, as well as other neural network parameters such as weights and/or biases, may be fine-tuned in the training phase of the neural network before the inference phase.
  • the apparatus may be configured to divide the Winograd input tensor by the balancing tensor b to obtain a balanced input tensor Vt>, and multiple the Winograd filter tensor by the balancing tensor b to obtain a balanced filter tensor Ub.
  • the apparatus may be configured to apply the following equations:
  • V b V/ b, (10)
  • channel ranges of the one or more first floating point channels and the one or more second floating point channels may be balanced. This may facilitate the quantization that is to be performed in a later stage, for example, where quantization errors may be reduced.
  • the apparatus may be configured to divide the Winograd input tensor by the balancing tensor b to obtain a balanced input tensor V b , and multiple the Winograd filter tensor by the balancing tensor b to obtain a balanced filter tensor U b .
  • the apparatus may be configured to determine each balancing coefficient in a same Winograd domain as:
  • the apparatus may be configured to apply the following equations based on equation (12) in the balancing phase 102:
  • V b V b, (13)
  • the apparatus is configured to determine a first scale factor (denoted as “scale/”) for the Winograd input tensor that is balanced and a second scale factor (denoted as “scaleu”) for the Winograd filter tensor that is balanced.
  • the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively. It is noted that determining a scale factor for a quantized Winograd convolution is commonly known in the field. Therefore, it is not described in detail herein.
  • the operation of U qU ant N quant ma Y also be referred to as a Winograd multiplication, which may be understood as a multiplication in the Winograd domain.
  • the transformation matrix A is used to transform the output in the Winograd domain back to the normal domain.
  • transformation matrix A Similar to transformation matrix B and G, details regarding the transformation matrix A may depend on the specific Winograd algorithm applied to the Winograd convolution.
  • the apparatus may be configured to combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balancing scale tensor and a second balancing scale tensor. That is, the first scale factor and the balancing tensor are combined, and the second scale factor and the balancing tensor are also combined. In this way, the apparatus may only need to store these combined parameters, and a simplified balancing and quantization of the Winograd convolution may be achieved. In this case, the apparatus may be configured to perform the same number of operations in the inference phase as in the case of a conventional Winograd convolution.
  • the balancing tensor for both the Winograd input tensor and the Winograd filter tensor of a Winograd convolution, information loss caused by the quantization may be minimized. This may, for example, increase the precision and accuracy of the result obtained by the neural network. For example, the quality of the image processing may be increased when a neural network is used and a Winograd convolution based on the embodiments of the present disclosure.
  • the apparatus since the apparatus applies reverse operations of multiplication and division to the Winograd input tensor and the Winograd filter tensor based on the same balancing tensor, the balancing tensor itself may be canceled during the operation of the Winograd multiplication in the output phase 104. Therefore, the apparatus may not need any additional operations to reverse the balancing. Hence, efficiency may be introduced to the Winograd convolution based on the embodiments of the present disclosure.
  • FIG. 2 shows an example of an apparatus 200 for performing a Winograd convolution.
  • the apparatus 200 may comprise four units, which are shown exemplarily in FIG. 2 as a transformation unit 201, a balancing unit 202, a quantization unit 203, and an output unit 204.
  • the transformation unit 201 may be configured to obtain the input tensor and the filter tensor.
  • the transformation unit 201 may be further configured to transform the input tensor and the filter tensor from the normal domain into Winograd domain.
  • the balancing unit 202 may be configured to determine a balanced tensor.
  • the balancing unit 202 may further be configured to perform channel balancing on the Winograd input tensor and the Winograd filter tensor based on the determined balanced tensor.
  • the quantization unit 203 may be configured to quantize the (balanced) Winograd input tensor and the (balanced) Winograd filter tensor.
  • the output unit 204 may be configured to compute the final output. It is noted that the units 201-204 in FIG. 2 may correspond to phases 101-104 in FIG. 1 correspondingly.
  • the apparatus 200 may be or may be part of a neural processing unit (NPU) or an Al processor.
  • the apparatus 200 may be a matrix/vector/scalar computation unit comprised in an Al core.
  • the Al core optionally along with a number of other same Al cores, may be comprised onto a chipset or system-on-chip (SoC).
  • SoC system-on-chip
  • the chipset or SoC may be configured to perform neural network related operations, such as training and/or inferencing.
  • the chipset or SoC may be configured to perform image processing, speech recognition, text recognition and the like based on artificial intelligence by using the Winograd convolution according to the present disclosure.
  • FIG. 3 shows a method 300 for performing a Winograd convolution.
  • the method 300 is performed by an apparatus for performing the Winograd convolution of a floating point neural network.
  • the method 300 comprises the following steps: step 301 : generating a Winograd input tensor based on an original input tensor.
  • the transformed input tensor comprises one or more first floating point channels;
  • step 302 generating a Winograd filter tensor based on a filter tensor of the floating point neural network.
  • the transformed filter tensor comprises one or more second floating point channels;
  • step 304 determining a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively;
  • steps 301 and 302 may correspond to phase 101 and may be performed by the transformation unit 201.
  • steps 303 may correspond to phase 102 and may be performed by the balancing unit 202.
  • Step 304 may correspond to phase 103 and may be performed by the quantization unit 203.
  • Step 305 may correspond to phase 104 and may be performed by the output unit 204.
  • the corresponding method implementations are not described in detail again at this point.
  • An application scenario of the present disclosure is that the method for performing a Winograd convolution may be applied to a neural network model that involves convolution operations. Applying the neural network model may often involve two phases: a training phase and an inference phase. For preparing the model in the training phase, the following steps may be executed, e.g., by the apparatus according to the present disclosure:
  • Step 401 Obtaining the neural network model that comprises conventional direct convolution operations.
  • Step 402. Replacing the conventional direct convolutions with the Winograd convolutions according to the present disclosure.
  • Step 403. Passing several data samples, which may be referred to as sample inputs, without training to the model.
  • Step 404 Collecting statistics about minimum and maximum values of the sample inputs in the Winograd domain (i.e., Winograd inputs).
  • Step 405. Calculating balancing coefficients based on the collected statistics according to the present disclosure.
  • Step 406 Applying the calculated balancing coefficients to filters in the Winograd domain (i.e., Winograd filters).
  • Step 407. Calculating a scaling factor of the balanced Winograd filters and quantizing the Winograd filters.
  • the apparatus may be configured to perform the following step 408a and the optional step 409a.
  • Step 408a calculating a scaling factor of the balanced Winograd sample inputs.
  • Step 409a fuse the scaling factor of the balanced Winograd sample inputs and the balancing coefficients.
  • the apparatus may be configured to perform the following step 408b.
  • Step 408b storing the balancing coefficients for further usages in the inference phase.
  • the neural network model has been trained and fine-tuned for actual application.
  • an apparatus using the trained neural network model may be configured to perform the following steps 501 and 502. It is noted that the apparatus using the trained neural network model may also be the apparatus according to the present disclosure.
  • Step 501 Obtaining balanced and quantized Winograd filters.
  • Step 502 Transforming actual inputs into Winograd inputs.
  • the apparatus may be configured to perform the following step 502a.
  • Step 502a Balancing and quantizing the Winograd inputs by using the balancing coefficients and the scaling factor determined based on the Winograd sample inputs before the inference phase, or by applying a fused scaling factor and the balancing coefficients if available.
  • the apparatus may be configured to perform the following steps 502b, 503 and 504.
  • Step 502b Balancing the Winograd input by using the balancing coefficients.
  • Step 503 Determining a current scale factor based on the balanced Winograd input.
  • Step 504 Quantizing the balanced Winograd input based on the current scale factor.
  • the apparatus may be configured to perform the following steps 505 and 506.
  • Step 505 Calculating Winograd multiplication based on the balanced and quantized Winograd filters and the balanced and quantized Winograd inputs.
  • Step 506 Transforming the Winograd product back to the normal (or spatial) domain as the final output.
  • FIG.4 shows an application scenario of the present disclosure.
  • FIG. 4 an example of a quantization Efficient Sub-Pixel CNN (ESPCNN) for 3x image super-resolution is illustrated.
  • ESPCNN quantization Efficient Sub-Pixel CNN
  • direct 2D convolutions of the 4 Convolutions 32x32 and Convolution 32x27 may be performed based on the Winograd convolution according to the present disclosure.
  • a symmetric quantization scheme, an F(4,3) Winograd algorithm and a quantized INT8 model may be used for the Winograd convolution.
  • FIG. 5A-5C show results based on different methods for performing convolution.
  • FIG. 5A shows a result of image super-resolution by using the ESPCNN based on a conventional Winograd convolution that is without balancing.
  • FIG. 5B shows a result of image super-resolution by using the ESPCNN based on the Winograd convolution according to the present disclosure.
  • FIG. 5C shows a result of image super-resolution by using a full precision model. It can be seen that the perceptual quality of FIG. 5B obtained based on the Winograd convolution according to the present disclosure significantly exceeds the perceptual quality of FIG. 5A obtained based on the conventional Winograd convolution and is close to the perceptual quality of FIG. 5C obtained based on the full precision model.
  • the apparatus in the present disclosure may comprise processing circuitry configured to perform, conduct or initiate the various operations of the device described herein, respectively.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors.
  • the processing circuitry comprises one or more processors and a non- transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device to perform, conduct or initiate the operations or methods described herein, respectively.
  • the apparatus in the present disclosure may be a single electronic device capable of computing, or may comprise a set of connected electronic components or modules capable of computing with shared system memory. It is well known in the art that such computing capabilities may be incorporated into many different devices, and therefore the term “apparatus” may comprise a chip, chipset, artificial intelligence accelerator, neural processing unit, computer, mobile terminal, tablet, wearable device, game console, graphic processing unit, graphic card, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure relates to a method and an apparatus for performing a Winograd convolution of a neural network. In particular, before input and filter tensors of the Winograd convolution in a Winograd domain are quantized into integer data, the apparatus is configured to determine a balancing tensor and balance channel ranges of the input and filter tensor based on the balancing tensor. After the channel ranges of the input and filter tensors are balanced, the apparatus is then configured to perform quantization. In this way, information loss caused by the quantization may be reduced. Hence, the precision and accuracy of the neural network may be enhanced.

Description

APPARATUS AND METHOD FOR WINOGRAD CONVOLUTION
TECHNICAL FIELD
The present disclosure relates to an apparatus and a method for processing a matrix in the fields of artificial intelligence. For example, the disclosure relates to an apparatus and a method for performing a convolution in an artificial neural network.
BACKGROUND
Artificial neural networks (ANNs) are often used for performing various tasks, such as image processing, speech recognition, robotics, and big data analysis. An ANN usually involves massive data processing, such as matrix convolution.
A matrix convolution may be seen as a process of adding each element of an input matrix to its local neighbours, weighted by a kernel matrix (or filter). Therefore, the matrix convolution normally includes matrix addition and multiplication. The matrix convolution is often used in a convolution neural network (CNN).
Several conventional algorithms have been developed to speed up the matrix convolution. In particular, Don Coppersmith and Shmuel Winograd in 1990 developed an algorithm for performing matrix multiplication, which is often referred to as “Coppersmith-Winograd algorithm”, or simply, “Winograd algorithm”. The matrix convolution performed based on the Winograd algorithm may be referred to as “Winograd-based convolution”, or simply “Winograd convolution”.
Winograd convolution is widely used in ANNs to reduce the computational complexity, e.g., by reducing the number of multiplications. As shown exemplarily in FIG. 6, two operands of a convolution, e.g. an input and a filter, are transformed into a so-called “Winograd domain” and a transformed output is obtained in the Winograd domain. Then, this transformed output is transformed back to a normal domain (sometimes also referred to as a spatial domain). Transformation matrices B, G and A depend on specific configurations of the Winograd convolution and have different values commonly known in the art for different configurations of the Winograd convolution. For efficient implementation and execution, it is often desired to perform Winograd convolution with integer data. Therefore, a floating point neural network is often quantized to an integer neural network. The floating point neural network is a neural network comprising floating point parameters, such as inputs and filters. The integer neural network is a neural network comprising only integer parameters. For inference on mobile and embedded devices, a quantized 8-bit integer (INT8) neural network can achieve comparable accuracies as a 32-bit float-point (FP32) neural network. Further, by quantizing an FP32 model to an INT8 model, model sizes can be reduced by a factor of four compared to the FP32 model. Moreover, calculations can be accelerated for quantized integer neural networks on processors compared to their floating point counterparts. Furthermore, on hardware where optimized fixed-point capabilities are available, the speedup can be further improved. Overall, quantization may bring improvements including model compression and latency reduction.
SUMMARY
Quantization of a neural network with a Winograd algorithm may lead to a significant drop in the performance of the neural network in some scenarios. For example, an actual neural network can have thousands of parameters. In general, it is not easy to obtain integer parameters by scaling, because hardware is restricted by the number of bits. For example, if an INT8 neural network is desired, then only 28 = 256 integer numbers (from -128 to 127) can be used. This is often much less than the range of a floating point neural network. For example, if the floating point neural network has a range of 0.0001 to 0.1 (i.e., [0.0001, 0.1]), then it may be scaled or quantized into 10000*[0.0001, 0.1] = [1, 1000]. However, 256 is much less than 10000. Therefore, rounding may be performed and rounding error may be introduced. This may lead to information loss and may jeopardize the accuracy of the quantized neural network.
Further, since a Winograd convolution involves a preliminary calculation of parameters in order to achieve transformed inputs and transformed filters in the Winograd domain, a neural network with Winograd convolution is much less vulnerable to errors introduced by quantization.
In view of the above, there is a need to address the aforementioned technical drawbacks in existing devices to improve Winograd convolution(s) of a neural network.
Apparatus and methods according to this disclosure facilitate performing a Winograd convolution of a neural network in a robust and efficient manner. This and other objectives are achieved by the subject matter of the independent claims. Further implementations forms are apparent from the dependent claims, the description and the drawings.
According to the present disclosure, a balancing tensor is used to balance channel ranges of inputs and weights at each layer of a neural network before quantization. Optionally, a direct algorithm for calculating the balancing tensor is disclosed, which exploits the distributions of inputs and weights at each layer of a neural network.
A first aspect of the present disclosure provides an apparatus for performing Winograd-based convolution of a floating point neural network. The apparatus is configured to generate a Winograd input tensor based on an original input tensor, wherein the Winograd input tensor comprises one or more first floating point channels. Further, the apparatus is configured to generate a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the Winograd filter tensor comprises one or more second floating point channels.
Optionally, a tensor may be a data structure used in neural networks in the field of artificial intelligence to carry a specific amount of information. For example, the tensor may be: a 0-dimensional (0-D) array, such as a single number; a 1 -dimensional (1 -D) array, such as a vector; a 2-dimensional (2-D) array, such as a matrix; a 3 -dimensional (3-D) array, such as data representing an RGB image, or
- an array with a higher (larger than three) dimensional structure.
Optionally, the apparatus may be configured to transform a tensor with a specific dimension into another dimension. For example, a vector may be transformed into a matrix, while a matrix may also be transformed into a vector. This may be useful for satisfying different requirements of inputs required by different configurations of the Winograd convolution.
Optionally, a tensor (i.e., the input or the filter tensor) may comprise one or more channels. A channel may be used to transmit information from a certain aspect. That is, a channel may have a certain capacity for transmitting information. Optionally, the number of the one or more channels is the depth of the tensor involved in the convolution. Optionally, all channels of a tensor may share a same size. For example, an NxM pixel RGB image may also be represented by a 2D (NXM) tensor with three channels: red, green and blue.
The Winograd convolution may be performed at one or more hidden layers of the floating point neural network. Optionally, the original input tensor may be or may be part of an input applied to a hidden layer. The input applied to a hidden layer may be an output of a previous layer. For generating the Winograd input tensor, the apparatus may be configured to apply Winograd transformation on the original input tensor. Optionally, the original input tensor may be split into several tiles that are suitable for performing the Winograd convolution. Then, the apparatus may be configured to transform each tile into the Winograd input tensor. Similarly, the apparatus may be configured to apply Winograd transformation to the original filter tensor in order to obtain the Winograd filter tensor.
Optionally, the filter tensor of the floating point neural network may be weights of neurons at each hidden layer of the floating point neural network. Therefore, the filter tensor may also be referred to as a weight tensor.
Optionally, a floating point channel may be a channel comprising at least one element that is a floating point value.
Then, the apparatus is configured to determine a balancing tensor based on the Winograd input tensor and the Winograd filter tensor. The balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels.
Optionally, the balancing tensor may comprise one or more balancing coefficients. The one or more balancing coefficients, the one or more first channels, and the one or more second channels may be in a one-to-one correspondence. The apparatus may be configured to balance each first channel and each second channel based on a corresponding balancing coefficient.
Optionally, the apparatus may be configured to divide the Winograd input tensor by the balancing tensor and multiply the Winograd filter tensor by the balancing tensor. Alternatively, the apparatus may be configured to multiply the Winograd input tensor by the balancing tensor and divide the Winograd filter tensor by the balancing tensor. In this way, the balancing tensor may be canceled afterwards when Winograd multiplication of the Winograd input tensor and the Winograd filter tensor is performed. Therefore, no additional operation is introduced.
After the one or more first and second floating point channels are balanced, the apparatus is configured to determine a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor. The first and the second scale factors are adapted to quantize the one or more first balanced floating point channels and the one or more second balanced floating point channels into one or more first integer channels and one or more second integer channels, respectively.
In this way, by applying the balancing tensor before quantization, the quantization errors may be reduced to a minimum.
Then, the apparatus is configured to perform the Winograd convolution based on the balancing tensor, the first scale factor, and the second scale factor.
Optionally, the apparatus may be configured to obtain a balanced and quantized Winograd input sensor and a balanced and quantized Winograd filter tensor based on the balancing tensor, the first scale factor, and the second scale factor. Then, the apparatus may be configured to perform Winograd multiplication based on the balanced and quantized Winograd input sensor and the balanced and quantized Winograd filter tensor.
By balancing the Winograd filter and input tensors based on the same balancing tensor, channel ranges of the Winograd filter and input tensors can be balanced while the number of operations of a Winograd convolution based thereon is equivalent to that of the conventional Winograd convolution. Further, quantization errors can be reduced because of the balanced channel ranges. In this way, the precision of the Winograd convolution according to the present disclosure can be increased.
The balancing of the Winograd filter and input tensors can be compatible with various quantization and training techniques in the art, such as post-training quantization and quantization aware training. The balancing of the Winograd filter and input tensors can be universal. Because the balancing does not depend on any specific type of the Winograd convolution, such as bit width, quantization scheme, scale type and so on. Therefore, the balancing can be applied to Winograd algorithm of any type.
In an implementation form of the first aspect, the floating point neural network may be a trained neural network. The apparatus may be configured to use the trained neural network for image processing such as image classification and image feature extraction. The apparatus may be further configured to obtain an image or a feature map of the image as the original input. The apparatus may be configured to determine the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor. Then, the apparatus may be configured to process the image or the feature map of the image by performing the Winograd convolution.
Optionally, after obtaining the image or the feature map of the image, the apparatus may be configured to split the image or the feature map of the image into multiple tiles. Each tile may still be considered as an original input, because values comprised therein are not altered. Therefore, each tile may still carry original information.
Optionally, the feature map of the image may be obtained by the apparatus as an output of a hidden layer comprised in the floating point neural network.
In an implementation form of the first aspect, the first floating point channel and the second floating point channel may be in a one-to-one correspondence. The balancing tensor may comprise one or more balancing coefficients. For determining the balancing tensor, the apparatus may be configured to determine each balancing coefficient based on a quantization range of each first floating point channel and a quantization range of each corresponding second floating point channel. A quantization range of a channel may be understood as a range between the maximum value and the minimum value of the channel. In an implementation form of the first aspect, the apparatus may be configured to determine each balancing coefficient based on the following equation:
Figure imgf000009_0001
wherein bk is a balancing coefficient for channel k, is a quantization range of channel k of the one or more first floating point channels, is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
In an alternative implementation form of the first aspect, the apparatus may be configured to determine each balancing coefficient based on the following equation:
Figure imgf000009_0002
In this way, the apparatus can be configured to obtain each balancing coefficient according to equation (1) or (2) in a simple and direct manner.
Optionally, prior to an inference phase of the trained floating point neural network, or during a training phase of a floating point neural network for obtaining the trained floating point neural network, the apparatus may be configured to obtain a set of sample inputs. The set of sample inputs may be a part of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase. Alternatively, the set of sample inputs may be similar to a set of inputs that are to be applied to the trained floating point neural network in the inference phase. Then, the apparatus may be configured to determine each balancing coefficient for each channel based on the sample inputs, and apply each determined balancing coefficient to each corresponding channel for later input(s).
In an implementation form of the first aspect, for performing the Winograd convolution, the apparatus may be configured to: divide the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
- multiply the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor; multiply the balanced Winograd input tensor by the first scale factor as a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels; multiply the balanced Winograd filter tensor by the second scale factor as a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and
- perform Winograd multiplication of the Winograd convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor.
In an alternative implementation form of the first aspect, for performing the Winograd convolution, the apparatus may be configured to: multiply the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor; divide the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor;
- multiply the balanced Winograd input tensor by the first scale factor as a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels;
- multiply the balanced Winograd filter tensor by the second scale factor as a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and
- perform Winograd multiplication of the Winograd convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor.
In an implementation form of the first aspect, the apparatus may be further configured to: combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balanced scale tensor and a second balanced scale tensor; and
- perform the Winograd convolution based further on the first balanced scale tensor and the second balanced scale tensor.
A second aspect of the present disclosure provides a computer-implemented method for performing Winograd convolution of a floating point neural network. The method comprises the following steps: - generating a Winograd input tensor based on an original input tensor, wherein the transformed input tensor comprises one or more first floating point channels; generating a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the transformed filter tensor comprises one or more second floating point channels; determining a balancing tensor based on the Winograd input tensor and the Winograd filter tensor, wherein the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels; determining a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively; and
- performing the Winograd convolution based on the balancing tensor, the first scale factor, and the second scale factor.
In an implementation form of the second aspect, the floating point neural network may be a trained neural network. The trained neural network may be used for image processing, such as image classification and image feature extraction. The method may further comprise: obtaining an image or a feature map of the image as the original input; determining the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor;
- processing the image or the feature map of the image by performing the Winograd convolution.
In an implementation form of the second aspect, the first floating point channel and the second floating point channel may be in a one-to-one correspondence, and the balancing tensor may comprise one or more balancing coefficients. The determining of the balancing tensor may comprise:
- determining each balancing coefficient based on a quantization range of each first floating point channel and a quantization range of each corresponding second floating point channel. In an implementation form of the second aspect, the determining of each balancing coefficient may be based on the following equation:
Figure imgf000012_0001
wherein bk is a balancing coefficient for channel k, is a quantization range of channel k of the one or more first floating point channels, r is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
In an alternative implementation form of the first aspect, the determining of each balancing coefficient may be based on the following equation:
Figure imgf000012_0002
Optionally, prior to an inference phase of the trained floating point neural network, or during a training phase of a floating point neural network for obtaining the trained floating point neural network, the method may further comprise obtaining a set of sample inputs. The set of sample inputs may be a part of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase. Alternatively, the set of sample inputs may be similar to a set of inputs that are to be applied to the trained floating point neural network in the inference phase. Then, the method may further comprise determining each balancing coefficient for each channel based on the sample inputs, and applying each determined balancing coefficient to each corresponding channel for later input(s).
In an implementation form of the second aspect, the performing of the Winograd-based convolution may comprise the following steps:
- dividing the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
- multiplying the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor;
- multiplying the balanced Winograd input tensor by the first scale factor as a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels; - multiplying the balanced Winograd filter tensor by the second scale factor as a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and performing Winograd multiplication of the Winograd convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor.
In an alternative implementation form of the second aspect, the performing of the Winograd- based convolution may comprise the following steps: multiplying the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
- dividing the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor;
- multiplying the balanced Winograd input tensor by the first scale factor as a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels;
- multiplying the balanced Winograd filter tensor by the second scale factor as a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and
- performing Winograd multiplication of the Winograd convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor.
In an implementation form of the second aspect, the method may further comprise the following steps:
- combining the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balanced scale tensor and a second balanced scale tensor; and
- performing the Winograd convolution based further on the first balanced scale tensor and the second balanced scale tensor.
A third aspect of the present disclosure provides a computer program product comprising a program code for performing the method according to the second aspect or any implementation form thereof, when executed on a computer. A fourth aspect of the present disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to any one of the second aspect or any implementation form thereof.
A fifth aspect of the present disclosure provides a chipset comprising instructions which, when executed by the chipset, cause the chipset to carry out the method according to any one of the second aspect or any implementation form thereof.
It has to be noted that all apparatus, devices, elements, units, and means described in the present application could be implemented in software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity, which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
BRIEF DESCRIPTION OF DRAWINGS
The above-described aspects and implementation forms will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which
FIG. 1 shows an example of a Winograd convolution performed by an apparatus;
FIG. 2 shows an example of an apparatus for performing a Winograd convolution;
FIG. 3 shows a method for performing a Winograd convolution;
FIG. 4 shows an application scenario;
FIG. 5A-5C show results based on different methods for performing convolution; and FIG. 6 shows an illustrative example of a conventional Winograd convolution.
DETAILED DESCRIPTION OF THE EMBODIMENTS
According to the present disclosure, a framework for performing a Winograd convolution in a neural network is provided. Optionally, improvements are introduced based on the conventional Winograd convolution illustrated in FIG. 6. A solution for reducing errors generated by quantization of the neural network is provided.
In the present disclosure, quantization may be referred to as a process of mapping a set of input values to a smaller set of values. Optionally, the conversion of floating point numbers to fixed point (e.g., integer) numbers may be a process of quantization.
Optionally, embodiments disclosed herein may be applied to a neural network that is adapted for image processing, such as image classification, image enhancement, and image recognition. Advantageously, information loss of the image processing based on the neural network according to embodiments disclosed herein may be reduced.
It is noted that in the present disclosure, a neural network may be referred to as a neural network model or simply as a model. A Winograd parameter shall be understood as a parameter in the Winograd domain.
Reference is now made to the drawings, wherein similar elements may share the same features and may function likewise.
FIG. 1 shows an example of a Winograd convolution performed by an apparatus for a floating point neural network. In this example, the Winograd convolution comprises four phases: a transformation phase 101 , a balancing phase 102, a quantization phase 103, and an output phase 104.
In the transformation phase 101 , an original input tensor and a filter tensor of the neural network are transformed into a Winograd input tensor, denoted as V, and a Winograd filter tensor, denoted as U, respectively. It is noted that the input tensor is denoted as X in the present disclosure and is exemplarily shown as a 4x4 input tile in FIG. 1. The filter tensor of the neural network (also referred to as an original filter tensor) is denoted as W in the present disclosure and is exemplarily shown as a 3x3 filter tile in FIG. 1. In the transformation phase 101, the apparatus may be configured to apply the following equations:
V = BT • X B, (3)
U = G • W • GT. (4) Moreover, the Winograd input tensor comprises one or more first floating point channels, and the Winograd filter tensor comprises one or more second floating point channels. In FIG. 1, it is exemplarily shown that the original input tensor X and the filter tensor W both comprise three channels, and operations according to equations (3) and (4) may be performed for each channel comprised therein. Then, each channel may be carried over respectively into the Winograd domain during the transformation according to equations (3) and (4).
In some embodiments, the neural network may be a trained neural network. The trained neural network may be configured to perform image processing such as image classification and image feature extraction. In this case, the original input tensor may be or may be part of an image. Alternatively, the original input tensor may be or may be part of a feature map of an image. The feature map may be obtained at a hidden layer of the neural network. For example, the feature map may be an output of a previous layer at a certain hidden layer of the neural network. Optionally, the original filter tensor may be dependent upon the hidden layer. That is, the neural network may comprise different original filter tensors at different hidden layers. It is noted that in the present disclosure, it shall be understood that the apparatus may perform one or more Winograd convolutions at any layer comprised in the neural network. Therefore, the original input tensor and the original filter tensor may be associated with each layer comprised in the neural network.
For performing the Winograd convolution, the original input tensor and the original filter tensor are transformed into the Winograd domain, where transformation matrices B and G may be used to facilitate the transformation. Any input tensor not transformed into the Winograd domain may be referred to as an original input. That is, elements of the original input tensor are not altered and may still reflect original true data. Details regarding transformation matrices of B and G may depend on a specific Winograd algorithm applied to the Winograd convolution. For example, the transformation matrix B and G for an F(2*2, 3 *3) Winograd convolution with a 4x4 input tile and a 3x3 filter tile may be as follows:
Figure imgf000016_0001
It is noted that a F(m *n, r*s) Winograd convolution may denote a 2D Winograd convolution used to compute an m *n feature map with r*s filters. Other types of Winograd convolutions, such as an F(4 *4, 3 *3) and an F(6*6, 3 *3) Winograd convolution may also be commonly used in the art, in particular for image processing. It is noted that sometimes, the F(2 *2, 3 *3), F(4 *4, 3 *3) and F(6*6, 3 *3) Winograd convolutions may simply be referred to as F(2,3), F(4, 3) and F(6,3) Winograd convolutions, especially for image processing where a 2D convolution may be considered as default.
In the Winograd domain, the Winograd filter tensor U may have a dimension (C,K,a,a), and the Winograd input tensor V may have a dimension (P,C,a,a), where C is the number of input channels, K is the number of output channels, P is the number of tiles of a complete input, and (a, a) determines the dimension of the Winograd domain.
Optionally, the apparatus may be configured to split a complete original input tensor into a plurality of tiles. Each tile may comprise a part of the original input tensor. The size of each tile may be based on the size of the Winograd input tensor of the specific Winograd algorithm applied to the Winograd convolution. In the present disclosure, each tile may be equivalent to the original input, because it also carries original data that is not transformed into the Winograd domain. For example, the 4x4 input tile in FIG. 1 may be an image segment that is spilt from a complete image of a dimension of 80x80 (pixels). In this case, the apparatus may be configured to split the image into 20x20=400 tiles and perform the Winograd convolution according to the present disclosure for each tile. Then, the apparatus may be configured to aggregate results obtained based on all the split tiles as a final result corresponding to the complete original input tensor.
Before the Winograd filter tensor U and the Winograd input tensor V are quantized, balancing is performed for both U and V in the Winograd domain. The apparatus is configured to determine a balancing tensor, denoted as b in the present disclosure. Optionally, the balancing tensor b may have a dimension (C, a, a). The balancing tensor b may be used to balance channel ranges of the Winograd input and filter tensors.
For this purpose, a notion of precision of a channel is proposed in the present disclosure. For example, the precision of channel k in Winograd domain (i, j) of the Winograd filter tensor U may be denoted as: rk
- -bL Pi,j - Rk '
Figure imgf000018_0001
where rt kj is the quantization range of channel k in Winograd domain (i, j)
Figure imgf000018_0002
is the quantization range of all channels in Winograd domain (i, j) of U. It is noted that i, j, k are all positive integers. A quantization range may be referred to as a range between a minimum value and a maximum value of a domain before quantization.
Similarly, the precision of channel k in Winograd domain (i, j) of the Winograd input tensor V may be denoted as:
Figure imgf000018_0003
where t- is the quantization range of channel k in Winograd domain (i, j) of V, and T- is the quantization range of all channels in the Winograd domain of T. In some embodiments, the apparatus may be configured to obtain parameters such as s-j by obtaining a set of sample data without training.
An optimal balancing tensor b may be used to achieve a maximized total precision of all channels. That is, the apparatus may be configured to determine the balancing tensor b such that the total precision of all channels ij.k Pij ' sij is maximized.
In some embodiments, the apparatus may be configured to determine the balancing tensor b as:
Figure imgf000018_0004
wherein denotes each balancing coefficient for channel k in Winograd domain (i,j) comprised in the balancing tensor b. In some embodiments, each balancing coefficient in a same Winograd domain may also be denoted as:
Figure imgf000018_0005
Optionally, prior to an inference phase of the trained floating point neural network, or during a training phase of a floating point neural network for obtaining the trained floating point neural network, the apparatus may be configured to obtain a set of sample inputs. The set of sample inputs may be a part (e.g., 5.0-20.0 %) of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase. Alternatively, the set of sample inputs may be similar to a set of inputs to be applied to the trained floating point neural network in the inference phase. Then, the apparatus may be configured to determine each balancing coefficient for each channel based on the sample inputs, and apply each determined balancing coefficient to each corresponding channel for later input(s).
For example, a complete set of inputs to be applied to the trained floating point neural network may comprise 100 images, each image comprising three channels (RGB channels). The apparatus may be configured to obtain 5 to 20 images from the 100 images as the sample inputs, and determine the balancing tensor b for the three channels based on the 5 to 20 images. Then, the apparatus may be configured to apply the determined balancing tensor b for other images of the 100 images.
As another example, a complete set of inputs to be applied to the trained floating point neural network may comprise 100 tiles split from a complete image. Each tile is a part of the complete image and comprises three channels (RGB channels). The apparatus may be configured to obtain 5 to 20 tiles from the 100 tiles as the sample inputs, and determine the balancing tensor b for the three channels based on the 5 to 20 tiles. Then, the apparatus may be configured to apply the determined balancing tensor b for other tiles of the 100 tiles.
As another example, the apparatus may be configured to determine the balancing tensor b based on part or all of training samples given during the training phase of the floating point neural network. After the training phase, the floating point neural network may be referred to as the trained floating point neural network.
In some embodiments, as an alternative to the determining of the balancing tensor b according to equation (8) or (9), the apparatus may be configured to determine the balancing tensor b in a learnable way. That is, the balancing tensor b may be seen as one of trainable parameters in a so-called “quantization aware training”. That is, the balancing tensor b, as well as other neural network parameters such as weights and/or biases, may be fine-tuned in the training phase of the neural network before the inference phase. After obtaining the balancing tensor b, the apparatus may be configured to divide the Winograd input tensor by the balancing tensor b to obtain a balanced input tensor Vt>, and multiple the Winograd filter tensor by the balancing tensor b to obtain a balanced filter tensor Ub. In the balancing phase 102, the apparatus may be configured to apply the following equations:
Vb = V/ b, (10)
Ub = U b. (11)
In this way, channel ranges of the one or more first floating point channels and the one or more second floating point channels may be balanced. This may facilitate the quantization that is to be performed in a later stage, for example, where quantization errors may be reduced.
Alternatively, in some embodiments, the apparatus may be configured to divide the Winograd input tensor by the balancing tensor b to obtain a balanced input tensor Vb, and multiple the Winograd filter tensor by the balancing tensor b to obtain a balanced filter tensor Ub. In this case, the apparatus may be configured to determine each balancing coefficient in a same Winograd domain as:
Figure imgf000020_0001
In this case, the apparatus may be configured to apply the following equations based on equation (12) in the balancing phase 102:
Vb = V b, (13)
Ub = U/b. (14)
Then, the apparatus is configured to determine a first scale factor (denoted as “scale/”) for the Winograd input tensor that is balanced and a second scale factor (denoted as “scaleu”) for the Winograd filter tensor that is balanced. The first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively. It is noted that determining a scale factor for a quantized Winograd convolution is commonly known in the field. Therefore, it is not described in detail herein. In the quantization phase 103, the apparatus may be configured to apply the first and second scale factors to the balanced Winograd input tensor and the balanced Winograd filter tensor as follows: vquant = scalev • Vb, and (15)
Uquant SCale^j ' Ub, (16) where Vquant is a quantized input tensor, and Uquant is a quantized filter tensor.
In the output phase 104, after obtaining the quantized Winograd input tensor and the quantized Winograd filter tensor, the apparatus may be configured to perform Winograd convolution as follows:
Figure imgf000021_0001
Y = A . A + bias (optional), (17) scaleyscaley * where Y is the final output, and transformation matrix A is used to convert the output that is in the Winograd domain to the normal domain. The operation of UqUant N quant maY also be referred to as a Winograd multiplication, which may be understood as a multiplication in the Winograd domain. The transformation matrix A is used to transform the output in the Winograd domain back to the normal domain. Similar to transformation matrix B and G, details regarding the transformation matrix A may depend on the specific Winograd algorithm applied to the Winograd convolution. An example of the transformation matrix A, based on the (2x2, 3x3) Winograd convolution with the 4x4 input tile and the 3x3 filter tile as previously mentioned, may be as follows:
Figure imgf000021_0002
In some embodiments, optionally, the apparatus may be configured to combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balancing scale tensor and a second balancing scale tensor. That is, the first scale factor and the balancing tensor are combined, and the second scale factor and the balancing tensor are also combined. In this way, the apparatus may only need to store these combined parameters, and a simplified balancing and quantization of the Winograd convolution may be achieved. In this case, the apparatus may be configured to perform the same number of operations in the inference phase as in the case of a conventional Winograd convolution.
According to the embodiments of the present disclosure, by determining and applying the balancing tensor for both the Winograd input tensor and the Winograd filter tensor of a Winograd convolution, information loss caused by the quantization may be minimized. This may, for example, increase the precision and accuracy of the result obtained by the neural network. For example, the quality of the image processing may be increased when a neural network is used and a Winograd convolution based on the embodiments of the present disclosure. Moreover, since the apparatus applies reverse operations of multiplication and division to the Winograd input tensor and the Winograd filter tensor based on the same balancing tensor, the balancing tensor itself may be canceled during the operation of the Winograd multiplication in the output phase 104. Therefore, the apparatus may not need any additional operations to reverse the balancing. Hence, efficiency may be introduced to the Winograd convolution based on the embodiments of the present disclosure.
FIG. 2 shows an example of an apparatus 200 for performing a Winograd convolution.
In some embodiments, the apparatus 200 may comprise four units, which are shown exemplarily in FIG. 2 as a transformation unit 201, a balancing unit 202, a quantization unit 203, and an output unit 204. The transformation unit 201 may be configured to obtain the input tensor and the filter tensor. The transformation unit 201 may be further configured to transform the input tensor and the filter tensor from the normal domain into Winograd domain. Then, the balancing unit 202 may be configured to determine a balanced tensor. The balancing unit 202 may further be configured to perform channel balancing on the Winograd input tensor and the Winograd filter tensor based on the determined balanced tensor. Then, the quantization unit 203 may be configured to quantize the (balanced) Winograd input tensor and the (balanced) Winograd filter tensor. Then, the output unit 204 may be configured to compute the final output. It is noted that the units 201-204 in FIG. 2 may correspond to phases 101-104 in FIG. 1 correspondingly.
In some embodiments, optionally, the apparatus 200 may be or may be part of a neural processing unit (NPU) or an Al processor. For example, the apparatus 200 may be a matrix/vector/scalar computation unit comprised in an Al core. The Al core, optionally along with a number of other same Al cores, may be comprised onto a chipset or system-on-chip (SoC). The chipset or SoC may be configured to perform neural network related operations, such as training and/or inferencing. When the chipset or SoC is integrated into an electronic device such as a computer and a mobile phone, the chipset or SoC may be configured to perform image processing, speech recognition, text recognition and the like based on artificial intelligence by using the Winograd convolution according to the present disclosure.
FIG. 3 shows a method 300 for performing a Winograd convolution.
The method 300 is performed by an apparatus for performing the Winograd convolution of a floating point neural network. The method 300 comprises the following steps: step 301 : generating a Winograd input tensor based on an original input tensor. The transformed input tensor comprises one or more first floating point channels;
- step 302: generating a Winograd filter tensor based on a filter tensor of the floating point neural network. The transformed filter tensor comprises one or more second floating point channels; step 303: determining a balancing tensor based on the Winograd input tensor and the Winograd filter tensor, wherein the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels; step 304: determining a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively; and step 305: performing the Winograd-based convolution based on the balancing tensor, the first scale factor, and the second scale factor.
It is noted that the steps of method 300 may share the same functions and details from the perspective of FIGs. 1-2 described above. In particular, steps 301 and 302 may correspond to phase 101 and may be performed by the transformation unit 201. Step 303 may correspond to phase 102 and may be performed by the balancing unit 202. Step 304 may correspond to phase 103 and may be performed by the quantization unit 203. Step 305 may correspond to phase 104 and may be performed by the output unit 204. The corresponding method implementations are not described in detail again at this point.
An application scenario of the present disclosure is that the method for performing a Winograd convolution may be applied to a neural network model that involves convolution operations. Applying the neural network model may often involve two phases: a training phase and an inference phase. For preparing the model in the training phase, the following steps may be executed, e.g., by the apparatus according to the present disclosure:
Step 401. Obtaining the neural network model that comprises conventional direct convolution operations.
Step 402. Replacing the conventional direct convolutions with the Winograd convolutions according to the present disclosure.
Step 403. Passing several data samples, which may be referred to as sample inputs, without training to the model.
Step 404. Collecting statistics about minimum and maximum values of the sample inputs in the Winograd domain (i.e., Winograd inputs).
Step 405. Calculating balancing coefficients based on the collected statistics according to the present disclosure.
Step 406. Applying the calculated balancing coefficients to filters in the Winograd domain (i.e., Winograd filters).
Step 407. Calculating a scaling factor of the balanced Winograd filters and quantizing the Winograd filters.
In the following, there may be two different ways of applying the balancing coefficient and the scale factors to actual input, i.e., a static approach and a dynamic approach.
For the static approach, the apparatus may be configured to perform the following step 408a and the optional step 409a.
Step 408a: calculating a scaling factor of the balanced Winograd sample inputs.
Step 409a (optional): fuse the scaling factor of the balanced Winograd sample inputs and the balancing coefficients. Alternatively, for the dynamic approach, the apparatus may be configured to perform the following step 408b.
Step 408b: storing the balancing coefficients for further usages in the inference phase.
In the inference phase, the neural network model has been trained and fine-tuned for actual application. In this case, an apparatus using the trained neural network model may be configured to perform the following steps 501 and 502. It is noted that the apparatus using the trained neural network model may also be the apparatus according to the present disclosure.
Step 501 : Obtaining balanced and quantized Winograd filters.
Step 502: Transforming actual inputs into Winograd inputs.
For the static approach, the apparatus may be configured to perform the following step 502a.
Step 502a: Balancing and quantizing the Winograd inputs by using the balancing coefficients and the scaling factor determined based on the Winograd sample inputs before the inference phase, or by applying a fused scaling factor and the balancing coefficients if available.
Alternatively, for the dynamic approach, the apparatus may be configured to perform the following steps 502b, 503 and 504.
Step 502b: Balancing the Winograd input by using the balancing coefficients.
Step 503 : Determining a current scale factor based on the balanced Winograd input. Step 504: Quantizing the balanced Winograd input based on the current scale factor.
Then, for both the static and dynamic approaches, the apparatus may be configured to perform the following steps 505 and 506.
Step 505: Calculating Winograd multiplication based on the balanced and quantized Winograd filters and the balanced and quantized Winograd inputs. Step 506: Transforming the Winograd product back to the normal (or spatial) domain as the final output.
FIG.4 shows an application scenario of the present disclosure.
In FIG. 4, an example of a quantization Efficient Sub-Pixel CNN (ESPCNN) for 3x image super-resolution is illustrated. In the ESPCNN, direct 2D convolutions of the 4 Convolutions 32x32 and Convolution 32x27 may be performed based on the Winograd convolution according to the present disclosure. For example, a symmetric quantization scheme, an F(4,3) Winograd algorithm and a quantized INT8 model may be used for the Winograd convolution.
FIG. 5A-5C show results based on different methods for performing convolution.
FIG. 5A shows a result of image super-resolution by using the ESPCNN based on a conventional Winograd convolution that is without balancing. FIG. 5B shows a result of image super-resolution by using the ESPCNN based on the Winograd convolution according to the present disclosure. FIG. 5C shows a result of image super-resolution by using a full precision model. It can be seen that the perceptual quality of FIG. 5B obtained based on the Winograd convolution according to the present disclosure significantly exceeds the perceptual quality of FIG. 5A obtained based on the conventional Winograd convolution and is close to the perceptual quality of FIG. 5C obtained based on the full precision model.
It is noted that the apparatus in the present disclosure may comprise processing circuitry configured to perform, conduct or initiate the various operations of the device described herein, respectively. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors. In one embodiment, the processing circuitry comprises one or more processors and a non- transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device to perform, conduct or initiate the operations or methods described herein, respectively. It is further noted that the apparatus in the present disclosure may be a single electronic device capable of computing, or may comprise a set of connected electronic components or modules capable of computing with shared system memory. It is well known in the art that such computing capabilities may be incorporated into many different devices, and therefore the term “apparatus” may comprise a chip, chipset, artificial intelligence accelerator, neural processing unit, computer, mobile terminal, tablet, wearable device, game console, graphic processing unit, graphic card, and the like.
The present disclosure has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed subject matter, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or another unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

CLAIMS An apparatus (200) for performing Winograd-based convolution of a floating point neural network, wherein the apparatus (200) is configured to: generate a Winograd input tensor based on an original input tensor, wherein the Winograd input tensor comprises one or more first floating point channels; generate a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the Winograd filter tensor comprises one or more second floating point channels; determine a balancing tensor based on the Winograd input tensor and the Winograd filter tensor, wherein the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels; determine a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively; and perform the Winograd-based convolution based on the balancing tensor, the first scale factor, and the second scale factor. The apparatus (200) according to claim 1, wherein the floating point neural network is a trained neural network for image processing, and the apparatus (200) is configured to: obtain an image or a feature map of the image as the original input; determine the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor; process the image or the feature map of the image by performing the Winograd convolution. The apparatus (200) according to claim 1 or 2, wherein the first floating point channel and the second floating point channel are in a one-to-one correspondence, and the balancing tensor comprises one or more balancing coefficients, wherein for determining the balancing tensor, the apparatus (200) is configured to:
26 determine each balancing coefficient based on a quantization range of each first floating point channel and a quantization range of each corresponding second floating point channel.
4. The apparatus (200) according to claim 3, wherein the apparatus (200) is configured to determine each balancing coefficient based on the following equation: hK = F —
° rfc’ wherein bk is a balancing coefficient for channel k, tk is a quantization range of channel k of the one or more first floating point channels, rk is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
5. The apparatus (200) according to claim 3, wherein the apparatus (200) is configured to determine each balancing coefficient based on the following equation:
Figure imgf000029_0001
wherein bk is a balancing coefficient for channel k, tk is a quantization range of channel k of the one or more first floating point channels, rk is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
6. The apparatus (200) according to any one of claims 1 to 4, wherein for performing the Winograd-based convolution, the apparatus (200) is configured to: divide the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor; multiply the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor; multiply the balanced Winograd input tensor by the first scale factor to obtain a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels; multiply the balanced Winograd filter tensor by the second scale factor to obtain a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and perform Winograd multiplication of the Winograd-based convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor. The apparatus (200) according to any one of claims 1, 2, 3 and 5, wherein for performing the Winograd-based convolution, the apparatus (200) is configured to: multiply the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor; divide the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor; multiply the balanced Winograd input tensor by the first scale factor to obtain a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels; multiply the balanced Winograd filter tensor by the second scale factor to obtain a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and perform Winograd multiplication of the Winograd-based convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor. The apparatus (200) according to any one of claims 1 to 5, further configured to: combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balanced scale tensor and a second balanced scale tensor; and perform the Winograd-based convolution based on the first balanced scale tensor and the second balanced scale tensor. A computer-implemented method (300) for performing Winograd-based convolution of a floating point neural network, wherein the method (300) is executed by an apparatus and comprises: generating (301) a Winograd input tensor based on an original input tensor, wherein the transformed input tensor comprises one or more first floating point channels; generating (302) a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the transformed filter tensor comprises one or more second floating point channels; determining (303) a balancing tensor based on the Winograd input tensor and the Winograd filter tensor, wherein the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels; determining (304) a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively; and performing (305) the Winograd-based convolution based on the balancing tensor, the first scale factor, and the second scale factor. The method (300) according to claim 9, wherein the floating point neural network is a trained neural network for image processing, and the method further comprises: obtaining an image or a feature map of the image as the original input; determining the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor; processing the image or the feature map of the image by performing the Winograd convolution. The method (300) according to claim 9 or 10, wherein the first floating point channel and the second floating point channel are in a one-to-one correspondence, and the balancing tensor comprises one or more balancing coefficients, wherein the determining of the balancing tensor comprises: determining each balancing coefficient based on a quantization range of each first floating point channel and a quantization range of each corresponding second floating point channel. The method (300) according to claim 11, wherein the determining of each balancing coefficient is based on the following equation:
Figure imgf000031_0001
29 wherein bk is a balancing coefficient for channel k, tk is a quantization range of channel k of the one or more first floating point channels, rk is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer. The method (300) according to claim 11, wherein the determining of each balancing coefficient is based on the following equation:
, rk hk = — 0 tk ’ wherein bk is a balancing coefficient for channel k, tk is a quantization range of channel k of the one or more first floating point channels, rk is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer. The method (300) according to any one of claims 9 to 12, wherein the performing (305) of the Winograd-based convolution comprises: dividing the Winograd input tensor by the balancing tensor to obtain a balanced
Winograd input tensor; multiplying the Winograd filter tensor by the balancing tensor to obtain a balanced
Winograd filter tensor; multiplying the balanced Winograd input tensor by the first scale factor to obtain a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels; multiplying the balanced Winograd filter tensor by the second scale factor to obtain a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and performing Winograd multiplication of the Winograd-based convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor. The method (300) according to any one of claims 9, 10, 11 and 13, wherein the performing (305) of the Winograd-based convolution comprises: multiplying the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor; dividing the Winograd filter tensor by the balancing tensor to obtain a balanced
Winograd filter tensor;
30 multiplying the balanced Winograd input tensor by the first scale factor to obtain a quantized Winograd input tensor, wherein the quantized Winograd input tensor comprises the one or more first integer channels; multiplying the balanced Winograd filter tensor by the second scale factor to obtain a quantized Winograd filter tensor, wherein the quantized Winograd filter tensor comprises the one or more second integer channels; and performing Winograd multiplication of the Winograd-based convolution based on the quantized Winograd input tensor and the quantized Winograd filter tensor. 16. The method (300) according to any one of claims 9 to 13, further comprises: combining the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balanced scale tensor and a second balanced scale tensor; and performing the Winograd-based convolution based further on the first balanced scale tensor and the second balanced scale tensor.
17. A computer program comprising a program code for performing the method according to any one of claims 9 to 16, when executed on a computer.
31
PCT/RU2021/000416 2021-10-04 2021-10-04 Apparatus and method for winograd convolution WO2023059215A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/RU2021/000416 WO2023059215A1 (en) 2021-10-04 2021-10-04 Apparatus and method for winograd convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2021/000416 WO2023059215A1 (en) 2021-10-04 2021-10-04 Apparatus and method for winograd convolution

Publications (2)

Publication Number Publication Date
WO2023059215A1 true WO2023059215A1 (en) 2023-04-13
WO2023059215A8 WO2023059215A8 (en) 2024-01-25

Family

ID=78725583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2021/000416 WO2023059215A1 (en) 2021-10-04 2021-10-04 Apparatus and method for winograd convolution

Country Status (1)

Country Link
WO (1) WO2023059215A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492779A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Method for operating neural network model, readable medium and electronic device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020024093A1 (en) * 2018-07-30 2020-02-06 Intel Corporation Method and apparatus for keeping statistical inference accuracy with 8-bit winograd convolution

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020024093A1 (en) * 2018-07-30 2020-02-06 Intel Corporation Method and apparatus for keeping statistical inference accuracy with 8-bit winograd convolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DI WU ET AL: "EasyQuant: Post-training Quantization via Scale Optimization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 June 2020 (2020-06-30), XP081710713 *
MAZAHERI ARYA MAZAHERI@CS TU-DARMSTADT DE ET AL: "Accelerating winograd convolutions using symbolic computation and meta-programming", PROCEEDINGS OF THE 12TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, ACM, NEW YORK, NY, USA, 15 April 2020 (2020-04-15), pages 1 - 14, XP058553058, ISBN: 978-1-4503-6894-0, DOI: 10.1145/3342195.3387549 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492779A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Method for operating neural network model, readable medium and electronic device

Also Published As

Publication number Publication date
WO2023059215A8 (en) 2024-01-25

Similar Documents

Publication Publication Date Title
CN109949255B (en) Image reconstruction method and device
US10977001B2 (en) Asymmetric quantization of multiple-and-accumulate operations in deep learning processing
JP7325158B2 (en) Data Representation for Dynamic Accuracy in Neural Network Cores
WO2021018163A1 (en) Neural network search method and apparatus
CN110852416B (en) CNN hardware acceleration computing method and system based on low-precision floating point data representation form
CN114067153B (en) Image classification method and system based on parallel double-attention light-weight residual error network
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN113191489B (en) Training method of binary neural network model, image processing method and device
WO2020001401A1 (en) Operation method and apparatus for network layer in deep neural network
CN111696149A (en) Quantization method for stereo matching algorithm based on CNN
CN115326809A (en) Apparent crack detection method and detection device for tunnel lining
WO2023059215A1 (en) Apparatus and method for winograd convolution
TW202004568A (en) Full exponential operation method applied to deep neural network, computer apparatus, and computer-readable recording medium reducing the operation complexity and circuit complexity, increasing the operation speed of the deep neural network and reducing the occupation of memory space.
CN114677286A (en) Image processing method and device, storage medium and terminal equipment
CN112532251A (en) Data processing method and device
CN112561050A (en) Neural network model training method and device
CN112634136B (en) Image super-resolution method and system based on image feature rapid stitching
EP4170547A1 (en) Method for extracting data features, and related apparatus
WO2022141094A1 (en) Model generation method and apparatus, image processing method and apparatus, and readable storage medium
WO2021179117A1 (en) Method and apparatus for searching number of neural network channels
CN112712461B (en) Image deconvolution processing method and device and terminal equipment
US20240036816A1 (en) Systems and methods for identifying scaling factors for deep neural networks
WO2021249520A1 (en) Image processing method and apparatus, and storage medium
CN114663774B (en) Lightweight salient object detection system and method
CN113255901B (en) Real-time quantization method and real-time quantization system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21811533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE