WO2023059215A1 - Appareil et procédé de convolution winograd - Google Patents

Appareil et procédé de convolution winograd Download PDF

Info

Publication number
WO2023059215A1
WO2023059215A1 PCT/RU2021/000416 RU2021000416W WO2023059215A1 WO 2023059215 A1 WO2023059215 A1 WO 2023059215A1 RU 2021000416 W RU2021000416 W RU 2021000416W WO 2023059215 A1 WO2023059215 A1 WO 2023059215A1
Authority
WO
WIPO (PCT)
Prior art keywords
winograd
tensor
balancing
floating point
filter
Prior art date
Application number
PCT/RU2021/000416
Other languages
English (en)
Other versions
WO2023059215A8 (fr
Inventor
Vladimir Maximovich CHIKIN
Vladimir Mikhailovich KRYZHANOVSKIY
Alexandr Alexandrovich ZURUEV
Yury Alexandrovich PARFENOV
Original Assignee
Huawei Technologies Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd filed Critical Huawei Technologies Co., Ltd
Priority to PCT/RU2021/000416 priority Critical patent/WO2023059215A1/fr
Publication of WO2023059215A1 publication Critical patent/WO2023059215A1/fr
Publication of WO2023059215A8 publication Critical patent/WO2023059215A8/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present disclosure relates to an apparatus and a method for processing a matrix in the fields of artificial intelligence.
  • the disclosure relates to an apparatus and a method for performing a convolution in an artificial neural network.
  • ANNs Artificial neural networks
  • An ANN usually involves massive data processing, such as matrix convolution.
  • a matrix convolution may be seen as a process of adding each element of an input matrix to its local neighbours, weighted by a kernel matrix (or filter). Therefore, the matrix convolution normally includes matrix addition and multiplication.
  • the matrix convolution is often used in a convolution neural network (CNN).
  • Winograd algorithm an algorithm for performing matrix multiplication, which is often referred to as “Coppersmith-Winograd algorithm”, or simply, “Winograd algorithm”.
  • the matrix convolution performed based on the Winograd algorithm may be referred to as “Winograd-based convolution”, or simply “Winograd convolution”.
  • Winograd convolution is widely used in ANNs to reduce the computational complexity, e.g., by reducing the number of multiplications.
  • two operands of a convolution e.g. an input and a filter
  • a transformed output is obtained in the Winograd domain.
  • this transformed output is transformed back to a normal domain (sometimes also referred to as a spatial domain).
  • Transformation matrices B, G and A depend on specific configurations of the Winograd convolution and have different values commonly known in the art for different configurations of the Winograd convolution. For efficient implementation and execution, it is often desired to perform Winograd convolution with integer data.
  • a floating point neural network is often quantized to an integer neural network.
  • the floating point neural network is a neural network comprising floating point parameters, such as inputs and filters.
  • the integer neural network is a neural network comprising only integer parameters.
  • a quantized 8-bit integer (INT8) neural network can achieve comparable accuracies as a 32-bit float-point (FP32) neural network.
  • FP32 float-point
  • INT8 32-bit float-point
  • model sizes can be reduced by a factor of four compared to the FP32 model.
  • calculations can be accelerated for quantized integer neural networks on processors compared to their floating point counterparts.
  • the speedup can be further improved. Overall, quantization may bring improvements including model compression and latency reduction.
  • Quantization of a neural network with a Winograd algorithm may lead to a significant drop in the performance of the neural network in some scenarios.
  • an actual neural network can have thousands of parameters.
  • 2 8 256 integer numbers (from -128 to 127) can be used. This is often much less than the range of a floating point neural network.
  • the floating point neural network has a range of 0.0001 to 0.1 (i.e., [0.0001, 0.1])
  • 256 is much less than 10000. Therefore, rounding may be performed and rounding error may be introduced. This may lead to information loss and may jeopardize the accuracy of the quantized neural network.
  • Winograd convolution involves a preliminary calculation of parameters in order to achieve transformed inputs and transformed filters in the Winograd domain, a neural network with Winograd convolution is much less vulnerable to errors introduced by quantization.
  • Apparatus and methods according to this disclosure facilitate performing a Winograd convolution of a neural network in a robust and efficient manner. This and other objectives are achieved by the subject matter of the independent claims. Further implementations forms are apparent from the dependent claims, the description and the drawings.
  • a balancing tensor is used to balance channel ranges of inputs and weights at each layer of a neural network before quantization.
  • a direct algorithm for calculating the balancing tensor is disclosed, which exploits the distributions of inputs and weights at each layer of a neural network.
  • a first aspect of the present disclosure provides an apparatus for performing Winograd-based convolution of a floating point neural network.
  • the apparatus is configured to generate a Winograd input tensor based on an original input tensor, wherein the Winograd input tensor comprises one or more first floating point channels.
  • the apparatus is configured to generate a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the Winograd filter tensor comprises one or more second floating point channels.
  • a tensor may be a data structure used in neural networks in the field of artificial intelligence to carry a specific amount of information.
  • the tensor may be: a 0-dimensional (0-D) array, such as a single number; a 1 -dimensional (1 -D) array, such as a vector; a 2-dimensional (2-D) array, such as a matrix; a 3 -dimensional (3-D) array, such as data representing an RGB image, or
  • the apparatus may be configured to transform a tensor with a specific dimension into another dimension.
  • a vector may be transformed into a matrix, while a matrix may also be transformed into a vector. This may be useful for satisfying different requirements of inputs required by different configurations of the Winograd convolution.
  • a tensor (i.e., the input or the filter tensor) may comprise one or more channels.
  • a channel may be used to transmit information from a certain aspect. That is, a channel may have a certain capacity for transmitting information.
  • the number of the one or more channels is the depth of the tensor involved in the convolution.
  • all channels of a tensor may share a same size.
  • an NxM pixel RGB image may also be represented by a 2D (N X M) tensor with three channels: red, green and blue.
  • the Winograd convolution may be performed at one or more hidden layers of the floating point neural network.
  • the original input tensor may be or may be part of an input applied to a hidden layer.
  • the input applied to a hidden layer may be an output of a previous layer.
  • the apparatus may be configured to apply Winograd transformation on the original input tensor.
  • the original input tensor may be split into several tiles that are suitable for performing the Winograd convolution. Then, the apparatus may be configured to transform each tile into the Winograd input tensor.
  • the apparatus may be configured to apply Winograd transformation to the original filter tensor in order to obtain the Winograd filter tensor.
  • the filter tensor of the floating point neural network may be weights of neurons at each hidden layer of the floating point neural network. Therefore, the filter tensor may also be referred to as a weight tensor.
  • a floating point channel may be a channel comprising at least one element that is a floating point value.
  • the apparatus is configured to determine a balancing tensor based on the Winograd input tensor and the Winograd filter tensor.
  • the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels.
  • the balancing tensor may comprise one or more balancing coefficients.
  • the one or more balancing coefficients, the one or more first channels, and the one or more second channels may be in a one-to-one correspondence.
  • the apparatus may be configured to balance each first channel and each second channel based on a corresponding balancing coefficient.
  • the apparatus may be configured to divide the Winograd input tensor by the balancing tensor and multiply the Winograd filter tensor by the balancing tensor.
  • the apparatus may be configured to multiply the Winograd input tensor by the balancing tensor and divide the Winograd filter tensor by the balancing tensor. In this way, the balancing tensor may be canceled afterwards when Winograd multiplication of the Winograd input tensor and the Winograd filter tensor is performed. Therefore, no additional operation is introduced.
  • the apparatus is configured to determine a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor.
  • the first and the second scale factors are adapted to quantize the one or more first balanced floating point channels and the one or more second balanced floating point channels into one or more first integer channels and one or more second integer channels, respectively.
  • the quantization errors may be reduced to a minimum.
  • the apparatus is configured to perform the Winograd convolution based on the balancing tensor, the first scale factor, and the second scale factor.
  • the apparatus may be configured to obtain a balanced and quantized Winograd input sensor and a balanced and quantized Winograd filter tensor based on the balancing tensor, the first scale factor, and the second scale factor. Then, the apparatus may be configured to perform Winograd multiplication based on the balanced and quantized Winograd input sensor and the balanced and quantized Winograd filter tensor.
  • channel ranges of the Winograd filter and input tensors can be balanced while the number of operations of a Winograd convolution based thereon is equivalent to that of the conventional Winograd convolution. Further, quantization errors can be reduced because of the balanced channel ranges. In this way, the precision of the Winograd convolution according to the present disclosure can be increased.
  • the balancing of the Winograd filter and input tensors can be compatible with various quantization and training techniques in the art, such as post-training quantization and quantization aware training.
  • the balancing of the Winograd filter and input tensors can be universal. Because the balancing does not depend on any specific type of the Winograd convolution, such as bit width, quantization scheme, scale type and so on. Therefore, the balancing can be applied to Winograd algorithm of any type.
  • the floating point neural network may be a trained neural network.
  • the apparatus may be configured to use the trained neural network for image processing such as image classification and image feature extraction.
  • the apparatus may be further configured to obtain an image or a feature map of the image as the original input.
  • the apparatus may be configured to determine the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor. Then, the apparatus may be configured to process the image or the feature map of the image by performing the Winograd convolution.
  • the apparatus may be configured to split the image or the feature map of the image into multiple tiles.
  • Each tile may still be considered as an original input, because values comprised therein are not altered. Therefore, each tile may still carry original information.
  • the feature map of the image may be obtained by the apparatus as an output of a hidden layer comprised in the floating point neural network.
  • the first floating point channel and the second floating point channel may be in a one-to-one correspondence.
  • the balancing tensor may comprise one or more balancing coefficients.
  • the apparatus may be configured to determine each balancing coefficient based on a quantization range of each first floating point channel and a quantization range of each corresponding second floating point channel.
  • a quantization range of a channel may be understood as a range between the maximum value and the minimum value of the channel.
  • the apparatus may be configured to determine each balancing coefficient based on the following equation: wherein b k is a balancing coefficient for channel k, is a quantization range of channel k of the one or more first floating point channels, is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
  • the apparatus may be configured to determine each balancing coefficient based on the following equation:
  • the apparatus can be configured to obtain each balancing coefficient according to equation (1) or (2) in a simple and direct manner.
  • the apparatus may be configured to obtain a set of sample inputs.
  • the set of sample inputs may be a part of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the set of sample inputs may be similar to a set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the apparatus may be configured to determine each balancing coefficient for each channel based on the sample inputs, and apply each determined balancing coefficient to each corresponding channel for later input(s).
  • the apparatus may be configured to: divide the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
  • the apparatus may be configured to: multiply the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor; divide the Winograd filter tensor by the balancing tensor to obtain a balanced Winograd filter tensor;
  • the apparatus may be further configured to: combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balanced scale tensor and a second balanced scale tensor;
  • a second aspect of the present disclosure provides a computer-implemented method for performing Winograd convolution of a floating point neural network.
  • the method comprises the following steps: - generating a Winograd input tensor based on an original input tensor, wherein the transformed input tensor comprises one or more first floating point channels; generating a Winograd filter tensor based on a filter tensor of the floating point neural network, wherein the transformed filter tensor comprises one or more second floating point channels; determining a balancing tensor based on the Winograd input tensor and the Winograd filter tensor, wherein the balancing tensor is adapted to balance the one or more first floating point channels and the one or more second floating point channels; determining a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels
  • the floating point neural network may be a trained neural network.
  • the trained neural network may be used for image processing, such as image classification and image feature extraction.
  • the method may further comprise: obtaining an image or a feature map of the image as the original input; determining the balancing tensor by minimizing quantization loss generated during the determining of the first scale factor and the second scale factor;
  • the first floating point channel and the second floating point channel may be in a one-to-one correspondence, and the balancing tensor may comprise one or more balancing coefficients.
  • the determining of the balancing tensor may comprise:
  • each balancing coefficient may be based on the following equation: wherein b k is a balancing coefficient for channel k, is a quantization range of channel k of the one or more first floating point channels, r is a quantization range of channel k of the one or more second floating point channels, and k is a positive integer.
  • the determining of each balancing coefficient may be based on the following equation:
  • the method may further comprise obtaining a set of sample inputs.
  • the set of sample inputs may be a part of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the set of sample inputs may be similar to a set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the method may further comprise determining each balancing coefficient for each channel based on the sample inputs, and applying each determined balancing coefficient to each corresponding channel for later input(s).
  • the performing of the Winograd-based convolution may comprise the following steps:
  • the performing of the Winograd- based convolution may comprise the following steps: multiplying the Winograd input tensor by the balancing tensor to obtain a balanced Winograd input tensor;
  • the quantized Winograd input tensor comprises the one or more first integer channels
  • the method may further comprise the following steps:
  • a third aspect of the present disclosure provides a computer program product comprising a program code for performing the method according to the second aspect or any implementation form thereof, when executed on a computer.
  • a fourth aspect of the present disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to any one of the second aspect or any implementation form thereof.
  • a fifth aspect of the present disclosure provides a chipset comprising instructions which, when executed by the chipset, cause the chipset to carry out the method according to any one of the second aspect or any implementation form thereof.
  • FIG. 1 shows an example of a Winograd convolution performed by an apparatus
  • FIG. 2 shows an example of an apparatus for performing a Winograd convolution
  • FIG. 3 shows a method for performing a Winograd convolution
  • FIG. 4 shows an application scenario
  • FIG. 5A-5C show results based on different methods for performing convolution
  • FIG. 6 shows an illustrative example of a conventional Winograd convolution.
  • a framework for performing a Winograd convolution in a neural network is provided.
  • improvements are introduced based on the conventional Winograd convolution illustrated in FIG. 6.
  • a solution for reducing errors generated by quantization of the neural network is provided.
  • quantization may be referred to as a process of mapping a set of input values to a smaller set of values.
  • the conversion of floating point numbers to fixed point (e.g., integer) numbers may be a process of quantization.
  • embodiments disclosed herein may be applied to a neural network that is adapted for image processing, such as image classification, image enhancement, and image recognition.
  • image processing such as image classification, image enhancement, and image recognition.
  • information loss of the image processing based on the neural network according to embodiments disclosed herein may be reduced.
  • a neural network may be referred to as a neural network model or simply as a model.
  • a Winograd parameter shall be understood as a parameter in the Winograd domain.
  • FIG. 1 shows an example of a Winograd convolution performed by an apparatus for a floating point neural network.
  • the Winograd convolution comprises four phases: a transformation phase 101 , a balancing phase 102, a quantization phase 103, and an output phase 104.
  • an original input tensor and a filter tensor of the neural network are transformed into a Winograd input tensor, denoted as V, and a Winograd filter tensor, denoted as U, respectively.
  • the input tensor is denoted as X in the present disclosure and is exemplarily shown as a 4x4 input tile in FIG. 1.
  • the filter tensor of the neural network (also referred to as an original filter tensor) is denoted as W in the present disclosure and is exemplarily shown as a 3x3 filter tile in FIG. 1.
  • the apparatus may be configured to apply the following equations:
  • the Winograd input tensor comprises one or more first floating point channels
  • the Winograd filter tensor comprises one or more second floating point channels.
  • FIG. 1 it is exemplarily shown that the original input tensor X and the filter tensor W both comprise three channels, and operations according to equations (3) and (4) may be performed for each channel comprised therein. Then, each channel may be carried over respectively into the Winograd domain during the transformation according to equations (3) and (4).
  • the neural network may be a trained neural network.
  • the trained neural network may be configured to perform image processing such as image classification and image feature extraction.
  • the original input tensor may be or may be part of an image.
  • the original input tensor may be or may be part of a feature map of an image.
  • the feature map may be obtained at a hidden layer of the neural network.
  • the feature map may be an output of a previous layer at a certain hidden layer of the neural network.
  • the original filter tensor may be dependent upon the hidden layer. That is, the neural network may comprise different original filter tensors at different hidden layers.
  • the apparatus may perform one or more Winograd convolutions at any layer comprised in the neural network. Therefore, the original input tensor and the original filter tensor may be associated with each layer comprised in the neural network.
  • the original input tensor and the original filter tensor are transformed into the Winograd domain, where transformation matrices B and G may be used to facilitate the transformation.
  • Any input tensor not transformed into the Winograd domain may be referred to as an original input. That is, elements of the original input tensor are not altered and may still reflect original true data. Details regarding transformation matrices of B and G may depend on a specific Winograd algorithm applied to the Winograd convolution.
  • the transformation matrix B and G for an F(2*2, 3 *3) Winograd convolution with a 4x4 input tile and a 3x3 filter tile may be as follows: It is noted that a F(m *n, r*s) Winograd convolution may denote a 2D Winograd convolution used to compute an m *n feature map with r*s filters. Other types of Winograd convolutions, such as an F(4 *4, 3 *3) and an F(6*6, 3 *3) Winograd convolution may also be commonly used in the art, in particular for image processing.
  • F(2 *2, 3 *3), F(4 *4, 3 *3) and F(6*6, 3 *3) Winograd convolutions may simply be referred to as F(2,3), F(4, 3) and F(6,3) Winograd convolutions, especially for image processing where a 2D convolution may be considered as default.
  • the Winograd filter tensor U may have a dimension (C,K,a,a), and the Winograd input tensor V may have a dimension (P,C,a,a), where C is the number of input channels, K is the number of output channels, P is the number of tiles of a complete input, and (a, a) determines the dimension of the Winograd domain.
  • the apparatus may be configured to split a complete original input tensor into a plurality of tiles.
  • Each tile may comprise a part of the original input tensor.
  • the size of each tile may be based on the size of the Winograd input tensor of the specific Winograd algorithm applied to the Winograd convolution.
  • each tile may be equivalent to the original input, because it also carries original data that is not transformed into the Winograd domain.
  • the 4x4 input tile in FIG. 1 may be an image segment that is spilt from a complete image of a dimension of 80x80 (pixels).
  • the apparatus is configured to determine a balancing tensor, denoted as b in the present disclosure.
  • the balancing tensor b may have a dimension (C, a, a).
  • the balancing tensor b may be used to balance channel ranges of the Winograd input and filter tensors.
  • the precision of channel k in Winograd domain (i, j) of the Winograd filter tensor U may be denoted as: r k
  • a quantization range may be referred to as a range between a minimum value and a maximum value of a domain before quantization.
  • the precision of channel k in Winograd domain (i, j) of the Winograd input tensor V may be denoted as: where t- is the quantization range of channel k in Winograd domain (i, j) of V, and T- is the quantization range of all channels in the Winograd domain of T.
  • the apparatus may be configured to obtain parameters such as s-j by obtaining a set of sample data without training.
  • An optimal balancing tensor b may be used to achieve a maximized total precision of all channels. That is, the apparatus may be configured to determine the balancing tensor b such that the total precision of all channels ij.k Pij ' s ij is maximized.
  • the apparatus may be configured to determine the balancing tensor b as: wherein denotes each balancing coefficient for channel k in Winograd domain (i,j) comprised in the balancing tensor b.
  • each balancing coefficient in a same Winograd domain may also be denoted as:
  • the apparatus may be configured to obtain a set of sample inputs.
  • the set of sample inputs may be a part (e.g., 5.0-20.0 %) of a complete set of inputs that are to be applied to the trained floating point neural network in the inference phase.
  • the set of sample inputs may be similar to a set of inputs to be applied to the trained floating point neural network in the inference phase.
  • the apparatus may be configured to determine each balancing coefficient for each channel based on the sample inputs, and apply each determined balancing coefficient to each corresponding channel for later input(s).
  • a complete set of inputs to be applied to the trained floating point neural network may comprise 100 images, each image comprising three channels (RGB channels).
  • the apparatus may be configured to obtain 5 to 20 images from the 100 images as the sample inputs, and determine the balancing tensor b for the three channels based on the 5 to 20 images. Then, the apparatus may be configured to apply the determined balancing tensor b for other images of the 100 images.
  • a complete set of inputs to be applied to the trained floating point neural network may comprise 100 tiles split from a complete image. Each tile is a part of the complete image and comprises three channels (RGB channels).
  • the apparatus may be configured to obtain 5 to 20 tiles from the 100 tiles as the sample inputs, and determine the balancing tensor b for the three channels based on the 5 to 20 tiles. Then, the apparatus may be configured to apply the determined balancing tensor b for other tiles of the 100 tiles.
  • the apparatus may be configured to determine the balancing tensor b based on part or all of training samples given during the training phase of the floating point neural network.
  • the floating point neural network may be referred to as the trained floating point neural network.
  • the apparatus may be configured to determine the balancing tensor b in a learnable way. That is, the balancing tensor b may be seen as one of trainable parameters in a so-called “quantization aware training”. That is, the balancing tensor b, as well as other neural network parameters such as weights and/or biases, may be fine-tuned in the training phase of the neural network before the inference phase.
  • the apparatus may be configured to divide the Winograd input tensor by the balancing tensor b to obtain a balanced input tensor Vt>, and multiple the Winograd filter tensor by the balancing tensor b to obtain a balanced filter tensor Ub.
  • the apparatus may be configured to apply the following equations:
  • V b V/ b, (10)
  • channel ranges of the one or more first floating point channels and the one or more second floating point channels may be balanced. This may facilitate the quantization that is to be performed in a later stage, for example, where quantization errors may be reduced.
  • the apparatus may be configured to divide the Winograd input tensor by the balancing tensor b to obtain a balanced input tensor V b , and multiple the Winograd filter tensor by the balancing tensor b to obtain a balanced filter tensor U b .
  • the apparatus may be configured to determine each balancing coefficient in a same Winograd domain as:
  • the apparatus may be configured to apply the following equations based on equation (12) in the balancing phase 102:
  • V b V b, (13)
  • the apparatus is configured to determine a first scale factor (denoted as “scale/”) for the Winograd input tensor that is balanced and a second scale factor (denoted as “scaleu”) for the Winograd filter tensor that is balanced.
  • the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively. It is noted that determining a scale factor for a quantized Winograd convolution is commonly known in the field. Therefore, it is not described in detail herein.
  • the operation of U qU ant N quant ma Y also be referred to as a Winograd multiplication, which may be understood as a multiplication in the Winograd domain.
  • the transformation matrix A is used to transform the output in the Winograd domain back to the normal domain.
  • transformation matrix A Similar to transformation matrix B and G, details regarding the transformation matrix A may depend on the specific Winograd algorithm applied to the Winograd convolution.
  • the apparatus may be configured to combine the balancing tensor with the first scale factor and the second scale factor, respectively, to obtain a first balancing scale tensor and a second balancing scale tensor. That is, the first scale factor and the balancing tensor are combined, and the second scale factor and the balancing tensor are also combined. In this way, the apparatus may only need to store these combined parameters, and a simplified balancing and quantization of the Winograd convolution may be achieved. In this case, the apparatus may be configured to perform the same number of operations in the inference phase as in the case of a conventional Winograd convolution.
  • the balancing tensor for both the Winograd input tensor and the Winograd filter tensor of a Winograd convolution, information loss caused by the quantization may be minimized. This may, for example, increase the precision and accuracy of the result obtained by the neural network. For example, the quality of the image processing may be increased when a neural network is used and a Winograd convolution based on the embodiments of the present disclosure.
  • the apparatus since the apparatus applies reverse operations of multiplication and division to the Winograd input tensor and the Winograd filter tensor based on the same balancing tensor, the balancing tensor itself may be canceled during the operation of the Winograd multiplication in the output phase 104. Therefore, the apparatus may not need any additional operations to reverse the balancing. Hence, efficiency may be introduced to the Winograd convolution based on the embodiments of the present disclosure.
  • FIG. 2 shows an example of an apparatus 200 for performing a Winograd convolution.
  • the apparatus 200 may comprise four units, which are shown exemplarily in FIG. 2 as a transformation unit 201, a balancing unit 202, a quantization unit 203, and an output unit 204.
  • the transformation unit 201 may be configured to obtain the input tensor and the filter tensor.
  • the transformation unit 201 may be further configured to transform the input tensor and the filter tensor from the normal domain into Winograd domain.
  • the balancing unit 202 may be configured to determine a balanced tensor.
  • the balancing unit 202 may further be configured to perform channel balancing on the Winograd input tensor and the Winograd filter tensor based on the determined balanced tensor.
  • the quantization unit 203 may be configured to quantize the (balanced) Winograd input tensor and the (balanced) Winograd filter tensor.
  • the output unit 204 may be configured to compute the final output. It is noted that the units 201-204 in FIG. 2 may correspond to phases 101-104 in FIG. 1 correspondingly.
  • the apparatus 200 may be or may be part of a neural processing unit (NPU) or an Al processor.
  • the apparatus 200 may be a matrix/vector/scalar computation unit comprised in an Al core.
  • the Al core optionally along with a number of other same Al cores, may be comprised onto a chipset or system-on-chip (SoC).
  • SoC system-on-chip
  • the chipset or SoC may be configured to perform neural network related operations, such as training and/or inferencing.
  • the chipset or SoC may be configured to perform image processing, speech recognition, text recognition and the like based on artificial intelligence by using the Winograd convolution according to the present disclosure.
  • FIG. 3 shows a method 300 for performing a Winograd convolution.
  • the method 300 is performed by an apparatus for performing the Winograd convolution of a floating point neural network.
  • the method 300 comprises the following steps: step 301 : generating a Winograd input tensor based on an original input tensor.
  • the transformed input tensor comprises one or more first floating point channels;
  • step 302 generating a Winograd filter tensor based on a filter tensor of the floating point neural network.
  • the transformed filter tensor comprises one or more second floating point channels;
  • step 304 determining a first scale factor for the Winograd input tensor and a second scale factor for the Winograd filter tensor, wherein the first and the second scale factors are adapted to quantize the one or more first floating point channels and the one or more second floating point channels to one or more first integer channels and one or more second integer channels, respectively;
  • steps 301 and 302 may correspond to phase 101 and may be performed by the transformation unit 201.
  • steps 303 may correspond to phase 102 and may be performed by the balancing unit 202.
  • Step 304 may correspond to phase 103 and may be performed by the quantization unit 203.
  • Step 305 may correspond to phase 104 and may be performed by the output unit 204.
  • the corresponding method implementations are not described in detail again at this point.
  • An application scenario of the present disclosure is that the method for performing a Winograd convolution may be applied to a neural network model that involves convolution operations. Applying the neural network model may often involve two phases: a training phase and an inference phase. For preparing the model in the training phase, the following steps may be executed, e.g., by the apparatus according to the present disclosure:
  • Step 401 Obtaining the neural network model that comprises conventional direct convolution operations.
  • Step 402. Replacing the conventional direct convolutions with the Winograd convolutions according to the present disclosure.
  • Step 403. Passing several data samples, which may be referred to as sample inputs, without training to the model.
  • Step 404 Collecting statistics about minimum and maximum values of the sample inputs in the Winograd domain (i.e., Winograd inputs).
  • Step 405. Calculating balancing coefficients based on the collected statistics according to the present disclosure.
  • Step 406 Applying the calculated balancing coefficients to filters in the Winograd domain (i.e., Winograd filters).
  • Step 407. Calculating a scaling factor of the balanced Winograd filters and quantizing the Winograd filters.
  • the apparatus may be configured to perform the following step 408a and the optional step 409a.
  • Step 408a calculating a scaling factor of the balanced Winograd sample inputs.
  • Step 409a fuse the scaling factor of the balanced Winograd sample inputs and the balancing coefficients.
  • the apparatus may be configured to perform the following step 408b.
  • Step 408b storing the balancing coefficients for further usages in the inference phase.
  • the neural network model has been trained and fine-tuned for actual application.
  • an apparatus using the trained neural network model may be configured to perform the following steps 501 and 502. It is noted that the apparatus using the trained neural network model may also be the apparatus according to the present disclosure.
  • Step 501 Obtaining balanced and quantized Winograd filters.
  • Step 502 Transforming actual inputs into Winograd inputs.
  • the apparatus may be configured to perform the following step 502a.
  • Step 502a Balancing and quantizing the Winograd inputs by using the balancing coefficients and the scaling factor determined based on the Winograd sample inputs before the inference phase, or by applying a fused scaling factor and the balancing coefficients if available.
  • the apparatus may be configured to perform the following steps 502b, 503 and 504.
  • Step 502b Balancing the Winograd input by using the balancing coefficients.
  • Step 503 Determining a current scale factor based on the balanced Winograd input.
  • Step 504 Quantizing the balanced Winograd input based on the current scale factor.
  • the apparatus may be configured to perform the following steps 505 and 506.
  • Step 505 Calculating Winograd multiplication based on the balanced and quantized Winograd filters and the balanced and quantized Winograd inputs.
  • Step 506 Transforming the Winograd product back to the normal (or spatial) domain as the final output.
  • FIG.4 shows an application scenario of the present disclosure.
  • FIG. 4 an example of a quantization Efficient Sub-Pixel CNN (ESPCNN) for 3x image super-resolution is illustrated.
  • ESPCNN quantization Efficient Sub-Pixel CNN
  • direct 2D convolutions of the 4 Convolutions 32x32 and Convolution 32x27 may be performed based on the Winograd convolution according to the present disclosure.
  • a symmetric quantization scheme, an F(4,3) Winograd algorithm and a quantized INT8 model may be used for the Winograd convolution.
  • FIG. 5A-5C show results based on different methods for performing convolution.
  • FIG. 5A shows a result of image super-resolution by using the ESPCNN based on a conventional Winograd convolution that is without balancing.
  • FIG. 5B shows a result of image super-resolution by using the ESPCNN based on the Winograd convolution according to the present disclosure.
  • FIG. 5C shows a result of image super-resolution by using a full precision model. It can be seen that the perceptual quality of FIG. 5B obtained based on the Winograd convolution according to the present disclosure significantly exceeds the perceptual quality of FIG. 5A obtained based on the conventional Winograd convolution and is close to the perceptual quality of FIG. 5C obtained based on the full precision model.
  • the apparatus in the present disclosure may comprise processing circuitry configured to perform, conduct or initiate the various operations of the device described herein, respectively.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors.
  • the processing circuitry comprises one or more processors and a non- transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device to perform, conduct or initiate the operations or methods described herein, respectively.
  • the apparatus in the present disclosure may be a single electronic device capable of computing, or may comprise a set of connected electronic components or modules capable of computing with shared system memory. It is well known in the art that such computing capabilities may be incorporated into many different devices, and therefore the term “apparatus” may comprise a chip, chipset, artificial intelligence accelerator, neural processing unit, computer, mobile terminal, tablet, wearable device, game console, graphic processing unit, graphic card, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

La présente invention concerne un procédé et un appareil pour effectuer une convolution Winograd d'un réseau neuronal. En particulier, avant que l'entrée et les tenseurs de filtre de la convolution Winograd dans un domaine Winograd sont quantifiés en données d'entiers, l'appareil est configuré pour déterminer un tenseur d'équilibrage et des plages de canaux d'équilibrage de l'entrée et du tenseur de filtre sur la base du tenseur d'équilibrage. Après que les plages de canaux des tenseurs d'entrée et de filtre sont équilibrées, l'appareil est ensuite configuré pour effectuer une quantification. De cette manière, la perte d'informations provoquée par la quantification peut être réduite. Par conséquent, la précision et l'exactitude du réseau neuronal peuvent être améliorées.
PCT/RU2021/000416 2021-10-04 2021-10-04 Appareil et procédé de convolution winograd WO2023059215A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/RU2021/000416 WO2023059215A1 (fr) 2021-10-04 2021-10-04 Appareil et procédé de convolution winograd

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2021/000416 WO2023059215A1 (fr) 2021-10-04 2021-10-04 Appareil et procédé de convolution winograd

Publications (2)

Publication Number Publication Date
WO2023059215A1 true WO2023059215A1 (fr) 2023-04-13
WO2023059215A8 WO2023059215A8 (fr) 2024-01-25

Family

ID=78725583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2021/000416 WO2023059215A1 (fr) 2021-10-04 2021-10-04 Appareil et procédé de convolution winograd

Country Status (1)

Country Link
WO (1) WO2023059215A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492779A (zh) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 神经网络模型的运行方法、可读介质和电子设备

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020024093A1 (fr) * 2018-07-30 2020-02-06 Intel Corporation Procédé et appareil pour maintenir une précision d'inférence statistique avec une convolution de winograd à 8 bits

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020024093A1 (fr) * 2018-07-30 2020-02-06 Intel Corporation Procédé et appareil pour maintenir une précision d'inférence statistique avec une convolution de winograd à 8 bits

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DI WU ET AL: "EasyQuant: Post-training Quantization via Scale Optimization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 June 2020 (2020-06-30), XP081710713 *
MAZAHERI ARYA MAZAHERI@CS TU-DARMSTADT DE ET AL: "Accelerating winograd convolutions using symbolic computation and meta-programming", PROCEEDINGS OF THE 12TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, ACM, NEW YORK, NY, USA, 15 April 2020 (2020-04-15), pages 1 - 14, XP058553058, ISBN: 978-1-4503-6894-0, DOI: 10.1145/3342195.3387549 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492779A (zh) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 神经网络模型的运行方法、可读介质和电子设备

Also Published As

Publication number Publication date
WO2023059215A8 (fr) 2024-01-25

Similar Documents

Publication Publication Date Title
CN109949255B (zh) 图像重建方法及设备
US10977001B2 (en) Asymmetric quantization of multiple-and-accumulate operations in deep learning processing
JP7325158B2 (ja) ニューラル・ネットワーク・コアにおける動的精度のためのデータ表現
WO2021018163A1 (fr) Procédé et appareil de recherche de réseau neuronal
CN110852416B (zh) 基于低精度浮点数数据表现形式的cnn硬件加速计算方法及***
CN112508125A (zh) 一种图像检测模型的高效全整数量化方法
CN111340077B (zh) 基于注意力机制的视差图获取方法和装置
CN113191489B (zh) 二值神经网络模型的训练方法、图像处理方法和装置
CN102054177B (zh) 一种图像相似度计算方法和装置
WO2020001401A1 (fr) Procédé et appareil de fonctionnement d'une couche réseau dans un réseau neuronal profond
CN111696149A (zh) 针对基于cnn的立体匹配算法的量化方法
EP4318313A1 (fr) Procédé de traitement de données, procédé d'entraînement pour modèle de réseau neuronal et appareil
CN115326809A (zh) 一种隧道衬砌表观裂纹检测方法及检测装置
CN114519667A (zh) 一种图像超分辨率重建方法及***
WO2023059215A1 (fr) Appareil et procédé de convolution winograd
TW202004568A (zh) 應用在深度神經網路的全指數運算方法、電腦裝置及電腦可讀取的記錄媒體
EP4170547A1 (fr) Procédé d'extraction de caractéristiques de données et appareil associé
CN112085175A (zh) 基于神经网络计算的数据处理方法和装置
CN112532251A (zh) 一种数据处理的方法及设备
CN112561050A (zh) 一种神经网络模型训练方法及装置
WO2021179117A1 (fr) Procédé et appareil de recherche de nombre de canaux de réseau de neurones artificiels
CN112712461B (zh) 一种图像反卷积处理方法、装置及终端设备
CN113177546A (zh) 一种基于稀疏注意力模块的目标检测方法
US20240036816A1 (en) Systems and methods for identifying scaling factors for deep neural networks
WO2021249520A1 (fr) Procédé et appareil de traitement d'images et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21811533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE