CN113762498A

CN113762498A - Method for quantizing RoiAlign operator

Info

Publication number: CN113762498A
Application number: CN202010497788.7A
Authority: CN
Inventors: 张东
Original assignee: Hefei Ingenic Technology Co ltd
Current assignee: Hefei Ingenic Technology Co ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2021-12-07
Anticipated expiration: 2040-06-04
Also published as: CN113762498B

Abstract

The invention provides a method for quantizing a ROIAlign operator, which comprises the following steps: s1, inputting Featuremap and quantizing the data to obtain low-bit data; s2, calculating coordinates and weight according to the input ROI and quantizing the weight, acquiring a position index of a final output feature map and a corresponding weight value, and performing RoiAlign operation on quantized data, wherein when the weight is calculated, the coordinates of the ROI are floating points, and the corresponding weight is also a floating point number, so that the weight can be fixed in point when the weight is calculated; s3, calculating the final result according to the indexes and weights obtained in the above steps and obtaining the output of poolHeight x poolWidth x Channel for each unit of Avgpoling operation. In the method, ROIAlign directly processes low-bit data without converting the low-bit data into full-precision data for processing.

Description

Method for quantizing RoiAlign operator

Technical Field

The invention relates to the technical field of neural network acceleration, in particular to a method for quantizing a RoiAlign operator.

Background

In recent years, with the rapid development of science and technology, a big data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and achieves remarkable results in key fields of many human intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is a typical DNN structure, can effectively extract hidden layer features of an image and accurately classify the image, and is widely applied to the field of image identification and detection in recent years.

In particular, the object detection network ROIAlign operator: ROIAlign is a regional feature aggregation proposed in the paper Mask-RCNN (authors Kaiming He, Georgia Gkioxari, Piotr Doll r, Ross Girshick, see https:// axiv. org/abs/1703.06870), when generating a fixed-size featuremap from a candidate box region pro-posal map.

However, in the prior art, floating point operation is adopted for the operator, and for the quantized model, the ROIAlign operator needs to convert input quantized data into floating point numbers first and then operate, so that the operating efficiency of the whole quantization model is reduced and the requirement on bandwidth is met.

Furthermore, the common terminology in the prior art is as follows:

convolutional Neural Networks (CNN): is a type of feedforward neural network that contains convolution calculations and has a depth structure.

And (3) quantification: quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) of discrete values.

Low bit rate: and quantizing the data into data with bit width of 8bit, 4bit or 2 bit.

Disclosure of Invention

In order to solve the technical problems, the application provides a method for quantizing the roiign operator, aims to overcome the defects in the prior art, provides a method for accelerating the model reasoning efficiency and reducing the bandwidth requirement, and solves the problem that the existing low-bit model needs to convert input into floating point numbers in the reasoning process.

The method can carry out quantization processing on the ROIAlign operator, namely, the input quantized data is directly operated without being converted into floating point numbers and then corresponding operation is carried out. In the method, ROIAlign directly processes low-bit data without converting the low-bit data into full-precision data for processing.

Specifically, the invention provides a method for quantizing ROIAlign operator, which comprises the following steps:

s1, inputting Featuremap and quantizing the data to obtain low-bit data;

s2, calculating coordinates and weight according to the input ROI, quantizing the weight, obtaining a position index of a final output featuremap and a corresponding weight value, and performing RoiAlign operation on quantized data, wherein when the weight is calculated, the coordinates of the ROI are floating points, and the corresponding weight is also a floating point number, so that the weight can be fixed in point when the weight is calculated;

s3, calculating the final result according to the indexes and weights obtained in the above steps and obtaining the output of poolHeight x poolWidth x Channel for each unit of Avgpoling operation.

In step S1, data quantization: quantizing the data to be quantized according to a formula shown in formula (1) to obtain low-bit data,

formula (1)

Description of variables: w_fFor full-precision data being an array, W_qFor quantized data, max_w

Full precision data W_fMedian maximum value, min_wFull precision data W_fAnd b is the bit width after quantization.

The step S2 of calculating coordinates and weights and quantizing weights further includes:

s2.1, setting a structure point, wherein the structure point comprises four members of xMin, yMin, rWidth and rHeight, wherein xMin is the minimum value of a parameter x, yMin is the minimum value of a parameter y, rWidth is the width of a parameter r, and rHeight is the height of the parameter r; the structural body Point represents a target frame of target detection, and xMin, yMin, rWidth and rwight respectively represent the coordinate and the length and the width of the upper left corner of the target frame on the characteristic diagram;

s2.2, rounding poolHeight, poolWidth, binSize, downSample, fixedWidth, width and height; wherein,

PoolHeight indicates the length of the feature map after fixation of Roi;

PoolWidth represents the width of the feature map after fixing the Roi;

binSize represents the number of sampling points per region;

the downSample indicates that the feature of the layer is obtained by sampling N times from the original image, and N is a positive integer;

fixedWidth represents the bit width of the coordinate decimal part quantization;

s2.3, structure list: roi

Assigning getNum (roi) to roiNum;

shifting 1 left fixed width and assigning to fixed Scale;

s2.4, assigning values to tag from 0 to roiNum in sequence, and performing the following operations:

assigning the roi (tag) to xMin, xMax, yMin, yMax;

assigning rHeight/poolHeight to roiBinH;

assigning rWidth/poolWidth to roiBinW;

wherein, the values are sequentially assigned to ph from 0 to poolHeight, and the following operations are carried out:

wherein, the following operations are performed for assigning values to pw from 0 to poolWidth in sequence:

and assigning values to bh from 0 to binSize in sequence, wherein the following operations are performed:

y is assigned y min + ph binSize + (bh +0.5) binSize/roiBinH;

and assigning values to bw from 0 to binSize in sequence, wherein the following operations are performed:

assigning x to xMin + pw binSize + (bw +0.5) binSize/roiBinW;

assign int (x) to xLow, int (y) to yLow;

assigning xLow +1 to xHigh, and assigning yLow +1 to yHigh;

assigning (y-yLow) fixedScale to ly and (x-xLow) fixedScale to lx;

assigning fixedScale-ly to hy, and assigning fixedScale-lx to hx;

assigning hy x hx to w1, hy x lx to w2, ly x hx to w3, and ly x lx to w 4;

assigning yLow _ width + xLow to pos1 and yLow _ width + xHigh to pos 2;

assigning vHigh × width + xLow to pos3 and yHigh × width + xHigh to pos 4;

assigning tag + (ph _ poolHeight + pw) channel to index;

calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w4) was calculated.

The step S3 of calculating the final output according to the position index and the weight further includes: s3.1, featureMap: (featureMap is a multidimensional data with dimensions { height, width, channel })

S3.2, outfeatureMap: (outfeatureMap, dimension { roiNum, poolhight, poolWidth channel });

s3.3, rounding the fixedWidth;

s3.4, calculating function calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3 and w4), wherein the function is realized as a bilinear interpolation process, pos1, pos2, pos3 and pos4 are positions w1, w2, w3 and w4 of surrounding pixels of the pixel to be solved are weights of the surrounding pixels during calculation;

s3.5, assigning featureMap [ pos1 × channel ] to dataPos 1;

featureMap [ pos2 channel ] was assigned to dataPos 2;

featureMap [ pos3 channel ] was assigned to dataPos 3;

featureMap [ pos4 channel ] was assigned to dataPos 4;

s3.6, assigning values to tag from 0 to channel in sequence, and performing the following operations:

will be provided with

Assigning to tmpValue;

dataPos1++，dataPos2++，dataPos3++，dataPos4++

rightShift←fixedWidth+fixedWidth+binSize*binSize/2

assigning fixedWidth + fixedWidth + binSize × binSize/2 to rightShift;

and the tmpValue is shifted right by rightShift and is assigned to outfeatureMap [ index + tag ].

Thus, the present application has the advantages that:

(1) for the RoiAlign operator, quantized data are input, the RoiAlign operator does not need to be converted into full-precision data and then operated, and RoiAlign operation can be directly performed on the quantized data;

(2) the inference process and speed of the low bit model are optimized, and the requirements on bandwidth and memory are reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.

Fig. 1 is a schematic diagram of the prior art roiallign operator process.

FIG. 2 is a schematic diagram of the process of quantifying the ROIAlign operator according to the present invention.

FIG. 3 is a schematic flow diagram of the method of the present invention.

Fig. 4 is a flow chart illustrating a coding method for calculating coordinates and weights and quantizing the weights in the method of the present invention.

FIG. 5 is a flow diagram of a method for implementing the coding method for computing final outputs based on position indices and weights in the method of the present invention.

Detailed Description

In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.

As shown in fig. 3, a method for quantizing roiign operator according to the present invention includes the following steps:

s1, inputting Featuremap and quantizing the data to obtain low-bit data;

Specifically, the present invention can also be interpreted as follows:

in the prior art, the ROIAlign operator is mainly realized in 2 steps, 1, a position index and a corresponding weight value of a final output feature map are obtained according to an input ROI. 2. And calculating a final result according to the index and the weight obtained in the first step, and obtaining the output of roiNum x poolHeight x poolWidth x Channel for each unit of the multiple AvgPoling operations. The flow chart is shown in fig. 1.

The implementation of the quantized ROIAlign operator according to the present invention is also divided into 2 steps, 1, obtaining the position index and the corresponding weight value of the final output feature map according to the input ROI, however, when calculating the weight, because the coordinates of the ROI are floating point, the corresponding weight is also floating point number, so that the ROI is fixed point when calculating the weight. 2. And calculating a final result according to the index and the weight obtained in the first step, and obtaining the output of poolHeight x poolWidth x Channel for each unit of the multiple AvgPoling operations. The flow chart is shown in fig. 2.

The method for calculating coordinates and weights and quantizing the weights is coded as shown in fig. 4:

01: setting a structure point, which comprises four members of xMin, yMin, rWidth and rHeight, wherein xMin is the minimum value of a parameter x, yMin is the minimum value of a parameter y, rWidth is the width of a parameter r, and rHeight is the height of the parameter r; the structure Point here represents a target frame for target detection, so xMin, yMin, rWidth and rwight respectively represent the coordinates and length and width of the upper left corner of the target frame on the feature map;

02: rounding poolHeight, poolWidth, binSize, downSample, fixedWidth, width, height; wherein,

PoolHeight indicates the length of the feature map after fixation of Roi;

PoolWidth represents the width of the feature map after fixing the Roi;

binSize represents the number of sampling points per region;

the downSample represents how many times the feature of the layer is sampled from the original image;

03: structure list: roi

04: assigning getNum (roi) to roiNum;

05, moving the 1 left fixed Width and assigning the value to fixed Scale;

06: assigning the tag with the following operations from 0 to roiNum in sequence:

07: assigning the roi (tag) to xMin, xMax, yMin, yMax;

08: assigning rHeight/poolHeight to roiBinH;

09: assigning rWidth/poolWidth to roiBinW;

10: wherein, the following operations are carried out for assigning values to ph from 0 to poolHeight in sequence:

11: wherein, the following operations are performed for assigning values to pw from 0 to poolWidth in sequence:

12: and assigning values to bh from 0 to binSize in sequence, wherein the following operations are performed:

13: y is assigned y min + ph binSize + (bh +0.5) binSize/roiBinH;

and 14, assigning values to bw from 0 to binSize in sequence, and performing the following operations:

15: assigning x to xMin + pw binSize + (bw +0.5) binSize/roiBinW;

assign int (x) to xLow, int (y) to yLow;

17: assigning xLow +1 to xHigh, and assigning yLow +1 to yHigh;

18: assigning (y-yLow) fixedScale to ly and (x-xLow) fixedScale to lx;

19: assigning fixedScale-ly to hy, and assigning fixedScale-lx to hx;

20: assigning hy x hx to w1, hy x lx to w2, ly x hx to w3, and ly x lx to w 4;

21: assigning yLow _ width + xLow to pos1 and yLow _ width + xHigh to pos 2;

22: assigning yHigh width + xLow to pos3 and yHigh width + xHigh to pos 4;

23: assigning tag + (ph _ poolHeight + pw) channel to index;

24: calculating calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3 and w4), wherein the function is realized as a bilinear interpolation process, pos1, pos2, pos3 and pos4 are positions w1, w2, w3 and w4 of surrounding pixel points of the pixel point to be solved are weights of the surrounding pixel points during calculation;

in which, the method of calculating the final output according to the position index and the weight is coded as shown in fig. 5:

01: featureMap: (featureMap is a plurality of multidimensional data with dimensions { height, width, channel });

02 outfeatureMap: (outfeatureMap, dimension { roiNum, poolhight, poolWidth channel });

02, rounding the fixedWidth;

calculating function calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w 4);

04, assigning featureMap [ pos1 channel ] to dataPos 1;

05 assigning featureMap [ pos2 channel ] to dataPos 2;

06 assigning featureMap [ pos3 channel ] to dataPos 3;

07 assigning featureMap [ pos4 channel ] to dataPos 4;

08, assigning the value to tag from 0 to channel in sequence, and performing the following operations:

09 to get

Assigning to tmpValue;

10:dataPos1++，dataPos2++，dataPos3++，dataPos4++

11:rightShift←fixedWidth+fixedWidth+binSize*binSize/2

assigning fixedWidth + fixedWidth + binSize/2 to rightShift; 13: and the tmpValue is shifted right by rightShift and is assigned to outfeatureMap [ index + tag ].

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of quantizing ROIAlign operators, the method comprising the steps of:

s1, inputting Featuremap and quantizing the data to obtain low-bit data;

s2, calculating coordinates and weight according to the input ROI and quantizing the weight, acquiring a position index of a final output feature map and a corresponding weight value, and performing RoiAlign operation on quantized data, wherein when the weight is calculated, the coordinates of the ROI are floating points, and the corresponding weight is also a floating point number, so that the weight can be fixed in point when the weight is calculated;

2. The method of claim 1, wherein the step S1 is implemented by quantizing data: quantizing the data to be quantized according to a formula shown in formula (1) to obtain low-bit data,

formula (1)

Description of variables: w_fFor full-precision data being an array, W_qFor quantized data, max_wFull precision data W_fMedian maximum value, min_wFull precision data W_fAnd b is the bit width after quantization.

3. The method of claim 1, wherein the step of calculating coordinates and weights and quantizing weights in step S2 further comprises:

PoolHeight indicates the length of the feature map after fixation of Roi;

PoolWidth represents the width of the feature map after fixing the Roi;

binSize represents the number of sampling points per region;

the downSample indicates that the feature of the layer is obtained by sampling N times from the original image, and N is a positive integer; fixedWidth represents the bit width of the coordinate decimal part quantization;

s2.3, structure list: roi

Assigning getNum (roi) to roiNum;

shifting 1 left fixed width and assigning to fixed Scale;

s2.4, assigning values to tga from 0 to roiNum in sequence to perform the following operations:

assigning the roi (tag) to xMin, xMax, yMin, yMax;

assigning rHeight/poolHeight to roiBinH;

assigning rWidth/poolWidth to roiBinW;

y is assigned y min + ph binSize + (bh +0.5) binSize/roiBinH;

assigning x to xMin + pw binSize + (bw +0.5) binSize/roiBinW;

assign int (x) to xLow, int (y) to yLow;

assigning xLow +1 to xHigh, and assigning yLow +1 to yHigh;

assigning (y-yLow) fixddScale to ly and (x-xLow) fixedScale to lx;

assigning fixedScale-ly to hy, and assigning fixedScale-lx to hx;

assigning hy x hx to w1, hy x lx to w2, ly x hx to w3, and ly x lx to w 4;

assigning yLow _ width + xLow to pos1 and yLow _ width + xHigh to pos 2;

assigning yHigh width + xLow to pos3 and yHigh width + xHigh to pos 4;

assigning tag (ph _ poolHeight + pw) channel to index;

calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w4) was calculated.

4. The method of claim 1, wherein the step of calculating the final output according to the position index and the weight in step S3 further comprises:

s3.1, featureMaps (featureMaps are multidimensional data with dimensions { height, width, channel })

s3.3, rounding the fixedWidth;

s3.5, assigning featureMap [ pos1 × channel ] to dataPos 1;

featureMap [ pos2 channel ] was assigned to dataPos 2;

featureMap [ pos3 channel ] was assigned to dataPos 3;

featureMap [ pos4 channel ] was assigned to dataPos 4;

will be provided with

Assigning to tmpValue;

dataPos1++，dataPos2++，dataPos3++，dataPos4++rightShift←fixedWidth+fixedWidth+binSize*binSize/2

assigning fixedWidth + fixedWidth + binSize × binSize/2 to rightShift;