CN113762498B

CN113762498B - Method for quantizing RoiAlign operator

Info

Publication number: CN113762498B
Application number: CN202010497788.7A
Authority: CN
Inventors: 张东
Original assignee: Hefei Ingenic Technology Co ltd
Current assignee: Hefei Ingenic Technology Co ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2024-01-23
Anticipated expiration: 2040-06-04
Also published as: CN113762498A

Abstract

The invention provides a method for quantifying a ROIAlign operator, which comprises the following steps: s1, inputting Featuremap and quantizing data to obtain low-bit data; s2, calculating coordinates and weights according to the input ROIs, quantizing the weights, obtaining position indexes and corresponding weight values of the final output feature map, and performing RoiAlign operation on quantized data, wherein when the weights are calculated, the weights are fixed when the weights are calculated because the coordinates of the ROIs are floating points and the corresponding weights are floating points; s3, obtaining indexes and weights according to the steps, calculating a final result, and obtaining the output of the poolheight×poolwidth×channel for each unit through multiple AvgPooling operations. The ROIAlign directly processes the low-bit data without conversion to full-precision processing.

Description

Method for quantizing RoiAlign operator

Technical Field

The invention relates to the technical field of neural network acceleration, in particular to a method for quantifying a RoiAlign operator.

Background

In recent years, with rapid development of technology, a large data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and has quite remarkable results in many key fields of artificial intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is used as a typical DNN structure, can effectively extract hidden layer characteristics of images, accurately classifies the images, and is widely applied to the fields of image recognition and detection in recent years.

In particular, the target detection network ROIAlign operator: ROIAlign is a region feature aggregation approach proposed in the paper Mask-RCNN (author Kaiming He, georgia Gkioxari, piotter dolla r, ross Girshick, see https:// arxiv. Org/abs/1703.06870) to generate a fixed size featuremap based on a candidate box region pro-pos map.

However, in the prior art, floating point operation is adopted for the operator, and for the quantized model, the input quantized data needs to be converted into floating point numbers and then operated by encountering the ROIAlign operator, so that the operation efficiency and the bandwidth requirement of the whole quantized model are reduced.

Furthermore, the common terminology in the prior art is as follows:

convolutional neural network (Convolutional Neural Networks, CNN): is a type of feedforward neural network that includes convolution calculations and has a depth structure.

Quantification: quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) discrete values.

Low bits: the data is quantized to 8bit,4bit or 2bit wide data.

Disclosure of Invention

In order to solve the technical problems, the application provides a method for quantifying the ROIAlign operator, which aims to overcome the defects in the prior art, and provides a method for accelerating the model reasoning efficiency and reducing the bandwidth requirements, and solves the problem that the existing low-bit model needs to convert input into floating point numbers in the reasoning process.

The method of the invention can carry out quantization processing on the ROIAlign operator, namely, the data which is input into quantization is directly operated without being converted into floating point numbers and then corresponding operation is carried out. The ROIAlign directly processes the low-bit data without conversion to full-precision processing.

Specifically, the invention provides a method for quantifying the ROIAlign operator, the method comprising the steps of:

s1, inputting Featuremap and quantizing data to obtain low-bit data;

s2, calculating coordinates and weights according to the input ROIs, quantizing the weights, obtaining position indexes and corresponding weight values of final output featuremap, and performing RoiAlign operation on quantized data, wherein when the weights are calculated, the weights are fixed points because the coordinates of the ROIs are floating points and the corresponding weights are floating points, and when the weights are calculated, the weights are fixed points;

s3, obtaining indexes and weights according to the steps, calculating a final result, and obtaining the output of the poolheight×poolwidth×channel for each unit through multiple AvgPooling operations.

The step S1, data quantization: quantizing the data to be quantized according to a formula shown in a formula (1) to obtain low-bit data,

formula (1)

Description of variables: w (W) _f Is an array, W _q Max for quantized data _w

Full precision data W _f Middle maximum value, min _w Full precision data W _f B is the quantized bit width.

The calculating coordinates and weights and quantifying weights in step S2 further includes:

s2.1, setting up a structural body point, which comprises four members of xMin, yMin, rWidth and rHeight, wherein xMin is the minimum value of parameter x, yMin is the minimum value of parameter y, rWidth is the width of parameter r, and rHeight is the height of parameter r; here, the structural body Point represents a target frame for target detection, xMin, yMin, rWidth, and rwight represent the upper left corner coordinate and the length and width of the target frame on the feature map, respectively;

s2.2, rounding the powerheight, powerwidth, binSize, downsamples, fixedWidth, width and height; wherein,

PoolHeight represents the length of the feature map after the Roi is fixed;

PoolWidth represents the width of the feature map after the Roi is fixed;

binSize represents the number of sampling points per region;

down sample indicates that the feature is obtained by sampling N times from an original image, and N is a positive integer;

fixedWidth represents the bit width of the fractional part quantization of the coordinates;

s2.3, structure list: roi

Assigning getNum (roi) to roiNum;

reassigning 1 left shift fixedWidth to fixedScale;

s2.4, sequentially assigning values from 0 to roiNum to the tag to do the following operations:

assigning the roller (tag) to xMin, xMax, yMin, yMax;

assigning rHeight/poolight to the roiBinH;

assigning rWidth/poolWidth to the roiBinW;

wherein, from 0 to poolight are assigned to ph in sequence:

wherein, the method is used for assigning the pw in sequence from 0 to poolWidth:

wherein, the following operations are performed for assigning values to bh from 0 to binSize in sequence:

assigning yMin+ph, binSize+ (bh+0.5) binSize/roiBinH to y;

wherein, the following operations are performed for assigning to bw in sequence from 0 to binSize:

assigning xmin+pw + (bw+0.5) binSize/roiBinW to x;

assigning int (x) to xLow and int (y) to yLow;

assigning xLow+1 to xHigh, and yLow+1 to yHigh;

assigning (y-yLow) fixedScale to ly, and (x-xLow) fixedScale to lx;

assigning fixedScale-ly to hy and fixedScale-lx to hx;

assigning hy to w1, hy to lx to w2, ly to w3, ly to w4;

assigning yLow with+xLow to pos1 and yLow with+xhigh to pos2;

assigning vHigh with+xLow to pos3 and yHigh with+xHigh to pos4;

assigning tag+ (ph_poolheight+pw) channel to index;

calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w 4) was calculated.

The calculating the final output according to the position index and the weight in the step S3 further includes: s3.1, featureMap: (featureMap is a multi-dimensional data with dimensions { height, width, channel })

S3.2, outlfeatureMap: (outfeatureMap, dimension { roiNum, poohight, poolWidth channel });

s3.3, fixedWidth rounding;

s3.4, calculating function calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w 4), wherein the function is realized as a bilinear interpolation process, and pos1, pos2, pos3, pos4 are positions w1, w2, w3, w4 of surrounding pixels of the pixel to be calculated, and the weights of the surrounding pixels are calculated;

s3.5, assigning a featureMap [ pos1 x channel ] to dataPos1;

assigning a featureMap [ pos2 x channel ] to dataPos2;

assigning a featureMap [ pos3 channel ] to dataPos3;

assigning a featureMap [ pos4 x channel ] to dataPos4;

s3.6, sequentially assigning values to the tag from 0 to channel to do the following operations:

will beAssigning a value to tmpValue;

dataPos1++，dataPos2++，dataPos3++，dataPos4++

rightShift←fixedWidth+fixedWidth+binSize*binSize/2

assigning fixedwidth+fixedwidth+binsize to lightshift;

tmpvue right shift lightshift is reassigned to the outfeatureMap [ index+tag ].

Thus, the present application has the advantages that:

(1) For the RoiAlign operator, the input is quantized data, the operation is carried out without being converted into full-precision data, and the RoiAlign operation can be directly carried out on the quantized data;

(2) The reasoning process and speed of the low-bit model are optimized, and the requirements on bandwidth and memory are reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate and together with the description serve to explain the invention.

FIG. 1 is a schematic diagram of the ROIALign operator flow in the prior art.

FIG. 2 is a schematic diagram of the quantized ROIAlign operator flow of the present invention.

FIG. 3 is a schematic flow chart of the method of the present invention.

FIG. 4 is a flow chart of a coding method for calculating coordinates and weights and quantifying the weights in the method of the present invention.

FIG. 5 is a flow chart of a coding method for calculating a final output according to a position index and a weight in the method of the present invention.

Detailed Description

In order that the technical content and advantages of the present invention may be more clearly understood, a further detailed description of the present invention will now be made with reference to the accompanying drawings.

As shown in fig. 3, a method of quantifying the ROIAlign operator according to the present invention includes the steps of:

s1, inputting Featuremap and quantizing data to obtain low-bit data;

Specifically, the present invention can also be interpreted as follows:

in the prior art, the implementation of the ROIAlign operator is mainly divided into 2 steps, 1, and a position index and a corresponding weight value of a final output feature map are obtained according to an input ROI. 2. The final result is calculated from the index and weight obtained in the first step and the output of roiNum x poolighth x pointchannel is obtained for each unit by AvgPooling operation. The flow chart is shown in fig. 1.

The implementation of the quantized ROIAlign operator is also divided into 2 steps, 1, the position index of the final output feature map and the corresponding weight value are obtained according to the input ROI, however, when the weight is calculated, the coordinate of the ROI is floating point, and the corresponding weight is floating point number, so that the position of the ROI is fixed when the weight is calculated. 2. The final result is calculated from the index and weight obtained in the first step and the output of poohight x poolWidth x Channel is obtained for each cell by the AvgPooling operation. The flow chart is shown in fig. 2.

Wherein, the method of calculating coordinates and weights and quantifying the weights is as shown in fig. 4:

01: setting up a structural body point, which comprises four members of xMin, yMin, rWidth and rHeight, wherein xMin is a minimum value of a parameter x, yMin is a minimum value of a parameter y, rWidth is a width of a parameter r, and rHeight is a height of the parameter r; here, the structure body Point represents a target frame for target detection, so xMin, yMin, rWidth, and rwight represent the upper left corner coordinates and length and width of the target frame on the feature map, respectively;

02: rounding the powerheight, powerwidth, binSize, downsampled, fixedWidth, width, height; wherein,

PoolHeight represents the length of the feature map after the Roi is fixed;

PoolWidth represents the width of the feature map after the Roi is fixed;

binSize represents the number of sampling points per region;

down sample indicates how many times the feature is sampled from the original image;

03: structure list: roi

04: assigning getNum (roi) to roiNum;

05, reassigning 1 left shift fixedWidth to fixedScale;

06: for assigning values to tags in order from 0 to roiNum:

07: assigning the roller (tag) to xMin, xMax, yMin, yMax;

08: assigning rHeight/poolight to the roiBinH;

09: assigning rWidth/poolWidth to the roiBinW;

10: wherein, the following operations are performed for assigning values to ph in sequence from 0 to poolight:

11: wherein, the method is used for assigning the pw in sequence from 0 to poolWidth:

12: wherein, the following operations are performed for assigning values to bh from 0 to binSize in sequence:

13: assigning yMin+ph, binSize+ (bh+0.5) binSize/roiBinH to y;

and 14, wherein, the following operations are performed for assigning the bw to the binSize in sequence from 0:

15: assigning xmin+pw + (bw+0.5) binSize/roiBinW to x;

assigning int (x) to xLow and int (y) to yLow;

17: assigning xLow+1 to xHigh, and yLow+1 to yHigh;

18: assigning (y-yLow) fixedScale to ly, and (x-xLow) fixedScale to lx;

19: assigning fixedScale-ly to hy and fixedScale-lx to hx;

20: assigning hy to w1, hy to lx to w2, ly to w3, ly to w4;

21: assigning yLow with+xLow to pos1 and yLow with+xhigh to pos2;

22: assigning yHigh with+xLow to pos3 and yHigh with+xhigh to pos4;

23: assigning tag+ (ph_poolheight+pw) channel to index;

24: calculating calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w 4), wherein the function is realized as a bilinear interpolation process, pos1, pos2, pos3, pos4 is the position w1, w2, w3, w4 of the surrounding pixel points of the pixel point to be calculated, and the weight of the surrounding pixel points is calculated;

wherein, the final output method is calculated according to the position index and the weight, and the coding is as shown in fig. 5:

01: featureMap: (featureMap is a piece of multidimensional data with dimensions { height, width, channel });

02 outfeaturemap: (outfeatureMap, dimension { roiNum, poohight, poolWidth channel });

02, fixedwidth rounding;

calculating function calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w 4);

04, assigning a featureMap [ pos1 channel ] to dataPos1;

05, assigning the featureMap [ pos2 channel ] to dataPos2;

06, assigning the featureMap [ pos3 channel ] to dataPos3;

assigning a featureMap [ pos4 channel ] to dataPos4;

08, the following operations are performed for sequentially assigning values to the tag from 0 to channel:

09 will beAssigning a value to tmpValue;

10:dataPos1++，dataPos2++，dataPos3++，dataPos4++

11:rightShift←fixedWidth+fixedWidth+binSize*binSize/2

assigning fixedwidth+fixedwidth+binsize to lightshift by binSize/2; 13: tmpvue right shift lightshift is reassigned to the outfeatureMap [ index+tag ].

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of quantifying the ROIAlign operator, the method comprising the steps of:

s1, inputting Featuremap and quantizing data to obtain low-bit data;

s2, calculating coordinates and weights according to the input ROIs, quantizing the weights, obtaining position indexes and corresponding weight values of the final output feature map, and performing RoiAlign operation on quantized data, wherein when the weights are calculated, the weights are fixed when the weights are calculated because the coordinates of the ROIs are floating points and the corresponding weights are floating points;

the calculating coordinates and weights and quantifying weights further comprises:

PoolHeight represents the length of the feature map after the Roi is fixed;

PoolWidth represents the width of the feature map after the Roi is fixed;

binSize represents the number of sampling points per region;

down sample indicates that the feature is obtained by sampling N times from an original image, and N is a positive integer; fixedWidth represents the bit width of the fractional part quantization of the coordinates;

s2.3, structure list: roi

Assigning getNum (roi) to roiNum;

reassigning 1 left shift fixedWidth to fixedScale;

assigning the roller (tag) to xMin, xMax, yMin, yMax;

assigning rHeight/poolight to the roiBinH;

assigning rWidth/poolWidth to the roiBinW;

wherein, from 0 to poolight are assigned to ph in sequence:

assigning yMin+ph, binSize+ (bh+0.5) binSize/roiBinH to y;

assigning xmin+pw + (bw+0.5) binSize/roiBinW to x;

assigning int (x) to xLow and int (y) to yLow;

assigning xLow+1 to xHigh, and yLow+1 to yHigh;

assigning (y-yLow) fixedScale to ly, and (x-xLow) fixedScale to lx;

assigning fixedScale-ly to hy and fixedScale-lx to hx;

assigning hy to w1, hy to lx to w2, ly to w3, ly to w4;

assigning yLow with+xLow to pos1 and yLow with+xhigh to pos2;

assigning yHigh with+xLow to pos3 and yHigh with+xhigh to pos4;

assigning tag+ (ph_poolheight+pw) channel to index;

calculating calRoiAlign (index, pos1, pos2, pos3, pos4, w1, w2, w3, w 4);

s3, obtaining indexes and weights according to the steps, calculating a final result, and obtaining the output of a poolHeight multiplied by poolWidth multiplied by Channel for each unit by multiple AvgPooling operations;

the calculating the final output according to the position index and the weight in the step S3 further includes:

s3.1, featureMap: (featureMap is a multi-dimensional data with dimensions { height, width, channel })

s3.3, fixedWidth rounding;

s3.5, assigning a featureMap [ pos1 x channel ] to dataPos1;

assigning a featureMap [ pos2 x channel ] to dataPos2;

assigning a featureMap [ pos3 channel ] to dataPos3;

assigning a featureMap [ pos4 x channel ] to dataPos4;

will beAssigning a value to tmpValue;

dataPos1++，dataPos2++，dataPos3++，dataPos4++；

rightShift←fixedWidth+fixedWidth+binSize*binSize/2，

assigning fixedwidth+fixedwidth+binsize to lightshift;

tmpvue right shift lightshift is reassigned to the outfeatureMap [ index+tag ].

2. The method of quantifying the ROIAlign operator according to claim 1, wherein the step S1 is data quantization: quantizing the data to be quantized according to a formula shown in a formula (1) to obtain low-bit data,

formula (1)

Description of variables: w (W) _f Is an array, W _q Max for quantized data _w Full precision data W _f Middle maximum value, min _w Full precision data W _f B is the quantized bit width, S _W Is a scaling factor that quantizes floating point data.