CN111126585A

CN111126585A - Method and device for optimizing filling parameters and computer-readable storage medium

Info

Publication number: CN111126585A
Application number: CN201911360230.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-05-08

Abstract

The present disclosure relates to a method, apparatus, computer-readable storage medium for optimizing filling parameters. The means for optimizing the filling parameters may be included in a combined processing device, which may also include a universal interconnect interface and other processing means. The device for optimizing the filling parameters interacts with other processing devices to jointly complete the calculation operation specified by the user. The combined processing means may further comprise storage means connected to the means for optimizing filling parameters and the further processing means, respectively, for data services of the means for optimizing filling parameters and the further processing means. By means of the method and the device, the generation of redundancy is overcome, the input scale is reduced, the memory occupation is reduced, and the operation speed is increased.

Description

Method and device for optimizing filling parameters and computer-readable storage medium

Technical Field

The present disclosure relates generally to the field of computers. And more particularly, to a method, apparatus, computer-readable storage medium for optimizing filling parameters.

Background

Tensorflow is a symbolic mathematical system based on dataflow programming (dataflow programming), and is widely applied to programming realization of various machine learning (machine learning) algorithms. Pythroch is a python version of a torrech, a neural network framework sourced by Facebook, Inc. (Facebook). Unlike the Tensorflow static computation graph, the Pythroch computation graph is dynamic and can be changed in real time according to the computation needs. In both the tensorblow framework and the Pytorch framework used in the field of artificial intelligence at present, an addition method of padding (zero padding) is often used in convolution operations.

However, since padding (zero padding) cannot be set under the tensrflow framework, the padding method has many functional defects, and multiple padding cases cannot be realized. Allowing the padding parameters to be set to any non-negative integer value in a pytorech frame does not have the functional drawbacks of the tensrflow frame, but is redundant in that it is identically padded or zero-filled across the corresponding dimension.

How to solve at least some of the defects in the prior art is a problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In order to at least solve the problems described in the above background art, so as to overcome the occurrence of redundancy, reduce the input scale, reduce the memory usage, and increase the operation speed, the present disclosure provides the following technical solutions and a plurality of embodiments thereof.

According to a first aspect of the present disclosure, there is provided a method of optimizing padding parameters (padding), which may include the following steps:

calculating an output size (output _ size) of the output data (output) from the input data (input);

calculating an effective input size (equivalent _ input _ size) of the input data from the output size and the input data;

an optimized padding parameter (optimized padding) is calculated based on a comparison between the valid input size and the corresponding parameter in the input data.

According to a second aspect of the present disclosure, there is provided an apparatus for optimizing a filling parameter, which may include:

a processor configured to execute program instructions; and

a memory configured to store program instructions that, when loaded and executed by the processor, cause the apparatus to perform the method described above.

According to a third aspect of the present disclosure, a computer readable storage medium is provided, in which program instructions are stored, the program instructions being adapted to be loaded by a processor and to perform the method according to the above.

According to a fourth aspect of the present disclosure, there is provided a chip for optimizing a filling parameter, which may include: an input configured to receive input data; a processor configured to: calculating an output size of the output data according to the input data; calculating an effective input size of the input data based on the output size and the input data; the optimized fill parameters are calculated based on a comparison between the valid input dimensions and corresponding parameters in the input data.

According to a fifth aspect of the present disclosure, an integrated circuit for optimizing fill parameters is provided, comprising a chip according to the above.

By adopting the technical scheme, the zero adding device is beneficial to adding zeros of different rows or columns at two ends of the corresponding dimensionality of input data in the operation process, namely, the number of rows for zero padding at the top can be different from the number of rows for zero padding at the bottom, and the number of columns for zero padding at the left side can be different from the number of columns for zero padding at the right side, so that the setting of any non-negative integer is realized. Therefore, the generation of redundancy in the convolution calculation process is overcome, and the effects of reducing the input scale, reducing the memory occupation and improving the operation speed are realized. The generation of the redundancy overcoming phenomenon mentioned in the present disclosure can be applied not only to the operation of the convolution layer but also to the operation of the pooling layer and the like.

Drawings

Fig. 1 schematically shows a flow chart according to a first aspect of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of optimized filling according to one embodiment of the present disclosure;

FIG. 3 schematically illustrates a block diagram of a combined processing device according to one embodiment of the present disclosure; and

fig. 4 schematically illustrates a structural diagram of a board card according to an embodiment of the present disclosure.

Detailed Description

In the existing pytorech framework, padding or zero padding (padding) is allowed to be set to any non-negative integer value, and the functional defect of the tensrflow framework does not exist, but redundancy phenomenon exists because the padding is the same at two ends of the corresponding dimension. The redundancy present in the prior art pytorech framework will be described below to assist those skilled in the art in understanding.

Taking convolution operations as an example, for ease of discussion, the values of the height (H) dimension are equal to the values of the width (W) dimension in the following exemplary embodiment of the prior art, where the following parameters are set as follows:

step (stride) is 3, convolution kernel extension (scaling) is 1, input size of input data (input _ size) is 7, convolution kernel size (filter _ size) is 4, padding (padding) is 2:

the input data (input) is:

1	1	1	1	1	1	1
							1	1	1	1	1	1	1
1	1	1	1	1	1	1
							1	1	1	1	1	1	1
1	1	1	1	1	1	1
							1	1	1	1	1	1	1
1	1	1	1	1	1	1

TABLE 1

From table 1, it can be seen that the input data therein is in the form of a7 × 7 vector (i.e., each value in the height (H) dimension is equal to each value in the width (W) dimension, as shown in table 1), and the input size (input _ size) of the input data is 7.

The convolution kernel (filter) used in the convolution operation is:

1	1	1	1
				1	1	1	1
1	1	1	1
				1	1	1	1

TABLE 2

From table 2, it can be known that the convolution kernel is in the form of a4 × 4 vector, and the size of the convolution kernel (filter _ size) is 4.

Table 3 below shows that, when the padding parameter is 2, the input data (input with padding) is in an 11 × 11 vector format, and the input size (input _ with _ padding _ size) of the input data after padding is 11:

the input data after filling is:

TABLE 3

The 0 between the two black bold boxes in table 3 is the filled vector (in the following tables, the 0 between the two black bold boxes represents the filled vector, e.g., table 5, table 8, table 10 … …), that is, all the 0's in table 3 are filled vectors. The same padding or zero padding is done across each dimension of the input data in table 3. In general, for a two-dimensional vector (shown in table 3 is a two-dimensional vector), the filling parameters in the height direction may be divided into a filling parameter at the top (padding _ top) and a filling parameter at the bottom (padding _ bottom), and the filling parameters in the width direction may be divided into a filling parameter at the left (padding _ left) and a filling parameter at the right (padding _ right). In fig. 3, the top and bottom fill parameters are both 2, i.e., the top and bottom are filled with 0's in two rows (rows 1-2 and 10-11 are filled with 0's in two rows, respectively), and the left and right fill parameters are both 2, i.e., the left and right are filled with 0's in two columns, respectively (columns 1-2 and 10-11 are filled with 0's in two columns, respectively).

The output data after convolution operation is:

4	8	6
			8	16	12
6	12	9

TABLE 4

The output data in table 4 is in the form of 3 × 3 vectors, and the output size of the output data is 3.

The inventor has noticed that the convolution operation starts from the upper left corner of the filled input data (input with padding) (table 3), and a phenomenon occurs in which the rightmost column and the downmost padded data (i.e., rightmost column and downmost padded 0) do not participate in the operation, i.e., a phenomenon in which the 11 th column and the 11 th row padded 0 do not participate in the operation.

It should be noted that the columns and rows mentioned in the description of the present disclosure are ordered from top to bottom and from left to right of the vector set. For example, column 11 and row 11 of Table 3 refer to the rightmost column and the downmost row, respectively, of Table 3.

The inventors of the present disclosure surprisingly found that the effective input size of the input data in this convolution operation is 10, because the 11 th column and 11 th row filled 0 does not participate in the operation. The concept "effective input size of input data" (equivalent _ input _ size) introduced here is just an inventive concept proposed by the inventors of the present disclosure. The present disclosure is also developed based on "an effective input size of input data".

The valid input data (equivalent input) corresponds to the above embodiment:

TABLE 5

0's between two black bold boxes in table 5 are filled vectors (the same applies in the following embodiment), that is, all 0's in table 5 are filled vectors. The input data is filled or zero filled differently across the dimensions in table 5. In the valid input data shown in table 5, the top padding parameter is 2, the bottom padding parameter is 1, the left padding parameter is 2, and the right padding parameter is 1, i.e., two rows 0 (rows 1 and 2) are padded at the top, one row zero (row 10) is padded at the bottom, two columns 0 (columns 1 and 2) are padded at the left, and one column zero (column 10) is padded at the right.

That is, the valid input data in table 5 is the input data that is really meaningful in the convolution operation.

As already analyzed, the inventors found that in the pytoreh framework, 0 padding is redundant in convolution operation, which results in large memory consumption, slow operation speed, and other defects (e.g., the 0 padding in the 11 th column and 11 th row shown in table 3 does not participate in the operation).

In addition, it should be noted that the terms "data", "filling parameter" and other input and output data mentioned in the application document of the present disclosure should be understood in a broad sense, and the input and output data may be images, languages, texts, or the like, and the method described in the present application may also be applied to the application fields of speech, image, or natural language processing, and the present disclosure does not set any limitation thereto.

After introducing the redundancy phenomenon existing in the convolution operation in the prior art (especially under the Pytorch framework), the inventor proposes the following improvement:

a method 100 of optimizing a padding parameter (padding) according to the first aspect of the present disclosure, schematically illustrated in fig. 1, may generally comprise the following steps:

s102, the processor calculates the output size (output _ size) of the output data according to the input data;

s104, the processor calculates the effective input size (equivalent _ input _ size) of the input data according to the output size (output _ size) and the input data;

s106, the processor calculates an optimized filling parameter (optimized _ padding) according to the comparison between the effective input size (equivalent _ input _ size) and the corresponding parameter in the input data.

In one embodiment of the present disclosure, the respective parameters in the input data may include:

the input size (input _ size), padding parameter (padding), convolution kernel (filter), convolution kernel expansion parameter (stride), and stride of the input data.

In another embodiment of the present disclosure, a method of optimizing a fill parameter may include the steps of:

calculating the size (contrast _ filter _ size) of the expanded convolution kernel; calculating the input size (input _ with _ padding _ size) of the filled input data (for example, under a pytore frame); calculating an output size (output _ size) of the output data; calculating an effective input size (equivalent _ input _ size) of the input data; judging the size relation between the effective input size (equivalent _ input _ size) of the input data, the padding parameter (padding) and the input size (input _ size); optimized padding parameters (optimized _ padding); adding the optimized padding parameter (optimized _ padding) to the input size (input _ size) results in the input size (input _ with _ optimized _ padding _ size) with the optimized padding parameter.

A detailed description will now be given in connection with a flow chart of optimized filling according to one embodiment of the present disclosure, schematically illustrated in fig. 2. The following description is basically presented in flow chart order from top to bottom and from left to right.

At the beginning of the flowchart shown in fig. 2, the corresponding input data (the input data includes the input size of the input data) and the parameters corresponding to the input data, including the input size of the input data, the padding parameter, the convolution kernel dilation parameter, and the step size, are input. These input parameters (input size of input data, padding parameter, convolution kernel extension parameter, and step size) are parameters that need to be used in the subsequent convolution operation. As shown by the two diamond boxes "input data, fill parameter" and "input convolution kernel, convolution kernel dilation parameter, step size" in fig. 2.

Calculate the input size of the traditional filled input data (e.g. under the Pytorch framework):

input_with_padding_size＝input_size+2×padding。

the input size of the conventional filled input data is equal to the input size of the input data plus twice the filling parameter. As shown by the rectangular box "compute input _ with _ padding _ size" in fig. 2.

Calculating the size of the expanded convolution kernel:

the size of the convolution kernel is represented by the size of the convolution kernel, and the size of the convolution kernel is represented by the size of the convolution _ filter _ size ═ contrast × (filter _ size-1) + 1. As indicated by the rectangular box "calculate partition _ filter _ size" in fig. 2.

It should be noted that there is no requirement in the order of calculating the input size (input _ with _ padding _ size) of the conventional padded input data and calculating the size (contrast _ filter _ size) of the expanded convolution kernel.

Then, the output size of the output data can be calculated:

wherein the brackets outside the partial type

Indicating a rounding down operation. As indicated by the rectangular box "calculate output _ size" in fig. 2.

Next, the effective input size of the input data can be calculated, as shown by the rectangular box "calculate equivalent _ input _ size" in FIG. 2. The effective input size of the input data can be calculated from the output size and the input data: the product of the value obtained by subtracting the output size (output _ size) by one and the step size, and the size expanded by the convolution kernel are added to obtain the effective input size of the input data, that is:

equivalent_input_size＝(output_size-1)×stride+dilation_filter_size

this step calculates a priori the effective input size of the input data, which describes the shape of the input data that will actually participate in the operation in the convolution operation, from which the subsequent steps can optimize the filling parameters.

Calculating an optimized padding parameter (optimized padding) based on a comparison between the valid input size and a corresponding parameter in the input data may further comprise:

and judging the size relation between the sum of the input size and the filling parameter and the effective input size, and calculating the optimized filling parameter according to the size relation.

It should be noted that there are two optimized filling parameters corresponding to each dimension. For example, in the case of a two-dimensional vector, the optimized padding parameters may include optimized _ padding1-1 and optimized _ padding1-2 along a first dimension, optimized _ padding2-1 and optimized _ padding2-2 along a second dimension. In the case of a three-dimensional vector, the optimized fill parameters may include optimized _ padding1-1 and optimized _ padding1-2 along a first dimension, optimized _ padding2-1 and optimized _ padding2-2 along a second dimension, and optimized _ padding3-1 and optimized _ padding3-2 along a third dimension. In the case of vectors of more than four dimensions, the optimized padding parameters may include corresponding optimized _ padding1-1 and optimized _ padding1-2 along the first dimension, optimized _ padding2-1 and optimized _ padding2-2 along the second dimension, optimized _ padding3-1 and optimized _ padding3-2 … … along the third dimension, which for convenience of description in the following description section, in various embodiments, for example, two-dimensional vectors, assume that the two-dimensional parameters are the same, so the optimized top padding parameter ("optimized _ padding 1-1" or conveniently "optimized _ padding _ top") and the optimized left padding parameter ("optimized _ padding 2-1" or conveniently "optimized _ padding _ 1" are considered to be used instead of simply "optimized _ padding _ 1". Similarly, the optimized underfill parameter ("optimized _ padding 1-2" or conveniently "optimized _ padding _ bottom") and the optimized right-side fill parameter ("optimized _ padding 2-2" or conveniently "optimized _ padding _ right") are the same, and thus optimized _ padding2 is simply used instead. That is, the optimized padding parameter optimized _ padding1 described above generally indicates the number of rows and columns that need to be padded with 0 (zero padding) at the top and left of the input data, respectively, in the case of a two-dimensional vector (assuming that the parameters are the same for both dimensions of the two-dimensional vector, as mentioned above). Similarly, the optimized padding parameter optimized _ padding2 generally indicates the number of rows and columns that need to be padded with 0 (zero padding) at the bottom and right of the input data, respectively, in the case of a two-dimensional vector.

It has been shown in the flowchart shown in fig. 2 that the size relationship between the sum of the input size and the filling parameter (input _ size + padding) and the valid input size (equivalent _ input _ size) is determined (as shown by the diamond-shaped box "equivalent _ input _ size > input _ size + padding.

In the case where the sum of the input size and the padding parameter (input _ size + padding) is smaller than the valid input size (equivalent _ input _ size) (corresponding to the case of "yes" in fig. 2):

optimized padding parameter (optimized _ padding1) ═ padding parameter (padding)

The optimized padding parameter (optimized _ padding2) is the valid input size (equivalent _ input _ size) minus the sum of the input size and the padding parameter (input _ size + padding). As shown by the rectangular box "optimized _ padding1 ═ padding optimized _ padding2 ═ equivalent _ input _ size- (input _ size + padding)" in fig. 2.

The optimized padding parameter optimized _ padding1 indicates that the number of columns and rows to be padded with 0 on the left and upper sides of the input size is equal to the padding parameter contained in the input data of the initial input, or the zero padding amount of the corresponding dimension convolution sliding start end is the same as the padding parameter. The optimized padding parameter optimized _ padding2 indicates the number of columns and rows to be padded with 0 on the right and lower side of the input size, or the zero padding amount at the end of the corresponding dimension convolution sliding is equal to the effective input size minus the sum of the input size of the input data and the padding parameter. It should be noted that, in the various embodiments of the specification, the terms "upper side", "left side" are mentioned; "Right" and "bottom" are used only when described with respect to a two-dimensional vector. The terms more commonly used in the art are "corresponding dimension convolution slide start end" and "corresponding dimension convolution slide end". For example, in the case of a two-dimensional vector, "corresponding dimension convolution sliding start end" actually means "upper side", "left side", and "corresponding dimension convolution sliding end" actually means "right side", "lower side".

In the case where the sum of the input size and the padding parameter (input _ size + padding) is equal to or greater than the valid input size (equivalent _ input _ size), that is, (input _ size + padding) ≧ equivalent _ input _ size (corresponding to the case of "no" in fig. 2):

Optimized padding parameter (optimized _ padding2) ═ 0

As shown by the rectangular box "optimized _ padding1 ═ padding optimized _ padding2 ═ 0" in fig. 2.

The optimized padding parameter (optimized _ padding1) indicates the number of columns and rows to be padded with 0 on the left side and the upper side of the input size (input _ size), and the optimized padding parameter (optimized _ padding2) indicates the number of columns and rows to be padded with 0 on the right side and the lower side of the input size (input _ size).

That is, when the sum of the input size and the padding parameter is equal to or larger than the valid input size, the zero padding amount at the start end of the corresponding dimension convolution sliding is equal to the padding parameter, and the zero padding amount at the end of the corresponding dimension convolution sliding is equal to zero.

The two situations of determining the magnitude relationship between the sum of the input size and the padding parameter and the effective input size are to optimize the padding parameter contained in the initial input data based on the effective input size, and the purpose of the two situations is to ensure that the optimized padding parameter (optimized _ padding) will actually participate in the operation, thereby avoiding the occurrence of redundancy.

The input data with optimized padding parameters (input with optimized padding) is derived from the optimized padding parameters and the input data, as indicated by the block "calculate input data with optimized padding parameters" in fig. 2.

The input data with optimized padding parameters is then output for convolution operation, as indicated by the block "output input data with optimized padding parameters" in FIG. 2.

Adding the optimized filling parameter (optimized _ padding) to the input data to obtain the input data (input _ with _ optimized _ padding) with the optimized filling parameter, wherein the result is directly used for convolution operation and no redundancy phenomenon occurs.

As analyzed in the foregoing, after determining the magnitude relationship between the sum of the input size and the padding parameter and the valid input size, and calculating the optimized padding parameter according to the magnitude relationship, the input data (input with optimized padding parameter) having the input size (input _ with _ optimized _ padding _ size) of the optimized padding parameter corresponding thereto may be obtained from the optimized padding parameter and the input data, and the input data just started to be input may be padded with the optimized padding parameter to obtain the input data having the optimized padding parameter, for example.

In one embodiment according to the present disclosure, the above method of optimizing the filling parameter can be used for a pytorech framework.

For a better understanding of the disclosure, the following two examples are given.

Example 1

The above example of redundancy occurring by the prior art filling method is applied to an embodiment of the present disclosure to see if the redundancy exists.

Assuming that each value of the height (H) dimension is equal to each value of the width (W) dimension in example 1, the corresponding parameters in the input data are:

step size (stride) is 3, convolution kernel extension parameter (scaling) is 1, input size (input _ size) is 7, convolution kernel size (filter _ size) is 4, padding parameter (padding) is 2,

the input data (input) is:

TABLE 6

Then, the size of the convolution kernel after expansion (dimension _ filter _ size) ═ dimension x (filter _ size-1) +1 ═ 1 × (4-1) +1 ═ 4 × (

Expanded convolution kernel:

1	1	1	1
				1	1	1	1
1	1	1	1
				1	1	1	1

TABLE 7

The input size (input _ with _ padding _ size) of the padded input data is input _ size +2 × padding, 7+2 × 2, 11

Filled input data (input with padding) (here "filled input data" creates data redundancy under traditional filling methods, as further analyzed below):

TABLE 8

Wherein floor represents the pair [ (11-4)/3]Rounded down.

The output data after convolution operation is:

4	8	6
			8	16	12
6	12	9

TABLE 9

The valid input size (equivalent _ input _ size) — (output _ size-1) × (3-1) × 3+4 ═ 10. Since the convolution operation is performed from left to right and top to bottom, this means that the valid input data (equivalent input) is:

watch 10

Because the valid input size (equivalent _ input _ size) is 10, and the sum of the input size (input _ size) and the padding parameter (padding) (input _ size + padding) is 7+ 2-9, i.e., the sum of the valid input size (equivalent _ input _ size) > input size (input _ size) and padding parameter (padding) (input _ size + padding); therefore, the optimized first padding parameter (optimized _ padding1) ═ padding parameter (padding) ═ 2(optimized _ padding _ top & optimized _ padding _ left). The meaning here is that the values of the optimized top-padding parameter (optimized _ padding _ top) and the optimized left-padding parameter (optimized _ padding _ left) are both 2. The optimized second padding parameter (optimized _ padding2) ═ equivalent _ input _ size- (padding + input _ size) ═ 10-9 ═ 1(optimized _ padding _ bottom & optimized _ padding _ right). The meaning here is that the values of the optimized bottom-padding parameter (optimized _ padding _ top) and the optimized right-padding parameter (optimized _ padding _ left) are both 1.

Obtaining input data (input with optimized padding) according to the optimized first padding parameter (optimized _ padding1), the optimized second padding parameter (optimized _ padding2) and the input data which is just input:

TABLE 11

Note that the input data with optimized padding parameters in table 11 is exactly equal to the valid input data (valid input) in table 5, so that no redundancy occurs with the valid input as the convolution input. The method reduces the input scale, and achieves the effects of reducing the memory occupation and improving the operation speed.

Example 2

Assuming that each value of the height (H) dimension is equal to each value of the width (W) dimension in example 2, the corresponding parameters in the input data are:

step size (stride) is 2, convolution kernel extension parameter (scaling) is 1, input size (input _ size) is 7, convolution kernel size (filter _ size) is 4, padding parameter (padding) is 2,

the input data (input) is:

TABLE 12

Expanded convolution kernel:

1	1	1	1
				1	1	1	1
1	1	1	1
				1	1	1	1

watch 13

The filled input size (input _ with _ padding _ size) ═ input _ size +2 × padding ═ 7+2 × 2 ═ 11

TABLE 14

The output data after convolution operation is:

4	8	8	6
				8	16	16	12
8	16	12	12
				6	12	12	9

watch 15

The valid input size (equivalent _ input _ size) — (output _ size-1) × (s + displacement _ filter _ size) — (4-1) × 2+4 ═ 10. Since the convolution operation is performed from left to right and top to bottom, this means that the valid input data (equivalent input) is:

TABLE 16

It should be noted that the effective input size (equivalent _ input _ size) presented here is only calculated numerically and does not require the convolution operation to be actually performed first.

Because the valid input size (equivalent _ input _ size) is 10, and the sum of the input size (input _ size) and the padding parameter (padding) (input _ size + padding) is 7+2, 9, i.e., the sum of the valid input size (equivalent _ input _ size) > input size (input _ size) and padding parameter (padding) (input _ size + padding); therefore, the optimized first padding parameter (optimized _ padding1) ═ padding parameter (padding) ═ 2(optimized _ padding _ top & optimized _ padding _ left). The meaning here is that the values of the optimized top-padding parameter (optimized _ padding _ top) and the optimized left-padding parameter (optimized _ padding _ left) are both 2. The optimized second padding parameter (optimized _ padding2) ═ equivalent _ input _ size- (padding + input _ size) ═ 10-9 ═ 1(optimized _ padding _ bottom & optimized _ padding _ right). The meaning here is that the values of the optimized bottom-padding parameter (optimized _ padding _ top) and the optimized right-padding parameter (optimized _ padding _ left) are both 1.

Get input data with optimized padding parameters (input with optimized padding):

TABLE 17

Note that the input data with optimized padding parameters in table 17 is exactly equal to the valid input data (valid input) in table 16, so that no redundancy occurs with the valid input as the convolution input. The method reduces the input scale, and achieves the effects of reducing the memory occupation and improving the operation speed.

Example 3

Assuming that each value of the height (H) dimension is equal to each value of the width (W) dimension in example 3, the corresponding parameters in the input data are:

step size (stride) is 4, convolution kernel extension parameter (scaling) is 1, input size (input _ size) is 7, convolution kernel size (filter _ size) is 4, padding parameter (padding) is 2,

the input data (input) is:

watch 18

Expanded convolution kernel:

1	1	1	1
				1	1	1	1
1	1	1	1
				1	1	1	1

watch 19

watch 20

The output data after convolution operation is:

4	8
		8	16

TABLE 21

The valid input size (equivalent _ input _ size) — (output _ size-1) × (2-1) × 4+4 ═ 8. Since the convolution operation is performed from left to right and top to bottom, this means that the valid input data (equivalent input) is:

TABLE 22

Because the valid input size (equivalent _ input _ size) is 8, and the sum of the input size (input _ size) and the padding parameter (padding) (input _ size + padding) is 7+2, 9, i.e., the valid input size (equivalent _ input _ size) < the sum of the input size (input _ size) and the padding parameter (padding) (input _ size + padding); therefore, the optimized first padding parameter (optimized _ padding1) ═ padding parameter (padding) ═ 2(optimized _ padding _ top & optimized _ padding _ left). The meaning here is that the values of the optimized top-padding parameter (optimized _ padding _ top) and the optimized left-padding parameter (optimized _ padding _ left) are both 2. The optimized second padding parameter (optimized _ padding2) ═ 0(optimized _ padding _ bottom & optimized _ padding _ right). The meaning here is that the values of the optimized bottom padding parameter (optimized _ padding _ top) and the optimized right padding parameter (optimized _ padding _ left) are both 0.

TABLE 23

Note that the input data of table 23 with the optimized padding parameter is not the same as the valid input data (equivalent input) in table 22 because there is a case in embodiment 3 where the original input data (input) portion is not used (i.e., the rightmost column and the lowermost row in table 23. in other words, the 9 th column and the 9 th row in table 23 are not used and discarded in the convolution operation), but considering that the original input data is actual data already stored in the storage device, the original data is not clipped any more, but the optimized second padding parameter (optimized _ padding2) is set to 0. However, for the 0 s (0 s in column 1 and column 2, and 0 s in row 1 and row 2 in table 23) filled in the input data with the optimized padding parameter, no redundancy occurs with valid input data as convolution input. The method also reduces the input scale, and achieves the effects of reducing the memory occupation and improving the operation speed.

Fig. 3 is a block diagram illustrating a combined processing device 300 according to an embodiment of the present disclosure. As shown, the combined processing device 300 includes an optimized filling parameter device 302 having the optimized filling parameter architecture described above, which can be configured to perform the method for optimizing filling parameters described above with reference to the drawings. In one or more embodiments, the means 302 for optimizing the filling parameter may also be a chip, an integrated circuit, for optimizing the filling parameter. In addition, the combined processing device includes a universal interconnect interface 304 and other processing devices 306. The apparatus 302 for optimizing filling parameters according to the present disclosure may interact with other processing apparatuses 306 through the universal interconnection interface 304 to jointly complete the operation specified by the user.

According to aspects of the present disclosure, the other processing devices may include one or more types of general and/or special purpose processors such as a central processing unit ("CPU"), a graphics processing unit ("GPU"), an artificial intelligence processor, etc., and the number thereof may be determined not by limitation but by actual needs. In one or more embodiments, the other processing device may comprise the aforementioned reference hardware platform or reference computing device, such that it may form a tested system with the device comprising the optimized fill parameters of the test hardware platform. In one or more embodiments, the other processing device may interface with external data and control as the device for optimizing filling parameters (which may be embodied as an artificial intelligence-related computing device) of the present disclosure, perform basic control including, but not limited to, data handling, completing start, stop, etc. of the present machine learning computing device; other processing devices may cooperate with the machine learning related computing device to perform computing tasks.

According to aspects of the present disclosure, the universal interconnect interface may be used to transfer data and control instructions between the device that optimizes the fill parameters and other processing devices. For example, the device for optimizing the filling parameter may obtain required input data from other processing devices via the universal interconnect interface, and write the input data into a storage device (or memory) on the device chip for optimizing the filling parameter. Further, the apparatus for optimizing padding parameters may obtain the control instruction from the other processing apparatus via the universal interconnect interface, and write the control instruction into the control cache on the apparatus chip for optimizing padding parameters. Alternatively or optionally, the universal interconnect interface may also read data in a memory module of the device for optimizing the filling parameters and transmit the data to other processing devices.

Optionally, the combined processing means may further comprise a storage means 308, which may be connected to the means for optimizing the filling parameters and the further processing means, respectively. In one or more embodiments, the storage device may be used to store data for the means for optimizing fill parameters and other processing means, particularly data that may not be stored in its entirety in internal or on-chip storage within the means for optimizing fill parameters or other processing means.

According to the difference of application scenes, the combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, so that the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the general interconnection interface of the combined processing apparatus is connected with some components of the device. Some components such as a camera, a display, a mouse, a keyboard, a network card, or a wifi interface.

In some embodiments, the present disclosure also discloses a chip comprising the above-mentioned device for optimizing filling parameters or the combined processing device. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above chip.

In some embodiments, the present disclosure also discloses a board card, which includes the above chip packaging structure. Referring to fig. 4, the aforementioned exemplary board is provided, which may include other accessories in addition to the aforementioned chip 402, including but not limited to: a memory device 404, an interface device 406, and a control device 408.

The memory device is connected with the chip in the chip packaging structure through a bus and used for storing data. The memory device may include a plurality of groups of memory cells 410. Each group of memory cells is connected with the chip through a bus. It is understood that each group of memory cells may be a DDR SDRAM ("Double Data Rate SDRAM").

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 groups of memory cells. Each group of memory cells may include multiple DDR4 pellets (chips). In one embodiment, the chip can internally comprise 4 72-bit DDR4 controllers, wherein 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits of the controller are used for ECC check.

In one embodiment, each group of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with the chip in the chip packaging structure. The interface means is used to enable data transfer between the chip and an external device 412, such as a server or a computer. For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so that data transfer is implemented. In another embodiment, the interface device may also be another interface, and the present disclosure does not limit the specific representation of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to the external device (e.g., server) by the interface device.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through the SPI interface. The control device may include a single chip Microcomputer (MCU). In one or more embodiments, a chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the chip can be in different working states such as multiple loads and light loads. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.

In some embodiments, the present disclosure also discloses an electronic device or apparatus, which includes the above board card. According to different application scenarios, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. Vehicles include airplanes, boats, and/or vehicles; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

According to different application scenarios, the device for optimizing filling parameters, or the combined processing device including the device for optimizing filling parameters, the chip for optimizing filling parameters, and the corresponding computer-readable storage medium, and the integrated circuit for optimizing filling parameters of the present disclosure may be applied to data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, video cameras, projectors, watches, headsets, mobile storage, wearable devices, vehicles, home appliances, and/or medical devices, and the like. Vehicles include airplanes, boats, and/or vehicles; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, optical, acoustic, magnetic or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, when the technical solution of the present disclosure can be embodied in the form of a software product, the computer software product is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

In the above embodiments of the present disclosure, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing may be better understood in light of the following clauses:

clause a1, a method of optimizing fill parameters, may include:

calculating an output size of the output data according to the input data;

calculating an effective input size of the input data based on the output size and the input data;

the optimized fill parameters are calculated based on a comparison between the valid input dimensions and corresponding parameters in the input data.

Clause a2, the method according to clause a1, the corresponding parameters in the input data include:

input size of input data, fill parameter, convolution kernel dilation parameter, and step size.

Clause A3, the method according to clause a1 or a2, further comprising:

and obtaining input data with the optimized filling parameters according to the optimized filling parameters and the input data, and outputting the input data with the optimized filling parameters for convolution operation.

Clause a4, the method according to any one of clauses a1-A3, wherein calculating the valid input size of the input data from the output size and the input data comprises: and subtracting the output size by one to obtain a value and step length product, and adding the size after the convolution kernel expansion to obtain the effective input size of the input data.

Clause a5, the method according to any one of clauses a1-a4, wherein calculating the optimized filling parameters based on the comparison between the valid input dimensions and the corresponding parameters in the input data comprises:

Clause a6, the method according to any one of clauses a1-a5, wherein determining a size relationship between a sum of the input size and the fill parameter and the valid input size, and calculating the optimized fill parameter according to the size relationship comprises:

and under the condition that the sum of the input size and the filling parameter is smaller than the effective input size, the zero filling amount of the corresponding dimension convolution sliding starting end is the same as that of the filling parameter, and the zero filling amount of the corresponding dimension convolution sliding ending end is equal to the sum of the effective input size minus the filling parameter and the input size of the input data.

Clause a7, the method according to any one of clauses a1-a6, wherein determining a size relationship between a sum of the input size and the fill parameter and the valid input size, and calculating the optimized fill parameter according to the size relationship comprises:

and under the condition that the sum of the input size and the filling parameter is larger than or equal to the effective input size, the zero filling amount of the corresponding dimension convolution sliding starting end is the same as that of the filling parameter, and the zero filling amount of the corresponding dimension convolution sliding ending end is equal to zero.

Clause A8, the method according to any one of clauses a1-a7, wherein deriving the input data having the optimized fill parameter from the optimized fill parameter and the input data comprises:

and filling the input data by using the optimized filling parameters.

Clause a9, the method according to any one of clauses a1-A8, wherein the method of optimizing the filling parameters is for a Pytorch framework.

Clause a10, an apparatus for optimizing filling parameters, may include:

a processor configured to execute program instructions; and

a memory configured to store program instructions that, when loaded and executed by the processor, cause the apparatus to perform a method according to any one of clauses a1-a 9.

Clause a11, a computer readable storage medium having stored therein program instructions adapted to be loaded by a processor and to perform the method according to any one of clauses a1-a 9.

Clause a12, a chip for optimizing filling parameters, may include:

an input configured to receive input data;

a processor configured to:

calculating an output size of the output data according to the input data;

Clause a13, the chip for optimizing filling parameters according to clause a12, wherein the corresponding parameters in the input data comprise:

input size, fill parameter, convolution kernel dilation parameter, and step size.

Clause a14, the chip for optimizing filling parameters according to clause a12 or a13, wherein the processor is further configured to:

and obtaining the input size with the optimized filling parameter according to the optimized filling parameter and the input size, and outputting the input size with the optimized filling parameter for convolution operation.

Clause a15, the chip for optimizing filling parameters according to any one of clauses a11-a14, wherein the processor is further configured to:

and calculating the corresponding filled input size and the size after the convolution kernel expansion according to the input size, the filling parameter, the convolution kernel expansion parameter and the step length, and calculating the output size according to the filled input size and the size after the convolution kernel expansion.

Clause a16, the chip for optimizing filling parameters according to any one of clauses a11-a15, wherein the processor is further configured to: calculating the effective input size according to the output size, the step length and the size after the convolution kernel expansion:

equivalent_input_size＝(output_size-1)×stride+dilation_filter_size，

where equivalent _ input _ size represents the valid input size, output _ size represents the output size, stride represents the step size, and scaling _ filter _ size represents the size after convolution kernel expansion.

Clause a17, the chip for optimizing filling parameters according to any one of clauses a11-a16, wherein the processor is further configured to:

Clause a18, the chip for optimizing filling parameters according to any one of clauses a11-a17, wherein the processor is further configured to:

in the case of (input _ size + padding) < equivalent _ input _ size:

padding1＝padding

padding2＝equivalent_input_size-(padding+input_size)

where input _ size indicates the input size, padding indicates the padding parameter, padding1 indicates the number of columns and rows that need to be padded with 0 on the left and upper side of the input size, and padding2 indicates the number of columns and rows that need to be padded with 0 on the right and lower side of the input size.

Clause a19, the chip for optimizing filling parameters according to any one of clauses a11-a18, wherein the processor is further configured to:

when (input _ size + padding) ≧ equivalent _ input _ size: padding1

＝padding

padding2＝0

Clause a20, the chip for optimizing filling parameters according to any one of clauses a11-a19, wherein the processor is further configured to:

and adding the optimized filling parameters and the input size to obtain the input size with the optimized filling parameters.

Clause a21, an integrated circuit for optimizing fill parameters, comprising a chip according to any one of clauses a 12-20.

Clause a22, a system for optimizing fill parameters, may include:

a receiving end configured for receiving input data;

a first calculation unit configured to calculate an output size of the output data from the input data;

a second calculation unit configured to calculate an effective input size of the input data from the output size and the input data;

a determination unit configured to calculate an optimized filling parameter based on a comparison between the valid input size and a corresponding parameter in the input data.

Clause a23, the system for optimizing fill parameters according to clause a22, wherein the respective parameters in the input data comprise:

Clause a24, the system for optimizing filling parameters according to clause a22 or a23, further comprising:

a third calculation unit configured to obtain an input size with the optimized padding parameter according to the optimized padding parameter and the input size, and output the input size with the optimized padding parameter for convolution operation.

Clause a25, the system for optimizing filling parameters according to any one of clauses a22-a24, wherein the first calculation unit is further configured to:

Clause a26, the system for optimizing filling parameters according to any one of clauses a22-a25, wherein the second calculation unit is further configured to:

calculating the effective input size according to the output size, the step length and the size after the convolution kernel expansion:

equivalent_input_size＝(output_size-1)×stride+dilation_filter_size，

Clause a27, the system for optimizing filling parameters according to any one of clauses a22-a28, wherein the determining unit is further configured to:

Clause a28, the system for optimizing filling parameters according to any one of clauses a22-a27, wherein the determining unit is further configured to:

in the case of (input _ size + padding) < equivalent _ input _ size:

padding1＝padding

padding2＝equivalent_input_size-(padding+input_size)

Clause a29, the system for optimizing filling parameters according to any one of clauses a22-a28, wherein the determining unit is further configured to:

when (input _ size + padding) ≧ equivalent _ input _ size: padding1

＝padding

padding2＝0

Clause a30, the system for optimizing filling parameters according to any one of clauses a22-a29, wherein the determining unit is further configured to:

Clause a31, an integrated circuit for optimizing fill parameters, including a system according to any one of clauses a22-a 30.

The embodiments of the present disclosure are described in detail above, and the principles and embodiments of the present disclosure are explained herein by applying specific embodiments, and the descriptions of the embodiments are only used to help understanding the method and the core ideas of the present disclosure; meanwhile, for a person skilled in the art, based on the idea of the present disclosure, there may be variations in the specific embodiments and application scope, and in summary, the content of the present description should not be construed as a limitation to the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

The embodiments of the present disclosure have been described in detail, and the principles and embodiments of the present disclosure are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present disclosure. Meanwhile, a person skilled in the art should, based on the idea of the present disclosure, change or modify the specific embodiments and application scope of the present disclosure. In view of the above, the description is not intended to limit the present disclosure.

Claims

1. A method of optimizing a fill parameter, comprising:

calculating an output size of the output data according to the input data;

calculating an effective input size of the input data from the output size and the input data;

calculating an optimized fill parameter based on a comparison between the effective input size and a corresponding parameter in the input data.

2. The method for optimizing padding parameters according to claim 1, the respective parameters in the input data comprising:

the input size of the input data, the filling parameter, the convolution kernel dilation parameter, and the step size.

3. The method of optimizing fill parameters according to claim 2, further comprising:

4. The method for optimizing filling parameters according to claim 3, wherein said calculating an effective input size of said input data from an output size and said input data comprises: and subtracting the output size by one to obtain a value, multiplying the step size by the value, and adding the size after the convolution kernel expansion to obtain the effective input size of the input data.

5. The method of optimizing filling parameters of claim 4, wherein said calculating optimized filling parameters based on a comparison between valid input dimensions and corresponding parameters in the input data comprises:

6. The method of optimizing a fill parameter of claim 5, wherein the determining a magnitude relationship between a sum of the input dimension and the fill parameter and the effective input dimension, and the calculating an optimized fill parameter from the magnitude relationship comprises:

7. The method of optimizing a fill parameter of claim 5, wherein the determining a magnitude relationship between a sum of the input dimension and the fill parameter and the effective input dimension, and the calculating an optimized fill parameter from the magnitude relationship comprises:

8. The method of optimizing padding parameters according to claim 6 or 7, wherein deriving input data having optimized padding parameters from the optimized padding parameters and the input data comprises:

and filling the input data by using the optimized filling parameters.

9. The method of optimizing padding parameters according to claim 8, wherein the method of optimizing padding parameters is for a Pytorch framework.

10. An apparatus for optimizing a fill parameter, comprising:

a processor configured to execute program instructions; and

a memory configured to store the program instructions, which when loaded and executed by the processor, cause the apparatus to perform the method of any of claims 1-9.

11. A computer-readable storage medium, in which program instructions are stored, the program instructions being adapted to be loaded by a processor and to perform the method according to any of claims 1-9.