CN108415881A

CN108415881A - The arithmetic unit and method of convolutional neural networks

Info

Publication number: CN108415881A
Application number: CN201710072906.8A
Authority: CN
Inventors: 李雷; 李一雷; 杜源; 杜力; 管延城; 刘峻诚
Original assignee: Energy Tolerance Ltd By Share Ltd
Current assignee: Energy Tolerance Ltd By Share Ltd
Priority date: 2017-02-10
Filing date: 2017-02-10
Publication date: 2018-08-17

Abstract

A kind of operation method of convolutional neural networks, including：Add operation is carried out to export cumulative data to multiple input data；Bit shift operation is carried out to export shifted data to cumulative data；And shifted data is weighted to export weighted data, wherein in the quantity of the factor foundation input data of ranking operation, bit shift operation depending on the scaling weights of the succeeding layer of the number of cells and convolutional neural networks of right shift.

Description

The arithmetic unit and method of convolutional neural networks

Technical field

The present invention relates to a kind of operation methods of convolutional neural networks, more particularly to a kind of average pond operation of execution Device and method.

Background technology

Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feed-forward type neural networks, Generally comprise multigroup convolutional layer (convolution layer) and pond layer (pooling layer).Pond layer can be directed to defeated Enter the special characteristic on some region of data and carries out maximum pond (max pooling) or average pond (average Pooling) operation, to reduce the operation in parameter amount and neural network.For average pond operation, traditional mode is first Add operation is carried out, then result will be added up and carry out division arithmetic.However, division arithmetic need to expend more processor efficiency, therefore Be easy to causeing hardware resource, over-burden.In addition, when carrying out the accumulating operation of multiple data, it is also easy to happen overflow (overflow) situation.

Therefore, how a kind of pond operation mode is provided, less processor efficiency can be used to execute average pond operation, Actually current important one of project.

Invention content

In view of this, it is an object of the present invention to provide a kind of convolution algorithm device and pond operation method, can avoid Over-burden for hardware resource, to promote the efficiency of pond operation.

A kind of operation method of convolutional neural networks, including：It is cumulative to export that add operation is carried out to multiple input data Data；Bit shift operation is carried out to export shifted data to cumulative data；And shifted data is weighted with defeated Go out weighted data, the bit of right shift wherein in the quantity of the factor foundation input data of ranking operation, bit shift operation Depending on the scaling weights of the succeeding layer of quantity and convolutional neural networks.

In one embodiment, the factor of ranking operation is with the position for scaling right shift in weights and bit shift operation First quantity is proportional, and the factor of ranking operation is in inverse ratio with the quantity of input data, weighted data equal to shifted data be multiplied by because Son.

In one embodiment, in bit shift operation depending on scale of the number of cells of right shift according to pond window, Depending on scale of the quantity of input data according to pond window.

In one embodiment, succeeding layer is time one layer of convolutional layer of convolutional neural networks, and scaling weights are time one layer of volume The filter coefficient of lamination, add operation and bit shift operation are the operations in the pond layer of convolutional neural networks.

In one embodiment, the division arithmetic in the layer of pond is incorporated into the multiplying of time one layer of convolutional layer and carries out.

A kind of operation method of convolutional neural networks, including：Add operation is carried out to multiple input data in the layer of pond To export cumulative data；And cumulative data is weighted to export weighted data in succeeding layer, wherein weighting fortune Depending on the factor of calculation is according to the quantity of input data and the scaling weights of succeeding layer, weighted data equal to cumulative data be multiplied by because Son.

In one embodiment, succeeding layer is time one layer of convolutional layer, and scaling weights are filter coefficient, and ranking operation is volume Product operation, the factor of ranking operation are equal to the quantity of filter coefficient divided by input data.

In one embodiment, depending on scale of the quantity of input data according to pond window.

A kind of operation method of convolutional neural networks includes：Scaling weights are multiplied to produce with original filter coefficient and are added Weigh postfilter coefficient；And convolution algorithm is carried out to input data and weighted filter coefficient in convolutional layer.

In one embodiment, operation method further includes：Bit shift operation is carried out to input data；And bit is moved Input data after bit arithmetic is input to the convolutional layer, wherein scaling weights are transported according to original scale weights and bit shift In calculation depending on the number of cells of right shift.

A kind of arithmetic unit of convolutional neural networks can carry out method above-mentioned.

From the above, in the arithmetic unit of the present invention and operation method, average pond operation, Chi Hua are carried out with two benches Unit only carries out add operation, and bit shift operation of arranging in pairs or groups, to avoid the overflow caused by cumulative process, then to pond The output result of unit is weighted, and obtains final average result.Since pond unit does not do division arithmetic, therefore Avoidable processor expends more efficiency, so reach promoted pond operation efficiency the effect of.

Description of the drawings

Fig. 1 is the schematic diagram of the part layer of convolutional neural networks.

Fig. 2 is the schematic diagram of the integration operation of convolutional neural networks.

Fig. 3 is the schematic diagram of convolutional neural networks.

Fig. 4 is the functional block diagram of the convolution algorithm device of one embodiment according to the present invention.

Specific implementation mode

Hereinafter with reference to relevant drawings, illustrate the convolution algorithm device and method of specific embodiment according to the present invention, wherein Identical component will be illustrated with identical component symbol, and attached drawing is only illustrative purposes, be not intended to the limitation present invention.

Fig. 1 is the schematic diagram of the part layer of convolutional neural networks.Refering to Figure 1, convolutional neural networks are with multiple The number of plies of operation layer, such as convolutional layer, pond layer etc., convolutional layer and pond layer can be multilayer, and the output of each layer can work as Make the input of another layer or succeeding layer, for example, n-th layer convolutional layer output be n-th layer pond layer input or other succeeding layers Input, the output of n-th layer pond layer is the input of N+1 layers of convolutional layer or the input of other succeeding layers, n-th layer operation layer Output can be N+1 layers of operation layer input.

In order to promote operation efficiency, the different layers but close operation of property can suitably combine operation are illustrated For, the pond operation of pond layer is average pond operation, and the operation of division may be integrally incorporated in time one layer of operation layer, secondary one layer Operation layer is, for example, convolutional layer, that is, the average pond of pond layer division be with the convolution multiplication of secondary one layer of convolutional layer together Operation.In addition, pond layer can also carry out shift operation to substitute the division needed for average computation, and the part that will have not yet removed It is incorporated into time one layer of operation layer and calculates together, that is, the division in the average pond of pond layer fails to utilize shift operation complete The part of replacement is the convolution multiplication operation together with secondary one layer of convolutional layer.

Fig. 2 is the schematic diagram of the integration operation of convolutional neural networks.It please refers to shown in Fig. 2, in convolutional layer, multiple data P1~Pn and multiple filter coefficient F1~Fn carries out convolution algorithm to generate multiple data C1~Cn, and data C1~Cn is as pond Change the multiple input data of layer.In the layer of pond, multiple input data carry out add operation to export cumulative data.In succeeding layer In cumulative data is weighted to export weighted data, wherein the scaling weights W of ranking operation is according to input data Depending on the scaling weights of quantity and succeeding layer, weighted data is multiplied by scaling weights W equal to cumulative data.

For example, succeeding layer can be time one layer of convolutional layer, and scaling weights are filter coefficient, and ranking operation is convolution Operation, the factor of ranking operation are equal to the quantity of filter coefficient divided by input data.In addition, the quantity of input data is according to pond Depending on the scale for changing window.

On the other hand, before cumulative data is calculated at another layer, the division knot of part can be obtained by shift operation Fruit.For example, cumulative data can carry out bit shift operation to export shifted data, then be weighted fortune to shifted data It calculates to export weighted data, wherein in the quantity of the scaling weights W foundation input datas of ranking operation, bit shift operation to the right Depending on the scaling weights of the number of cells of displacement and the succeeding layer of convolutional neural networks.The scaling weights W of ranking operation is with contracting The number of cells of right shift is proportional in value of delegating power and bit shift operation, and the scaling weights W of ranking operation is with input number According to quantity be in inverse ratio, weighted data, which is equal to shifted data and is multiplied by, scales weights W.

Depending on scale of the number of cells of right shift in bit shift operation according to pond window, one bit of right shift It is equivalent to divided by 2 is primary, if the number of cells of right shift is n, 2 n times side is the rule closest to but no more than pond window Mould power side.By taking 2 × 2 pond window as an example, it is then 2 that the scale of pond window, which is 4, n, 2 bit of right shift；With 3 × 3 pond For window, it is then 3 that the scale of pond window, which is 9, n, 3 bit of right shift.

Depending on scale of the quantity of input data according to pond window.Succeeding layer is time one layer of convolution of convolutional neural networks Layer, scaling weights are the filter coefficient of time one layer of convolutional layer, and add operation and bit shift operation are convolutional neural networks Pond layer in operation.

It for example, can be first to 9 data if there is the pending average pond operation of 9 data in a certain characteristic area Add up and obtain accumulated value, to avoid the accumulated value that the situation of overflow occurs, bit shift fortune can be carried out to the accumulated value It calculates, such as the accumulated value is moved right two bits, and obtain shift value, that is, the accumulated value is removed to 4 effect, then will The shift value is multiplied by weighting coefficient, and obtains weighted value.Depending on the selection of weighting coefficient is the offset according to bit shift, In the present embodiment, weighting coefficient 1/2.25, therefore finally obtained weighted value be equal to by the accumulated value remove 9 effect.By Too many processing routine will not be occupied in bit shift operation and ranking operation, therefore passes through bit shift and the two stage fortune of weighting Calculation mode can allow processor that can carry out average pond operation using less efficiency, and then promote the efficiency of pond operation.

Fig. 3 is the schematic diagram of the integration operation of convolution neural network.It please refers to shown in Fig. 3, the convolution algorithm of convolutional layer It is that the data that will be inputted are multiplied with filter coefficient, when the data of input need to weight or scale, what this weighted or scaled Operation is handled together when may be integrally incorporated to convolution algorithm.That is, the weighted input (or scaling) and convolution fortune of convolutional layer Calculation can be completed in the same multiplying.

Data P1~the Pn for being input to convolutional layer can be the pixel of image or the last layer of convolution neural network Output, e.g. preceding layer pond layer, hidden layer etc..In figure 3, the operation method of convolutional neural networks includes：Scaling is weighed Value W and original filter coefficient F1~Fn is multiplied to produce filter coefficient WF1~WFn after weighting；And in convolutional layer Convolution algorithm is carried out to filter coefficient WF1~WFn after input data P1~Pn and weighting.The convolution algorithm of script is defeated Data P1~the Pn entered carries out multiplication with original filter coefficient F1~Fn, in order to integrate the operation of weighting or scaling, convolution Layer actual operation used in coefficient be weighting after filter coefficient WF1~WFn and non-primary filter coefficient F1~Fn. The input of convolutional layer does not have to additionally be weighted or scale using multiplying.

In addition, when the value that weighting or scaling need to carry out division arithmetic or weighting or scale is less than 1, operation method can be first Bit shift operation is carried out to input data, the input data after bit shift operation is then input to convolutional layer.Scaling power Depending on number of cells of the value W according to right shift in original scale weights and bit shift operation.Such as original scale weights It is 0.4, bit shift operation is then set as 1 bit of right shift (be equivalent to and be multiplied by 0.5), then scales weights W and is then set as 0.8, Operation result entire in this way can still be comparable to input data and be multiplied by original scale weights (0.5*0.8=0.4).In addition, will remove Method operation, which is changed to shift operation, can reduce hardware burden, and the input of convolutional layer does not have to additionally be weighted using multiplying Or scaling.

Fig. 4 is the functional block diagram of the convolution algorithm device of one embodiment according to the present invention.It please refers to shown in Fig. 3, Convolution algorithm device include memory 1, buffer unit 2, convolution algorithm module 3, staggeredly sum unit 4, add up buffer cell 5, Coefficient extracts controller 6 and control unit 7.Convolution algorithm device can be used in convolutional neural networks (Convolutional Neural Network, CNN) application.

Memory 1 stores the data or data for waiting for convolution algorithm, may be, for example, image, video, audio, statistics, convolution god Through network wherein one layer of data etc..It is, for example, pixel (pixel) data for image data；Come with video data It says, the pixel data motion-vector for being, for example, video frame or the message in video；With convolutional neural networks wherein one It is typically a two-dimensional array data for the data of layer, for image data, then the picture of typically one two-dimensional array Prime number evidence.In addition, in the present embodiment, formula is static RAM (static random-access with memory 1 Memory, SRAM) for, other than it can store and wait for the data or data of convolution algorithm, it is complete convolution algorithm can also to be stored At data or data, and can the memory structure with multilayer and respectively storage wait for the data that operation and operation finish, change Yan Zhi, memory 1 can be used as such as the cache (cache memory) inside convolution algorithm device.

When practical application, data wholly or largely can be first stored in elsewhere, such as in another memory, separately Such as dynamic random access memory (dynamic random access memory, DRAM) or other kinds may be selected in one memory The memory of class.When convolution algorithm device will carry out convolution algorithm, then entirely or partly data are added by another memory It is loaded onto in memory 1, is then entered data by buffer unit 2 and carry out convolution algorithm to convolution algorithm module 3.If input Data be stream data, newest stream data can be written for convolution algorithm in memory 1 at any time.

Buffer unit 2 is coupled with memory 1, convolution algorithm module 3 and adds up buffer cell 5.Also, buffer unit 2 Also the other assemblies with convolution algorithm device are coupled, such as staggeredly sum unit 4 and control unit 7.In addition, for shadow As data or video frame data operation for, the sequence of processing be (column) while reading multiple row (row) line by line, therefore In one sequential (clock), buffer unit 2 is inputted from memory 1 with the data in a line different lines, in this regard, the present embodiment Buffer unit of the buffer unit 2 as a kind of row buffering (column buffer).When operation to be carried out, buffer unit 2 can first by Memory 1 extracts the data of operation required for convolution algorithm module 3, and is after extraction smoothly to be written by these data point reuses The data pattern of convolution algorithm module 3.On the other hand, since buffer unit 2 is also coupled with totalling buffer cell 5, buffering is added up Data after 5 operation of unit also will be by sending back memory 1 again after the temporary rearrangement (reorder) of buffer unit 2 Storage.In other words, buffer unit 2 also has the function of similar relaying temporal data other than having the function of row buffering, Buffer unit 2 can be used as a kind of data buffer with ranking function in other words.

It is noted that buffer unit 2 further includes memory control unit 21, when buffer unit 2 is being carried out and stored Data between device 1 are extracted or can be controlled via memory control unit 21 when being written.In addition, since it is between memory 1 With limited storage access width, or also known as bandwidth or bandwidth (bandwidth), convolution algorithm module 3 actually can The convolution algorithm of progress is also related with the access width of memory 1.In other words, the operation efficiency of convolution algorithm module 3 can be by Aforementioned access width and limited.Therefore, if the input of memory 1 has bottleneck, the efficiency of convolution algorithm that will be rushed It hits and declines.

Convolution algorithm module 3 has multiple convolution units, each convolution unit be based on filter and multiple current datas into Row convolution algorithm, and after convolution algorithm member-retaining portion current data.Buffer unit 2 obtains multiple new datas from memory 1, And new data is input to convolution unit, new data is not repeated with current data.The convolution unit of convolution algorithm module 3 is based on filter Wave device, the current data of reservation and new data carry out next round convolution algorithm.Staggeredly sum unit 4 couples convolution algorithm module 3, Result according to convolution algorithm generates feature and exports result.The coupling of buffer cell 5 staggeredly sum unit 4 and buffer unit 2 are added up, Temporary feature exports result；Wherein, after the completion of the convolution algorithm of specified range, buffer unit 2 will be temporary from totalling buffer cell 5 The total data deposited is written to memory 1.

Coefficient extracts controller 6 and couples convolution algorithm module 3, and control unit 7 then couples buffer unit 2.Practical application When, for convolution algorithm module 3, required input source also needs input to have filter other than data itself (filter) coefficient could carry out operation.The coefficient input of the signified convolution unit array for being 3 × 3 in the present embodiment. Coefficient extraction controller 6 can be deposited by way of direct memory access (direct memory access) by external Reservoir directly inputs filter coefficient.Other than coupling convolution algorithm module 3, coefficient extracts controller 6 and can also be filled with buffering It sets 2 to be attached, to receive the various instructions from control unit 7, convolution algorithm module 3 is enable to be controlled by control unit 7 Coefficient processed extracts controller 6, carries out the input of filter coefficient.

Control unit 7 may include instruction decoder 71 and data reading controller 72.Instruction decoder 71 is read from data Take controller 72 to obtain control instruction and by instruction decoding, so as to obtain the size of current input data, the line number of input data, The initial address of the columns of input data, the feature number of input data and input data in memory 1.In addition, instruction Decoder 71 can also obtain the information in relation to filter from data reading controller 72 and export the number of feature, and export Vacant signal appropriate is to buffer unit 2.Buffer unit 2 is then run according to the information provided after instruction decoding, also in turn Control convolution algorithm module 3 and add up the running of buffer cell 5, for example, data from memory 1 be input to buffer unit 2 and The sequential of convolution algorithm module 3, the scale of the convolution algorithm of convolution algorithm module 3, data are from memory 1 to buffer unit 2 Address, data are read to be transported to the writing address of memory 1, convolution algorithm module 3 and buffer unit 2 from buffer cell 5 is added up The convolution pattern of work.

On the other hand, control unit 7 then can equally be carried by way of direct memory access by external memory Take required control instruction and convolution information, instruction decoder 71 is by after instruction decoding, these control instructions and convolution information It is extracted by buffer unit 2, instruction may include the image number of the step size of Moving Window, the address of Moving Window and feature to be extracted According to ranks number.

The coupling of buffer cell 5 staggeredly sum unit 4 is added up, it includes that part adds up block 51 and pond to add up buffer cell 5 Change unit 52.Part adds up the temporary data that staggeredly sum unit 4 exports of block 51.Pond unit 52 is added up to being temporarily stored into part The data of block 51 carry out pond operation.Pond operation is maximum value pond or average pond.

For example, adding up buffer cell 5 can will be via 3 convolutional calculation result of convolution algorithm module and staggeredly sum unit 4 output characteristic results are temporarily stored into part and add up block 51.Then it, then by pond unit 52 to being temporarily stored into part adds up The data of block 51 carry out pond (pooling) operation, and pond operation can be directed to the special characteristic on some region of input data, It takes its average value or takes its maximum value as summary feature extraction or statistical nature output, this statistical nature is compared to previous Not only there is lower dimension for feature, can also improve the handling result of operation.

It should be noted that herein temporary is still that the partial arithmetic result in input data is added (partial sum) Just it is kept among part adds up block 51 afterwards, therefore is called part and adds up block 51 and add up buffer cell 5, or It can be referred to as PSUM units and PSUM BUFFER modules.On the other hand, the pond operation of the pond unit 52 of the present embodiment The calculation that aforementioned average pond (average pooling) can be used obtains statistical nature output.Wait inputted data After being all convolved computing module 3 and the staggeredly processing of sum unit 4 calculating, adds up buffer cell 5 and export final number According to handling result, and equally result can be restored into memory 1 by buffer unit 2, or be exported again to it by memory 1 His component.At the same time, convolution algorithm module 3 is still continued for the acquirement and operation of data characteristics with staggeredly sum unit 4, To improve the treatment efficiency of convolution algorithm device.

In the case of using aforementioned average pond (average pooling), the filter of convolutional layer in memory originally Wave device coefficient need it is adjusted, actually enter convolution algorithm module 3 be it is adjusted after the factor, the factor can be aforementioned integration The factor used in the operation of pond layer and a secondary convolutional layer, since the generation of the factor has illustrated in previous embodiment, therefore This is repeated no more.When convolution algorithm device is in the convolutional layer and pond layer of processing current layer, pond unit 52 can not located first Should in the layer of front layer pond average pond divide portion, when the processing of convolution algorithm device is to next layer of convolutional layer, convolution is transported Module 3 is calculated the divide portion in the still untreated average pond of first forebay unit 52 is incorporated into the multiplying of convolution again. On the other hand, when convolution algorithm device is when handling the convolutional layer and pond layer of current layer, pond unit 52 can be utilized and be moved Bit arithmetic as part division, but leave also completely be averaged pond divide portion, wait for convolution algorithm device handle under When one layer of convolutional layer, convolution algorithm module 3 again integrates the divide portion in the still untreated average pond of first forebay unit 52 In the multiplying of convolution.

Convolution algorithm device may include multiple convolution algorithm modules 3, the convolution unit of convolution algorithm module 3 and staggeredly plus Total unit 4 can be selectively operative in low scale convolution pattern and high scale convolution pattern.In low scale convolution pattern, Staggeredly sum unit 4 configuration come to correspond in convolution algorithm module 3 sequence each convolution algorithm result interlock totalling with out of the ordinary Output adds up result.In high scale convolution pattern, staggeredly sum unit 4 hands over the result of each convolution algorithm of each convolution unit Mistake is added up as output.

In conclusion in the arithmetic unit of the present invention and operation method, average pond operation, Chi Hua are carried out with two benches Unit only carries out add operation, and displacement bit arithmetic of arranging in pairs or groups, to avoid the overflow caused by cumulative process, then to Chi Huadan The output result of member is weighted, and obtains final average result.Since pond unit does not do division arithmetic, therefore can The effect of avoiding processor from expending more efficiency, and then reaching the efficiency for promoting pond operation.

Above-described embodiment is not limited to the present invention, any skilled person, without departing from the present invention spirit with In scope, and the equivalent modifications that it is carried out or change, it is intended to be limited solely by appended claims range.

Claims

1. a kind of operation method of convolutional neural networks, including：

Add operation is carried out to export cumulative data to multiple input data；

Bit shift operation is carried out to export shifted data to the cumulative data；And

The shifted data is weighted to export weighted data, wherein the factor of the ranking operation is according to described defeated Enter the contracting of the succeeding layer of the number of cells and convolutional neural networks of right shift in the quantity of data, the bit shift operation Depending on value of delegating power.

2. the method as described in claim 1, wherein the factor of the ranking operation is with scaling weights and described The number of cells of right shift is proportional in bit shift operation, and the factor of the ranking operation is with the input data Quantity is in inverse ratio, and the weighted data is multiplied by the factor equal to the shifted data.

3. the method as described in claim 1, wherein the number of cells of right shift is according to pond in the bit shift operation Depending on the scale of window, depending on the scale of the quantity of the input data according to the pond window.

4. the method as described in claim 1, wherein the succeeding layer is time one layer of convolutional layer of convolutional neural networks, the contracting Value of delegating power is the filter coefficient of described one layer of convolutional layer, and the add operation and the bit shift operation are convolution god Operation in pond layer through network.

5. method as claimed in claim 4, wherein the division arithmetic in the pond layer is incorporated into described one layer of convolutional layer Multiplying in carry out.

6. a kind of operation method of convolutional neural networks, including：

Add operation is carried out to export cumulative data to multiple input data in the layer of pond；And

The cumulative data is weighted to export weighted data in succeeding layer, wherein the factor of the ranking operation Depending on the quantity of the input data and the scaling weights of the succeeding layer, the weighted data is equal to the cumulative number According to being multiplied by the factor.

7. method as claimed in claim 6, wherein the succeeding layer is time one layer of convolutional layer, the scaling weights are filter Coefficient, the ranking operation are convolution algorithm, and the factor of the ranking operation is equal to the filter coefficient divided by described The quantity of input data.

8. method as claimed in claim 6, wherein depending on scale of the quantity of the input data according to the pond window.

9. a kind of operation method of convolutional neural networks, including：

Scaling weights are multiplied to produce weighted filter coefficient with original filter coefficient；And

Convolution algorithm is carried out to input data and the weighted filter coefficient in convolutional layer.

10. method as claimed in claim 9, wherein this method further include：

Bit shift operation is carried out to input data；And

Input data after bit shift operation is input to the convolutional layer,

Wherein, depending on number of cells of the scaling weights according to right shift in original scale weights and bit shift operation.

11. a kind of arithmetic unit of convolutional neural networks carries out the method as described in claims 1 to 10 one of which.