CN108415881A - The arithmetic unit and method of convolutional neural networks - Google Patents
The arithmetic unit and method of convolutional neural networks Download PDFInfo
- Publication number
- CN108415881A CN108415881A CN201710072906.8A CN201710072906A CN108415881A CN 108415881 A CN108415881 A CN 108415881A CN 201710072906 A CN201710072906 A CN 201710072906A CN 108415881 A CN108415881 A CN 108415881A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- pond
- input data
- weighted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Complex Calculations (AREA)
Abstract
A kind of operation method of convolutional neural networks, including:Add operation is carried out to export cumulative data to multiple input data;Bit shift operation is carried out to export shifted data to cumulative data;And shifted data is weighted to export weighted data, wherein in the quantity of the factor foundation input data of ranking operation, bit shift operation depending on the scaling weights of the succeeding layer of the number of cells and convolutional neural networks of right shift.
Description
Technical field
The present invention relates to a kind of operation methods of convolutional neural networks, more particularly to a kind of average pond operation of execution
Device and method.
Background technology
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feed-forward type neural networks,
Generally comprise multigroup convolutional layer (convolution layer) and pond layer (pooling layer).Pond layer can be directed to defeated
Enter the special characteristic on some region of data and carries out maximum pond (max pooling) or average pond (average
Pooling) operation, to reduce the operation in parameter amount and neural network.For average pond operation, traditional mode is first
Add operation is carried out, then result will be added up and carry out division arithmetic.However, division arithmetic need to expend more processor efficiency, therefore
Be easy to causeing hardware resource, over-burden.In addition, when carrying out the accumulating operation of multiple data, it is also easy to happen overflow
(overflow) situation.
Therefore, how a kind of pond operation mode is provided, less processor efficiency can be used to execute average pond operation,
Actually current important one of project.
Invention content
In view of this, it is an object of the present invention to provide a kind of convolution algorithm device and pond operation method, can avoid
Over-burden for hardware resource, to promote the efficiency of pond operation.
A kind of operation method of convolutional neural networks, including:It is cumulative to export that add operation is carried out to multiple input data
Data;Bit shift operation is carried out to export shifted data to cumulative data;And shifted data is weighted with defeated
Go out weighted data, the bit of right shift wherein in the quantity of the factor foundation input data of ranking operation, bit shift operation
Depending on the scaling weights of the succeeding layer of quantity and convolutional neural networks.
In one embodiment, the factor of ranking operation is with the position for scaling right shift in weights and bit shift operation
First quantity is proportional, and the factor of ranking operation is in inverse ratio with the quantity of input data, weighted data equal to shifted data be multiplied by because
Son.
In one embodiment, in bit shift operation depending on scale of the number of cells of right shift according to pond window,
Depending on scale of the quantity of input data according to pond window.
In one embodiment, succeeding layer is time one layer of convolutional layer of convolutional neural networks, and scaling weights are time one layer of volume
The filter coefficient of lamination, add operation and bit shift operation are the operations in the pond layer of convolutional neural networks.
In one embodiment, the division arithmetic in the layer of pond is incorporated into the multiplying of time one layer of convolutional layer and carries out.
A kind of operation method of convolutional neural networks, including:Add operation is carried out to multiple input data in the layer of pond
To export cumulative data;And cumulative data is weighted to export weighted data in succeeding layer, wherein weighting fortune
Depending on the factor of calculation is according to the quantity of input data and the scaling weights of succeeding layer, weighted data equal to cumulative data be multiplied by because
Son.
In one embodiment, succeeding layer is time one layer of convolutional layer, and scaling weights are filter coefficient, and ranking operation is volume
Product operation, the factor of ranking operation are equal to the quantity of filter coefficient divided by input data.
In one embodiment, depending on scale of the quantity of input data according to pond window.
A kind of operation method of convolutional neural networks includes:Scaling weights are multiplied to produce with original filter coefficient and are added
Weigh postfilter coefficient;And convolution algorithm is carried out to input data and weighted filter coefficient in convolutional layer.
In one embodiment, operation method further includes:Bit shift operation is carried out to input data;And bit is moved
Input data after bit arithmetic is input to the convolutional layer, wherein scaling weights are transported according to original scale weights and bit shift
In calculation depending on the number of cells of right shift.
A kind of arithmetic unit of convolutional neural networks can carry out method above-mentioned.
From the above, in the arithmetic unit of the present invention and operation method, average pond operation, Chi Hua are carried out with two benches
Unit only carries out add operation, and bit shift operation of arranging in pairs or groups, to avoid the overflow caused by cumulative process, then to pond
The output result of unit is weighted, and obtains final average result.Since pond unit does not do division arithmetic, therefore
Avoidable processor expends more efficiency, so reach promoted pond operation efficiency the effect of.
Description of the drawings
Fig. 1 is the schematic diagram of the part layer of convolutional neural networks.
Fig. 2 is the schematic diagram of the integration operation of convolutional neural networks.
Fig. 3 is the schematic diagram of convolutional neural networks.
Fig. 4 is the functional block diagram of the convolution algorithm device of one embodiment according to the present invention.
Specific implementation mode
Hereinafter with reference to relevant drawings, illustrate the convolution algorithm device and method of specific embodiment according to the present invention, wherein
Identical component will be illustrated with identical component symbol, and attached drawing is only illustrative purposes, be not intended to the limitation present invention.
Fig. 1 is the schematic diagram of the part layer of convolutional neural networks.Refering to Figure 1, convolutional neural networks are with multiple
The number of plies of operation layer, such as convolutional layer, pond layer etc., convolutional layer and pond layer can be multilayer, and the output of each layer can work as
Make the input of another layer or succeeding layer, for example, n-th layer convolutional layer output be n-th layer pond layer input or other succeeding layers
Input, the output of n-th layer pond layer is the input of N+1 layers of convolutional layer or the input of other succeeding layers, n-th layer operation layer
Output can be N+1 layers of operation layer input.
In order to promote operation efficiency, the different layers but close operation of property can suitably combine operation are illustrated
For, the pond operation of pond layer is average pond operation, and the operation of division may be integrally incorporated in time one layer of operation layer, secondary one layer
Operation layer is, for example, convolutional layer, that is, the average pond of pond layer division be with the convolution multiplication of secondary one layer of convolutional layer together
Operation.In addition, pond layer can also carry out shift operation to substitute the division needed for average computation, and the part that will have not yet removed
It is incorporated into time one layer of operation layer and calculates together, that is, the division in the average pond of pond layer fails to utilize shift operation complete
The part of replacement is the convolution multiplication operation together with secondary one layer of convolutional layer.
Fig. 2 is the schematic diagram of the integration operation of convolutional neural networks.It please refers to shown in Fig. 2, in convolutional layer, multiple data
P1~Pn and multiple filter coefficient F1~Fn carries out convolution algorithm to generate multiple data C1~Cn, and data C1~Cn is as pond
Change the multiple input data of layer.In the layer of pond, multiple input data carry out add operation to export cumulative data.In succeeding layer
In cumulative data is weighted to export weighted data, wherein the scaling weights W of ranking operation is according to input data
Depending on the scaling weights of quantity and succeeding layer, weighted data is multiplied by scaling weights W equal to cumulative data.
For example, succeeding layer can be time one layer of convolutional layer, and scaling weights are filter coefficient, and ranking operation is convolution
Operation, the factor of ranking operation are equal to the quantity of filter coefficient divided by input data.In addition, the quantity of input data is according to pond
Depending on the scale for changing window.
On the other hand, before cumulative data is calculated at another layer, the division knot of part can be obtained by shift operation
Fruit.For example, cumulative data can carry out bit shift operation to export shifted data, then be weighted fortune to shifted data
It calculates to export weighted data, wherein in the quantity of the scaling weights W foundation input datas of ranking operation, bit shift operation to the right
Depending on the scaling weights of the number of cells of displacement and the succeeding layer of convolutional neural networks.The scaling weights W of ranking operation is with contracting
The number of cells of right shift is proportional in value of delegating power and bit shift operation, and the scaling weights W of ranking operation is with input number
According to quantity be in inverse ratio, weighted data, which is equal to shifted data and is multiplied by, scales weights W.
Depending on scale of the number of cells of right shift in bit shift operation according to pond window, one bit of right shift
It is equivalent to divided by 2 is primary, if the number of cells of right shift is n, 2 n times side is the rule closest to but no more than pond window
Mould power side.By taking 2 × 2 pond window as an example, it is then 2 that the scale of pond window, which is 4, n, 2 bit of right shift;With 3 × 3 pond
For window, it is then 3 that the scale of pond window, which is 9, n, 3 bit of right shift.
Depending on scale of the quantity of input data according to pond window.Succeeding layer is time one layer of convolution of convolutional neural networks
Layer, scaling weights are the filter coefficient of time one layer of convolutional layer, and add operation and bit shift operation are convolutional neural networks
Pond layer in operation.
It for example, can be first to 9 data if there is the pending average pond operation of 9 data in a certain characteristic area
Add up and obtain accumulated value, to avoid the accumulated value that the situation of overflow occurs, bit shift fortune can be carried out to the accumulated value
It calculates, such as the accumulated value is moved right two bits, and obtain shift value, that is, the accumulated value is removed to 4 effect, then will
The shift value is multiplied by weighting coefficient, and obtains weighted value.Depending on the selection of weighting coefficient is the offset according to bit shift,
In the present embodiment, weighting coefficient 1/2.25, therefore finally obtained weighted value be equal to by the accumulated value remove 9 effect.By
Too many processing routine will not be occupied in bit shift operation and ranking operation, therefore passes through bit shift and the two stage fortune of weighting
Calculation mode can allow processor that can carry out average pond operation using less efficiency, and then promote the efficiency of pond operation.
Fig. 3 is the schematic diagram of the integration operation of convolution neural network.It please refers to shown in Fig. 3, the convolution algorithm of convolutional layer
It is that the data that will be inputted are multiplied with filter coefficient, when the data of input need to weight or scale, what this weighted or scaled
Operation is handled together when may be integrally incorporated to convolution algorithm.That is, the weighted input (or scaling) and convolution fortune of convolutional layer
Calculation can be completed in the same multiplying.
Data P1~the Pn for being input to convolutional layer can be the pixel of image or the last layer of convolution neural network
Output, e.g. preceding layer pond layer, hidden layer etc..In figure 3, the operation method of convolutional neural networks includes:Scaling is weighed
Value W and original filter coefficient F1~Fn is multiplied to produce filter coefficient WF1~WFn after weighting;And in convolutional layer
Convolution algorithm is carried out to filter coefficient WF1~WFn after input data P1~Pn and weighting.The convolution algorithm of script is defeated
Data P1~the Pn entered carries out multiplication with original filter coefficient F1~Fn, in order to integrate the operation of weighting or scaling, convolution
Layer actual operation used in coefficient be weighting after filter coefficient WF1~WFn and non-primary filter coefficient F1~Fn.
The input of convolutional layer does not have to additionally be weighted or scale using multiplying.
In addition, when the value that weighting or scaling need to carry out division arithmetic or weighting or scale is less than 1, operation method can be first
Bit shift operation is carried out to input data, the input data after bit shift operation is then input to convolutional layer.Scaling power
Depending on number of cells of the value W according to right shift in original scale weights and bit shift operation.Such as original scale weights
It is 0.4, bit shift operation is then set as 1 bit of right shift (be equivalent to and be multiplied by 0.5), then scales weights W and is then set as 0.8,
Operation result entire in this way can still be comparable to input data and be multiplied by original scale weights (0.5*0.8=0.4).In addition, will remove
Method operation, which is changed to shift operation, can reduce hardware burden, and the input of convolutional layer does not have to additionally be weighted using multiplying
Or scaling.
Fig. 4 is the functional block diagram of the convolution algorithm device of one embodiment according to the present invention.It please refers to shown in Fig. 3,
Convolution algorithm device include memory 1, buffer unit 2, convolution algorithm module 3, staggeredly sum unit 4, add up buffer cell 5,
Coefficient extracts controller 6 and control unit 7.Convolution algorithm device can be used in convolutional neural networks (Convolutional
Neural Network, CNN) application.
Memory 1 stores the data or data for waiting for convolution algorithm, may be, for example, image, video, audio, statistics, convolution god
Through network wherein one layer of data etc..It is, for example, pixel (pixel) data for image data;Come with video data
It says, the pixel data motion-vector for being, for example, video frame or the message in video;With convolutional neural networks wherein one
It is typically a two-dimensional array data for the data of layer, for image data, then the picture of typically one two-dimensional array
Prime number evidence.In addition, in the present embodiment, formula is static RAM (static random-access with memory 1
Memory, SRAM) for, other than it can store and wait for the data or data of convolution algorithm, it is complete convolution algorithm can also to be stored
At data or data, and can the memory structure with multilayer and respectively storage wait for the data that operation and operation finish, change
Yan Zhi, memory 1 can be used as such as the cache (cache memory) inside convolution algorithm device.
When practical application, data wholly or largely can be first stored in elsewhere, such as in another memory, separately
Such as dynamic random access memory (dynamic random access memory, DRAM) or other kinds may be selected in one memory
The memory of class.When convolution algorithm device will carry out convolution algorithm, then entirely or partly data are added by another memory
It is loaded onto in memory 1, is then entered data by buffer unit 2 and carry out convolution algorithm to convolution algorithm module 3.If input
Data be stream data, newest stream data can be written for convolution algorithm in memory 1 at any time.
Buffer unit 2 is coupled with memory 1, convolution algorithm module 3 and adds up buffer cell 5.Also, buffer unit 2
Also the other assemblies with convolution algorithm device are coupled, such as staggeredly sum unit 4 and control unit 7.In addition, for shadow
As data or video frame data operation for, the sequence of processing be (column) while reading multiple row (row) line by line, therefore
In one sequential (clock), buffer unit 2 is inputted from memory 1 with the data in a line different lines, in this regard, the present embodiment
Buffer unit of the buffer unit 2 as a kind of row buffering (column buffer).When operation to be carried out, buffer unit 2 can first by
Memory 1 extracts the data of operation required for convolution algorithm module 3, and is after extraction smoothly to be written by these data point reuses
The data pattern of convolution algorithm module 3.On the other hand, since buffer unit 2 is also coupled with totalling buffer cell 5, buffering is added up
Data after 5 operation of unit also will be by sending back memory 1 again after the temporary rearrangement (reorder) of buffer unit 2
Storage.In other words, buffer unit 2 also has the function of similar relaying temporal data other than having the function of row buffering,
Buffer unit 2 can be used as a kind of data buffer with ranking function in other words.
It is noted that buffer unit 2 further includes memory control unit 21, when buffer unit 2 is being carried out and stored
Data between device 1 are extracted or can be controlled via memory control unit 21 when being written.In addition, since it is between memory 1
With limited storage access width, or also known as bandwidth or bandwidth (bandwidth), convolution algorithm module 3 actually can
The convolution algorithm of progress is also related with the access width of memory 1.In other words, the operation efficiency of convolution algorithm module 3 can be by
Aforementioned access width and limited.Therefore, if the input of memory 1 has bottleneck, the efficiency of convolution algorithm that will be rushed
It hits and declines.
Convolution algorithm module 3 has multiple convolution units, each convolution unit be based on filter and multiple current datas into
Row convolution algorithm, and after convolution algorithm member-retaining portion current data.Buffer unit 2 obtains multiple new datas from memory 1,
And new data is input to convolution unit, new data is not repeated with current data.The convolution unit of convolution algorithm module 3 is based on filter
Wave device, the current data of reservation and new data carry out next round convolution algorithm.Staggeredly sum unit 4 couples convolution algorithm module 3,
Result according to convolution algorithm generates feature and exports result.The coupling of buffer cell 5 staggeredly sum unit 4 and buffer unit 2 are added up,
Temporary feature exports result;Wherein, after the completion of the convolution algorithm of specified range, buffer unit 2 will be temporary from totalling buffer cell 5
The total data deposited is written to memory 1.
Coefficient extracts controller 6 and couples convolution algorithm module 3, and control unit 7 then couples buffer unit 2.Practical application
When, for convolution algorithm module 3, required input source also needs input to have filter other than data itself
(filter) coefficient could carry out operation.The coefficient input of the signified convolution unit array for being 3 × 3 in the present embodiment.
Coefficient extraction controller 6 can be deposited by way of direct memory access (direct memory access) by external
Reservoir directly inputs filter coefficient.Other than coupling convolution algorithm module 3, coefficient extracts controller 6 and can also be filled with buffering
It sets 2 to be attached, to receive the various instructions from control unit 7, convolution algorithm module 3 is enable to be controlled by control unit 7
Coefficient processed extracts controller 6, carries out the input of filter coefficient.
Control unit 7 may include instruction decoder 71 and data reading controller 72.Instruction decoder 71 is read from data
Take controller 72 to obtain control instruction and by instruction decoding, so as to obtain the size of current input data, the line number of input data,
The initial address of the columns of input data, the feature number of input data and input data in memory 1.In addition, instruction
Decoder 71 can also obtain the information in relation to filter from data reading controller 72 and export the number of feature, and export
Vacant signal appropriate is to buffer unit 2.Buffer unit 2 is then run according to the information provided after instruction decoding, also in turn
Control convolution algorithm module 3 and add up the running of buffer cell 5, for example, data from memory 1 be input to buffer unit 2 and
The sequential of convolution algorithm module 3, the scale of the convolution algorithm of convolution algorithm module 3, data are from memory 1 to buffer unit 2
Address, data are read to be transported to the writing address of memory 1, convolution algorithm module 3 and buffer unit 2 from buffer cell 5 is added up
The convolution pattern of work.
On the other hand, control unit 7 then can equally be carried by way of direct memory access by external memory
Take required control instruction and convolution information, instruction decoder 71 is by after instruction decoding, these control instructions and convolution information
It is extracted by buffer unit 2, instruction may include the image number of the step size of Moving Window, the address of Moving Window and feature to be extracted
According to ranks number.
The coupling of buffer cell 5 staggeredly sum unit 4 is added up, it includes that part adds up block 51 and pond to add up buffer cell 5
Change unit 52.Part adds up the temporary data that staggeredly sum unit 4 exports of block 51.Pond unit 52 is added up to being temporarily stored into part
The data of block 51 carry out pond operation.Pond operation is maximum value pond or average pond.
For example, adding up buffer cell 5 can will be via 3 convolutional calculation result of convolution algorithm module and staggeredly sum unit
4 output characteristic results are temporarily stored into part and add up block 51.Then it, then by pond unit 52 to being temporarily stored into part adds up
The data of block 51 carry out pond (pooling) operation, and pond operation can be directed to the special characteristic on some region of input data,
It takes its average value or takes its maximum value as summary feature extraction or statistical nature output, this statistical nature is compared to previous
Not only there is lower dimension for feature, can also improve the handling result of operation.
It should be noted that herein temporary is still that the partial arithmetic result in input data is added (partial sum)
Just it is kept among part adds up block 51 afterwards, therefore is called part and adds up block 51 and add up buffer cell 5, or
It can be referred to as PSUM units and PSUM BUFFER modules.On the other hand, the pond operation of the pond unit 52 of the present embodiment
The calculation that aforementioned average pond (average pooling) can be used obtains statistical nature output.Wait inputted data
After being all convolved computing module 3 and the staggeredly processing of sum unit 4 calculating, adds up buffer cell 5 and export final number
According to handling result, and equally result can be restored into memory 1 by buffer unit 2, or be exported again to it by memory 1
His component.At the same time, convolution algorithm module 3 is still continued for the acquirement and operation of data characteristics with staggeredly sum unit 4,
To improve the treatment efficiency of convolution algorithm device.
In the case of using aforementioned average pond (average pooling), the filter of convolutional layer in memory originally
Wave device coefficient need it is adjusted, actually enter convolution algorithm module 3 be it is adjusted after the factor, the factor can be aforementioned integration
The factor used in the operation of pond layer and a secondary convolutional layer, since the generation of the factor has illustrated in previous embodiment, therefore
This is repeated no more.When convolution algorithm device is in the convolutional layer and pond layer of processing current layer, pond unit 52 can not located first
Should in the layer of front layer pond average pond divide portion, when the processing of convolution algorithm device is to next layer of convolutional layer, convolution is transported
Module 3 is calculated the divide portion in the still untreated average pond of first forebay unit 52 is incorporated into the multiplying of convolution again.
On the other hand, when convolution algorithm device is when handling the convolutional layer and pond layer of current layer, pond unit 52 can be utilized and be moved
Bit arithmetic as part division, but leave also completely be averaged pond divide portion, wait for convolution algorithm device handle under
When one layer of convolutional layer, convolution algorithm module 3 again integrates the divide portion in the still untreated average pond of first forebay unit 52
In the multiplying of convolution.
Convolution algorithm device may include multiple convolution algorithm modules 3, the convolution unit of convolution algorithm module 3 and staggeredly plus
Total unit 4 can be selectively operative in low scale convolution pattern and high scale convolution pattern.In low scale convolution pattern,
Staggeredly sum unit 4 configuration come to correspond in convolution algorithm module 3 sequence each convolution algorithm result interlock totalling with out of the ordinary
Output adds up result.In high scale convolution pattern, staggeredly sum unit 4 hands over the result of each convolution algorithm of each convolution unit
Mistake is added up as output.
In conclusion in the arithmetic unit of the present invention and operation method, average pond operation, Chi Hua are carried out with two benches
Unit only carries out add operation, and displacement bit arithmetic of arranging in pairs or groups, to avoid the overflow caused by cumulative process, then to Chi Huadan
The output result of member is weighted, and obtains final average result.Since pond unit does not do division arithmetic, therefore can
The effect of avoiding processor from expending more efficiency, and then reaching the efficiency for promoting pond operation.
Above-described embodiment is not limited to the present invention, any skilled person, without departing from the present invention spirit with
In scope, and the equivalent modifications that it is carried out or change, it is intended to be limited solely by appended claims range.
Claims (11)
1. a kind of operation method of convolutional neural networks, including:
Add operation is carried out to export cumulative data to multiple input data;
Bit shift operation is carried out to export shifted data to the cumulative data;And
The shifted data is weighted to export weighted data, wherein the factor of the ranking operation is according to described defeated
Enter the contracting of the succeeding layer of the number of cells and convolutional neural networks of right shift in the quantity of data, the bit shift operation
Depending on value of delegating power.
2. the method as described in claim 1, wherein the factor of the ranking operation is with scaling weights and described
The number of cells of right shift is proportional in bit shift operation, and the factor of the ranking operation is with the input data
Quantity is in inverse ratio, and the weighted data is multiplied by the factor equal to the shifted data.
3. the method as described in claim 1, wherein the number of cells of right shift is according to pond in the bit shift operation
Depending on the scale of window, depending on the scale of the quantity of the input data according to the pond window.
4. the method as described in claim 1, wherein the succeeding layer is time one layer of convolutional layer of convolutional neural networks, the contracting
Value of delegating power is the filter coefficient of described one layer of convolutional layer, and the add operation and the bit shift operation are convolution god
Operation in pond layer through network.
5. method as claimed in claim 4, wherein the division arithmetic in the pond layer is incorporated into described one layer of convolutional layer
Multiplying in carry out.
6. a kind of operation method of convolutional neural networks, including:
Add operation is carried out to export cumulative data to multiple input data in the layer of pond;And
The cumulative data is weighted to export weighted data in succeeding layer, wherein the factor of the ranking operation
Depending on the quantity of the input data and the scaling weights of the succeeding layer, the weighted data is equal to the cumulative number
According to being multiplied by the factor.
7. method as claimed in claim 6, wherein the succeeding layer is time one layer of convolutional layer, the scaling weights are filter
Coefficient, the ranking operation are convolution algorithm, and the factor of the ranking operation is equal to the filter coefficient divided by described
The quantity of input data.
8. method as claimed in claim 6, wherein depending on scale of the quantity of the input data according to the pond window.
9. a kind of operation method of convolutional neural networks, including:
Scaling weights are multiplied to produce weighted filter coefficient with original filter coefficient;And
Convolution algorithm is carried out to input data and the weighted filter coefficient in convolutional layer.
10. method as claimed in claim 9, wherein this method further include:
Bit shift operation is carried out to input data;And
Input data after bit shift operation is input to the convolutional layer,
Wherein, depending on number of cells of the scaling weights according to right shift in original scale weights and bit shift operation.
11. a kind of arithmetic unit of convolutional neural networks carries out the method as described in claims 1 to 10 one of which.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710072906.8A CN108415881A (en) | 2017-02-10 | 2017-02-10 | The arithmetic unit and method of convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710072906.8A CN108415881A (en) | 2017-02-10 | 2017-02-10 | The arithmetic unit and method of convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108415881A true CN108415881A (en) | 2018-08-17 |
Family
ID=63124915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710072906.8A Pending CN108415881A (en) | 2017-02-10 | 2017-02-10 | The arithmetic unit and method of convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108415881A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020087742A1 (en) * | 2018-11-02 | 2020-05-07 | 深圳云天励飞技术有限公司 | Processing element, apparatus and method used for implementing convolution operation |
CN112346703A (en) * | 2020-11-24 | 2021-02-09 | 华中科技大学 | Global average pooling circuit for convolutional neural network calculation |
CN112633462A (en) * | 2019-10-08 | 2021-04-09 | 黄朝宗 | Block type inference method and system for memory optimization of convolution neural network |
-
2017
- 2017-02-10 CN CN201710072906.8A patent/CN108415881A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020087742A1 (en) * | 2018-11-02 | 2020-05-07 | 深圳云天励飞技术有限公司 | Processing element, apparatus and method used for implementing convolution operation |
CN112633462A (en) * | 2019-10-08 | 2021-04-09 | 黄朝宗 | Block type inference method and system for memory optimization of convolution neural network |
CN112346703A (en) * | 2020-11-24 | 2021-02-09 | 华中科技大学 | Global average pooling circuit for convolutional neural network calculation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106875011B (en) | Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof | |
JP7007488B2 (en) | Hardware-based pooling system and method | |
CN107633297B (en) | Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm | |
CN106445471A (en) | Processor and method for executing matrix multiplication on processor | |
CN108073977A (en) | Convolution algorithm device and convolution algorithm method | |
CN108573305B (en) | Data processing method, equipment and device | |
TWI630544B (en) | Operation device and method for convolutional neural network | |
CN106874219A (en) | A kind of data dispatching method of convolutional neural networks, system and computer equipment | |
CN110188869B (en) | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm | |
CN110415157A (en) | A kind of calculation method and device of matrix multiplication | |
CN108073549B (en) | Convolution operation device and method | |
CN108415881A (en) | The arithmetic unit and method of convolutional neural networks | |
CN110147252A (en) | A kind of parallel calculating method and device of convolutional neural networks | |
WO2022110386A1 (en) | Data processing method and artificial intelligence processor | |
CN110555516A (en) | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method | |
CN108416430A (en) | The pond arithmetic unit and method of convolutional neural networks | |
CN110490308B (en) | Design method of acceleration library, terminal equipment and storage medium | |
CN113806261B (en) | Vector processor oriented pooling vectorization realization method | |
JP2022137247A (en) | Processing for a plurality of input data sets | |
KR102290531B1 (en) | Apparatus for Reorganizable neural network computing | |
CN109416743B (en) | Three-dimensional convolution device for identifying human actions | |
CN109800867B (en) | Data calling method based on FPGA off-chip memory | |
CN116090518A (en) | Feature map processing method and device based on systolic operation array and storage medium | |
KR101989793B1 (en) | An accelerator-aware pruning method for convolution neural networks and a recording medium thereof | |
CN101339649A (en) | Computing unit and image filtering device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180817 |