CN108647777A - A kind of data mapped system and method for realizing that parallel-convolution calculates - Google Patents

A kind of data mapped system and method for realizing that parallel-convolution calculates Download PDF

Info

Publication number
CN108647777A
CN108647777A CN201810432269.5A CN201810432269A CN108647777A CN 108647777 A CN108647777 A CN 108647777A CN 201810432269 A CN201810432269 A CN 201810432269A CN 108647777 A CN108647777 A CN 108647777A
Authority
CN
China
Prior art keywords
convolution
data
feature vector
module
characteristic pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810432269.5A
Other languages
Chinese (zh)
Inventor
聂林川
姜凯
王子彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201810432269.5A priority Critical patent/CN108647777A/en
Publication of CN108647777A publication Critical patent/CN108647777A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of data mapped systems and method for realizing that parallel-convolution calculates, belong to nerual network technique field.The data mapped system that the realization parallel-convolution of the present invention calculates includes input feature vector cache module, mapping logic module, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module, the input feature vector figure cache module is separately connected with control logic module, mapping logic module, weight cache module is separately connected with control logic module, mapping logic module, computing array is connected with control logic module, mapping logic module, output characteristic pattern cache module, and output characteristic pattern cache module is connected with control logic module.The data mapped system and can eliminate computing resource that is invalid or being not involved in that the realization parallel-convolution of the invention calculates, improve computing resource utilization rate, have good application value.

Description

A kind of data mapped system and method for realizing that parallel-convolution calculates
Technical field
The present invention relates to nerual network technique fields, specifically provide a kind of data mapped system realized parallel-convolution and calculated And method.
Background technology
With artificial intelligence(AI)The development in field, CNN(Convolutional Neural Network, that is, convolutional Neural Network)It is fully used.Mainstream convolutional neural networks model is not only complicated at present, and it is big and each to calculate data volume Layer architecture difference is also very big, and hardware circuit realizes that high-performance realizes that high universalizable is not light simultaneously, should consider the utilization of resources Rate considers Energy Efficiency Ratio again.Realize each layer of whole network model and unrealistic, power consumption, area, resource profit simultaneously with hardware circuit It is difficult to obtain satisfied with rate etc. as a result, the usual way for solving the problems, such as this is to exchange area for the time, it also will entire model Hierarchical block processing is carried out, circuit design at general basic unit, entire model is constructed by control circuit timesharing, simultaneously Means are mapped by efficient data and improve resource utilization, and circuit working performance is improved with this.In the prior art in hardware electricity Road is realized be more than 1 there are convolution kernel sliding step during certain convolutional neural networks models calculate in the case of, there are invalid computation, Reduce resource utilization;On the other hand, in the case of computing array circuit design is fixed, if there is output characteristic pattern and calculate There is the resource for being not involved in calculating in the unmatched situation of array sizes, there is also waste, computing resource waste meetings for resource utilization Overall performance is set to cannot get ideal result.
Invention content
The technical assignment of the present invention is in view of the above problems, to provide a kind of meter that can be eliminated and in vain or be not involved in Resource is calculated, the data mapped system of computing resource utilization rate realized parallel-convolution and calculated is improved.
The further technical assignment of the present invention is to provide a kind of data mapping method realized parallel-convolution and calculated.
To achieve the above object, the present invention provides following technical solutions:
A kind of data mapped system realized parallel-convolution and calculated, which includes input feature vector cache module, mapping logic mould Block, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module, the input feature vector figure Cache module is separately connected with control logic module, mapping logic module, and weight cache module is patrolled with control logic module, mapping It collects module to be separately connected, convolutional calculation array and control logic module, mapping logic module, output characteristic pattern cache module phase Even, output characteristic pattern cache module is connected with control logic module.
The data mapped system for realizing parallel-convolution calculating increases convolutional calculation by reconfiguring input feature vector figure Degree of parallelism, eliminate computing resource that is invalid or being not involved in.Input feature vector figure is particularly subjected to well-regulated piecemeal, is passed through Effective mapping means, reconfigure input feature vector figure, will be invalid or be not involved in calculating section and be substituted for effective calculating section, increasing The degree of parallelism for adding whole convolutional calculation improves the utilization rate of computing resource, improves system performance.
Preferably, caching of the input feature vector figure cache module as outer input data, mapping logic module are pressed According to the order that control logic module issues data, mapping logic mould are obtained from input feature vector figure cache module and weight cache module Block send the data of acquisition to convolutional calculation array, and convolutional calculation array will calculate the data completed and send to output characteristic pattern caching Module.
Preferably, the convolutional calculation array multiplies N row convolutional calculation units, adjacent convolutional calculation unit using N rows Interconnection.
Each convolutional calculation unit includes 2x2 PE(Processing Element, that is, processing unit), convolution meter When calculation, each PE corresponds to the calculating of a pixel of an output characteristic pattern.
A method of realizing the data mapping that parallel-convolution calculates, the method carries out input feature vector figure well-regulated Piecemeal reconfigures input feature vector figure by mapping means, increases the degree of parallelism of convolutional calculation, and mapping logic will be from group again The data that the input feature vector figure of conjunction obtains are sent to convolutional calculation array, and convolutional calculation array is sent the data completed are calculated to output Characteristic pattern cache module.
Preferably, when convolution kernel sliding step is more than 1, convolution kernel in input feature vector figure is slided to the part of invalid computation It is partially filled with what is effectively calculated, the input feature vector figure reconfigured is inputted as convolution unit.
Preferably, the part of convolution kernel sliding invalid computation is filled out with the part effectively calculated in the figure by input feature vector It fills, invalid computation partial array is filled using the data of the effective calculating position in the matrix upper right corner, will participate in having in input feature vector figure The data that effect calculates translate downwards to the right, copy in adjacent convolutional calculation unit.
Preferably, the data copied in adjacent convolutional calculation unit and the volume read in from weight cache module Product core weighted value carries out convolutional calculation, and the characteristic pattern of Combination nova is made to have traversed weighted value, and result of calculation is sent to output characteristic pattern Cache module.
Preferably, when output characteristic pattern and computing array size mismatch, by multichannel input feature vector figure be divided into compared with Small characteristic pattern unit reconfigures the characteristic pattern unit of adjacency channel same position for new input feature vector figure, as volume Product computing array input.
Preferably, the multichannel input feature vector figure division proportion depends on output characteristic pattern size, port number depends on In convolutional calculation array sizes and output characteristic pattern size.
Compared with prior art, the data mapping method that realization parallel-convolution of the invention calculates has with following prominent Beneficial effect:The data mapping method for realizing parallel-convolution calculating reconfigures input feature vector by effectively mapping means Figure, increases the degree of parallelism of convolutional calculation, and input feature vector figure is particularly carried out well-regulated piecemeal, will be invalid or be not involved in meter Partial replacement is calculated into effective calculating section, computing resource that is invalid or being not involved in is eliminated, increases the degree of parallelism of whole convolutional calculation, The utilization rate of computing resource is improved, system performance is improved, there is good application value.
Description of the drawings
Fig. 1 is the topological diagram for the data mapped system that realization parallel-convolution of the present invention calculates;
Fig. 2 is that convolutional calculation unit progress convolutional calculation is opened up in the data mapped system that realization parallel-convolution of the present invention calculates Flutter figure;
Fig. 3 is the signal when data mapping method convolution kernel sliding step that realization parallel-convolution of the present invention calculates is more than 1 Figure;
Fig. 4 be realization parallel-convolution of the present invention the data mapping method output characteristic pattern and the computing array size that calculate not The schematic diagram of timing.
Specific implementation mode
Below in conjunction with drawings and examples, to the data mapped system and method for realizing parallel-convolution calculating of the present invention It is described in further detail.
Embodiment
As shown in Figure 1, the data mapped system of the present invention realized parallel-convolution and calculated, including input feature vector cache mould Block, mapping logic module, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module.
Caching of the input feature vector figure cache module as outer input data, with control logic module, mapping logic module It is separately connected.
Convolutional calculation array multiplies N row convolutional calculation units, adjacent convolutional calculation element-interconn ection using N rows.Such as Fig. 2 institutes Show, each convolutional calculation unit includes 2x2 PE, and when convolutional calculation, each PE corresponds to a pixel of an output characteristic pattern The calculating of point.
Mapping logic module is cached according to the order that control logic module issues from input feature vector figure cache module and weight Module obtains data, and mapping logic module send the data of acquisition to convolutional calculation array, and convolutional calculation array will be calculated and be completed Data send to output characteristic pattern cache module.
Weight cache module is separately connected with control logic module, mapping logic module.Convolutional calculation array is patrolled with control Module, mapping logic module, output characteristic pattern cache module is collected to be connected.Export characteristic pattern cache module and control logic module phase Even.
The present invention's realizes that input feature vector figure is carried out well-regulated piecemeal by the data mapping method that parallel-convolution calculates, and leads to Mapping means are crossed, input feature vector figure is reconfigured, increase the degree of parallelism of convolutional calculation, mapping logic will be from the input reconfigured The data that characteristic pattern obtains are sent to convolutional calculation array, and convolutional calculation array, which will calculate the data completed and send to output characteristic pattern, to be delayed Storing module.
When convolution kernel sliding step is more than 1, convolution kernel in input feature vector figure is slided to the part effectively meter of invalid computation That calculates is partially filled with, and invalid computation partial array is filled using the data of the effective calculating position in the matrix upper right corner, by input feature vector The data effectively calculated are participated in figure to translate downwards to the right, are copied in adjacent computing unit.Copy to adjacent calculating list Data in member carry out convolutional calculation with the convolution kernel weighted value read in from weight cache module, and the characteristic pattern of Combination nova is made to traverse Complete weighted value, the input feature vector figure reconfigured are inputted as convolution unit, and result of calculation is sent to output characteristic pattern and is delayed Storing module.Specific implementation process is as shown in Figure 3.It is 4x4 with convolutional calculation array sizes, output characteristic pattern is 2x2, convolution kernel power Weight matrix is 1x1, is illustrated for the example that convolution kernel sliding step is 2.The each sliding step of convolution kernel is 2, often calculates one Effective output point can all carry out primary invalid calculating, and entire computing array effective rate of utilization is (2x2)/(4x4)=1/4, is calculated Resource receives waste, in order to make full use of computing resource, is replicated parallel using by effective computing resource, then respectively from different volumes Product nuclear convolution, and cache the mode of intermediate result.
1, period 1 T0 moment, control logic command mappings logic cache from input feature vector figure and obtain input feature vector figure In 11 point values input computing array, cached from weight and obtain respective weights k1 and input computing array.
2, the T1 moment, in computing array, 11 point values and weight k1 be calculated result of calculation out0 to exporting feature Figure caching, while copying to 12 position of clearing array by 11 points.
3, the T2 moment, 12 point values and weight k2 carry out result out1 is calculated in computing array delays to output characteristic pattern It deposits, while 21 position of clearing array is copied to by 11 points.
4, the T3 moment, in computing array, 21 point values and weight k3 carry out result out2 is calculated to be delayed to output characteristic pattern It deposits, while 22 position of clearing array is copied to by 11 points.
5, T4 moment, 22 positions with weight k4 carry out that result out3 is calculated, to output characteristic pattern caching.
The same processing mode of other computing units, until by first characteristic value of input feature vector figure and ownership restatement It calculates and completes, and preserve intermediate result, then carry out next characteristic value clearing, and so on, first passage input feature vector figure is whole After the completion of calculating, next channel input feature vector figure enters calculating, and different channels are corresponded to results of intermediate calculations and sum up place Reason.
When exporting characteristic pattern with computing array size mismatch, multichannel input feature vector figure is divided into smaller characteristic pattern Unit reconfigures the characteristic pattern unit of adjacency channel same position for new input feature vector figure, as convolutional calculation array Input.Multichannel input feature vector figure division proportion depend on output characteristic pattern size, port number depend on computing array size and Export characteristic pattern size.Specific implementation process is as shown in Figure 4.It is 3x3 with convolutional calculation array sizes, output characteristic pattern size is 2x2, convolution kernel size are 1x1, and the example that sliding step is 1 illustrates, and in the case of this kind, computing array size is more than output Characteristic pattern size, and be not integral multiple relation, computing resource utilization rate is(2x2)/ (3x3)=4/9, is not involved in the resource of calculating It is wasted.Input feature vector figure stripping and slicing is taken, the input feature vector figure stripping and slicing in different channels is combined, calculating is made full use of All resources of array so that computing resource can be fully used.Detailed process is as follows:
1, period 1 T0 moment, control logic command mappings logic divide 11 point values of the same position of the one two three four-way Not Shu Ru computing array 11,12,21,22 positions, four tunnel parallel computations simultaneously obtain 4 output characteristic pattern 11 point values, and Keep in output characteristic pattern caching.
2, T1 moment, control logic command mappings logic are defeated by the 12 point values difference of the same position of the one two three four-way Enter the 11 of computing array, 12,21,22 positions, four tunnel parallel computations obtain 12 point values of 4 output characteristic patterns simultaneously, and keep in To output characteristic pattern caching.
3, T2 moment, control logic command mappings logic are defeated by the 21 point values difference of the same position of the one two three four-way Enter the 11 of computing array, 12,21,22 positions, four tunnel parallel computations obtain 21 point values of 4 output characteristic patterns simultaneously, and keep in To output characteristic pattern caching.
4, T3 moment, control logic command mappings logic are defeated by the 22 point values difference of the same position of the one two three four-way Enter the 11 of computing array, 12,21,22 positions, four tunnel parallel computations obtain 22 point values of 4 output characteristic patterns simultaneously, and keep in To output characteristic pattern caching.
At the end of the T3 moment, all point values of output characteristic pattern in four channels, which all calculate, to be completed.
Embodiment described above, the only present invention more preferably specific implementation mode, those skilled in the art is at this The usual variations and alternatives carried out within the scope of inventive technique scheme should be all included within the scope of the present invention.

Claims (9)

1. a kind of data mapped system realized parallel-convolution and calculated, it is characterised in that:The system includes input feature vector caching mould Block, mapping logic module, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module, institute It states input feature vector figure cache module to be separately connected with control logic module, mapping logic module, weight cache module is patrolled with control Collect module, mapping logic module is separately connected, convolutional calculation array and control logic module, mapping logic module, output feature Figure cache module is connected, and output characteristic pattern cache module is connected with control logic module.
2. the data mapped system according to claim 1 realized parallel-convolution and calculated, it is characterised in that:The input Caching of the characteristic pattern cache module as outer input data, the order that mapping logic module is issued according to control logic module from Input feature vector figure cache module and weight cache module obtain data, and mapping logic module send the data of acquisition to convolutional calculation Array, convolutional calculation array will calculate the data completed and send to output characteristic pattern cache module.
3. the data mapped system according to claim 1 or 2 realized parallel-convolution and calculated, it is characterised in that:The volume Product computing array multiplies N row convolutional calculation units, adjacent convolutional calculation element-interconn ection using N rows.
4. a kind of method that data that realizing that parallel-convolution calculates map, it is characterised in that:The method by input feature vector figure into The well-regulated piecemeal of row reconfigures input feature vector figure by mapping means, increases the degree of parallelism of convolutional calculation, mapping logic The data obtained from the input feature vector figure reconfigured are sent to convolutional calculation array, convolutional calculation array will calculate the number completed According to send to output characteristic pattern cache module.
5. the data mapping method according to claim 4 realized parallel-convolution and calculated, it is characterised in that:Convolution kernel slides When step-length is more than 1, the part that convolution kernel in input feature vector figure is slided to invalid computation is partially filled with what is effectively calculated, obtains weight The input feature vector figure of Combination nova is inputted as convolution unit.
6. the data mapping method according to claim 4 or 5 realized parallel-convolution and calculated, it is characterised in that:It is described to incite somebody to action The part of convolution kernel sliding invalid computation is partially filled with what is effectively calculated in input feature vector figure, is effectively counted using the matrix upper right corner The data for calculating position fill invalid computation partial array, and the data for participating in effectively calculating in input feature vector figure are put down downwards to the right It moves, copies in adjacent convolutional calculation unit.
7. the data mapping method according to claim 6 realized parallel-convolution and calculated, it is characterised in that:It is described to copy to Data in adjacent convolutional calculation unit carry out convolutional calculation with the convolution kernel weighted value read in from weight cache module, make new The characteristic pattern of combination has traversed weighted value, and result of calculation is sent to output characteristic pattern cache module.
8. the data mapping method according to claim 4 realized parallel-convolution and calculated, it is characterised in that:Export characteristic pattern When being mismatched with computing array size, multichannel input feature vector figure is divided into smaller characteristic pattern unit, adjacency channel is same The characteristic pattern unit of one position reconfigures as new input feature vector figure, is inputted as convolutional calculation array.
9. the data mapping method according to claim 8 realized parallel-convolution and calculated, it is characterised in that:The multichannel Input feature vector figure division proportion depends on output characteristic pattern size, and port number depends on convolutional calculation array sizes and output feature Figure size.
CN201810432269.5A 2018-05-08 2018-05-08 A kind of data mapped system and method for realizing that parallel-convolution calculates Pending CN108647777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810432269.5A CN108647777A (en) 2018-05-08 2018-05-08 A kind of data mapped system and method for realizing that parallel-convolution calculates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810432269.5A CN108647777A (en) 2018-05-08 2018-05-08 A kind of data mapped system and method for realizing that parallel-convolution calculates

Publications (1)

Publication Number Publication Date
CN108647777A true CN108647777A (en) 2018-10-12

Family

ID=63749398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810432269.5A Pending CN108647777A (en) 2018-05-08 2018-05-08 A kind of data mapped system and method for realizing that parallel-convolution calculates

Country Status (1)

Country Link
CN (1) CN108647777A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163338A (en) * 2019-01-31 2019-08-23 腾讯科技(深圳)有限公司 Chip operation method, device, terminal and chip with operation array
CN112101284A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Image recognition method, training method, device and system of image recognition model
CN112966807A (en) * 2019-12-13 2021-06-15 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN114429207A (en) * 2022-01-14 2022-05-03 支付宝(杭州)信息技术有限公司 Convolution processing method, device, equipment and medium for feature map
CN114565501A (en) * 2022-02-21 2022-05-31 格兰菲智能科技有限公司 Data loading method and device for convolution operation
CN116306855A (en) * 2023-05-17 2023-06-23 之江实验室 Data processing method and device based on memory and calculation integrated system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN106446546A (en) * 2016-09-23 2017-02-22 西安电子科技大学 Meteorological data complement method based on automatic convolutional encoding and decoding algorithm
CN106779060A (en) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization
CN107153873A (en) * 2017-05-08 2017-09-12 中国科学院计算技术研究所 A kind of two-value convolutional neural networks processor and its application method
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN107506828A (en) * 2016-01-20 2017-12-22 南京艾溪信息科技有限公司 Computing device and method
CN106446546A (en) * 2016-09-23 2017-02-22 西安电子科技大学 Meteorological data complement method based on automatic convolutional encoding and decoding algorithm
CN106779060A (en) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization
CN107153873A (en) * 2017-05-08 2017-09-12 中国科学院计算技术研究所 A kind of two-value convolutional neural networks processor and its application method
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163338A (en) * 2019-01-31 2019-08-23 腾讯科技(深圳)有限公司 Chip operation method, device, terminal and chip with operation array
WO2020156508A1 (en) * 2019-01-31 2020-08-06 腾讯科技(深圳)有限公司 Method and device for operating on basis of chip with operation array, and chip
CN110163338B (en) * 2019-01-31 2024-02-02 腾讯科技(深圳)有限公司 Chip operation method and device with operation array, terminal and chip
CN112966807A (en) * 2019-12-13 2021-06-15 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN112966807B (en) * 2019-12-13 2022-09-16 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN112101284A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Image recognition method, training method, device and system of image recognition model
CN114429207A (en) * 2022-01-14 2022-05-03 支付宝(杭州)信息技术有限公司 Convolution processing method, device, equipment and medium for feature map
CN114565501A (en) * 2022-02-21 2022-05-31 格兰菲智能科技有限公司 Data loading method and device for convolution operation
CN114565501B (en) * 2022-02-21 2024-03-22 格兰菲智能科技有限公司 Data loading method and device for convolution operation
CN116306855A (en) * 2023-05-17 2023-06-23 之江实验室 Data processing method and device based on memory and calculation integrated system
CN116306855B (en) * 2023-05-17 2023-09-01 之江实验室 Data processing method and device based on memory and calculation integrated system

Similar Documents

Publication Publication Date Title
CN108647777A (en) A kind of data mapped system and method for realizing that parallel-convolution calculates
CN111178519B (en) Convolutional neural network acceleration engine, convolutional neural network acceleration system and method
CN108875958A (en) Use the primary tensor processor of outer product unit
CN107590085B (en) A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN108564168A (en) A kind of design method to supporting more precision convolutional neural networks processors
CN107852379A (en) For the two-dimentional router of orientation of field programmable gate array and interference networks and the router and other circuits of network and application
CN104200045B (en) The parallel calculating method of a kind of basin large scale water system sediments formula hydrodynamic model
CN104145281A (en) Neural network computing apparatus and system, and method therefor
CN108875956A (en) Primary tensor processor
CN109447241A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field
CN107506329B (en) A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line
CN110163354A (en) A kind of computing device and method
CN105373517A (en) Spark-based distributed matrix inversion parallel operation method
CN105426918B (en) Normalize associated picture template matching efficient implementation method
US20220222513A1 (en) Neural network processor system and methods of operating and forming thereof
CN102214086A (en) General-purpose parallel acceleration algorithm based on multi-core processor
CN104239595B (en) For realizing the method and apparatus for design planning and the system level design tool of framework exploration
CN102497411A (en) Intensive operation-oriented hierarchical heterogeneous multi-core on-chip network architecture
CN105468439A (en) Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework
CN103761072A (en) Coarse granularity reconfigurable hierarchical array register file structure
CN108805285A (en) A kind of convolutional neural networks pond unit design method
CN111079078B (en) Lower triangular equation parallel solving method for structural grid sparse matrix
CN108875957B (en) Primary tensor processor and the system for using primary tensor processor
Xu et al. CMSA: Configurable multi-directional systolic array for convolutional neural networks
CN102446342A (en) Reconfigurable binary arithmetical unit, reconfigurable binary image processing system and basic morphological algorithm implementation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181012

RJ01 Rejection of invention patent application after publication