JP6929958B2 - 低レイテンシ行列乗算ユニット - Google Patents
低レイテンシ行列乗算ユニット Download PDFInfo
- Publication number
- JP6929958B2 JP6929958B2 JP2019553237A JP2019553237A JP6929958B2 JP 6929958 B2 JP6929958 B2 JP 6929958B2 JP 2019553237 A JP2019553237 A JP 2019553237A JP 2019553237 A JP2019553237 A JP 2019553237A JP 6929958 B2 JP6929958 B2 JP 6929958B2
- Authority
- JP
- Japan
- Prior art keywords
- weight
- register
- matrix
- multiplication unit
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000011159 matrix material Substances 0.000 title claims description 195
- 238000013528 artificial neural network Methods 0.000 claims description 36
- 238000012546 transfer Methods 0.000 claims description 9
- 238000002360 preparation method Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 description 39
- 238000000034 method Methods 0.000 description 27
- 230000008569 process Effects 0.000 description 22
- 238000004364 calculation method Methods 0.000 description 16
- 238000004590 computer program Methods 0.000 description 11
- 230000009471 action Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 230000001186 cumulative effect Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000007667 floating Methods 0.000 description 7
- 238000002347 injection Methods 0.000 description 6
- 239000007924 injection Substances 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000017105 transposition Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000005945 translocation Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8046—Systolic arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F5/01—Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
- G06F5/015—Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising having at least two separately controlled shifting levels, e.g. using shifting matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
- Fuel Cell (AREA)
- Shift Register Type Memory (AREA)
Description
この明細書は、ハードウェアでのニューラルネットワーク計算の実行に関する。
この明細書では、ニューラルネットワークのトレーニング、ニューラルネットワークの推論の計算、またはその両方を行う専用ハードウェア回路、および特に重み値が行列乗算ユニット内の重み行列レジスタにロードされる速度を増大させることによって行列乗算ユニット全体のレイテンシを低減する専用ハードウェア回路に関する技術について記載する。
詳細な説明
複数の層を有するニューラルネットワークをトレーニングし、それを推論の計算に用いることができる。たとえば、ニューラルネットワークは、ある値で各々初期化されるパラメータを有する。トレーニング中、ニューラルネットワークは、ニューラルネットワークのトレーニング手順を実行して、ニューラルネットワークのパラメータの値を調整して、たとえば、逆伝播を用いてパラメータの初期値からパラメータのトレーニングを経た値を判断する。トレーニングされたニューラルネットワークは、推論を計算でき、つまり、ニューラルネットワークの層を介して入力を処理し、その入力に対するニューラルネットワーク出力を生成できる。
複数の重み値が利用可能な場合、インターフェイスは、重み値を、シフトチェーンによって、クロックサイクルで、マルチセル内のセル435の重みシフトレジスタにシフトする(504)。
Claims (12)
- セルのシストリックアレイとして実装される行列乗算ユニットであって、前記セルのシストリックアレイの各セルは、
転置されたまたは転置されない重みシフトレジスタから重み入力を受け取るように構成される重み行列レジスタと、
前記重み行列レジスタに格納されるよう、2次元フォーマットの第1の方向から重み入力を受け取るように構成される転置された重みシフトレジスタと、
前記重み行列レジスタに格納されるよう、前記第1の方向に対して垂直である前記2次元フォーマットの第2の方向から重み入力を受け取るように構成される転置されない重みシフトレジスタと、
前記重み行列レジスタに結合され、乗算結果を得るために前記重み行列レジスタの重み入力をベクトルデータ入力で乗算するように構成される乗算ユニットとを備える、行列乗算ユニット。 - 前記重み入力は、ニューラルネットワークの重み入力であり、
前記ベクトルデータ入力は、ニューラルネットワークのベクトルデータ入力である、請求項1に記載の行列乗算ユニット。 - 各セルは、前記転置された重みシフトレジスタの重み入力と前記転置されない重みシフトレジスタの重み入力との間で選択を行い、選択された重み入力を前記重み行列レジスタに転送するように構成されるマルチプレクサをさらに備える、請求項1または2に記載の行列乗算ユニット。
- 前記転置された重みシフトレジスタまたは前記転置されない重みシフトレジスタのいずれかからの重み値を保持するように構成された第1の重み保持レジスタをさらに備える、請求項1〜3のいずれか1項に記載の行列乗算ユニット。
- 前記転置された重みシフトレジスタまたは前記転置されない重みシフトレジスタのいずれかからの重み値を保持するように構成された第2の重み保持レジスタをさらに備える、請求項4に記載の行列乗算ユニット。
- ある重み値は、転置された重みシフトレジスタから前記第1の重み保持レジスタにロードされ、ある重み値は、垂直方向から前記第2の重み保持レジスタにロードされる、請求項5に記載の行列乗算ユニット。
- 前記重み行列レジスタには、前記第1の重み保持レジスタまたは前記第2の重み保持レジスタから値がロードされる、請求項5または請求項6に記載の行列乗算ユニット。
- データが前記重み行列レジスタにあるとき、前記データは任意の数の乗算サイクルで用いられる、請求項1から請求項7のいずれか1項に記載の行列乗算ユニット。
- 前記任意の数の乗算サイクルの間に、次の乗算セットに備えて、より多くの重みがバックグラウンドで前記重みシフトレジスタにシフトされる、請求項8に記載の行列乗算ユニット。
- 前記任意の数の乗算サイクルの間に、乗算結果を得るために、前記重み行列レジスタの重み入力がベクトルデータ入力で乗算される、請求項8に記載の行列乗算ユニット。
- ベクトルデータ入力がクロックサイクルごとに1マルチセルだけ移動する、請求項1から請求項10のいずれか1項に記載の行列乗算ユニット。
- 命令が受け取られると、前記命令に基づいて重みがシフトされる、請求項1から請求項11のいずれか1項に記載の行列乗算ユニット。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021131278A JP7135181B2 (ja) | 2017-05-17 | 2021-08-11 | 低レイテンシ行列乗算ユニット |
JP2022138332A JP7444936B2 (ja) | 2017-05-17 | 2022-08-31 | 低レイテンシ行列乗算ユニット |
JP2024024596A JP2024063060A (ja) | 2017-05-17 | 2024-02-21 | 低レイテンシ行列乗算ユニット |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762507766P | 2017-05-17 | 2017-05-17 | |
US62/507,766 | 2017-05-17 | ||
PCT/US2018/033261 WO2018213628A1 (en) | 2017-05-17 | 2018-05-17 | Low latency matrix multiply unit |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2021131278A Division JP7135181B2 (ja) | 2017-05-17 | 2021-08-11 | 低レイテンシ行列乗算ユニット |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2020516991A JP2020516991A (ja) | 2020-06-11 |
JP6929958B2 true JP6929958B2 (ja) | 2021-09-01 |
Family
ID=62815117
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019553237A Active JP6929958B2 (ja) | 2017-05-17 | 2018-05-17 | 低レイテンシ行列乗算ユニット |
JP2021131278A Active JP7135181B2 (ja) | 2017-05-17 | 2021-08-11 | 低レイテンシ行列乗算ユニット |
JP2022138332A Active JP7444936B2 (ja) | 2017-05-17 | 2022-08-31 | 低レイテンシ行列乗算ユニット |
JP2024024596A Pending JP2024063060A (ja) | 2017-05-17 | 2024-02-21 | 低レイテンシ行列乗算ユニット |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2021131278A Active JP7135181B2 (ja) | 2017-05-17 | 2021-08-11 | 低レイテンシ行列乗算ユニット |
JP2022138332A Active JP7444936B2 (ja) | 2017-05-17 | 2022-08-31 | 低レイテンシ行列乗算ユニット |
JP2024024596A Pending JP2024063060A (ja) | 2017-05-17 | 2024-02-21 | 低レイテンシ行列乗算ユニット |
Country Status (8)
Country | Link |
---|---|
US (8) | US10635740B2 (ja) |
EP (4) | EP3757823B1 (ja) |
JP (4) | JP6929958B2 (ja) |
KR (1) | KR102302608B1 (ja) |
CN (4) | CN116661732A (ja) |
BR (2) | BR112019023395B1 (ja) |
TW (5) | TWI771155B (ja) |
WO (2) | WO2018213635A1 (ja) |
Families Citing this family (65)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10489479B1 (en) | 2016-09-12 | 2019-11-26 | Habana Labs Ltd. | Matrix multiplication engine |
EP3757823B1 (en) | 2017-05-17 | 2023-07-05 | Google LLC | Low latency matrix multiply unit |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11157287B2 (en) | 2017-07-24 | 2021-10-26 | Tesla, Inc. | Computational array microprocessor system with variable latency memory access |
US11157441B2 (en) | 2017-07-24 | 2021-10-26 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11409692B2 (en) * | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US10671349B2 (en) | 2017-07-24 | 2020-06-02 | Tesla, Inc. | Accelerated mathematical engine |
US11321092B1 (en) | 2017-11-08 | 2022-05-03 | Habana Labs Ltd. | Tensor-based memory access |
US10915297B1 (en) * | 2017-11-15 | 2021-02-09 | Habana Labs Ltd. | Hardware accelerator for systolic matrix multiplication |
US11373088B2 (en) * | 2017-12-30 | 2022-06-28 | Intel Corporation | Machine learning accelerator mechanism |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11461579B2 (en) | 2018-02-08 | 2022-10-04 | Western Digital Technologies, Inc. | Configurable neural network engine for convolutional filter sizes |
US11164072B2 (en) | 2018-02-08 | 2021-11-02 | Western Digital Technologies, Inc. | Convolution engines for systolic neural network processor |
US11275997B1 (en) * | 2018-04-30 | 2022-03-15 | Amazon Technologies, Inc. | Weight loading in an array |
US10990396B2 (en) * | 2018-09-27 | 2021-04-27 | Intel Corporation | Systems for performing instructions to quickly convert and use tiles as 1D vectors |
US10817042B2 (en) * | 2018-09-27 | 2020-10-27 | Intel Corporation | Power savings for neural network architecture with zero activations during inference |
CN111291874B (zh) * | 2018-12-06 | 2023-12-01 | 神盾股份有限公司 | 卷积神经网络处理器及其数据处理方法 |
US11494645B2 (en) | 2018-12-06 | 2022-11-08 | Egis Technology Inc. | Convolutional neural network processor and data processing method thereof |
US20200210517A1 (en) | 2018-12-27 | 2020-07-02 | Intel Corporation | Systems and methods to accelerate multiplication of sparse matrices |
US11928582B1 (en) * | 2018-12-31 | 2024-03-12 | Cadence Design Systems, Inc. | System, media, and method for deep learning |
US20200226444A1 (en) | 2019-01-15 | 2020-07-16 | BigStream Solutions, Inc. | Systems, apparatus, methods, and architecture for precision heterogeneity in accelerating neural networks for inference and training |
CN111279364A (zh) * | 2019-01-31 | 2020-06-12 | 深圳市大疆创新科技有限公司 | 卷积计算的装置、方法、处理器和可移动设备 |
WO2020160490A1 (en) * | 2019-02-01 | 2020-08-06 | Lightelligence, Inc. | Processing matrix operations for rate limited systems |
KR20200107295A (ko) * | 2019-03-07 | 2020-09-16 | 에스케이하이닉스 주식회사 | 시스톨릭 어레이 및 프로세싱 시스템 |
KR20210136994A (ko) * | 2019-03-15 | 2021-11-17 | 인텔 코포레이션 | 매트릭스 가속기 아키텍처 내에서의 시스톨릭 분리 |
US10929058B2 (en) | 2019-03-25 | 2021-02-23 | Western Digital Technologies, Inc. | Enhanced memory device architecture for machine learning |
US11783176B2 (en) | 2019-03-25 | 2023-10-10 | Western Digital Technologies, Inc. | Enhanced storage device memory architecture for machine learning |
US10990397B2 (en) * | 2019-03-30 | 2021-04-27 | Intel Corporation | Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator |
CN112149049A (zh) | 2019-06-26 | 2020-12-29 | 北京百度网讯科技有限公司 | 用于变换矩阵的装置和方法、数据处理*** |
CN110210615B (zh) * | 2019-07-08 | 2024-05-28 | 中昊芯英(杭州)科技有限公司 | 一种用于执行神经网络计算的脉动阵列*** |
US11481471B2 (en) * | 2019-08-16 | 2022-10-25 | Meta Platforms, Inc. | Mapping convolution to a matrix processor unit |
US11188618B2 (en) | 2019-09-05 | 2021-11-30 | Intel Corporation | Sparse matrix multiplication acceleration mechanism |
US11195818B2 (en) * | 2019-09-12 | 2021-12-07 | Taiwan Semiconductor Manufacturing Company, Ltd. | Backside contact for thermal displacement in a multi-wafer stacked integrated circuit |
US11842169B1 (en) | 2019-09-25 | 2023-12-12 | Amazon Technologies, Inc. | Systolic multiply delayed accumulate processor architecture |
US11409838B2 (en) | 2019-10-29 | 2022-08-09 | Meta Platforms, Inc. | High throughput matrix processor with support for concurrently processing multiple matrices |
US20210149677A1 (en) * | 2019-11-15 | 2021-05-20 | Intel Corporation | Enhanced processor functions for calculation |
US11467806B2 (en) | 2019-11-27 | 2022-10-11 | Amazon Technologies, Inc. | Systolic array including fused multiply accumulate with efficient prenormalization and extended dynamic range |
US11816446B2 (en) | 2019-11-27 | 2023-11-14 | Amazon Technologies, Inc. | Systolic array component combining multiple integer and floating-point data types |
US11809798B2 (en) | 2019-12-13 | 2023-11-07 | Intel Corporation | Implementing large multipliers in tensor arrays |
US11520584B2 (en) * | 2019-12-13 | 2022-12-06 | Intel Corporation | FPGA specialist processing block for machine learning |
WO2021179224A1 (zh) * | 2020-03-11 | 2021-09-16 | 深圳市大疆创新科技有限公司 | 数据处理装置、数据处理方法以及加速器 |
US20210303987A1 (en) * | 2020-03-26 | 2021-09-30 | Advanced Micro Devices, Inc. | Power reduction for machine learning accelerator background |
US20210312266A1 (en) * | 2020-04-01 | 2021-10-07 | Microsoft Technology Licensing, Llc | Deep neural network accelerator with independent datapaths for simultaneous processing of different classes of operations |
US11507817B2 (en) | 2020-04-17 | 2022-11-22 | Samsung Electronics Co., Ltd. | System and method for performing computations for deep neural networks |
GB2594971B (en) * | 2020-05-13 | 2022-10-05 | Advanced Risc Mach Ltd | Variable position shift for matrix processing |
US11308027B1 (en) | 2020-06-29 | 2022-04-19 | Amazon Technologies, Inc. | Multiple accumulate busses in a systolic array |
US11422773B1 (en) | 2020-06-29 | 2022-08-23 | Amazon Technologies, Inc. | Multiple busses within a systolic array processing element |
US11308026B1 (en) * | 2020-06-29 | 2022-04-19 | Amazon Technologies, Inc. | Multiple busses interleaved in a systolic array |
KR20220015680A (ko) | 2020-07-31 | 2022-02-08 | 삼성전자주식회사 | 딥러닝 연산 수행 방법 및 장치 |
JP7358312B2 (ja) * | 2020-08-25 | 2023-10-10 | 株式会社東芝 | 記憶装置およびニューラルネットワーク装置 |
CN112434256B (zh) * | 2020-12-03 | 2022-09-13 | 海光信息技术股份有限公司 | 矩阵乘法器和处理器 |
CN112434255B (zh) * | 2020-12-03 | 2023-12-08 | 成都海光微电子技术有限公司 | 向量-矩阵运算和数据处理方法、乘法器和处理器芯片 |
WO2022164652A1 (en) * | 2021-02-01 | 2022-08-04 | Microsoft Technology Licensing, Llc | Semi-programmable and reconfigurable co-accelerator for a deep neural network with normalization or non-linearity |
US11734214B2 (en) * | 2021-02-01 | 2023-08-22 | Microsoft Technology Licensing, Llc | Semi-programmable and reconfigurable co-accelerator for a deep neural network with normalization or non-linearity |
KR102597802B1 (ko) * | 2021-06-11 | 2023-11-02 | 강원대학교산학협력단 | 시분할 다채널 아날로그 행렬 연산기, 이의 동작 방법, 및 이를 포함하는 장치 |
US11880682B2 (en) | 2021-06-30 | 2024-01-23 | Amazon Technologies, Inc. | Systolic array with efficient input reduction and extended array performance |
CN113821701B (zh) * | 2021-10-14 | 2023-09-26 | 厦门半导体工业技术研发有限公司 | 提升电路访问效率的方法及装置 |
US11829321B2 (en) * | 2022-03-24 | 2023-11-28 | Google Llc | General-purpose systolic array |
WO2024025852A1 (en) * | 2022-07-27 | 2024-02-01 | Texas Instruments Incorporated | Transient current management |
CN115469826B (zh) * | 2022-09-16 | 2023-04-07 | 深圳思谋信息科技有限公司 | 数据处理方法、装置、计算机设备及计算机可读存储介质 |
US20240095492A1 (en) * | 2022-09-21 | 2024-03-21 | Qualcomm Incorporated | Memory management for mathematical operations in computing systems with heterogeneous memory architectures |
FR3142815A1 (fr) * | 2022-12-05 | 2024-06-07 | Safran Electronics & Defense | Structure de parallélisation de convolution à bas besoin de bande passante |
WO2024144950A1 (en) * | 2022-12-30 | 2024-07-04 | Google Llc | Multi-modal systolic array for matrix multiplication |
US12007937B1 (en) | 2023-11-29 | 2024-06-11 | Recogni Inc. | Multi-mode architecture for unifying matrix multiplication, 1×1 convolution and 3×3 convolution |
US12008069B1 (en) | 2023-11-29 | 2024-06-11 | Recogni Inc. | Multi-mode architecture for unifying matrix multiplication, 1×1 convolution and 3×3 convolution |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4720780A (en) * | 1985-09-17 | 1988-01-19 | The Johns Hopkins University | Memory-linked wavefront array processor |
US5138695A (en) * | 1989-10-10 | 1992-08-11 | Hnc, Inc. | Systolic array image processing system |
US7161995B1 (en) * | 2002-03-15 | 2007-01-09 | Xilinx, Inc. | Method and apparatus for Viterbi synchronization |
US20040122887A1 (en) | 2002-12-20 | 2004-06-24 | Macy William W. | Efficient multiplication of small matrices using SIMD registers |
US8577950B2 (en) * | 2009-08-17 | 2013-11-05 | International Business Machines Corporation | Matrix multiplication operations with data pre-conditioning in a high performance computing architecture |
US8620984B2 (en) * | 2009-11-23 | 2013-12-31 | Xilinx, Inc. | Minimum mean square error processing |
CN101968732B (zh) * | 2010-10-09 | 2012-12-19 | 中国人民解放军信息工程大学 | 检错比特并行脉动阵列移位多项式基乘法器及其构造方法 |
US20120254591A1 (en) * | 2011-04-01 | 2012-10-04 | Hughes Christopher J | Systems, apparatuses, and methods for stride pattern gathering of data elements and stride pattern scattering of data elements |
US9960917B2 (en) * | 2011-12-22 | 2018-05-01 | Intel Corporation | Matrix multiply accumulate instruction |
US8543634B1 (en) * | 2012-03-30 | 2013-09-24 | Altera Corporation | Specialized processing block for programmable integrated circuit device |
CN103246773B (zh) * | 2013-05-13 | 2016-12-28 | 句容华正电气有限公司 | 电子式互感器中采样率转换的低延迟滤波器设计方法 |
FR3021428B1 (fr) * | 2014-05-23 | 2017-10-13 | Kalray | Multiplication de matrices de bits utilisant des registres explicites |
US10049322B2 (en) * | 2015-05-21 | 2018-08-14 | Google Llc | Prefetching weights for use in a neural network processor |
US10438117B1 (en) * | 2015-05-21 | 2019-10-08 | Google Llc | Computing convolutions using a neural network processor |
US9747546B2 (en) | 2015-05-21 | 2017-08-29 | Google Inc. | Neural network processor |
US9805303B2 (en) * | 2015-05-21 | 2017-10-31 | Google Inc. | Rotating data for neural network computations |
US10192162B2 (en) | 2015-05-21 | 2019-01-29 | Google Llc | Vector computation unit in a neural network processor |
US11244225B2 (en) * | 2015-07-10 | 2022-02-08 | Samsung Electronics Co., Ltd. | Neural network processor configurable using macro instructions |
EP3757823B1 (en) | 2017-05-17 | 2023-07-05 | Google LLC | Low latency matrix multiply unit |
US11188814B2 (en) * | 2018-04-05 | 2021-11-30 | Arm Limited | Systolic convolutional neural network |
-
2018
- 2018-05-17 EP EP20188875.7A patent/EP3757823B1/en active Active
- 2018-05-17 US US15/983,043 patent/US10635740B2/en active Active
- 2018-05-17 JP JP2019553237A patent/JP6929958B2/ja active Active
- 2018-05-17 EP EP18738023.3A patent/EP3526683B1/en active Active
- 2018-05-17 BR BR112019023395-4A patent/BR112019023395B1/pt active IP Right Grant
- 2018-05-17 KR KR1020197026961A patent/KR102302608B1/ko active IP Right Grant
- 2018-05-17 WO PCT/US2018/033270 patent/WO2018213635A1/en unknown
- 2018-05-17 TW TW110130201A patent/TWI771155B/zh active
- 2018-05-17 US US15/983,037 patent/US10698974B2/en active Active
- 2018-05-17 WO PCT/US2018/033261 patent/WO2018213628A1/en unknown
- 2018-05-17 BR BR112019022916-7A patent/BR112019022916A2/pt unknown
- 2018-05-17 EP EP20198533.0A patent/EP3800563B1/en active Active
- 2018-05-17 TW TW112131703A patent/TW202349233A/zh unknown
- 2018-05-17 CN CN202310556021.0A patent/CN116661732A/zh active Pending
- 2018-05-17 CN CN202310303331.1A patent/CN116414350A/zh active Pending
- 2018-05-17 EP EP18737049.9A patent/EP3500945B1/en active Active
- 2018-05-17 CN CN201880004576.1A patent/CN109997132B/zh active Active
- 2018-05-17 TW TW111127382A patent/TWI816475B/zh active
- 2018-05-17 TW TW109105071A patent/TW202024961A/zh unknown
- 2018-05-17 CN CN201880004328.7A patent/CN109937416B/zh active Active
- 2018-05-17 TW TW107116872A patent/TWI685757B/zh active
-
2019
- 2019-08-01 US US16/529,662 patent/US10698976B2/en active Active
-
2020
- 2020-03-26 US US16/830,894 patent/US11500961B2/en active Active
- 2020-06-29 US US16/915,286 patent/US10970362B2/en active Active
-
2021
- 2021-03-23 US US17/210,293 patent/US11599601B2/en active Active
- 2021-08-11 JP JP2021131278A patent/JP7135181B2/ja active Active
-
2022
- 2022-08-31 JP JP2022138332A patent/JP7444936B2/ja active Active
- 2022-11-10 US US17/985,069 patent/US11989259B2/en active Active
-
2023
- 2023-02-17 US US18/111,468 patent/US11907330B2/en active Active
-
2024
- 2024-02-21 JP JP2024024596A patent/JP2024063060A/ja active Pending
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6929958B2 (ja) | 低レイテンシ行列乗算ユニット | |
JP2022106737A (ja) | ハードウェアにおける行列乗算の実行 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20191017 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20191017 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20201112 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20201201 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20210215 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20210713 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20210811 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6929958 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |