TW305998B

TW305998B - Weighted sum type of artificial neural network with reconfigurable structure and bit-serial I/O mode

Info

Publication number: TW305998B
Application number: TW85107139A
Authority: TW
Inventors: Jyh-Dar Chiue; Hwai-Tzuu Jang; Yeh-Rong Shyu; Fenq-Lin Yang; Jong-Jyh Jang
Original assignee: Ind Tech Res Inst
Priority date: 1996-06-14
Filing date: 1996-06-14
Publication date: 1997-05-21

Abstract

An artificial neural network structure includes: 1) plural APU's accepting the input variable with bit-serial mode and performing multiplication of the weightings with the input variables to obtain the output variables Yi=A(WijXj)) which are output also in bit-serial mode. p2) plural switching boxes with the dynamic routing capability. The above APU's and swtching boxes are connected into a two-dimensional-array network structure.

Description

經濟部中央揉準局貝工消費合作社印製 3 C 5998 A7 B7 五、發明説明（I ) 類神經網路（Artificial Neural Network,簡稱 ANN)依照學習編碼(encoding)及架構解碼(decoding)方式可分爲：前向監督式學習網路（feedforward supervised learning network)，迴向監督式學習網路（feedback supervised learining network)，前向非監督式學習網路（feedforward unsupervised learning network)，以及迴向非監督式學習網路（feedback unsupervised learning network)四類（參考圖1)。本發明提供一種以位元序列(bit-serial)方式輸出入，並可架構重組(reconfigurable)網路拓樸(topology)之權重乘積和型類神經網路架構。除容易架構重組且方便晶片間之可擴式連接(scalable)外，並可減少晶片埠腳數(pin count)。本發明的硬體架構係針對前向式(feedforward)類神經網路演算法（例如：多層感知器（Multiple Layer Perceptron) (MLP、BP)類神經網路，及自適應線性元件網路 ADALINE、MADALINE等類神經網路），所設計而成的專屬架構。其中之運算處理單元則可有效地執行，前向式推論演算法之前向推論演算式，目前並無與本發明相同之先前技術，以位元序列輸出入方式完成之可重組串接式類神經網路。若再加上發展系統及VLSI技術支援，將可快速推出商品化的類神經網路產品系列，達成實踐類神經網路硬體化之目的。類神經網路中各神經元(neuron) i之前向推論運算式特性爲式（1)(參考圖2): -3 一本紙張尺度逋用中國國家橾準（CNS ) A4规格（210X297公釐） I I I裝訂綠 (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局貝工消费合作社印製 A7 B7 五、發明説明（Z )3 C 5998 A7 B7 Printed by the Beigong Consumer Cooperative of the Central Bureau of Economic Development of the Ministry of Economic Affairs 5. Invention Description (I) Artificial Neural Network (ANN) can be used in accordance with learning encoding and architectural decoding. Divided into: feedforward supervised learning network (feedforward supervised learning network), feedback supervised learning network (feedback supervised learining network), feedforward unsupervised learning network (feedforward unsupervised learning network), and feedback non- There are four types of feedback unsupervised learning network (refer to Figure 1). The present invention provides a weighted product and type neural network architecture that is input and output in a bit-serial manner and that can reconfigure network topology. In addition to being easy to reorganize the structure and to facilitate the scalable connection between the chips (scalable), and can reduce the chip port pin count (pin count). The hardware architecture of the present invention is directed to feedforward neural network algorithms (for example: Multiple Layer Perceptron (MLP, BP) neural networks, and adaptive linear component networks ADALINE, MADALINE Etc. neural network), designed by the exclusive structure. The arithmetic processing unit can be effectively executed. The forward inference algorithm is forward inference algorithm. At present, there is no prior technology similar to the present invention, and the reconfigurable tandem neuron is completed by bit sequence input and output. network. If we add development system and VLSI technical support, we can quickly launch a commercial neural network-like product series to achieve the purpose of practical neural network hardware. The neuron i's forward deduction formula in the neural network is characterized by the formula (1) (refer to Figure 2): -3 A paper size using the Chinese National Standard (CNS) A4 specification (210X297 mm ) III binding green (please read the precautions on the back before filling in this page) A7 B7 printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of Invention (Z)

Yi = A(-) = A(YdF(m,m (1) j 其中Xj爲神經元i之輸入變數，Wij爲神經元i至神經元j 之神經鍵權重値（synapse weight)，Yj爲對應神經兀i之輸出訊號，Α(·)則爲一活化函數（activation function)，負責轉換輸入的網路函數。活化函數Α(·)可爲系革默(Sigmoid)函數、高斯(Gaussian)函數或線性(linear)函數等。亦可爲步階(step)函數，即當k^O時，A(k)=l ;當k <0時，A(k)=-1。還可爲平方（square)函數，即當|k|> 5 時，A(k)=-1 ;當|k|$ 6時，A(k)=l ; 5爲一常數。但活化函數Α(·)並不局限於以上所列數種。而F(Wij，Xj)運算式則可以是(Wij · Xj)權重乘積，亦可以是(Wq-Xj)2差値平方 (difference-square)。而本發明之類神經網路架構，係特別針對權重乘積和網路函數¥ 來設計。我們大致可將目前之類神經網路硬體架構，分爲以下: 大類= (I)係以單處理機、數位訊號處理器、或浮點運算處理器所組成之一維或多維陣列來模擬類神經網路執行功能。由於不是專用之類神經網路硬體架構’ 故以數位晶片製作後之執行速度等級約在每秒數十萬個連結運算（connection)，如美國專利 5204938所揭露即爲此類。 —4— 本紙張尺度逋用中國國家揉準（CNS ) A4规格（2丨0x297公羡） I ! I I I I I I 訂 (請先閲讀背面之注意事項再填寫本頁) 經濟部中央樣準局員工消費合作社印裝 SC599S β7 五、發明説明（孑） (π)此種型式類神經網路爲神經鍵權重値與輸入値交錯形成二維乘法器陣列型式所組成，一般皆以電流加成模式類比技術來製作成VLSI硬體架構，其執行速度等級約在每秒數十億個連結運算，然其缺點在於需要額外的A/D或D/A轉換器以與其它數位系統交換資料信號，如美國專利5〇87826所揭露即爲此類。 (m)若第e種型式類神經網路完全由數位電路型式製作，則其硬體成本會過高，故必須儘可能對映至一精簡架構，利用管線式(pipeline)、時序區隔(time interleave)等技巧來加速類神經網路運算處理速度，其執行速度可高達每秒百萬個連結運算，如美國專利5091864所揭露即爲此類。本發明之位元序列式架構隸屬於第m類對等型式，除了揭露專用之位元序列式運算處理單元(Processing Element: 簡稱PE ;其相對應於神經元功能定義)結構外，亦揭露複數個交換箱(switch box)與複數個權重乘積累加運算處理單元形成二維陣列架構。每個交換箱擁有類似市面上FPGA之動態繞線(dynamic routing)能力，可輕易地完成MLP、BP、 ADALINE、MADALINE等網路拓樸之對映(mapping)動作，容易達成架構重組目的。此外，在實際製作晶片時’由於晶片面積（die size)的限制，必須將類神經網路分割 (partition)成數個晶片製作再加以連結起來，利用本發明提供的位元序列輸出入架構，由於各晶片間資料傳遞可以是以 _-5-___ 本紙張尺度適用中國國家梂準（CNS > A4规格（210X297公釐） ---------^------、訂------.# (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 A7 __B7_ 五、發明説明（分）位元序列方式傳送，可不受晶片埠腳數目（pin count)限制，方便擴充網路拓樸並可架構重組。由本發明揭露之實施例來看，由於其神經元功能係以位元序列方式連結，硬體設計簡單且節省晶片面積。另外，本發明的複數個權重乘積累加運算處理單元，每個運算處理單元i，i=l,2，...，I，以位元序列輸出入模式運作。此運算處理單元以位元序列方式接收輸入變數Xj値 (j = l,2,...，N)，將Xj和儲存於此運算處理單元中的神經鍵權重値Wij，依Xj位元序列順序相乘，得到權重乘積（Wij · χρ 後將之累加。而將所有N個權重乘積累加得到一權重乘積和Yi = A (-) = A (YdF (m, m (1) j where Xj is the input variable of neuron i, Wij is the synapse weight of neuron i to neuron j, and Yj is the corresponding The output signal of the neuron i, A (·) is an activation function, which is responsible for converting the input network function. The activation function A (·) can be a Sigmoid function or a Gaussian function Or linear function, etc. It can also be a step function, that is, when k ^ O, A (k) = l; when k < 0, A (k) =-1. It can also be The square function, that is, when | k | > 5, A (k) =-1; when | k | $ 6, A (k) = l; 5 is a constant. But the activation function A (· ) Is not limited to the number listed above. The F (Wij, Xj) expression can be the product of (Wij · Xj) weights, or (Wq-Xj) 2 difference-square. Neural network architectures like the present invention are specifically designed for weight products and network functions. We can roughly divide the current neural network hardware architectures into the following: Major category = (I) is a single processor , Digital signal processor, or floating-point arithmetic processor Or a multi-dimensional array to simulate the execution function of a neural network. Because it is not a dedicated neural network hardware architecture, the execution speed of the digital chip is about hundreds of thousands of connections per second, such as the United States. This is disclosed in the patent 5204938. —4— This paper size is based on the Chinese National Standard (CNS) A4 specification (2 丨 0x297 public envy) I! IIIIII (please read the precautions on the back before filling this page) Printed SC599S β7 by the Employee Consumer Cooperative of the Central Bureau of Samples of the Ministry of Economic Affairs V. Description of the Invention (囑) (π) This type of neural network is composed of a two-dimensional multiplier array type in which the weight value of the neural key and the input value are interleaved to form All are made into VLSI hardware architecture by analog technology of current addition mode, and its execution speed level is about billions of connection operations per second, but its disadvantage is that it requires an additional A / D or D / A converter to communicate with other Digital systems exchange data signals, as disclosed in US Patent No. 5087826. (m) If the e-type neural network is entirely made of digital circuit type, the hardware cost will be over Therefore, it is necessary to map to a streamlined structure as much as possible, using pipeline, time interval and other techniques to accelerate the processing speed of neural network-like operations, which can perform up to millions of connection operations per second , As disclosed in US Patent 5091864. The bit sequence architecture of the present invention belongs to the m-type peer-to-peer type. In addition to revealing the structure of the dedicated bit sequence arithmetic processing unit (Processing Element: PE for short; which corresponds to the definition of the function of the neuron), it also discloses the plural A switch box and a plurality of weight multiply-accumulate and add processing units form a two-dimensional array architecture. Each switch box has dynamic routing capabilities similar to FPGAs on the market, which can easily complete the mapping of network topologies such as MLP, BP, ADALINE, MADALINE, etc., and it is easy to achieve the purpose of structural reorganization. In addition, due to the limitation of the die size, it is necessary to divide the neural network into several chips and then connect them. The bit sequence input and output architecture provided by the present invention is used. The data transfer between each chip can be based on _-5 -___ This paper standard is applicable to the Chinese National Standard (CNS > A4 specification (210X297mm) --------- ^ ------, order ------. # (Please read the precautions on the back before filling in this page) A7 __B7_ printed by the Employee Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs V. Description of Invention (Minute) Bit-serial transmission, not subject to the chip port The pin count is limited, which is convenient for expanding the network topology and reorganizing the structure. According to the embodiments disclosed in the present invention, since the neuron functions are connected in a bit sequence manner, the hardware design is simple and the chip area is saved. In addition, the plurality of weight multiply-accumulate and add operation processing units of the present invention, each operation processing unit i, i = 1, 2, ..., I, operates in a bit sequence input / output mode. This operation processing unit operates in a bit sequence The way to receive input changes Xj value (j = l, 2, ..., N), multiply Xj and the neural key weight value Wij stored in this operation processing unit in the order of Xj bit sequence to obtain the weight product (after Wij · χρ Accumulate them, and multiply and accumulate all N weights to get a weight product sum

N ，此權重乘積累加値再透過儲存於運算處理單元中的活化函數Α( ·)轉換後，得到一 Yi値，N, this weight is multiplied by the accumulation plus value and then converted by the activation function A (·) stored in the arithmetic processing unit to obtain a Yi value,

N 。最後，以位元序列方式輸出。圖式之簡單說明圖1爲類神經網路模型分類。圓2是人工類神經元運算式特性示意圖。圖3爲本發明之可架構重組的位元序列式權重乘積和型類神經網路晶片架構示意圖。圖4爲本發明之位元序列式權重乘積累加運算處理單元之示意圓。圖5(a)、（b)爲本發明之網路拓樸實施例示意圖，未連接之繞線部份省略未畫出。 -6- 本紙張尺度適用中國國家標準（CNS ) Α4規格（210X297公釐） ---------装------ΐτ------.# (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消费合作社印裝 A7 B7 五、發明説明（6) 發明之詳細敘述一般在類神經網路晶片上執行的功能有下列數項： (I)乘積累加（multiplication and accumulation)(簡稱 MAC)功能，或差値平方累加（difference-square and accumulation)(簡稱 DSA)功能。 (Π )神經元活化函數（activation function of neurons)功能，例如系革默（Sigmoid)函數或高斯（Gaussian)函數。 (瓜）神經鍵權重之儲存(storage of synapse weights)功能，可以類比方式或以數位方式來儲存。 (IV)權重調整方法(mechanism for adjusting weights)，假如此類神經網路晶片有學習或調適能力。其中（I)與（Π)這兩項，牽涉到數學運算或訊號處理的部份若以數位電路執行之，則與普通數位訊號處理電路並無太太差異。但是，因爲類神經網路乃是一巨量平行(massively parallel)運算，需要非常大(類神經網路)的運算量，所以，若以數位方式來運算，恐在速度上會比類比方式慢些。然而，類比式類神經網路晶片在製程過程中，由於受溫度之影響參數控制不易，故晶片品質不易穩定，較難商品化，故一般則僅止於實驗室發表。對於一泛用型類神經網路積體電路而言，尙有一重要課題一架構。由於一個晶片可能執行的類神經網路拓樸模式甚多，如何快速的重組(reconfigure)硬體架構以有效執行不 _7 - 本紙張尺度適用中國國家梯準（CNS ) A4規格（210 X297公釐） I I 裝訂絲 (請先閱讀背面之注意事項再填寫本買) SQ5298 377 五、發明説明（公）同的類神經網路拓樸模式，乃是一泛用型類神經網路晶片必須考慮的問題。而本發明目的之一則係針對此課題一架構重組來設計。傳統數位式類神經網路晶片大部份皆是以字元爲輸入單位，以位元平行形式操作，複雜度較高且佔較大晶片面積。例如，假設N爲輸入層輸入變數個數，而晶片上共有I個神經元，若輸入變數爲Xj(j=l，2，...，N)，係以寬度爲r個位元的字元單位廣播（broadcast)輸入，而神經鍵權重値 Wij(i=l，2，...，I)，係以寬度爲b個位元的字元單位平行輸經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 入(共I埠），則同時可進行I個神經元之運算。因X係以廣播方式輸入，故需正比於N個時脈，才可完成I個神經元之網路函數運算。此類數位式類神經網路晶片，皆是以字元形式組成輸入埠與輸出埠，故匯流排寬度較大。若欲重新配置則須以字元多工(word mux)來建構訊號，所以電路複雜度亦較高。且若某神經元輸出欲接至另一神經元輸入，在考慮串連式交換機(crossbar switch)，且每一字元r個位元時，則一共需要：〔輸出層神經元數I〕乘上〔輸入層神經元數 N〕乘上〔位元數r〕的交換箱數目，因此當神經網路上的神經元數目增多時，交換箱數目將變得很龐大。再者，在數位式類神經網路晶片中，輸入變數xj的輸出入頻寬遠小於神經鍵權重値Wij之輸出入頻寬，原因是在式 (2)權重乘積累加之計算中，每一個Xj皆被重覆用了1次。 Σ(W'J * XJ) i=l,2,....,I (2) ____-8-__ 本紙張尺度適用中國國家標準（CNS ) A4規格（21 OX297公嫠）經濟部中央標準局貝工消费合作社印掣 5059as A7 _B7 五、發明説明（1 ) 然而，因本發明之運算處理單元是以位元序列方式傳送輸出入變數，如此可大幅降低神經元之輸出入頻寬，且可令交換箱數目減少。若一類神經網路共有N個輸入變數値，且共須計算I個神經元之權重乘積和，茲以N=20，1 = 80，8位元解析度爲例，舉例說明如下：如由於類神經網路晶片只能有1〇個神經元（即相當於1 〇個mac功能），一次輸入一輸入變數 Xj(j=l，2,…，N)，則晶片一次可於20個時脈(clocks)後，完成10個神經元之權重乘積和(weighted sum)運算處理。再過7X20個時脈後，則可完成其他70個神經元之權重乘積和運算，然如此一來此20個Xj被重覆輸入了8次。但若以位元序列（bit-sedal)方式輸入Xj，平行輸入神經鍵權重値 Wij(假設內建於晶片內），則同樣面積大小晶片約可裝置80 個序列平行乘積累加器（serial-parallel MAC)(SPMAC)。故此類神經網路晶片可同時執行8〇個神經元之運算，如此所有權重乘積和亦可於8X 20=160個時脈內完成，所需時脈數與原使用位元平行(即以字元爲單位)輸入方式，組成的1〇個神經元之狀況時相同。圖3揭示本發明之位元序列可重組架構式權重乘積和類神經網路晶片（1〇〇)示意圖，本架構是以複數個交換箱（1〇) 及複數個位元序列式權重乘積累加運算處理單元(20)，以二維陣列序列型式規律排列組合而成。同時，吾人可將每個交換箱s，s=l,2,...,S，程式成或受控制融接(fusing)成各種拓樸連通方式，以完成可能執行的類神經網路模式。例如，晶 ____________ -9 - 本紙張尺度適用中國國家棣準（CNS ) A4規格（210X297公釐） ---------装------订------.# (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 A7 B7 _ __ 五、發明説明（令）片間資料流之遞送走向可以位元序列方式串聯’因此具備類似FPGA之可架構重組能力，容易重組串接且擴充性極高。如圖4之示意圖所示，本發明之類神經網路上的位元序列式權重乘積累加運算處理單元(2〇)，其所屬功能定義相當於神經網路神經元之功能定義。此運算處理單元(2〇)(即神經元i)，包括了一個序列輸入平行輸出之乘積累加器 (22)，簡稱SPMAC，主要係利用相加(add)及移位(shift)方N. Finally, it is output as a bit sequence. Brief description of the diagrams Figure 1 shows the classification of neural network-like models. Circle 2 is a schematic diagram of the characteristics of artificial neuron arithmetic expressions. FIG. 3 is a schematic diagram of an architecture reconfigurable bit sequence type weight product and type neural network chip architecture. Fig. 4 is a schematic circle of the bit sequence weight multiply-accumulate-add processing unit of the present invention. 5 (a) and (b) are schematic diagrams of embodiments of the network topology of the present invention, and the unconnected winding part is omitted and not shown. -6- This paper scale is applicable to China National Standard (CNS) Α4 specification (210X297mm) --------- installed ------ lτ ------. # (Please read the back first Please pay attention to this page and fill out this page) A7 B7 printed by the Staff Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of the invention (6) Detailed description of the invention The functions generally performed on neural network-like chips are as follows: (I) Multiplication and accumulation (MAC for short) function, or difference-square and accumulation (DSA for short) function. (Π) The activation function of neurons, such as the Sigmoid function or Gaussian function. (Melon) The storage of synapse weights function can be stored by analogy or digitally. (IV) Weight adjustment method (mechanism for adjusting weights), assuming that such neural network chip has the ability to learn or adjust. Among them, if the two parts (I) and (Π), which involve mathematical operations or signal processing, are implemented by digital circuits, there is no difference from ordinary digital signal processing circuits. However, because the neural-like network is a massively parallel operation and requires a very large (neural-like network) calculation amount, if the operation is performed digitally, the speed may be slower than the analogy method. some. However, the analog neural network chip is not easy to be controlled by the influence of temperature during the manufacturing process, so the chip quality is not stable and it is difficult to commercialize, so it is generally only published in the laboratory. For a general-purpose neural network integrated circuit, there is an important subject-architecture. Since there are many neural network-like topology modes that can be executed by a chip, how to quickly reconfigure the hardware architecture to effectively implement the _7-This paper standard applies to China National Standards (CNS) A4 specification (210 X297 Ii) II binding wire (please read the precautions on the back before filling in the purchase) SQ5298 377 V. Invention description (public) The same neural network topology model is a general-purpose neural network chip that must be considered The problem. One of the objectives of the present invention is to design a framework reorganization for this subject. Most of the traditional digital neural network chips take characters as input units and operate in the form of bit parallel, which has a higher complexity and occupies a larger chip area. For example, suppose N is the number of input variables in the input layer, and there are a total of 1 neuron on the chip. If the input variable is Xj (j = l, 2, ..., N), it is a word with a width of r bits Meta unit broadcast (broadcast) input, and the neural key weight value Wij (i = l, 2, ..., I), is a character unit with a width of b bits parallel to the Ministry of Economic Affairs Central Standards Bureau employee consumer cooperative Print (please read the precautions on the back before filling in this page) into (a total of I port), you can perform the calculation of one neuron at the same time. Since X is input by broadcasting, it needs to be proportional to N clocks to complete the network function calculation of one neuron. These digital neural network chips are composed of input ports and output ports in the form of characters, so the width of the bus is relatively large. If you want to reconfigure, you must use word mux to construct the signal, so the circuit complexity is also higher. And if the output of a neuron is to be connected to the input of another neuron, when considering a crossbar switch and r bits per character, a total of: [the number of neurons in the output layer I] multiplied The number of [input layer neurons N] multiplied by [number of bits r] exchange box, so when the number of neurons on the neural network increases, the number of exchange boxes will become very large. Furthermore, in the digital neural network chip, the input and output bandwidth of the input variable xj is much smaller than the input and output bandwidth of the neural key weight value Wij, because in the calculation of the weight multiplied by the formula (2), each Xj has been reused once. Σ (W'J * XJ) i = l, 2, ...., I (2) ____- 8 -__ This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (21 OX297 public daughter) Central Standard of the Ministry of Economy Bureau of Industry and Commerce Consumer Cooperative Printed 5059as A7 _B7 V. Description of the invention (1) However, because the arithmetic processing unit of the present invention transmits and outputs variables in bit sequence mode, the input and output bandwidth of the neuron can be greatly reduced, and Can reduce the number of exchange boxes. If a type of neural network has N input variable values, and the weight product sum of I neurons must be calculated, here we take N = 20, 1 = 80, and 8-bit resolution as an example. The examples are as follows: The neural network chip can only have 10 neurons (that is, equivalent to 10 mac functions), and input one input variable Xj (j = l, 2, ..., N) at a time, then the chip can be at 20 clocks at a time (clocks), complete the weighted sum operation of 10 neurons. After another 7X20 clocks, the weight product and operation of the other 70 neurons can be completed, so that the 20 Xj are repeatedly input 8 times. However, if Xj is input in a bit-sedal mode and the neural key weight value Wij is input in parallel (assumed to be built in the chip), then a chip of the same area and size can be equipped with about 80 serial parallel-accumulators (serial-parallel) MAC) (SPMAC). Therefore, this type of neural network chip can perform the operation of 80 neurons at the same time, so the weighted product sum can also be completed within 8X 20 = 160 clocks, and the required clock number is parallel to the original used bit (that is, the word The unit is the input mode. The condition of the 10 neurons is the same. FIG. 3 shows a schematic diagram of the bit sequence reconfigurable architecture weight product and neural network chip (100) of the present invention. The architecture is based on a plurality of exchange boxes (10) and a plurality of bit sequence weight multiplication accumulation accumulation The arithmetic processing unit (20) is formed by regularly arranging and combining two-dimensional array sequences. At the same time, we can program or exchange fusing each exchange box s, s = l, 2, ..., S into various topological connection methods to complete the possible execution of the neural network-like model . For example, crystal ____________ -9-This paper scale is applicable to China National Standards (CNS) A4 specification (210X297mm) --------- installed ------ ordered ------. # (Please read the precautions on the back before filling in this page) A7 B7 _____ printed by the employee consumer cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs With FPGA-like architecture reorganization capability, it is easy to reorganize and connect in series and has high scalability. As shown in the schematic diagram of FIG. 4, the bit sequence weight multiply-accumulate-add processing unit (20) on a neural network like the present invention has a functional definition equivalent to that of a neural network neuron. This arithmetic processing unit (2〇) (i.e., neuron i) includes a sequence input parallel output multiply-accumulate adder (22), abbreviated as SPMAC, which mainly uses add and shift methods.

N 式，以位元序列方式逐步完成權重乘積累加運算。SPMAC(22)的輸出値經由活化函數單元（24)轉換後，平行輸出至移位暫存器(26)。最後，移位暫存器(26)以位元序列模式，輸出至下一連結之運算處理單元，或類神經網路晶片之位元序列輸出埠腳。由於是本發明以位元序列輸出入模式執行方程式（1)，故乘積累加功能必須以序列輸入平行輸出方式來設計。乘積累加器(22)之組成亦同時示意於圖4。運算處理單元(20)(神經元i)包含有：一序列輸入平行輸出之乘積累加器In the N-type, the weight multiplying and accumulating operation is gradually completed in the form of bit sequence. After the output value of the SPMAC (22) is converted by the activation function unit (24), it is output in parallel to the shift register (26). Finally, the shift register (26) is output in the bit sequence mode to the next connected arithmetic processing unit or the bit sequence output port of the neural network chip. Since the present invention executes equation (1) in the bit sequence input / output mode, the multiplying, accumulating and adding functions must be designed in a sequence input parallel output mode. The composition of the product accumulator (22) is also shown in Figure 4. The arithmetic processing unit (20) (the neuron i) includes: a sequence of input and parallel output multiply-accumulate adders

N (SPMAC)(22)，負責權重乘積累加運算，此乘積累加器以位元序列方式(201)接收輸入變數Xj値後，係以位元序列方式接收輸入變數Xj (j=l，2，...，N)値，將X』値和儲存於乘積累加器中的神經鍵權重値Wij，依Xj之位元序列順序相乘’算完權重乘積Wij · Xj。然後，將此權重乘積累加，累加N個後，以平行方式(202)輸出此權重乘積累加値。一平行輸出入之活化函數單元(24)，負責將輸入之權重乘積累加 _____-10- 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閱讀背面之注意事項再填寫本頁) -裝- 、π 經濟部中央標準局員工消費合作杜印裝 A7 _____B7 五、發明説明（ί) 値轉換成Yi値輸出(2〇3) ’及一移位暫存器(26)，將平行輸入之Yi値(2〇3) ’以位元序列方式輸出(204)。類神經網路上神經元i (運算處理單元）內的乘積累加器 (22)，係以序列輸入平行輸出方式運作。其含有複數個相對於神經鍵權重値之各個位元數的且閘（and gate)(223)，各個且聞在每一時脈將依序輸入的含r個位元之輸入變數 Xj(j = l，2，...，N)的一位元値（201)，與另一輸入之儲存在權重値記憶體（221)中的含b個位元之神經鍵權重値 \\^(卜1，2，...，1，j=l，2，.._，N)之每個位元値（222)，同時作且運算（and)後合倂輸出（224)。此乘積累加器（22)內的加法器(225)，則依時脈順序將此輸入値(224)和回饋自移位器 (227)之輸入値(228)(初値爲〇)相加後，再輸出（226)給該移位器(227)。此移位器(227)在計算權重乘積時，每一時脈依序向右移1位元再回饋(228)給加法器(225)。當算完該權重乘積後欲累加前，則先向左移回（r-Ι)位元，再完成累加動作。依此時脈週期順序將N個輸入變數Xj値和神經鍵權重値Wij，依Χ』之位元順序序列相乘得到權重乘積値，再將其値累加。然後，將得到之最後的權重乘積累加値N (SPMAC) (22), responsible for weight multiply-accumulate and add operation, this multiply-accumulate adder receives the input variable Xj value in the bit sequence mode (201), and then receives the input variable Xj in the bit sequence mode (j = l, 2, ..., N) value, multiply X 'value and neural key weight value Wij stored in the multiply-accumulate adder, multiply in the order of the bit sequence of Xj, and calculate the weight product Wij · Xj. Then, the weights are multiplied and accumulated, and after accumulating N, the weights are multiplied and accumulated in parallel (202). A parallel input and output activation function unit (24), responsible for multiplying and accumulating the weight of the input _____- 10- This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the notes on the back first (Fill in this page) -installation-, π, Ministry of Economic Affairs, Central Bureau of Standards, Employee Consumption Cooperation Du Printed A7 _____B7 V. Description of Invention (ί) Converted into Yi output (2〇3) and a shift register (26 ), Yi value (2〇3) 'input in parallel is output in a bit sequence manner (204). The multiply-accumulate adder (22) in neuron i (operation processing unit) on neural-like network is operated by serial input and parallel output. It contains a plurality of AND gates (223) with respect to the number of bits of the neural key weight value, and each input will contain r bits of input variables Xj (j = l, 2, ..., N) one-bit value (201), and another input of a b-bit neuron key weight value stored in the weight value memory (221) \\ ^ (卜1, 2, ..., 1, j = l, 2, .._, N) for each bit value (222), and simultaneously perform (and) and output (224). The adder (225) in the multiply-accumulate adder (22) adds the input value (224) and the input value (228) (the initial value is 0) fed back from the shifter (227) according to the clock sequence , And then output (226) to the shifter (227). When the shifter (227) calculates the weight product, each clock is sequentially shifted to the right by 1 bit and then fed back (228) to the adder (225). When the weight product is calculated and before it is to be accumulated, it is moved back to the left (r-1) bits, and then the accumulation is completed. The N input variables Xj value and neural key weight value Wij are multiplied according to the sequence of the pulse cycle in order of the bit sequence to obtain the weight product value, and then the value is accumulated. Then, multiply the accumulated weight by the final weight

N y (Wu · χλ 幺，以平行方式輸出(202)。圓5(a)、（b)之拓樸對映（topology mapping)實施例中，類神經網路各層依序排列(或對映)至類神經網路晶片，各層對映後自成一群(group)，各群再利用交換箱連結位元序列資料流流向。圖5(b)中只畫出相連之繞線部份，爲方便說明起見，未使用到的運算處理單元、交換箱則省略未畫 ____ _11-__—__ 本紙張尺度適用中國國家榡率（CNS ) A4規格（210X297公釐） (請先閱讀背而之注意事項再填艿本頁) -裝. '-° 經濟部中央標準局貝工消費合作社印装 ^05998 A7 B7 五、發明説明（t〇 ) 出。在此實施例中晶片間資料流之遞送走向，係以位元序列方式串接。若一類神經網路晶片內神經元(即運算處理單元)不敷使用時，則須用分割之對映法則做分割處理。類神經網路拓樸分割（topology partition)方式，本於物理條件上的限制，一類神經網路晶片所能製做的神經元(或稱運算處理單元)數目有一限額，若實際類神經網路拓樸數目大於類神經網路晶片神經元數目時，則必須做分割處理。例如圖5(a)之類神經網路拓樸，輸入層(input layer)有N個輸入（30)，第一隱藏層(hidden layer)有Η個神經元（40)，第二隱藏層有L 個神經元（5〇)，而輸出層（〇utPut layer)有κ個神經元 (60)。今若類神經網路晶片內只可對映(H+L)個神經元，且 K < (H + L)。則必須分割成爲，第一顆類神經網路積體電路放入(H + L)個神經元，包含類神經網路拓模之第一和第二隱藏層部份，輸出層之K個神經元。則必須放入第二顆類神經網路晶片中.，然而各個類神經網路晶片間，還是以位元序列方式連接之。容易重組串接且擴充性極高，硬體設計簡單且節省晶片面積。最後，以上實施例之討論僅供說明之用’對熟悉本項技藝之一般技術者而言，可以不違反下列專利請求項之範圍及精神下，得到其他的實施例。 -12- 本紙張尺度適用中國國家橾準（CNS ) Α4規格（210 X297公釐） (請先閱讀背面之注意事項再填寫本頁) -β ΓN y (Wu · χλ unitary, output (202) in parallel. In the topology mapping embodiment of circle 5 (a), (b), the layers of the neural network are arranged in sequence (or map ) To a neural network-like chip, each layer is mapped into a group, and each group uses the exchange box to connect the bit sequence data flow direction. In FIG. 5 (b), only the connected winding parts are drawn as For the convenience of explanation, unused arithmetic processing units and exchange boxes are omitted. ____ _11 -__—__ This paper scale is applicable to the Chinese national rate (CNS) A4 specification (210X297mm) (please read the back first Please pay attention to this page) -installation. '-° Printed by Beigong Consumer Cooperative of Central Bureau of Standards of the Ministry of Economic Affairs ^ 05998 A7 B7 Fifth, the invention description (t〇) is published. In this embodiment, the data flow between chips The delivery direction is concatenated in bit sequence. If the neurons (ie, the arithmetic processing unit) in a type of neural network chip are not enough, you must use the split mapping method for segmentation. Neural network topology The partition (topology partition) method is based on the limitation of physical conditions, a type of neural network There is a limit to the number of neurons (or arithmetic processing units) that can be manufactured by the chip. If the actual number of neural network topology is greater than the number of neural network chip neurons, segmentation must be done. For example, Figure 5 (a ) And other neural network topology, the input layer (input layer) has N inputs (30), the first hidden layer (H) neurons (40), the second hidden layer has L neurons ( 5〇), and the output layer (〇utPut layer) has κ neurons (60). Today if the neural network chip can only map (H + L) neurons, and K < (H + L ). It must be divided into the first neural network integrated circuit into (H + L) neurons, including the first and second hidden layer parts of the neural network topology, and the output layer K Neurons. They must be put into the second neural network chip. However, the various neural network chips are still connected in bit sequence. It is easy to reorganize and connect in series and has extremely high scalability. The hardware design It is simple and saves the chip area. Finally, the discussion of the above embodiments is for illustrative purposes only. In other words, other embodiments can be obtained without violating the scope and spirit of the following patent claims. -12- This paper size is applicable to China National Standard (CNS) Α4 specification (210 X297 mm) (please read the back (Please fill out this page again) -β Γ

Claims

A8 B8 C8 D8 々、申請專利範圍 1. 一種類神經網路架構，包含：複數個運算處理單元，每個運算處理單元i (i=l，2，...,I)，係以位元序列方式接收輸入變數Xj (j = l,2,...,N)値，將Xj和儲存於該運算處理單元中的神經鍵權重値Wij，依Xj位元序列順序相乘，算完權重乘積Wij ·Χ』，然後將該權重乘積累加，累加N個後得到一 N 權重乘積和，此權重乘積累加値再透過儲存於該運算處理單元中的活化函數Α(·)轉換成Yi，以位元序列方式輸出，其中Γί= 心·Χ;));及複數個具動態繞線能力之交換箱；並且上述之複數個運算處理單元係和上述之複數個交換箱以二維陣列排列組合成類神經網路。 2. 如申請專利範圍第1項所述之類神經網路架構，其中之複數個交換箱可被程式成各種連通方式，以架構重組不同模式的類神經網路拓樸。經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 3. 如申請專利範圍第1項所述之類神經網路架構，其中之複數個交換箱可受控制融接成各種連通方式，以架構重組不同模式的類神經網路拓樸。 4. 如申請專利範圍第1，2，或3項所述之類神經網路架構，其中之複數個運算處理單元，各個該運算處理單元包含：一序列輸入平行輸出之乘積累加器（SPMAC)，負責以位元序列方式接收前述的輸入變數Xj値，將Xj和儲存 -13— 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐）經濟部中央標準局員工消費合作社印製 A8 Β8 C8 __ D8 六、申請專利範圍於該乘積累加器中的神經鍵權重値Wij，依Xj位元序列順序相乘，經過前述的權重乘積累加運算後，以平行方式輸出該權重乘積累加値；一平行輸出入之活化函數單元，儲存前述的活化函數 Α(·)，該活化函數Α(·)負責將平行輸入之該權重乘積累加値，轉換成前述的Yi値平行輸出；及一移位暫存器，將平行輸入之該Yi値以位元序列方式輸出。 5.如申請專利範圍第4項所述之類神經網路架構，其中之乘積累加器包含：一記憶體用以儲存前述的神經鍵權重値；複數個且閘（and gate)，其中且閘數目相對該神經鍵權重値的各個位元數目，各個且閘在每一時脈將依序輸入之前述的變數Xj(j=l，2，...，N)的一位元値，與該神經鍵權重値的每個位元，作且運算後平行輸出並輸入至一加法器中；而該加法器將上述輸入値和回饋自一移位器的輸入値相加後輸出給該移位器；前述之移位器將輸入自上述加法器之値移位後回饋給該加法器；如此，依位元序列順序算完乘積値，並將該乘積値移位 N 後累加，累加N個後得到一前述的權重乘積和^#〃％値，該移位器再將此値以平行方式輸出。 -14- 本紙張尺度逋用中國國家標準（CNS > A4规格（210X297公釐） (請先閱讀背面之注意事項再填寫本頁) 裝. 訂經濟部中央標準局負工消費合作社印裝 A8 B8 C8 D8 π、申請專利範圍 6.如申請專利範圍第5項所述之類神經網路架構，其中之移位器在作前述的權重乘積累加運算時，每一時脈依序向右移1位元，當算完前述的權重乘積Wij . Xj後欲累加前，則先向左移回前述的輸入變數Xj之位元數目減1的位元數’ 再完成累加動作。 7·如申請專利範圍第1，2,或3項所述之類神經網路架構’ 其輸出輸入係爲位元序列方式者。 8. 如申請專利範圍第7項所述之類神經網路架構’其中之複數個運算處理單元，各個該運算處理單元包含：一序列輸入平行輸出之乘積累加器（SPMAC) ’負責以位元序列方式接收前述的輸入變數Xj値，將Xj和儲存於該乘積累加器中的神經鍵權重値Wij，依Xj位元序列順序相乘，經過前述的權重乘積累加運算後’以平行方式輸出該權重乘積累加値；一平行輸出入之活化函數單元，儲存前述的活化函數 A(.)，該活化函數Α(·)負責將平行輸入之該權重乘積累加値，轉換成前述的Yi値平行輸出；及一移位暫存器，將該平行輸入之該Yi値以位元序列方式輸出。 9. 如申請專利範圍第8項所述之類神經網路架構’其中之乘積累加器包含：一記憶體用以儲存前述的神經鍵權重値； -15- 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁) .裝· -訂經濟部中央標準局員工消費合作社印製 A8 B8 C8 D8 '申請專利範圍複數個且閘（and gate)，其中且閘數目相對該神經鍵權重値的各個位元數目，各個且閘在每一時脈將依序輸入之前述的變數Xj(j = l，2，...，N)的一位元値，與該神經鍵權重値的每個位元，作且運算後輸出並輸入至一加法器中；而該加法器將上述輸入値和回饋自一移位器的輸入値相加後輸出給該移位器；前述之移位器將輸入自上述加法器之値移位後回饋給該加法器；如此，依位元序列順序算完乘積値，並將該乘積値移位 N 後累加，累加N個後得到一前述的權重乘積値，該移位器再將此値以平行方式輸出。 10. 如申請專利範圍第9項所述之類神經網路架構，其中之移位器在作前述的權重乘積累加運算時，每一時脈依序向右移1位元，當算完前述的權重乘積WyXj後欲累加前，則先向左移回前述的輸入變數Xj之位元數目減1的位元數，再完成累加動作。 11. 如申請專利範圍第1項所述之類神經網路架構，其中之活化函數Α(·)，係可以爲系革默（Sigmoid)函數、高斯 (Gaussian)函數、線性(linear)函數、步階(Step)函數或平方（Square)函數等。 -16- (請先聞讀背面之注意事項再填寫本頁) 裝. 訂本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） 3C5398 A8 Βδ C8 D8 六、申請專利範圍 12. 如申請專利範圍第1項所述之類神經網路架構，其中之輸入變數Xj之位元數目可和神經鍵權重値Wij之位元數目不同。 13. 如申請專利範圍第1項所述之類神經網路架構，其中之輸入變數Xj之位元數目可和神經鍵權重値Wij之位元數目相同。 (請先閱讀背面之注意事項再填寫本頁) 經濟.那中央標準局員工消費合作社印製 -17- 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐）A8 B8 C8 D8 々. Patent scope 1. A neural network-like architecture, including: a plurality of arithmetic processing units, each arithmetic processing unit i (i = l, 2, ..., I), in bits Sequentially receive the input variable Xj (j = l, 2, ..., N), multiply Xj and the neural key weight value Wij stored in the operation processing unit according to the sequence of Xj bit sequence, and calculate the weight The product Wij · Χ ', then multiply and accumulate the weights, accumulate N to obtain an N weight product sum, the weight multiplying and accumulating value is converted into Yi through the activation function A (·) stored in the arithmetic processing unit, to Bit sequence output, where Γί = 心 · Χ;)); and a plurality of exchange boxes with dynamic winding capability; and the above-mentioned plural arithmetic processing units and the above-mentioned plural exchange boxes are arranged and combined in a two-dimensional array Neural network. 2. The neural network architecture as described in item 1 of the scope of patent application, in which a plurality of exchange boxes can be programmed into various connection methods to restructure different models of neural network topology. Printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs (please read the precautions on the back before filling in this page) 3. For neural network architectures such as those described in item 1 of the scope of patent application, multiple exchange boxes can be controlled It is fused into various connection methods, and the neural network topology of different modes is reorganized by the architecture. 4. A neural network architecture as described in items 1, 2, or 3 of the patent application scope, in which a plurality of arithmetic processing units, each of which includes: a sequence of input and parallel output multiply-accumulate adders (SPMAC) ， Responsible for receiving the aforementioned input variable Xj value in bit sequence, and store Xj and -13 — This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) A8 printed by the employee consumer cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Β8 C8 __ D8 VI. The patent application covers the neural key weight value Wij in the multiply-accumulate adder, which is multiplied in the order of Xj bit sequence. After the aforementioned weight multiply-accumulate and add operation, the weight multiply-accumulate add value is output in parallel; A parallel input and output activation function unit stores the aforementioned activation function Α (·), which is responsible for multiplying and accumulating the weight of the parallel input and converting it into the aforementioned Yi value parallel output; and a shift The temporary register outputs the Yi value input in parallel as a bit sequence. 5. A neural network architecture as described in item 4 of the patent application scope, wherein the multiply-accumulate adder includes: a memory for storing the aforementioned neural key weight values; a plurality of and gates, where the gates The number is relative to the number of bits of the weight value of the neural key, and each bit will be input in sequence for each bit of the aforementioned variable Xj (j = l, 2, ..., N) at each clock, and the Each bit of the neural key weight value is calculated and output in parallel and input to an adder; and the adder adds the above input value and the input value fed back from a shifter and outputs it to the shift The aforementioned shifter shifts the input value from the above adder and feeds it back to the adder; in this way, the product value is calculated in the order of the bit sequence, and the product value is shifted by N and then accumulated, accumulating N Then, the aforementioned weight product sum ^ # 〃% value is obtained, and the shifter outputs the value in parallel. -14- The size of this paper is in accordance with Chinese National Standard (CNS> A4 size (210X297mm) (please read the precautions on the back and then fill out this page) for installation. Order A8 printed by the Ministry of Economic Affairs Central Standardization Bureau Cooperative Consumer Cooperative B8 C8 D8 π, patent application range 6. Neural network architecture as described in item 5 of the patent application range, in which the shifter shifts to the right by 1 clockwise in sequence when performing the aforementioned weight multiply-accumulate addition operation Bits, after calculating the aforementioned weight product Wij.Xj and then wanting to accumulate, first move back to the number of bits of the input variable Xj minus 1 and then complete the accumulating action. 7 · If applying for a patent Neural network architectures such as those described in items 1, 2, or 3 of the scope 'whose output and input are bit-sequence methods. 8. As mentioned in the patent application, the neural network architectures described in item 7' An arithmetic processing unit, each of which includes: a sequence of input and parallel output multiply-accumulate adders (SPMAC) 'responsible to receive the aforementioned input variable Xj value in a bit-sequence manner, store Xj and the nerve The key weight value Wij is multiplied in the order of the Xj bit sequence, and after the aforementioned weight multiplying, accumulating and adding operation, the weight multiplying and accumulating value is output in parallel; a parallel input and output activation function unit stores the activation function A ), The activation function Α (·) is responsible for multiplying and accumulating the weight of the parallel input and converting it into the aforementioned Yi parallel output; and a shift register, which uses the Yi value of the parallel input as a bit sequence Output 9. The neural network architecture as described in item 8 of the patent application scope, where the multiply-accumulate adder includes: a memory for storing the aforementioned weight values of the neural keys; -15- This paper scale is applicable to Chinese national standards (CNS) A4 specification (210X297mm) (please read the precautions on the back before filling in this page). Install ·-Order A8 B8 C8 D8 printed by the Employees Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs. (And gate), where the number of gates is relative to the number of bits of the weight value of the neural key, and each gate will input the aforementioned variable Xj (j = 1, 2, ... N) one bit value, and each bit of the neural key weight value, after the operation and output and input to an adder; and the adder feeds back the above input value and the input of a shifter The value is added and output to the shifter; the aforementioned shifter shifts the value input from the above adder and feeds it back to the adder; in this way, the product value is calculated in the order of the bit sequence, and the product value is calculated Accumulate after shifting N, and accumulate N to obtain the aforementioned weight product value, and the shifter outputs this value in parallel. 10. Neural network architecture as described in item 9 of the patent application scope, where When the shifter performs the aforementioned weight multiplying and accumulating operation, each clock is shifted to the right by 1 bit in sequence. When the weighting product WyXj is calculated and before it is accumulated, it is shifted back to the left by the aforementioned input variable Xj. The number of bits minus 1 is the number of bits, and then the accumulation is completed. 11. The neural network architecture as described in item 1 of the patent application scope, in which the activation function A (·) can be a Sigmoid function, a Gaussian function, a linear function, Step function or square function, etc. -16- (please read the precautions on the back before filling in this page). The size of the paper is applicable to the Chinese National Standard (CNS) A4 specification (210X297mm) 3C5398 A8 Βδ C8 D8 6. Patent scope 12. Such as For a neural network architecture as described in item 1 of the patent application scope, the number of bits of the input variable Xj may be different from the number of bits of the neural key weight value Wij. 13. A neural network architecture as described in item 1 of the patent application, where the number of bits of the input variable Xj may be the same as the number of bits of the neural key weight value Wij. (Please read the precautions on the back before filling in this page) Economy. That is printed by the Staff Consumer Cooperative of the Central Bureau of Standards -17- This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297mm)