RU2011221C1

RU2011221C1 - Device for multiplying matrixes

Info

Publication number: RU2011221C1
Authority: RU
Inventors: В.П. Якуш; Н.А. Лиходед; П.И. Соболевский; В.В. Косьянчук
Original assignee: Якуш Виктор Павлович
Priority date: 1991-07-03
Filing date: 1991-07-03
Publication date: 1994-04-15

Abstract

FIELD: computer technology. SUBSTANCE: device has n computational modules, where n is order of matrixes to be multiplied. Each computational module has two groups of registers, four registers, multiplier, adder, two flip-flops, two groups of AND gates, group of OR gates, AND gate and NOT gate. EFFECT: reduced equipment expenses; improved speed of operation. 3 dwg, 1 tbl

Description

Изобретение относится к вычислительной технике и может быть использовано в высокопроизводительных специализированных вычислительных машинах и устройствах обработки сигналов для умножения матриц. The invention relates to computer technology and can be used in high-performance specialized computers and signal processing devices for matrix multiplication.

На фиг. 1 представлена структурная схема устройства для умножения матриц; на фиг. 2 - структурная схема устройства для умножения матриц для n = 4; на фиг. 3 - функциональная схема вычислительного модуля. In FIG. 1 is a structural diagram of a device for matrix multiplication; in FIG. 2 is a block diagram of a device for matrix multiplication for n = 4; in FIG. 3 is a functional diagram of a computing module.

Устройство для умножения матриц содержит первый 1, второй 2 и третий 3 информационные входы, вход логического нуля 4, первый 5 и второй 6 входы задания режима, синхровход 7, вычислительные модули 8i (i = 1, n) и выход 9. The device for matrix multiplication contains the first 1, second 2, and third 3 information inputs, a logic zero input 4, the first 5 and second 6 mode input inputs, clock input 7, computational modules 8i (i = 1, n) and output 9.

Вычислительный модуль 8 (фиг. 3) содержит первый 10, второй 11 и третий 12 информационные входы, первый 13 и второй 14 входы задания режима, синхровход 15, умножитель 16, сумматор 17, первую группу 18 регистров 8, вторую группу регистров 19, первый 20, второй 21, третий 22 и четвертый 23 регистры, первый 24 и второй 25 триггеры, первую 26 и вторую 27 группы элементов И, группу элементов ИЛИ 28, элемент И 29, элемент НЕ 30, первый 31, второй 32 и третий 33 информационные выходы, первый 34 и второй 35 выходы задания режима. Computing module 8 (Fig. 3) contains the first 10, second 11 and third 12 information inputs, the first 13 and second 14 mode input inputs, sync input 15, multiplier 16, adder 17, the first group of 18 registers 8, the second group of registers 19, the first 20, second 21, third 22 and fourth 23 registers, first 24 and second 25 triggers, first 26 and second 27 groups of AND elements, a group of OR elements 28, AND element 29, NOT 30 element, first 31, second 32 and third 33 information outputs, the first 34 and second 35 outputs of the job mode.

В основу работы устройства положен алгоритм умножения двух (n x n)-матриц, основанный на рекурентных соотношениях
с_ij ^(o) = 0, i, j = 1, n;
c_ij ^(k) = c_ij ^(k-1) + a_ik b_kj, k, i, j = 1, n;
c_ij = c_ij ⁽ⁿ⁾, i, j = 1, n
Вычислительный модуль 8 (фиг. 3) обладает возможностью реализации следующих функций:
V^j+1 = α^j
W^j+1 = β^j
A^j+2 = a^j
B^j+l = b^j

c^j+1 = c^j+ d^j l^j,
d^j=

e^j=

где α^j и β^j - значения управляющих сигналов соответственно на первом и втором входах задания режима вычислительного модуля на j-м такте;
V^j+1 и W^j+1 - значения управляющих сигналов соответственно на первом и втором выходах задания режима вычислительного модуля на (j + 1)-м такте;
a^j, b^j и c^j - значения чисел соответственно на втором, первом и третьем информационных входах вычислительного модуля на j-м такте;
A^j, B^j и c^j - значения чисел соответственно на первом, втором и третьем информационных выходах вычислительного модуля на j-м такте;
p = 0, n-1 - параметр, определяемый алгоритмом.The device is based on an algorithm for multiplying two (nxn) -matrices based on recurrence relations
with _ij ^(o) = 0, i, j = 1, n;
c _ij ^(k) = c _ij ^(k-1) + a _ik b _kj , k, i, j = 1, n;
c _ij = c _ij ⁽ⁿ⁾ , i, j = 1, n
Computing module 8 (Fig. 3) has the ability to implement the following functions:
V ^{j + 1} = α ^j
W ^{j + 1} = β ^j
A ^{j + 2} = a ^j
B ^{j + l} = b ^j

c ^{j + 1} = c ^j + d ^j l ^j ,
d ^j =

e ^j =

where α ^j and β ^j are the values of the control signals, respectively, at the first and second inputs of the job mode of the computing module on the j-th clock;
V ^{j + 1} and W ^{j + 1} are the values of the control signals, respectively, at the first and second outputs of setting the mode of the computing module at the (j + 1) -th clock cycle;
a ^j , b ^j and c ^j are the values of the numbers, respectively, at the second, first and third information inputs of the computing module on the j-th clock;
A ^j , B ^j and c ^j are the values of the numbers, respectively, at the first, second and third information outputs of the computing module on the j-th clock;
p = 0, n-1 is the parameter determined by the algorithm.

Вычислительный модуль 8 работает в четырех режимах, которые задаются комбинацией управляющих сигналов α и β , подаваемых соответственно на входы 13 и 14. Computing module 8 operates in four modes, which are set by a combination of control signals α and β supplied to inputs 13 and 14, respectively.

Во всех режимах элемент b^j подается на вход 10, задерживается регистрами 18 на l тактов и выдается на выход 31 на (j + l + 1)-м такте; элемент a^j подается на вход 11, задерживается регистрами 20 и 22 и выдается на выход (j + 2)-м такте; управляющие сигналы α, β задерживаются соответственно триггерами 24 и 25 на один такт и выдаются на выходы 34 и 35; на выходе сумматора 17 формируется значение с + a b (элемент c подается на вход 12).In all modes, the element b ^j is fed to input 10, delayed by registers 18 by l cycles and issued to output 31 at the (j + l + 1) -th cycle; the element a ^j is fed to input 11, delayed by the registers 20 and 22 and issued to the output of the (j + 2) -th clock; control signals α, β are delayed by triggers 24 and 25, respectively, for one clock cycle and are output to outputs 34 and 35; at the output of adder 17, the value c + ab is generated (element c is fed to input 12).

В первом режиме (α, β ) = (1,1). При этом элемент b^j через группы элементов И 26 и ИЛИ 28 записывается в регистр 19₁; элемент a^jзаписывается в регистр 21, т. к. элемент И 29 открыт и по заднему фронту тактового импульса осуществляется запись в регистр 21; на выходе сумматора 17 формируется значение c^j + a^j-b^j, которое подается на выход 33.In the first mode (α, β) = (1,1). In this case, the element b ^j through the group of elements And 26 and OR 28 is recorded in the register 19 ₁ ; the element a ^{j is} recorded in the register 21, because the element And 29 is open and on the trailing edge of the clock pulse is recorded in the register 21; at the output of the adder 17, a value c ^j + a ^j -b ^j is generated, which is supplied to the output 33.

Во втором режиме (α, β ) = (1,0). Элемент b^j записывается в регистр 19₁. В регистре 21 хранится элемент a^j-p (p = 0, n-1), записанный ранее на (j-p)-м такте. На выходе сумматора 17 формируется значение c^j + a^j-pb^j.In the second mode (α, β) = (1,0). The element b ^{j is} written to register 19 ₁ . The register 21 stores the element a ^jp (p = 0, n-1), previously recorded on the (jp) -th beat. At the output of the adder 17, the value c ^j + a ^jp b ^j is generated.

В третьем режиме (α, β ) = (0, 1). На выходе элемента НЕ 30 формируется единичный сигнал, который открывает группу элементов И 27, элемент b^j-n с выхода регистра 19_n-го через группы элементов И 27 и ИЛИ 28 записывается в регистр 19₁. На выходе сумматора формируется значение c^j + a^j b^j-n.In the third mode (α, β) = (0, 1). At the output of element HE 30, a single signal is generated, which opens the group of elements AND 27, element b ^jn from the output of register 19 of the _nth through the group of elements AND 27 and OR 28 is recorded in register 19 ₁ . At the output of the adder, the value c ^j + a ^j b ^{jn is formed} .

В четвертом режиме (α, β ) = (0, 0). В регистр 19₁-й записывается элемент b^j-n из регистра 19_n-го. В регистре 21 хранится элемент a^j-p. На выходе сумматора 17 формируется значение c^j ₊ ⁺ a^j-p b^j-n.In the fourth mode (α, β) = (0, 0). The register 19 ₁ st recording element 19 _n b ^jn th from the register. Register 21 stores the element a ^jp . At the output of the adder 17, the value c ^j ₊ ⁺ a ^jp b ^{jn is generated} .

Рассмотрим работу устройства. Consider the operation of the device.

В исходном состоянии все регистры и триггеры вычислительных модулей 8 устанавливаются в нулевое состояние. На выходы 1, 2 и 3 подаются соответственно элементы b

j=

, k=

, a_ik(i=

, k=

) и b

j=

, k=

в соответствующие моменты времени: t

= -ni-k+n

n/2

-2n+1, i, k=

; t

= nk+j+n

n/2

-1, j=

, k=

; t

= nk+j+n

n/2

-n²-2, j= 1, n, k=

, n
На вход 4 постоянно подается нулевое значение.In the initial state, all the registers and triggers of the computing modules 8 are set to zero. At

outputs

1, 2, and 3, elements b

j =

, k =

, a _ik (i =

, k =

) and b

j =

, k =

at relevant times: t

= -ni-k + n

n / 2

-2n + 1, i, k =

; t

= nk + j + n

n / 2

-1, j =

, k =

; t

= nk + j + n

n / 2

-n ² -2, j = 1, n, k =

, n
Input 4 is constantly supplied with a zero value.

На входы 5 и 6 подаются управляющие сигналы τ_ij = (α, β) в виде матрицы

Элементы τ_ij подаются в моменты времени
t

= ni+j+n

n/2

-2n-1
На выходе 9 элементы c_ij формируются в моменты времени
t

= ni≠j+n

n/2

-n-2.

Inputs

5 and 6 are fed with control signals τ _ij = (α, β) in the form of a matrix

Elements τ _ij are given at time instants
t

= ni + j + n

n / 2

-2n-1
At the output 9, elements c _ij are formed at time instants
t

= ni ≠ j + n

n / 2

-n-2.

Последний элемент c_nn для n-нечетного формируется на (3/2·n²-n/2-1)-м такте, для n-четного - (3/2·n²-2)-м такте.The last element with _nn for n-odd is formed at the (3/2 · n ² -n / 2-1) -th beat, for n-even - at the (3/2 · n ² -2)) -th beat.

На фиг. 2 приведена организация подачи входных и выходных потоков для n = 4. В таблице приведены состояния регистров, триггеров, значения на выходе сумматоров 17 и выходных 33 вычислительных модулей 8₁, 8₂, 8₃и 8₄ при вычислении элементов c_ij для n = 4. (56) Kung H. T. Leiserson C. E. Systolic Arrayt (for VLSI)-Sparse Matrix Proc. 1976, Society for Industrial and Applied Mathematicf, 1979, p. 262, fig 3-2.In FIG. Figure 2 shows the organization of input and output flows for n = 4. The table shows the status of registers, triggers, output values of adders 17 and output 33 of computing modules 8 ₁ , 8 ₂ , 8 ₃ and 8 ₄ when calculating elements c _ij for n = 4. (56) Kung HT Leiserson CE Systolic Arrayt (for VLSI) -Sparse Matrix Proc. 1976, Society for Industrial and Applied Mathematicf, 1979, p. 262, fig 3-2.

Авторское свидетельство СССР N 1619305, кл. G 06 F 15/347, 1991. USSR author's certificate N 1619305, cl. G 06 F 15/347, 1991.

Claims

УСТРОЙСТВО ДЛЯ УМНОЖЕНИЯ МАТРИЦ, содержащее n вычислительных модулей (n - разрядность перемножаемых матриц), каждый из которых содержит первую группу регистров, первый, второй и третий регистры, умножитель, сумматор, первый триггер и первую группу элементов И, причем первый и второй информационные входы и первый вход задания режима первого вычислительного модуля соединены соответственно с первым и вторым информационными входами и первым входом задания режима устройства, первый информационный выход i-го вычислительного модуля (i =

, где

- число, округленное в сторону большего целого) соединен с первым информационным входом (i + 1)-го вычислительного модуля, второй информационный выход и первый выход задания режима j-го вычислительного модуля (j = 1, . . . , n - 1) соединен соответственно с вторым информационным входом и первым входом задания режима (j + 1)-го вычислительного модуля, третий информационный выход n-го вычислительного модуля соединен с выходом устройства, синхровход которого соединен с синхровходами всех вычислительных модулей, при этом в каждом из вычислительных модулей второй информационный вход модуля соединен с информационным входом первого регистра, выход которого соединен с информационным входом второго регистра, выход которого соединен с вторым информационным выходом модуля, первый вход задания режима которого соединен с информационным входом первого триггера, выход которого соединен с первым выходом задания режима модуля, выход умножителя соединен с входом первого слагаемого сумматора, синхровход модуля соединен с синхровходами первого и второго регистров, первого триггера и регистрами первой группы, отличающееся тем, что, с целью сокращения аппаратурных затрат и повышения быстродействия, в каждый из вычислительных модулей введены вторая группа регистров, четвертый регистр, второй триггер, вторая группа элементов И, элемент И и элемент НЕ, причем третий информационный вход и второй вход задания режима первого вычислительного модуля соединены соответственно с входом логического нуля и вторым входом задания режима устройства, третий информационный выход и второй выход задания режима j-го вычислительного модуля соединены соответственно с третьим информационным входом и вторым входом задания режима (j + 1)-го вычислительного модуля, первый информационный вход n-го вычислительного модуля соединен с третьим информационным входом устройства, первый информационный вход m-го вычислительного модуля m =

соединен с первым информационным выходом (m + 1)-го вычислительного модуля, при этом в каждом вычислительном модуле первый информационный вход модуля соединен с первыми входами элементов И первой группы и информационным входом первого регистра первой группы, выход K-го регистра первой группы K =

l = n+1 для вычислительных модулей с первого по

-й; l = n - 1) для вычислительных модулей с

-го по n-й) соединен с информационным входом (K + 1)-го регистра первой группы, выход l-го регистра первой группы соединен с первым информационным выходом модуля, второй информационный вход которого соединен с информационным входом третьего регистра, выход которого соединен с первым входом умножителя, второй вход которого соединен с выходом первого регистра второй группы, третий информационный вход модуля соединен с информационным входом четвертого регистра, выход которого соединен с входом второго слагаемого сумматора, выход которого соединен с третьим информационным выходом модуля, первый вход задания режима соединен с входом элемента НЕ и вторыми входами элементов И первой группы, выходы которых соединены с первыми входами элементов ИЛИ группы, вторые входы которых соединены с выходами элементов И второй группы, первые входы которых соединены с выходом n-го регистра второй группы, выход j-го регистра второй группы соединен с информационным входом (j + 1)-го регистра второй группы, выходы элементов ИЛИ группы соединены с информационными входами первого регистра второй группы, второй вход задания режима модуля соединен с информационным входом второго триггера и первым входом элемента И, выход которого соединен с синхровходом третьего регистра, выход второго триггера соединен с вторым входом задания режима модуля, выход элемента НЕ соединен с вторыми входами элементов И группы, синхровход модуля соединен с синхровходами регистров второй группы, четвертого регистра, второго триггера и вторым входом элемента И.DEVICE FOR MATRIX MULTIPLICATION, containing n computational modules (n is the width of the matrices multiplied), each of which contains a first group of registers, first, second and third registers, a multiplier, an adder, a first trigger and a first group of AND elements, the first and second information inputs and the first input of the job mode of the first computing module are connected respectively to the first and second information inputs and the first input of the job mode of the device, the first information output of the i-th computing module (i =

where

- the number rounded towards the larger integer) is connected to the first information input of the (i + 1) -th computing module, the second information output and the first output of setting the mode of the j-th computing module (j = 1, ..., n - 1) connected respectively to the second information input and the first input of the mode setting of the (j + 1) -th computing module, the third information output of the nth computing module is connected to the output of the device, the sync input of which is connected to the sync inputs of all computing modules, while in each of the computing modules the second information input of the module is connected to the information input of the first register, the output of which is connected to the information input of the second register, the output of which is connected to the second information output of the module, the first input of the mode setting of which is connected to the information input of the first trigger, the output of which is connected to the first output of the module mode , the output of the multiplier is connected to the input of the first term of the adder, the clock input of the module is connected to the clock inputs of the first and second registers, the first trigger and the registers ith group, characterized in that, in order to reduce hardware costs and improve performance, a second group of registers, a fourth register, a second trigger, a second group of elements AND, an element AND and an element NOT are introduced into each of the computing modules, the third information input and the second the input of the job mode of the first computing module are connected respectively to the input of logical zero and the second input of the job mode of the device, the third information output and the second output of the job mode of the j-th computing module are connected to responsibly with the third information input and the second input of the mode setting of the (j + 1) -th computing module, the first information input of the nth computing module is connected to the third information input of the device, the first information input of the m-th computing module m =

connected to the first information output of the (m + 1) -th computing module, with each computing module, the first information input of the module is connected to the first inputs of the And elements of the first group and the information input of the first register of the first group, the output of the Kth register of the first group K =

l = n + 1 for computing modules from first to

th; l = n - 1) for computing modules with

-th through the nth) is connected to the information input of the (K + 1) -th register of the first group, the output of the l-th register of the first group is connected to the first information output of the module, the second information input of which is connected to the information input of the third register, the output of which is connected with the first input of the multiplier, the second input of which is connected to the output of the first register of the second group, the third information input of the module is connected to the information input of the fourth register, the output of which is connected to the input of the second term of the adder, the output of which is connected inen with the third information output of the module, the first input of the mode setting is connected to the input of the element NOT and the second inputs of the elements AND of the first group, the outputs of which are connected to the first inputs of the elements OR of the group, the second inputs of which are connected to the outputs of the elements AND of the second group, the first inputs of which are connected to the output of the nth register of the second group, the output of the j-th register of the second group is connected to the information input of the (j + 1) -th register of the second group, the outputs of the elements OR of the group are connected to the information inputs of the first register of the second group s, the second input of the module mode setting is connected to the information input of the second trigger and the first input of the AND element, the output of which is connected to the third input clock input, the output of the second trigger is connected to the second input of the module mode setting, the element output is NOT connected to the second inputs of the AND elements, sync input the module is connected to the sync inputs of the registers of the second group, the fourth register, the second trigger and the second input of the element I.