CN109902795B - Artificial intelligent module and system chip with processing unit provided with input multiplexer - Google Patents

Artificial intelligent module and system chip with processing unit provided with input multiplexer Download PDF

Info

Publication number
CN109902795B
CN109902795B CN201910104131.7A CN201910104131A CN109902795B CN 109902795 B CN109902795 B CN 109902795B CN 201910104131 A CN201910104131 A CN 201910104131A CN 109902795 B CN109902795 B CN 109902795B
Authority
CN
China
Prior art keywords
data
multiplexer
module
processing unit
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910104131.7A
Other languages
Chinese (zh)
Other versions
CN109902795A (en
Inventor
连荣椿
王海力
马明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingwei Qili Beijing Technology Co ltd
Original Assignee
Jingwei Qili Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingwei Qili Beijing Technology Co ltd filed Critical Jingwei Qili Beijing Technology Co ltd
Priority to CN201910104131.7A priority Critical patent/CN109902795B/en
Publication of CN109902795A publication Critical patent/CN109902795A/en
Application granted granted Critical
Publication of CN109902795B publication Critical patent/CN109902795B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Logic Circuits (AREA)

Abstract

An artificial intelligence AI module and a system chip of a processing unit are provided with an input multiplexer. In an embodiment, the AI module comprises: a plurality of processing units arranged in a two-dimensional array according to a first dimension and a second dimension, wherein each processing unit can complete logic and/or multiply-add operation; wherein the processing unit comprises an enable input for receiving an enable signal and suspending or starting operation of the processing unit in dependence of the enable signal; the processing unit further comprises at least one input multiplexer; the input multiplexer is used for receiving input data in different directions in the first dimension and/or the second dimension, and selecting one data from the input data for processing by the processing unit; each processing unit in the two-dimensional array shares the same clock signal for operation; the first dimension and the second dimension are perpendicular to each other. The AI module and the system chip thereof of the embodiment of the invention provide more diversified AI module structures, so that more complex operation can be executed.

Description

Artificial intelligent module and system chip with processing unit provided with input multiplexer
Technical Field
The invention relates to the field of integrated circuits, in particular to an artificial intelligence AI module and a system chip, wherein a processing unit is provided with an input multiplexer.
Background
In recent years, artificial intelligence has grown on waves. Artificial intelligence is a discipline of studying certain mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) that make a computer simulate a person, and mainly includes the principle of computer-implemented intelligence, manufacturing a computer similar to human brain intelligence, so that the computer can implement higher-level application.
With the penetration of artificial intelligence research and the wide popularization of applications, it is necessary to propose AI modules that are more desirable.
In addition, the artificial intelligence module is accessed by the processor via a bus, which is a bandwidth limitation, and such architecture is difficult to accommodate for the large bandwidth requirements of the artificial intelligence AI module.
Disclosure of Invention
According to a first aspect, there is provided a chip circuit comprising an AI module, the AI module comprising: a plurality of processing units arranged in a two-dimensional array according to a first dimension and a second dimension, wherein each processing unit can complete logic and/or multiply-add operation; wherein the processing unit comprises an enable input for receiving an enable signal and suspending or starting operation of the processing unit in dependence of the enable signal; the processing unit further comprises at least one input multiplexer; the input multiplexer is used for receiving input data in different directions in the first dimension and/or the second dimension, and selecting one data from the input data for processing by the processing unit; each processing unit in the two-dimensional array shares the same clock signal for operation; the first dimension and the second dimension are perpendicular to each other.
Preferably, the processing unit comprises an arithmetic unit, an adder, a first register and a second register, at least one multiplexer and a fourth multiplexer; the operation unit is used for carrying out operation on the first data and the second data from at least one input multiplexer to obtain an operation result; a fourth multiplexer selecting one data output from the output data of the first register and the third data from the at least one input multiplexer; the adder adds the operation result and the output of the fourth multiplexer, and the sum value after the addition is registered in the first register; the first data is also registered in the second register and output via its output.
Preferably, the at least one input multiplexer comprises a first multiplexer; the first multiplexer selects the first data from the plurality of ports of the first data input.
Preferably, the at least one input multiplexer comprises a second multiplexer; the second multiplexer selects the second data from the plurality of ports of the coefficient input.
Preferably, the at least one input multiplexer comprises a third multiplexer; the third multiplexer selects third data from the plurality of ports of the second data input.
Preferably, the operation unit performs an operation according to an algorithm determined by the broadcasted operation code.
According to a second aspect, there is provided a system-on-chip comprising: the chip circuit of the first aspect; and the FPGA module is coupled with the AI module so as to send data from the AI module or receive data.
Preferably, the AI module is embedded in the FPGA module so as to multiplex the winding architecture of the FPGA module, so that data is sent from the AI module or received from the AI module, both via the winding architecture of the multiplexed FPGA.
The AI module and the system chip thereof of the embodiment of the invention provide more diversified AI module structures, so that more complex operation can be executed.
Drawings
FIG. 1 is a schematic diagram of a 2-dimensional AI module in accordance with an embodiment of the invention;
FIG. 2 is a schematic diagram of a processing unit;
FIG. 3 is a schematic diagram of a processing unit;
FIG. 4 is a schematic diagram of the architecture of a system chip integrated with FPGA and AI modules;
fig. 5 is a schematic diagram of the structure of the FPGA circuit.
Detailed Description
In order to make the technical scheme and the advantages of the embodiments of the present invention more clear, the technical scheme of the present invention is further described in detail below through the drawings and the embodiments.
In the description of the present application, the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate an orientation or positional relationship based on that shown in the drawings, merely for convenience of description and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application.
Fig. 1 is a schematic diagram of a 2-dimensional AI module according to an embodiment of the invention. As shown in fig. 1, the AI module is a two-dimensional array, for example comprising 4X4 processing units PE. The array may be divided into two dimensions, a first dimension and a second dimension that are perpendicular to each other. For convenience, the horizontal dimension will be referred to hereinafter as the first dimension, and the vertical dimension may be referred to as the second dimension. Taking the first processing unit, the second processing unit and the third processing unit as examples, the first processing unit and the second processing unit are adjacently arranged along the second dimension and have the same second dimension value, and the second output end of the first processing unit is coupled to the second input end of the first processing unit; the first processing unit and the third processing unit are arranged adjacently along the first dimension and have the same first dimension value, and a first output of the first processing unit is coupled to a first input of the third processing unit.
The processing unit further comprises at least one input multiplexer; the at least one input multiplexer is configured to receive input data in different directions along the first dimension and/or the second dimension and to select one of the data for processing. After being input to the processing unit, the data undergoes various operations in the processing unit, such as addition, subtraction, multiplication, division, logical operations, and the like.
Thus, data before and after the operation may flow bi-directionally along the first dimension, for example, from the left side of the array into each processing unit having the same second dimension value sequentially under the same clock, or from the right side of the array into each processing unit having the same second dimension value sequentially. The data obtained after the operation will flow bi-directionally in the second dimension, for example, from above the array and sequentially enter each processing unit with the same first dimension value under the same clock, or from below the array and sequentially enter each processing unit with the same first dimension value.
Each processing unit in the two-dimensional array shares the same clock signal for operation.
It should be noted that each data line in fig. 1 may represent either a single bit signal or an 8 (or 16, 32) bit signal.
In one example, a two-dimensional array may implement matrix multiplication. In another example, the two-dimensional array may implement a convolution algorithm. Of course, other arithmetic and logical operations may also be performed by the two-dimensional array.
Fig. 2 is a schematic diagram of a processing unit. As shown in fig. 2, the processing unit includes an arithmetic unit ALU, an adder ADD. The processing unit may be provided with at least one MUX at the input port, such as a first multiplexer MUX1 at DI, a second multiplexer MUX2 at CI, and a third multiplexer MUX3 at PI. In one example, first data from different directions (northeast, northwest, NESW) of the first dimension and the second dimension is input from a first data input port DI, one of the first data is gated by the MUX1; in another example, second data from different directions (northeast, northwest, NESW) of the first dimension and the second dimension is input from the second data input port CI, one of which is gated by the MUX2; in yet another example, third data PI from different directions (northeast, northwest, NESW) of the first dimension and the second dimension is input from the second data input port PI, one of which is gated by the MUX3. The gating of MUX1, MUX2 and MUX3 is determined by the configuration signal. In one example, the first data may be input raw data for AI operations. In one example, the second data may be weight data for AI operation. In one example, the third data may be operation result data of other processing units of the AI module.
At the ALU, the first data and the second data are operated on. The operations include arithmetic operations and logical operations, and the like. Then, the operation result is added (or gated) at the adder ADD and the third data, and the sum value (or gated value) after the addition is registered in the register REG 1. At the next clock, the sum S is output via the first output PO. The sum value S is output via the first output terminal PO. The output value may be transmitted to the PE in each direction.
Of course, the first data may also be registered in the register REG2 and output via the second output DO under clock control. The output value may be transmitted to the PE in each direction.
The processing unit further includes a fourth MUX coupled between the third MUX and the ADD. One input of the fourth MUX is coupled to the output of the third MUX, the other input is coupled to the output of REG1, and the output is coupled to that input of ADD where the output data of the third MUX should be input. Thus, the arithmetic result of the ALU can be fed back to the ADD input terminal through the fourth MUX, and accumulated repeatedly.
The enable signal EN is used to control the start or pause of the processing unit. The clock CK is used to control the processing progress of the processing unit.
Fig. 3 is a schematic diagram of a processing unit. The arithmetic functions of the ALU include, but are not limited to, add, subtract, multiply, divide, logic, and the like. Fig. 3 differs from fig. 2 in that in fig. 3, the ALU may be caused to perform an arithmetic function related to the arithmetic code by broadcasting the arithmetic code to the processing units of the AI module.
Fig. 4 is a schematic diagram of the architecture of a system chip integrated with FPGA and AI modules. As shown in fig. 4, at least one FPGA circuit and at least one AI module are integrated on a system chip. The AI module may be the AI module shown in fig. 1.
Each FPGA circuit in the at least one FPGA circuit may implement various functions such as logic, computation, control, and the like. The FPGA implements the combinational logic using small look-up tables (e.g., 16 x1 RAM), each of which is connected to the input of one D flip-flop, which in turn drives other logic circuits or drives I/O, thereby forming basic logic cell modules that implement both combinational and sequential logic functions, which are interconnected or connected to the I/O modules by metal wires. The logic of the FPGA is implemented by loading programming data into the internal static memory unit, and the values stored in the memory unit determine the logic functions of the logic unit and the connection modes between the modules or between the modules and the I/O, and ultimately determine the functions that the FPGA can implement.
And an interface corresponding to the AI module is also arranged on the system chip, and the FPGA module is communicated with the AI module through the interface module. The interface module may be a wrap (e.g., XBAR) module, for example, comprised of a plurality of selectors (multiplexers) and selection bits. The interface module may also be a FIFO (first in first out). The interface module may also be a Synchronizer (Synchronizer) consisting of, for example, 2 Flip-flops (Flip-Flop or FF) in series.
The FPGA module and the AI module can be arranged side by side, and the FPGA module can transmit data for the AI module to provide control; the AI module may also be embedded into the FPGA module, where the AI module needs to multiplex the winding architecture of the FPGA module to receive and transmit data through the winding architecture of the multiplexed FPGA module.
Fig. 5 is a schematic diagram of the structure of the FPGA circuit. As shown in FIG. 5, the FPGA circuit may include a plurality of programmable LOGIC modules (LOGICs), embedded Memory Blocks (EMBs), multiply-accumulate (MAC) and the like, and corresponding routing (XBAR). Of course, the FPGA circuit is also provided with related resources such as a clock/configuration module (backbone/backbone) and the like. If an EMB or MAC module is required, the EMB/MAC module replaces a number of PLB modules because its area is much larger than that of the PLB.
The winding resource XBAR is a contact point for interconnection among the modules and is uniformly distributed in the FPGA module. All resources within the FPGA module, PLB, EMB, MAC, IO, are routed to each other via a same interface-routing XBAR unit. In a winding mode, the whole array is identical and consistent, and the XBAR units which are orderly arranged form grids to connect all modules in the FPGA.
The LOGIC module may contain, for example, 8 6-input look-up tables, 18 registers. The EMB module may be, for example, 36 kbit or 2 18 kbit memory cells. The MAC module may be, for example, a 25x18 multiplier, or 2 18x18 multipliers. The duty ratio of the number of each module of LOGIC, MAC, EMB in the FPGA array is not limited, and the size of the array is also determined by practical application in design according to the needs.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A chip circuit comprising an artificial intelligence AI module, the AI module comprising: a plurality of processing units PE arranged in a two-dimensional array according to a first dimension and a second dimension, wherein each processing unit can complete logic and/or multiply-add operation; wherein the processing unit comprises an enable input for receiving an enable signal and suspending or starting operation of the processing unit in dependence of the enable signal; the processing unit further comprises at least one input multiplexer MUX1, MUX2, MUX3; the input multiplexer is used for receiving input data in different directions in the first dimension and/or the second dimension, and selecting one data from the input data for processing by the processing unit; each processing unit in the two-dimensional array shares the same clock signal for operation; the first dimension and the second dimension are perpendicular to each other.
2. The chip circuit according to claim 1, wherein the processing unit comprises an arithmetic unit ALU, an adder ADD, a first register REG1 and a second register REG2, at least one multiplexer and a fourth multiplexer MUX4; the operation unit is used for carrying out operation on the first data and the second data from at least one input multiplexer to obtain an operation result; the fourth multiplexer MUX4 selects one data output from the output data of the first register and the third data from the at least one input multiplexer; the adder adds the operation result and the output of the fourth multiplexer, and the sum value after the addition is registered in the first register REG 1; the first data is also registered in the second register and output via its output.
3. The chip circuit of claim 2, wherein the at least one input multiplexer comprises a first multiplexer MUX1; the first multiplexer selects the first data from the plurality of ports of the first data input.
4. The chip circuit of claim 2, wherein the at least one input multiplexer comprises a second multiplexer MUX2; the second multiplexer selects the second data from the plurality of ports of the coefficient input.
5. The chip circuit of claim 2, wherein the at least one input multiplexer comprises a third multiplexer MUX3; the third multiplexer selects third data from the plurality of ports of the second data input.
6. The chip circuit according to claim 2, wherein the operation unit performs an operation according to an algorithm determined by the broadcasted operation code.
7. A system-on-chip, comprising: chip circuit according to one of claims 1 to 6;
and the FPGA module is coupled with the AI module so as to send data from the AI module or receive data.
8. The system chip of claim 7, wherein the AI module is embedded in the FPGA module to multiplex the routing architecture of the FPGA module to send data from the AI module or to receive data from the AI module, both via the multiplexed routing architecture of the FPGA.
CN201910104131.7A 2019-02-01 2019-02-01 Artificial intelligent module and system chip with processing unit provided with input multiplexer Active CN109902795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910104131.7A CN109902795B (en) 2019-02-01 2019-02-01 Artificial intelligent module and system chip with processing unit provided with input multiplexer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910104131.7A CN109902795B (en) 2019-02-01 2019-02-01 Artificial intelligent module and system chip with processing unit provided with input multiplexer

Publications (2)

Publication Number Publication Date
CN109902795A CN109902795A (en) 2019-06-18
CN109902795B true CN109902795B (en) 2023-05-23

Family

ID=66944716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910104131.7A Active CN109902795B (en) 2019-02-01 2019-02-01 Artificial intelligent module and system chip with processing unit provided with input multiplexer

Country Status (1)

Country Link
CN (1) CN109902795B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7412591B2 (en) * 2005-06-18 2008-08-12 Industrial Technology Research Institute Apparatus and method for switchable conditional execution in a VLIW processor
US9830150B2 (en) * 2015-12-04 2017-11-28 Google Llc Multi-functional execution lane for image processor
CN106406813B (en) * 2016-08-31 2019-01-29 宁波菲仕电机技术有限公司 A kind of general-purpose servo control arithmetic logic unit
US10838910B2 (en) * 2017-04-27 2020-11-17 Falcon Computing Systems and methods for systolic array design from a high-level program
CN107578098B (en) * 2017-09-01 2020-10-30 中国科学院计算技术研究所 Neural network processor based on systolic array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张俊 ; 王飞跃 ; .知识可编程智能芯片***:概念、架构与展望.模式识别与人工智能.2018,(第10期),全文. *
施羽暇 ; .人工智能芯片技术研究.电信网技术.2016,(第12期),全文. *

Also Published As

Publication number Publication date
CN109902795A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
US6526461B1 (en) Interconnect chip for programmable logic devices
JP5956820B2 (en) DSP block having embedded floating point structure
US9564902B2 (en) Dynamically configurable and re-configurable data path
Mohanty et al. Memory footprint reduction for power-efficient realization of 2-D finite impulse response filters
US8364738B1 (en) Programmable logic device with specialized functional block
US8516025B2 (en) Clock driven dynamic datapath chaining
JPH10111790A (en) Arithmetic cell
US20130207688A1 (en) Apparatus and Methods for Time-Multiplex Field-Programmable Gate Arrays
CN109902063B (en) System chip integrated with two-dimensional convolution array
US4524428A (en) Modular input-programmable logic circuits for use in a modular array processor
CN111160542B (en) Integrated circuit chip device and related products
CN109857024B (en) Unit performance test method and system chip of artificial intelligence module
CN109902040B (en) System chip integrating FPGA and artificial intelligence module
CN109902795B (en) Artificial intelligent module and system chip with processing unit provided with input multiplexer
CN109902835A (en) Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit
KR101000099B1 (en) Programmable logic device
CN111752529B (en) Programmable logic unit structure supporting efficient multiply-accumulate operation
CN109933369B (en) System chip of artificial intelligence module integrated with single instruction multiple data flow architecture
Tufte et al. Biologically-inspired: A rule-based self-reconfiguration of a virtex chip
CN109933370B (en) System chip for connecting FPGA and artificial intelligence module
CN109766293B (en) Circuit and system chip for connecting FPGA and artificial intelligence module on chip
CN109902836A (en) The failure tolerant method and System on Chip/SoC of artificial intelligence module
CN109885512B (en) System chip integrating FPGA and artificial intelligence module and design method
CN109828948B (en) System chip integrated with artificial intelligent module
CN109902037B (en) System chip for connecting FPGA and artificial intelligence module under different clock domains

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant