CN113517009A - Storage and calculation integrated intelligent chip, control method and controller - Google Patents

Storage and calculation integrated intelligent chip, control method and controller Download PDF

Info

Publication number
CN113517009A
CN113517009A CN202110645465.2A CN202110645465A CN113517009A CN 113517009 A CN113517009 A CN 113517009A CN 202110645465 A CN202110645465 A CN 202110645465A CN 113517009 A CN113517009 A CN 113517009A
Authority
CN
China
Prior art keywords
storage
integrated
calculation
bus
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110645465.2A
Other languages
Chinese (zh)
Inventor
梁龙飞
陈小刚
阿西木约麦尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai New Helium Brain Intelligence Technology Co ltd
Original Assignee
Shanghai New Helium Brain Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai New Helium Brain Intelligence Technology Co ltd filed Critical Shanghai New Helium Brain Intelligence Technology Co ltd
Priority to CN202110645465.2A priority Critical patent/CN113517009A/en
Publication of CN113517009A publication Critical patent/CN113517009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/12Bit line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, equalising circuits, for bit lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/08Word line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, for word lines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Logic Circuits (AREA)

Abstract

The invention provides a storage and calculation integrated intelligent chip, a control method and a controller, wherein the storage and calculation integrated intelligent chip is provided with standard storage bus interfaces, the storage bus interfaces are very easy to mount on an external bus of a processor when a system is upgraded, and the system is not required to be greatly changed when the storage bus interfaces are used; the intelligent chip comprises a storage and calculation integrated array area, and the calculation acceleration of the artificial neural network can be realized through an analog signal circuit; the intelligent chip can convert an access instruction transmitted by the storage bus into a configuration read-write instruction of a storage-computation integrated array area, a control instruction of training/prediction computation and a control instruction of von Neumann architecture, and the cooperative operation of the processor and the memory conforms to the most basic von Neumann architecture at present, so that the intelligent chip is easier to realize compared with a heterogeneous computation cooperative architecture of the processor and a computation acceleration chip; a basic operating system is not required to be modified; the nonvolatile characteristic of the system can be fully utilized, and the system is very convenient for realizing multi-process sharing or remote sharing or migration of data in different front-end nodes.

Description

Storage and calculation integrated intelligent chip, control method and controller
Technical Field
The application relates to the technical field of artificial intelligence and storage and calculation integrated technology and chips, in particular to a storage and calculation integrated intelligent chip, a control method and a controller.
Background
With the rapid development of artificial intelligence technology, the scale of artificial neural networks is getting larger and larger, the desire for computing power is getting stronger, the computing efficiency of general-purpose CPUs and GPUs can not meet the demand of artificial neural network computing, and special artificial neural network chips have become a new computing chip and need to work in cooperation with general-purpose CPUs.
The development of nanotechnology promotes a new generation of nonvolatile storage technology represented by phase change storage, bionic synapses are designed by utilizing the characteristic that the resistance of a nanometer device changes under the excitation of electric pulses, and the functions of calculation and weight storage are simultaneously realized on one device by utilizing the nonlinear simulation characteristic of the bionic synapses to form a storage and calculation integrated technology, so that the calculation energy efficiency of an artificial neural network is expected to be improved by multiple orders of magnitude, and the calculation bottleneck of artificial intelligence application is relieved.
However, the existing artificial neural network computing chip mainly appears in a system in a form of computing-assisted acceleration, and the target application field is selected in a big data or cloud computing back-end server, and the computing is mainly characterized in that all data related to computing are temporary data, including weight data, non-volatile storage is not needed, because the weight array data needed in different computing tasks are different, retransmission is needed when computing is started every time, the weight array data do not need to be stored after computing is finished, the weight array data do not need to be stored in a non-volatile manner under a machine room condition with excellent power supply conditions, and a medium for storing the weight array data needs to be repeatedly updated in the computing process, so that the write-erase cycle life is long enough. With this requirement, only DRAM memory can meet the requirement.
The storage and calculation integrated technology is originated from the nonvolatile storage technology, although the calculation can be accelerated by using the analog signal and the energy efficiency is greatly improved, when the storage integrated technology is used for the bionic synapse storage weight, the updating speed is not as fast as that of a DRAM (dynamic random access memory), and the performance is changed after the updating frequency is too much, so that the calculation precision is influenced. That is, the non-volatility of the integrated device is not utilized, and the limited performance and lifetime thereof are an obstacle to popularization and application.
Compared with a back-end artificial intelligence computing application scene, the intelligent application scene of the front-end equipment of the Internet of things is more suitable for application of a storage and computation integrated technology. Since the front end of the internet of things is usually specific to a specific function, when the front end of the internet of things is applied in a specific scene, the required neural network is often a trained network and is used for life, and only a few updates are possible. Therefore, the updating service life of the integrated storage and calculation device can completely meet the requirement for more than million times, and the characteristic of extremely high energy efficiency can just meet the pain point requirements of insufficient configuration and inconvenient power supply of front-end equipment, so that the integrated storage and calculation chip structure is more meaningful only if the integrated storage and calculation device conforms to the front-end calculation.
From the perspective of the chip, the front-end application requirement is greatly different from the back-end, due to the diversity of the front-end environment and the requirement, it is impossible to provide a relatively single operation environment like a back-end server, the operation speed, the main control chip, the peripheral equipment, the interface condition, the function requirement and the like are various, the use mode of the artificial intelligent chip is a brand-new system, the hardware interface and the software system have great changes, if the front-end equipment is required to be greatly changed, the engineering quantity is huge, and many specific problems can be difficult to solve at all, so that the market introduction difficulty of the new technology is large, the existing system is difficult to be integrated, how to design and calculate an integrated chip, the diversity capable of adapting to the front-end application is an urgent problem to be solved.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present application is to provide a saving-integration type smart chip, a control method and a controller, which are used to solve the problem that how to design a saving-integration type smart chip, which is difficult to be integrated into an existing system, and can adapt to the diversity of front-end applications, is a challenge to be solved.
To achieve the above and other related objects, a first aspect of the present application provides a cost-effective integrated smart chip, including: the memory bus interface is used for connecting an external processor; the storage and calculation integrated controller is connected with the storage bus interface; the internal bus is connected with the storage and calculation integrated controller; the input module is connected with the internal bus; the input module receives input data from the storage and computation integrated controller and correspondingly converts the input data into analog input signals; the storage and calculation integrated array area is connected with the input module and the internal bus; the storage and calculation integrated array area is used for performing prediction calculation on the artificial neural network and outputting an analog output signal containing a prediction calculation result; the output module is connected with the storage and calculation integrated array area and the internal bus; and the output module correspondingly converts the analog output signals received from the storage and calculation integrated array area into digital signals for the storage and calculation integrated controller to read.
In some embodiments of the first aspect of the present application, the integrated storage controller comprises: the external storage bus controller is connected with the storage bus interface; an internal bus controller connected to the internal bus to access a storage resource through the internal bus; and the processor core is provided with an embedded code memory and an embedded operation memory.
In some embodiments of the first aspect of the present application, the banked array region comprises a plurality of banked cells; and all the storage and computation integrated units are connected through a transverse bus and a longitudinal bus.
In some embodiments of the first aspect of the present application, the computing entity unit comprises: the system comprises an input switch array, a calculation acceleration array, an output switch array and a cross bus switch array; wherein, an input switch matrix is arranged in the input switch array; a weight matrix is arranged in the calculation acceleration array; an output switch matrix is arranged in the output switch array; and a cross bus switch matrix is arranged in the cross bus switch array.
In some embodiments of the first aspect of the present application, the computation acceleration array is formed by connecting the storage and computation integrated devices in a row-column manner, and configures a gating device for each storage and computation integrated device according to read-write gating and isolation requirements, and implements analog computation by using an analog circuit and ohm's law; the integrated memory device at least comprises a phase change memory device, a resistive memory device and a magnetic memory device.
In some embodiments of the first aspect of the present application, the calculation acceleration array includes a read-write circuit, which is used to read and write the resistance value of the integrated storage device; and the storage and calculation integrated controller reads and writes the resistance values of the storage and calculation integrated devices in the calculation acceleration array through the internal bus, and the storage and calculation integrated devices arranged according to a certain sequence are mapped to the address space of the internal bus to form the readable and writable weight matrix.
In some embodiments of the first aspect of the present application, the input switch array comprises a set of analog switches, each of the analog switches being controlled by a state of a memory cell of one bit; the storage and computation integrated controller reads and writes the storage units of the analog switches through the internal bus, the storage units are arranged according to a certain sequence and are mapped to the address space of the internal bus, and the readable and writable input switch matrix is formed.
In some embodiments of the first aspect of the present application, the output switch array comprises a set of analog switches, each of the analog switches being controlled by a state of a memory cell of one bit; the storage and computation integrated controller reads and writes the storage units of the analog switches through the internal bus, the storage units are arranged according to a certain sequence and are mapped to the address space of the internal bus, and the readable and writable output switch matrix is formed.
In some embodiments of the first aspect of the present application, the bank controller writes to the output switch matrix via the internal bus to select to output the signal output by the crossbar switch array or the signal output by the compute acceleration array.
In some embodiments of the first aspect of the present application, the crossbar switch array comprises a set of analog switches, each of the analog switches being controlled by a state of one bit of the memory cell; the storage and computation integrated controller reads and writes the storage units of the analog switches through the internal bus, and the storage units are arranged according to a certain sequence and mapped to the address space of the internal bus to form a readable and writable cross bus switch matrix.
In some embodiments of the first aspect of the present application, the bank controller writes to the crossbar switch matrix through the internal bus, and selects and connects corresponding input signals for each output signal to perform cross-connection between the horizontal bus and the vertical bus.
In some embodiments of the first aspect of the present application, the input module has an input buffer built therein for buffering input data received from the bank controller.
In some embodiments of the first aspect of the present application, the output module is embedded with an output buffer area for buffering and storing the acquired and digitized data of the analog output signal received from the bank array area.
To achieve the above and other related objects, a second aspect of the present application provides a bank control method for controlling a bank array region; the storage and calculation integrated array area comprises a calculation acceleration array, an input switch array, an output switch array and a cross bus switch array which are connected through a transverse bus and a longitudinal bus; the control method comprises any one or more of the following combinations: reading and writing the resistance values of all the storage and computation integrated devices in the calculation acceleration array, so that the storage and computation integrated devices arranged according to a certain sequence are mapped into an address space of an internal bus to form a readable and writable weight matrix; reading and writing the storage units of the analog switches in the input switch array, so that the storage units arranged according to a certain sequence are mapped to an address space of an internal bus to form a readable and writable input switch matrix; reading and writing the storage units of the analog switches in the output switch array, so that the storage units arranged according to a certain sequence are mapped to an address space of an internal bus to form a readable and writable output switch matrix; and performing write operation on the cross bus switch matrix, and selecting and connecting corresponding input signals for each path of output signals so as to perform cross interconnection between the transverse bus and the longitudinal bus.
To achieve the above and other related objects, a third aspect of the present application provides a storage and computation integrated controller, comprising: a processor and a memory; the memory is used for storing a computer program; the processor is used for executing the computer program stored in the memory so as to enable the controller to execute the storage and calculation integrated control method.
As described above, the storage and computation integrated intelligent chip, the control method and the controller of the present application have the following beneficial effects:
1. all processors have storage bus interfaces, and the storage bus interfaces are very easy to mount on an external bus of the processor when the system is upgraded, and the system does not need to be greatly changed.
2. The cooperative operation of the processor and the memory conforms to the most basic von Neumann architecture at present, and is easier to realize compared with a heterogeneous computing cooperative architecture of the processor and a computing acceleration chip.
3. And the software of the storage and calculation integrated chip capable of supporting the storage model is easy to establish in a system without an operating system without modifying a basic operating system.
4. The storage and computation integrated chip is established based on the storage model, the nonvolatile characteristic of the chip can be fully utilized, and the method is very convenient for realizing multi-process sharing or remote sharing or migration of data in different front-end nodes.
Drawings
Fig. 1 is a schematic structural diagram of a computing-integrated smart chip according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating a calculation-integrated control method according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a storage and computation integrated controller according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a bank device array according to an embodiment of the present disclosure.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.
In this application, unless expressly stated or limited otherwise, the terms "mounted," "connected," "secured," "retained," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and/or "including" specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. It should be further understood that the terms "or" and/or "as used herein are to be interpreted as being inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; a. B and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
In order to adapt to the diversity of front-end applications, the invention provides a storage and calculation integrated intelligent chip and a using method thereof, aiming at constructing the storage and calculation integrated chip based on a storage model instead of a calculation model, wherein the chip is provided with an interface similar to a storage bus and converts the data input and output of intelligent calculation into read-write storage access. In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 shows a schematic structural diagram of a storage and computation integrated smart chip according to an embodiment of the present invention. The integrated storage and calculation intelligent chip in the embodiment comprises a chip body; the system comprises a storage bus interface 11, a storage and computation integrated controller 12, an internal bus 13, an input module 14, a storage and computation integrated array area 16 and an output module 17. Hereinafter, the components and the connection and interaction between the components in the integrated computing smart chip will be described in detail.
In this embodiment, the storage bus interface 11 includes a random access memory interface and/or a fast access storage interface. The random access memory is used for storing and saving data, can be read and written at any time and is usually used as a temporary storage medium of an operating system or other running programs; the random access memory interface includes, but is not limited to, a static random access memory interface (SRAM interface), a dynamic random access memory interface DRAM interface (such as a DDR interface, a DDR2 interface, a DDR3 interface), a synchronous DRAM interface, and the like. The fast-access storage interface includes, but is not limited to, an ONFI interface, a Toggle DDR interface, an eMMC interface, an SDIO interface, and the like.
It should be understood that ONFI (open Nand Flash interface) is an interface standard for Nand Flash; the ONF1.0 standard supports SDR, the ONF2.0 standard not only supports SDR but also supports NV-DDR, the NV-DDR2 is added to the ONF3.0 standard, and the NV-DDR3 is added to the ONF4.0 standard. Toggle is a Flash interface standard established by Samsung and Toshiba based on DDR, and is a standard for resisting ONFI; toggle1.0 corresponds to DDR1, Toggle2.0 corresponds to DDR 2; the Flash of the Toggle interface also generally supports switching to the Legacy interface (SDR).
In this embodiment, the storage and computation integrated controller 12 is connected to the storage bus interface 11, and specifically includes an external storage bus controller 121, an internal bus controller 122, a processor core 123, an embedded code memory 124, and an embedded operating memory 125; the external storage bus controller 121 is connected to the storage bus interface 11; the internal bus controller 122 is connected to the internal bus 13 of the integrated computing smart chip.
Specifically, the external memory bus controller 121 is configured to connect to an external processor through the memory bus interface 11, and perform configuration, read, write, and the like operations at the time of access. The processor core 123 is configured to support a relatively complex bus protocol, control logic, and the like, and is essentially a reduced instruction set processor core (such as an ARM or RISC-V, and the like), and is correspondingly configured with an embedded code memory 124 (such as an embedded NOR Flash or an embedded phase change memory, and the like) and an embedded run memory 125 (such as an embedded SRAM, and the like). The internal bus controller 122 accesses all storage resources in the storage and calculation integrated intelligent chip through the internal bus 13, and realizes configuration and operation control of storage and calculation integrated intelligent calculation.
In this embodiment, the input module 14 is connected to the internal bus 13 and has an input buffer 141 therein. The input module 14 receives input data from the integrated storage and computation controller 12, buffers the input data into the input buffer 141, converts the input data into an analog input signal 15 according to a control command, and transmits the analog input signal to the integrated storage and computation array 16 for computation.
In this embodiment, the output module 17 is connected to the internal bus 13 and has an output buffer 171 therein. The output module 17 receives the analog output signal 18 from the bank array 16, converts the analog output signal into a digital signal, buffers the digital signal into the output buffer 171, and then notifies the bank controller 12 to read the digital signal.
In this embodiment, the array area 16 is a core area of the integrated storage and computation intelligent chip, and is used to complete the prediction computation on the configured artificial neural network, the computation process is completed by a resistor network formed by connecting the integrated storage and computation devices and necessary analog circuits, the input signal and the output signal are analog signals, and some digitally controlled switches are present, so that a user can configure the network structure of the artificial neural network.
Specifically, the bank array area 16 is composed of a plurality of bank units 161 and analog signal buses connecting the units, for example, a two-dimensional grid array composed of four bank units 161 on the right of the bank array area 16 in the figure (hereinafter, four extending directions of the two-dimensional grid are described in the upper, lower, left, and right directions, respectively, and do not represent directions in actual physical implementation, and specific derivation may be adjusted as needed). Each integral storage and computation unit 161 has the same internal structure, and comprises an input switch array 1611, a computation acceleration array 1612, an output switch array 1613 and a cross bus switch array 1614; the switch array 1611 is internally provided with an input switch matrix 1615, the calculation acceleration array 1612 is internally provided with a weight matrix 1616, the output switch array 1613 is internally provided with an output switch matrix 1617, and the cross bus switch array 1614 is internally provided with a cross bus switch matrix 1618.
In general, the two-dimensional grid format is capable of supporting both top-to-bottom and left-to-right signal flow, and for ease of description and understanding, the analog signal bus carrying top-to-bottom signal flow is referred to as the vertical bus 162 and the analog signal bus carrying left-to-right signal flow is referred to as the horizontal bus 163. However, these two types of buses are not in the conventional bus form, signals pass through a plurality of selection switches during transmission, and the selection of the switch state may cause the interruption, crossing or replacement of the signals.
In some examples, the storage integration unit 161 interfaces and connects as follows: each of the all-in-one computing units 161 has the same interface, that is, four sets of analog signal buses and one set of internal bus digital interface, where the internal bus digital interface is mounted on the internal bus 13, and the internal bus digital interfaces of all the all-in-one computing units 16 are mounted on the same set of internal bus 13 sent by the all-in-one computing controller 12, and the bus may be designed as a serial or parallel bus according to the comprehensive consideration of cost or performance, and a reasonable bus bit width is configured to realize the access of the all-in-one computing unit internal resource data by the all-in-one computing controller; four groups of analog signals are respectively led out from the upper direction, the lower direction, the left direction, the right direction and the four directions and are connected with another integral storage and calculation unit adjacent to the integral storage and calculation unit in the four directions, and for the integral storage and calculation unit positioned at the edge of the two-dimensional grid array, bus suspensions on one side or two sides are not connected according to the position of the integral storage and calculation unit; in particular, for a certain bank unit located at a corner position (e.g. upper left corner), there is a set of buses connected to the input module 14 for receiving input signals, and for another bank unit located at a corner position (e.g. lower right corner), there is a set of buses connected to the output module 17 for sending output signals.
In some examples, the computation acceleration array 1612 is formed by connecting storage and computation integrated devices in a row-column manner, and a gating device is configured for each storage and computation integrated device according to read-write gating and isolation requirements, so that analog computation is realized by assisting with a certain analog circuit and utilizing ohm's law. The integrated storage and computation device can be manufactured by adopting the device technologies of Phase Change Memory (PCM), resistive random access memory (ReRAM), magnetic memory (MRAM) and the like; the gating device may use an OTS device, a MOS transistor, a bipolar transistor, a diode, and the like, and this embodiment is not limited.
Furthermore, the resistance value of the calculation and acceleration integrated device corresponds to the weight value of the artificial neural network, the calculation and acceleration array 1612 includes a read-write circuit for reading and writing the resistance value of the calculation and acceleration integrated device, the calculation and acceleration integrated controller 12 can directly read and write the resistance value of the calculation and acceleration array 1612 through the internal bus 13, the resistance values of the calculation and acceleration integrated device in the calculation and acceleration array 1612 are arranged according to a certain sequence, and from the viewpoint of storage logic, the resistance values can be mapped to the internal bus address space to form a section of readable and writable weight matrix 1616. Since the resistance of the compute-integrated device remains after power is removed, weight matrix 1616 is stored in compute acceleration array 1612 in a non-volatile manner.
Regarding compute-accelerated arrays, one possible implementation is shown in fig. 4, where each compute-all-in-one device 401 in the compute-all-in-one device array 400 is connected to the drain of one MOS gate tube 402; the other end of the integrated storage and calculation device is connected with a row bit line 403, the integrated storage and calculation devices in the same row in the array are connected to the same row bit line, and all the row bit lines are connected with the output of the input switch matrix 404; the source of the MOS gate tube 402 is connected to a column bit line 405, the gate tubes of the same column in the array are connected to the same column bit line, and all the column bit lines are connected to the input of the output switch matrix 406; the gate of the MOS gate tube 402 is connected to a word line 407, the gate tubes in the same column in the array are connected to the same word line, and all the word lines are connected to the read-write circuit 408; all the row bit lines are also connected with the read-write circuit at the same time; the read-write circuit is mounted on the internal bus 409, when receiving a read-write command, the read-write circuit gates the column designated by the command through the word line and outputs a read-write pulse signal to the designated row bit line to complete the read-write operation, at the moment, the output switch array gives the control right of the row bit line to the read-write circuit, when receiving a prediction command, the read-write circuit gives the control right of the row bit line to the output switch array and conducts all gate tubes through the word line to form a resistor array required by the prediction command.
In some examples, the input switch array 1611 is comprised of a set of analog switches, each controlled by the state of a one-bit memory cell. The integrated storage and computation controller 12 can directly read and write the storage units of the analog switches through the internal bus 13, and the storage units are arranged in a certain sequence; from the storage logic, the mapping can be mapped to the address space of the internal bus 13 to form a segment of readable and writable input switch matrix 1615; selective connection of input signals to designated inputs of compute acceleration array 1612 is accomplished by write operations to input switch matrix 1615; in addition, the current switch state may also be sent to the bank controller 12 by a read operation.
In some examples, the output switch array 1613 is comprised of a set of analog switches, each controlled by the state of one bit of memory cells. The integrated storage and computation controller 12 can directly read and write the storage units of the analog switches through the internal bus 13, and the storage units are arranged in a certain sequence; from the storage logic, the mapping can be mapped to the address space of the internal bus 13 to form a segment of readable and writable output switch matrix 1617; by writing the output switch matrix 1617, selection can be made for each path of signals, and the signals transmitted by the cross bus switch array 1614 are selected to be transmitted continuously or replaced by the output signals of the calculation acceleration array 1612; in addition, the current output bus selection status may also be sent to the bank controller 12 via a read operation.
In some examples, the crossbar switch array 1614 is comprised of a set of analog switches, each controlled by the state of one bit of memory cells. The integrated storage and computation controller 12 can directly read and write the storage units of the analog switches through the internal bus 13, and the storage units are arranged in a certain sequence; from the storage logic, the mapping can be mapped to the address space of the internal bus 13 to form a segment of readable and writable crossbar switch matrix 1618; by writing the cross bus switch matrix 1618, which input signal is connected to each output signal can be selected, so that cross interconnection between the transverse bus 163 and the longitudinal bus 162 is realized; in addition, the current output bus selection status may also be sent to the bank controller 12 via a read operation.
Fig. 2 is a schematic flow chart illustrating a calculation-integration control method according to an embodiment of the present invention. The integrated storage and calculation control method of the present embodiment can be applied to the integrated storage and calculation controller described above, and is used for controlling the integrated storage and calculation array region; the storage and calculation integrated array area comprises a calculation acceleration array, an input switch array, an output switch array and a cross bus switch array which are connected through a transverse bus and a longitudinal bus; the control method comprises any one or more of the following combinations:
step S21: and reading and writing the resistance values of all the storage and computation integrated devices in the calculation acceleration array, so that the storage and computation integrated devices arranged according to a certain sequence are mapped into an address space of an internal bus to form a readable and writable weight matrix.
Step S22: and reading and writing the storage units of the analog switches in the input switch array, so that the storage units arranged according to a certain sequence are mapped to the address space of the internal bus to form a readable and writable input switch matrix.
Step S23: and reading and writing the storage units of the analog switches in the output switch array, so that the storage units arranged according to a certain sequence are mapped to the address space of the internal bus to form a readable and writable output switch matrix.
Step S24: and performing write operation on the cross bus switch matrix, and selecting and connecting corresponding input signals for each path of output signals so as to perform cross interconnection between the transverse bus and the longitudinal bus.
Since the saving and calculating integrated control method of the present embodiment is similar to the above implementation of the saving and calculating integrated intelligent chip, it is not described again.
Fig. 3 is a schematic structural diagram of a storage and computation integrated controller according to an embodiment of the present invention. The integrated storage controller provided by the embodiment comprises: a processor 31 and a memory 32; the memory 32 is connected to the processor 31 through the system bus and performs communication with each other, the memory 32 is used for storing computer programs, and the processor 31 is used for running the computer programs, so that the electronic terminal executes the steps of the integrated control method.
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In summary, the present application provides a storage and computation integrated intelligent chip, a control method, and a controller, all processors have storage bus interfaces, and these storage bus interfaces are very easy to mount on an external bus of a processor when a system is upgraded, and do not need to make a great change to the system; the cooperative operation of the processor and the memory conforms to the most basic von Neumann architecture at present, and is easier to realize compared with a heterogeneous computing cooperative architecture of the processor and a computing acceleration chip; the software of the storage and calculation integrated chip capable of supporting the storage model is easily established in a system without an operating system without modifying a basic operating system; the storage and computation integrated chip is established based on the storage model, the nonvolatile characteristic of the chip can be fully utilized, and the method is very convenient for realizing multi-process sharing or remote sharing or migration of data in different front-end nodes. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (15)

1. The utility model provides a deposit and calculate integral type intelligent chip which characterized in that includes:
the memory bus interface is used for connecting an external processor;
the storage and calculation integrated controller is connected with the storage bus interface;
the internal bus is connected with the storage and calculation integrated controller;
the input module is connected with the internal bus; the input module receives input data from the storage and computation integrated controller and correspondingly converts the input data into analog input signals;
the storage and calculation integrated array area is connected with the input module and the internal bus; the storage and calculation integrated array area is used for performing prediction calculation on the artificial neural network and outputting an analog output signal containing a prediction calculation result;
the output module is connected with the storage and calculation integrated array area and the internal bus; and the output module correspondingly converts the analog output signals received from the storage and calculation integrated array area into digital signals for the storage and calculation integrated controller to read.
2. The deposit and computation integrated smart chip of claim 1, wherein the deposit and computation integrated controller comprises:
the external storage bus controller is connected with the storage bus interface;
an internal bus controller connected to the internal bus to access a storage resource through the internal bus;
and the processor core is provided with an embedded code memory and an embedded operation memory.
3. The credit unibody smart chip of claim 1 wherein the credit unibody array region comprises a plurality of credit unibody cells; and all the storage and computation integrated units are connected through a transverse bus and a longitudinal bus.
4. The credit integration type smart chip of claim 3, wherein the credit integration unit comprises: the system comprises an input switch array, a calculation acceleration array, an output switch array and a cross bus switch array; wherein, an input switch matrix is arranged in the input switch array; a weight matrix is arranged in the calculation acceleration array; an output switch matrix is arranged in the output switch array; and a cross bus switch matrix is arranged in the cross bus switch array.
5. The integrated storage and computation intelligent chip of claim 4, wherein the computation acceleration array is formed by connecting storage and computation integrated devices in a row-column manner, a gating device is configured for each storage and computation integrated device according to read-write gating and isolation requirements, and analog computation is realized through an analog circuit and by using ohm's law; the integrated memory device at least comprises a phase change memory device, a resistive memory device and a magnetic memory device.
6. The integrated intelligent chip for storage and calculation according to claim 5, wherein the calculation acceleration array comprises a read-write circuit for reading and writing the resistance value of the integrated storage and calculation device; and the storage and calculation integrated controller reads and writes the resistance values of the storage and calculation integrated devices in the calculation acceleration array through the internal bus, and the storage and calculation integrated devices arranged according to a certain sequence are mapped to the address space of the internal bus to form the readable and writable weight matrix.
7. The memory and computation integrated smart chip of claim 4, wherein the input switch array comprises a set of analog switches, each of the analog switches being controlled by the state of a memory cell of one bit; the storage and computation integrated controller reads and writes the storage units of the analog switches through the internal bus, the storage units are arranged according to a certain sequence and are mapped to the address space of the internal bus, and the readable and writable input switch matrix is formed.
8. The memory and computation integrated smart chip of claim 4, wherein the output switch array comprises a set of analog switches, each of the analog switches being controlled by the state of a memory cell of one bit; the storage and computation integrated controller reads and writes the storage units of the analog switches through the internal bus, the storage units are arranged according to a certain sequence and are mapped to the address space of the internal bus, and the readable and writable output switch matrix is formed.
9. The integrated computing and accounting smart chip of claim 8 wherein the integrated computing and accounting controller writes the output switch matrix through the internal bus to select to output the signals output by the crossbar switch array or the compute acceleration array.
10. The memory-computing integrated smart chip of claim 4 wherein the crossbar switch array comprises a set of analog switches, each of the analog switches being controlled by the state of one bit of memory cells; the storage and computation integrated controller reads and writes the storage units of the analog switches through the internal bus, and the storage units are arranged according to a certain sequence and mapped to the address space of the internal bus to form a readable and writable cross bus switch matrix.
11. The integrated inventory and computation intelligent chip of claim 10, wherein the integrated inventory and computation controller performs a write operation on the cross-bus switch matrix through the internal bus to select and connect a corresponding input signal for each output signal to perform cross-connection between the transverse bus and the longitudinal bus.
12. The integrated computing and intelligence chip of claim 1 wherein the input module has an input buffer built therein for buffering input data received from the integrated computing and intelligence controller.
13. The integrated intelligent chip for storage and calculation according to claim 1, wherein an output buffer area is built in the output module, and is used for buffering and storing the collected and digitized data of the analog output signals received from the integrated array area for storage and calculation.
14. A storage and calculation integrated control method is characterized by being used for controlling a storage and calculation integrated array area; the storage and calculation integrated array area comprises a calculation acceleration array, an input switch array, an output switch array and a cross bus switch array which are connected through a transverse bus and a longitudinal bus; the control method comprises any one or more of the following combinations:
reading and writing the resistance values of all the storage and computation integrated devices in the calculation acceleration array, so that the storage and computation integrated devices arranged according to a certain sequence are mapped into an address space of an internal bus to form a readable and writable weight matrix;
reading and writing the storage units of the analog switches in the input switch array, so that the storage units arranged according to a certain sequence are mapped to an address space of an internal bus to form a readable and writable input switch matrix;
reading and writing the storage units of the analog switches in the output switch array, so that the storage units arranged according to a certain sequence are mapped to an address space of an internal bus to form a readable and writable output switch matrix;
and performing write operation on the cross bus switch matrix, and selecting and connecting corresponding input signals for each path of output signals so as to perform cross interconnection between the transverse bus and the longitudinal bus.
15. A storage integration controller, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory to cause the controller to execute the integrated storage control method of claim 14.
CN202110645465.2A 2021-06-10 2021-06-10 Storage and calculation integrated intelligent chip, control method and controller Pending CN113517009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110645465.2A CN113517009A (en) 2021-06-10 2021-06-10 Storage and calculation integrated intelligent chip, control method and controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110645465.2A CN113517009A (en) 2021-06-10 2021-06-10 Storage and calculation integrated intelligent chip, control method and controller

Publications (1)

Publication Number Publication Date
CN113517009A true CN113517009A (en) 2021-10-19

Family

ID=78065383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110645465.2A Pending CN113517009A (en) 2021-06-10 2021-06-10 Storage and calculation integrated intelligent chip, control method and controller

Country Status (1)

Country Link
CN (1) CN113517009A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116124334A (en) * 2023-01-10 2023-05-16 杭州未名信科科技有限公司 Pressure detection system, method, equipment and medium
CN116306855A (en) * 2023-05-17 2023-06-23 之江实验室 Data processing method and device based on memory and calculation integrated system
CN116504281A (en) * 2022-01-18 2023-07-28 浙江力德仪器有限公司 Computing unit, array and computing method
CN116821047A (en) * 2023-08-31 2023-09-29 北京犀灵视觉科技有限公司 Sensing and storing integrated circuit, system and method
CN116881195A (en) * 2023-09-04 2023-10-13 北京怀美科技有限公司 Chip system facing detection calculation and chip method facing detection calculation

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080081606A (en) * 2007-03-06 2008-09-10 엠텍비젼 주식회사 Dual port memory having common signal line
CN105718380A (en) * 2015-07-29 2016-06-29 上海磁宇信息科技有限公司 Cell array calculation system
US20180300278A1 (en) * 2000-10-06 2018-10-18 Pact Xpp Technologies Ag Array Processor Having a Segmented Bus System
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing
US10467527B1 (en) * 2018-01-31 2019-11-05 Pure Storage, Inc. Method and apparatus for artificial intelligence acceleration
CN110647983A (en) * 2019-09-30 2020-01-03 南京大学 Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
WO2020133317A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Computing resource allocation technology and neural network system
CN111614353A (en) * 2019-02-26 2020-09-01 北京知存科技有限公司 Digital-to-analog conversion circuit and analog-to-digital conversion circuit multiplexing device in storage and calculation integrated chip
CN112148669A (en) * 2020-10-01 2020-12-29 北京知存科技有限公司 Pulse storage and calculation integrated chip and electronic equipment
CN112183732A (en) * 2020-10-22 2021-01-05 中国人民解放军国防科技大学 Convolutional neural network acceleration method and device and computer equipment
CN112836814A (en) * 2021-03-02 2021-05-25 清华大学 Storage and computation integrated processor, processing system and method for deploying algorithm model

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300278A1 (en) * 2000-10-06 2018-10-18 Pact Xpp Technologies Ag Array Processor Having a Segmented Bus System
KR20080081606A (en) * 2007-03-06 2008-09-10 엠텍비젼 주식회사 Dual port memory having common signal line
CN105718380A (en) * 2015-07-29 2016-06-29 上海磁宇信息科技有限公司 Cell array calculation system
US10467527B1 (en) * 2018-01-31 2019-11-05 Pure Storage, Inc. Method and apparatus for artificial intelligence acceleration
WO2020133317A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Computing resource allocation technology and neural network system
CN111614353A (en) * 2019-02-26 2020-09-01 北京知存科技有限公司 Digital-to-analog conversion circuit and analog-to-digital conversion circuit multiplexing device in storage and calculation integrated chip
CN110334799A (en) * 2019-07-12 2019-10-15 电子科技大学 Integrated ANN Reasoning and training accelerator and its operation method are calculated based on depositing
CN110647983A (en) * 2019-09-30 2020-01-03 南京大学 Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN112148669A (en) * 2020-10-01 2020-12-29 北京知存科技有限公司 Pulse storage and calculation integrated chip and electronic equipment
CN112183732A (en) * 2020-10-22 2021-01-05 中国人民解放军国防科技大学 Convolutional neural network acceleration method and device and computer equipment
CN112836814A (en) * 2021-03-02 2021-05-25 清华大学 Storage and computation integrated processor, processing system and method for deploying algorithm model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116504281A (en) * 2022-01-18 2023-07-28 浙江力德仪器有限公司 Computing unit, array and computing method
CN116124334A (en) * 2023-01-10 2023-05-16 杭州未名信科科技有限公司 Pressure detection system, method, equipment and medium
CN116306855A (en) * 2023-05-17 2023-06-23 之江实验室 Data processing method and device based on memory and calculation integrated system
CN116306855B (en) * 2023-05-17 2023-09-01 之江实验室 Data processing method and device based on memory and calculation integrated system
CN116821047A (en) * 2023-08-31 2023-09-29 北京犀灵视觉科技有限公司 Sensing and storing integrated circuit, system and method
CN116821047B (en) * 2023-08-31 2023-10-31 北京犀灵视觉科技有限公司 Sensing and storing integrated circuit, system and method
CN116881195A (en) * 2023-09-04 2023-10-13 北京怀美科技有限公司 Chip system facing detection calculation and chip method facing detection calculation
CN116881195B (en) * 2023-09-04 2023-11-17 北京怀美科技有限公司 Chip system facing detection calculation and chip method facing detection calculation

Similar Documents

Publication Publication Date Title
CN113517009A (en) Storage and calculation integrated intelligent chip, control method and controller
CN109328343B (en) Non-volatile storage system with compute engine to accelerate big data applications
CN106158017B (en) Resistive operation stores equipment
CN110888826B (en) Parallel access to volatile memory by processing means for machine learning
CN111433758B (en) Programmable operation and control chip, design method and device thereof
US20130329491A1 (en) Hybrid Memory Module
US20230168891A1 (en) In-memory computing processor, processing system, processing apparatus, deployment method of algorithm model
Hur et al. Memristive memory processing unit (MPU) controller for in-memory processing
CN110083554A (en) For configuring the device and method of the I/O of the memory of mixing memory module
CN104317770A (en) Data storage structure and data access method for multiple core processing system
US20220019442A1 (en) Reconfigurable processing-in-memory logic using look-up tables
US20190042138A1 (en) Adaptive Data Migration Across Disaggregated Memory Resources
US11726690B2 (en) Independent parallel plane access in a multi-plane memory device
WO2022011312A1 (en) Checking status of multiple memory dies in a memory sub-system
US20220188606A1 (en) Memory Configuration to Support Deep Learning Accelerator in an Integrated Circuit Device
Smagulova et al. Resistive neural hardware accelerators
US20210357146A1 (en) Memory device with microbumps to transmit data for a machine learning operation
WO2022031447A1 (en) Intelligent low power modes for deep learning accelerator and random access memory
CN117234720A (en) Dynamically configurable memory computing fusion data caching structure, processor and electronic equipment
CN106293491B (en) The processing method and Memory Controller Hub of write request
US20230041801A1 (en) Transmission of data for a machine learning operation using different microbumps
US12019550B2 (en) Concurrent page cache resource access in a multi-plane memory device
CN111694772A (en) Memory controller
CN112908373B (en) System for performing machine learning operations using microbumps
CN109147839B (en) Device and system with Yixin calculation and random access functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination