CN117709420A - Method and apparatus for neuromorphic computing system - Google Patents

Method and apparatus for neuromorphic computing system Download PDF

Info

Publication number
CN117709420A
CN117709420A CN202311643144.4A CN202311643144A CN117709420A CN 117709420 A CN117709420 A CN 117709420A CN 202311643144 A CN202311643144 A CN 202311643144A CN 117709420 A CN117709420 A CN 117709420A
Authority
CN
China
Prior art keywords
pulse
encoding
coding
processor
input data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311643144.4A
Other languages
Chinese (zh)
Inventor
***
曹露
程宏
蔡东琪
张益民
吴浩洋
刘晓龙
张丽丹
郭萍
陈益斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel China Research Center Co ltd
Original Assignee
Intel China Research Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel China Research Center Co ltd filed Critical Intel China Research Center Co ltd
Priority to CN202311643144.4A priority Critical patent/CN117709420A/en
Publication of CN117709420A publication Critical patent/CN117709420A/en
Pending legal-status Critical Current

Links

Landscapes

  • Advance Control (AREA)

Abstract

The present application relates to methods and apparatus for neuromorphic computing systems. There is provided a method for neuromorphic computation, the method comprising: receiving input data from one or more data sources, wherein the input data is in a non-pulsed input format; determining coding parameters for pulse coding the input data; pulse coding the input data according to time steps based on the coding parameters; and after the pulse coding result of each time step is obtained, directly transmitting the pulse coding result to a corresponding nerve morphology core for corresponding data processing.

Description

Method and apparatus for neuromorphic computing system
Technical Field
The present application relates to the field of neural networks, and more particularly, to methods and apparatus for neuromorphic computing systems.
Background
Neuromorphic calculations aimed at simulating brain behavior have been developed in various areas of computer science over the last decades. Neuromorphic cores, which are a neuromorphic hardware, show potential for many applications because it can directly provide real-time and low-power complex data processing using impulse neural network (SNN) based computational paradigms. However, neuromorphic cores cannot be deployed on their own, as they require an external host to configure and data input manage them. RISC-V is widely used as an external host core for neuromorphic cores as a low-power, open-source processor. Some researchers have even extended the SNN computation unit directly in RISC-V so that it can accomplish the work of data management, configuration, and SNN computation at the same time.
The biological neurons are treated and communicate by pulses or spikes, which are electrical pulses of about 100mV amplitude. The computational model of many neurons reduces voltage bursts to discrete single bit events of "1" or "0". In hardware, a "1" or "0" represents a much simpler representation than a high precision value. Thus, in addition to the direct pulse input format (e.g., output of neuromorphic sensors), the external host core needs to convert the input high-precision values into a pulse sequence of sequence length in time steps. There are four typical pulse coding methods that model a pulse sequence as a function of time and then use the input value as a parameter of the function of time and use that parameter to generate the pulse sequence. For example, the most widely used coding scheme in the SNN model, rate coding, is to treat each input value as an excitation rate and use that excitation rate to convert the input into a poisson pulse train.
Disclosure of Invention
The application provides a novel mechanism for efficiently realizing pulse coding of the neuromorphic computing system, so that the pulse coding process is realized by accelerating, the time of the neuromorphic computing system for pulse coding is saved, and the delay of the neuromorphic computing system is greatly reduced.
According to an embodiment of the present disclosure, there is provided a method for neuromorphic computation, the method comprising: receiving input data from one or more data sources, wherein the input data is in a non-pulsed input format; determining coding parameters for pulse coding the input data; pulse encoding the input data in time steps based on the encoding parameters; and after the pulse coding result of each time step is obtained, directly transmitting the pulse coding result to a corresponding neuromorphic core for corresponding data processing.
According to an embodiment of the present disclosure, there is provided a processor including: at least one processor core receiving input data from one or more data sources, wherein the input data is in a non-pulsed input format; determining coding parameters for pulse coding the input data; pulse encoding the input data in time steps based on the encoding parameters; and after the pulse coding result of each time step is obtained, directly transmitting the pulse coding result to a corresponding neuromorphic core for corresponding data processing.
According to an embodiment of the present disclosure, there is provided a computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, cause the processor to perform the above-described method for neuromorphic computation.
The present invention provides a scheme for accelerating the implementation of pulse code streams for neuromorphic computing systems that directly transmits the generated code results (i.e., element 0 or 1) to the neuromorphic core upon completion of the encoding for each iteration in the pulse code loop, without waiting for completion of the loop for generating the entire pulse sequence. The scheme accelerates the pulse coding process of the neuromorphic computing system, saves the time of the system for pulse coding, and greatly reduces the delay of the system.
Drawings
Embodiments of the present disclosure will now be described, by way of example and not limitation, with reference to the figures of the accompanying drawings in which like reference numerals refer to similar elements and in which:
FIG. 1 illustrates an exemplary schematic diagram of a neuromorphic-computing-system neuromorphic-encoding stream.
Fig. 2 illustrates a flowchart of a method 200 for neuromorphic computation, according to an embodiment of the present disclosure.
Fig. 3 illustrates a code representation of a set _ spike _ encoding. Vx command added to a vector extension of RISC-V in accordance with an embodiment of the present disclosure.
FIG. 4 illustrates a code representation of a get_spike_encoding.vx command added to a vector extension of RISC-V according to an embodiment of the present disclosure.
Fig. 5 shows a schematic diagram of a neuromorphic chip including an RSIC-V core and a neuromorphic core, according to an embodiment of the present disclosure.
Fig. 6 shows a schematic diagram of the workflow of pulse encoding by the encoding unit.
Fig. 7A-7D illustrate schematic diagrams of example implementations of hardware pulse encoding for different types of encoding methods, according to embodiments of the present disclosure.
FIG. 8 illustrates an example computing system.
Fig. 9 shows a block diagram of an example processor and/or SoC 900, which processor and/or SoC 900 may have one or more cores and have an integrated memory controller.
Fig. 10 is a block diagram illustrating components capable of reading instructions from a machine-readable or computer-readable medium (e.g., a non-transitory machine-readable storage medium) and performing any one or more of the methods discussed herein, according to some example embodiments.
Detailed Description
Features and exemplary embodiments of various aspects of the present application are described in detail below. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by showing an example of the present application. The present application is in no way limited to any particular configuration set forth below, but rather covers any modification, substitution, or improvement of elements, components, and algorithms without departing from the spirit of the present application. In the drawings and following description, well-known structures and techniques are not shown in order to avoid unnecessarily obscuring the present application.
Moreover, various operations will be described as multiple discrete operations in a manner that is most helpful in understanding the illustrative embodiments; however, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations need not be performed in the order of presentation.
The phrases "in an embodiment," "in one embodiment," and "in some embodiments" are repeated herein. The phrase generally does not refer to the same embodiment; however, it may refer to the same embodiment. The terms "comprising," "having," and "including" are synonymous, unless the context dictates otherwise. The phrases "A or B" and "A/B" mean "(A), (B) or (A and B)".
FIG. 1 illustrates an exemplary schematic diagram of a neuromorphic-computing-system neuromorphic-encoding stream. As shown in fig. 1, each pixel in the input image from the MNIST image dataset is encoded as a pulse sequence of length num_steps, where the pixel value determines the probability of transmitting a pulse at each time step. A larger value will produce more pulses than a smaller value, e.g., in fig. 1, the pulse train for pixel value 233 produces much more pulses than for pixel value 22. After pulse encoding the input values (e.g., pixel values), the external host core needs to send the pulse signal for each pulse sequence element by element to the corresponding neuron. However, existing instructions are not efficient in supporting pulse-code streams. The existing pulse coding method utilizes a coding module and a transmission module to realize pulse coding flow. The pulse train is first generated by the encoding module, i.e. the input data in a non-pulse input format is converted into a pulse input format (i.e. a 0-1 array format) and stored in the storage means in the pulse input format. Specifically, the encoding module uses a loop to generate a pulse sequence based on the input value; if the current iteration in the loop satisfies the condition of the trigger pulse, the code will generate element 1; otherwise element 0 will be generated. The resulting data of the plurality of iterations performed within the defined number of time steps will be combined and formed into a pulse train of the pulse input format, which is then stored in the memory means. Next, the transmission module reads out the elements in the array from the storage device element by element in time steps and sends them to the corresponding neuromorphic cores.
In other words, the prior art requires that the input values be pulse coded to generate a corresponding pulse sequence, then stored in a pulse input format of a 0-1 array, then read element by element in time steps, and fire pulses to be sent to the neuromorphic core when element 1 is read. This process is very time consuming and greatly increases the delay of the system. In addition, if memory read-write, cache miss and the like occur during intermediate storage, the neuromorphic calculation is further adversely affected. To this end, the present invention provides a scheme to accelerate the implementation of pulse coding for neuromorphic computing systems that directly transmits the generated coding result (i.e., element 0 or 1) to the neuromorphic core upon completion of the coding for each iteration in the pulse coding loop, without waiting for completion of the loop for generating the entire pulse sequence.
Fig. 2 illustrates a flowchart of a method 200 for neuromorphic computation, according to an embodiment of the present disclosure. In one embodiment, the method 200 may be performed by a neuromorphic computing device. In one example, the method 200 may be performed by a processor, such as a neuromorphic processor or a RISC-V processor. In one example, the method 200 may be performed by a neuromorphic processor that includes a RISC-V core. The method 200 may include steps S202, S204, S206 and S208. However, in some embodiments, method 200 may include more or less distinct steps, and the present disclosure is not limited thereto.
In step S202, input data is received from one or more data sources, wherein the input data is in a non-pulsed input format.
In embodiments of the present disclosure, the data source may include a sensor device that may be used to detect and/or measure properties of the environment and generate sensor data describing or capturing characteristics of the environment. For example, a given sensor may be configured to detect a characteristic such as movement, weight, physical contact, temperature, wind, noise, light, computer communication, wireless signals, humidity, radiation, or the presence of a particular compound. The sensors may generate digital data, audio data, photographic images, video, and other sensor data describing these attributes.
In embodiments of the present disclosure, the data source may also include a data storage device, such as a database of one or more computing systems, that may aggregate data and/or generate additional data (e.g., based on post-processing of the aggregated data), such as related to a government, enterprise, science, or other entity or project.
In embodiments of the present disclosure, the input data is non-pulse input format data indicating that the input data is not pulse sequence data in a 0-1 array format.
In embodiments of the present disclosure, the input data may employ any input data capable of pulse encoding.
In step S204, encoding parameters for pulse-encoding the input data are determined.
In one embodiment, the encoding parameters include the encoding type and the sample length. In one embodiment, the encoding parameters may also include the input data type. In one embodiment, the encoding type includes one of the following: rate coding (rate coding), time to first burst coding (time to first spike coding, TTFS), phase coding (phase coding), and burst coding (burst coding). In another embodiment, the coding type includes any other coding scheme that may be used to pulse code the input data.
In one embodiment, the sampling length may be used to define the sequence length of the pulse sequence corresponding to the input data, i.e. to define the number of time steps that need to be performed to generate the pulse sequence.
In step S206, the input data is pulse-encoded in time steps based on the determined encoding parameters.
In one embodiment, input data is pulse coded based on a determined coding type (e.g., rate coding, first pulse triggered time coding, phase coding, or burst coding) and a sampling length to obtain a pulse coding result (i.e., element 0 or 1) for each time step, and a pulse sequence of the input data is obtained after execution of a plurality of time steps of the sampling length is completed.
In step S208, after the pulse coding result of each time step is obtained, the pulse coding result is directly transmitted to the corresponding neuromorphic core for corresponding data processing.
For example, after pulse encoding is performed at the ith time step and a pulse encoding result, e.g., 1, is obtained, the pulse encoding result 1 may be directly transmitted to the corresponding neuromorphic core without waiting for the completion of all encoding of the multiple time steps of the sample length before transmission. The corresponding neuromorphic core performs further data processing based on the received pulse-code results, e.g., for performing machine and deep learning, e.g., for perceptual processing, motion control, information decision-making, etc.
In an embodiment of the present disclosure, the method 200 may be implemented by invoking an instruction pair including a pulse code setting (set_spike_encoding) instruction and a pulse code acquisition (get_spike_encoding) instruction.
The set_spike_encoding instruction is used to indicate how to generate a pulse, and may have the following parameters:
a. data (data): for representing the input of high precision values. The data may be vector values, considering that pulse coding may be parallelized. The high precision value may represent, for example, a pixel value of an RGB camera (pixel value of an image), a value of an event camera (change value of an image), a signal of a force sensor, a signal of an olfactory sensor, a signal of a laser, an embedded knowledge (embedding) representation, and so on. In fact, the data may be any signal that can be used for neuromorphic calculations.
b. Coding type (encoding_type): for representing the selected coding scheme for pulse coding. In one embodiment, the coding type may be one of rate coding, first pulse trigger time coding, phase coding, or burst coding. The parameter is also used to set which coding scheme to use for decoding.
c. Sample length (sample_length): for defining the sequence length of the pulse sequence.
In one embodiment, the set_spike_encoding instruction may be split into multiple instructions based on different encoding types.
The get_spike_encoding instruction is used to instruct the readout of the pulse encoding result for each time step element by element in time steps. After executing the set_spike_encoding instruction, the get_spike_encoding instruction may be repeatedly invoked, the number of times the instruction is repeatedly invoked depending on the value of the sample length. Invoking the get_spike_encoding instruction will obtain a pulse encoding result corresponding to the time step.
The pseudo-operation details of the set_spike_encoding and get_spike_encoding instruction pairs are shown below. Note that for simplicity, the input data is treated here as a scalar, however, in other embodiments, the operations may be vectorized.
In the case where the encoding type employs rate encoding, a set_spike_encoding and get_spike_encoding instruction pair may be invoked to perform the following:
Specifically, a set_spike_encoding instruction is first invoked, and a firing probability threshold for a pulse is calculated based on its parameters. The get_spike_encoding instruction is then invoked to generate a random value and compare with the resulting excitation probability threshold, and the pulse code result for the current time step, i.e., 0 or 1, is output based on the comparison result.
In the case where the encoding type employs a first pulse trigger time encoding, a set_spike_encoding and get_spike_encoding instruction pair may be invoked to perform the following:
specifically, a set_spike_encoding instruction is first invoked to determine the initial value of the index idx and calculate the pulse time based on the parameters of set_spike_encoding. And then invoking get_spike_encoding to perform the comparison of idx and the calculated pulse time, and outputting the pulse coding result of the current time step, i.e., 0 or 1, based on the comparison result.
In the case of phase encoding of the encoding type, the set_spike_encoding and get_spike_encoding instruction pair may be invoked to perform the following:
specifically, a set_spike_encoding instruction is first invoked to determine the initial value of idx and calculate a phase mask based on the parameters of set_spike_encoding. And then calling get_spike_encoding to execute index idx and calculated phase mask logic and judgment, and outputting a pulse coding result of the current time step, namely 0 or 1, based on the judgment result.
In the case where the encoding type employs burst encoding, a set_spike_encoding and get_spike_encoding instruction pair may be invoked to perform the following:
specifically, a set_spike_encoding instruction is first invoked, the interval of pulse excitation is calculated based on the parameters of set_spike_encoding, and the initial value of the counter is determined. Get_spike_encoding is then invoked to determine the counter value based on the interval of pulse excitation and to output the pulse encoding result for the current time step, i.e., 0 or 1, based on the determination result.
In one embodiment, the set_spike_encoding and get_spike_encoding instruction pairs may be implemented in a new execution unit. In one embodiment, the set_spike_encoding and get_spike_encoding instructions are extended as a pair of instructions. With these two instructions, data from, for example, a non-pulse input format sensor can be processed as follows:
sensor_data=read_from sensor
set_spike_encoding(sensor_data,encoding_type,sample_length)
encoded_data=get_spike_encoding()
transmitting encoded_data to neuromorphic core
……
encoded_data=get_spike_encoding()
Transmitting encoded_data to neuromorphic core
Specifically, for example, sensor data (sensor_data) is read from a non-pulsed input sensor, a set_spike_encoding instruction is invoked to determine its relevant parameters, namely sensor data, coding type, and sampling length, the get_spike_encoding instruction is invoked to pulse encode the sensor data to obtain a coded data result (i.e., 0 or 1) and send the coded result data to the corresponding neuromorphic core, wherein the get_spike_encoding instruction is repeatedly invoked a number of times for sampling length. In embodiments of the present disclosure, the above instructions may be invoked by a neuromorphic processor to effect processing of data from a non-pulse input format sensor. In embodiments of the present disclosure, the neuromorphic processor may support input data for both pulsed input format sensors and non-pulsed input format sensors.
In one embodiment of the present disclosure, the set_spike_encoding and get_spike_encoding instructions may be SIMD instructions. In another embodiment of the present disclosure, the set_spike_encoding and get_spike_encoding instructions may be SISD instructions.
In embodiments of the present disclosure, the set_spike_encoding and get_spike_encoding instruction pairs may be implemented based on RISC-V. In another embodiment of the present disclosure, the set_spike_encoding and get_spike_encoding instruction pairs may be implemented based on ARM or x86, etc.
In embodiments of the present disclosure, the set_spike_encoding and get_spike_encoding instruction pairs may be added to the vector extension of RISC-V, with the corresponding instructions described as follows:
where vector.set_spike_encoding.vx may use vector registers to represent input data, after using RISC-V for vector registers, a user may use the vsetdcfg command to set the type of vector registers to accommodate the number and type of different inputs. The instruction may also use a scalar register to represent the sample length and encode the encoding type in the function field in the instruction. Fig. 3 shows a code representation of set_spike_encoding. Vx added to vector expansion of RISC-V according to an embodiment of the present disclosure.
The vector.get_spike_encoding.vx is similar to the vector.load instruction and is responsible for reading the data generated by the pulse code unit into a specific vector register for subsequent operations. FIG. 4 illustrates a code representation of get_spike_encoding. Vx added to a vector extension of RISC-V according to an embodiment of the present disclosure.
In embodiments of the present disclosure, vector.set_spike_encoding.vx and vector.get_spike_encoding.vx instructions may be set as SIMD type instructions because such instruction types may leverage register file width to improve parallel processing capability. Following the classical RISC-V implementation instructions, the instruction vsetdcfg is required to configure the SIMD information of the proposed pulse code instruction and pulse read instruction.
In an embodiment of the present disclosure, to support these two instructions, a dedicated hardware functional unit (i.e., an encoding unit) is added to the RISC-V core for pulse encoding. The low power RISC-V core is naturally suitable for control and management work, but its pulse coding capability is weak. The hardware coding unit is added into the RISC-V core to supplement the coding capability of the RISC-V core and form a complete neuromorphic computing system with the neuromorphic core. Fig. 5 shows a schematic diagram of a neuromorphic chip including an RSIC-V core and a neuromorphic core, according to an embodiment of the present disclosure. As shown in fig. 5, the RISC-V core included in the neuromorphic chip has a hardware encoding unit for pulse encoding. In one embodiment, the encoding unit may implement pulse encoding by invoking the two instructions described above.
Fig. 6 shows a schematic diagram of the workflow of pulse encoding by the encoding unit shown in fig. 5. At 601, the coding unit of the RISC-V core is invoked. At 602, an encoding configuration is set by invoking a set_spike_encoding instruction. At 603, encoder parameters are set including the coding type shown at 604, the input data type, the sample length, for example, the coding type and the sample length defined in vector. At 605, input data is read from the register file and distributed to corresponding encoded blocks based on the encoding parameters at 606. At 607, the input data is pulse-encoded in time steps at the encoding block to generate a pulse-encoded result, at which time the counter is set to a sample length value, and encoding is ended when it is determined at 608 that the counter is equal to 0. The encoded results are written to the corresponding neuromorphic core interface at step 609. In an embodiment of the present disclosure, the encoded results (i.e., 0 or 1) generated at each time step 607 are sent directly to the neuromorphic core without waiting for the counter to return to 0.
Fig. 7A-7D illustrate schematic diagrams of example implementations of hardware pulse encoding for different types of encoding methods, according to embodiments of the present disclosure. Specifically, when the coding type is rate coding, the corresponding data may be input into the pulse coding hardware for rate coding shown in fig. 7A; when the encoding type is phase encoding, the corresponding data may be input into pulse encoding hardware for phase encoding shown in fig. 7B; when the code type is the first pulse trigger time code, the corresponding data can be input into the pulse coding hardware for TTFS coding shown in fig. 7C; when the encoding type is burst encoding, the corresponding data may be input into pulse encoding hardware for burst encoding as shown in fig. 7D. It should be appreciated that the pulse encoded hardware implementation shown in fig. 7A-7D is one exemplary example of a hardware implementation in which the present disclosure may be implemented, and that a variety of other hardware implementations exist for implementing the present disclosure.
The present invention provides a scheme for accelerating the implementation of pulse coding for neuromorphic computing systems that directly transmits the generated coding result (i.e., element 0 or 1) to the neuromorphic core upon completion of the coding for each iteration in the pulse coding loop, without waiting for completion of the loop for generating the entire pulse sequence. The scheme accelerates the pulse coding process of the neuromorphic computing system, saves the time of the system for pulse coding, and greatly reduces the delay of the system.
FIG. 8 illustrates an example computing system. Multiprocessor system 800 is an interface system, and includes multiple processors or cores, including a first processor 870 and a second processor 880 coupled via an interface 850, such as a point-to-point (P-P) interconnect, fabric, and/or bus. In some examples, the first processor 870 and the second processor 880 are homogenous. In some examples, the first processor 870 and the second processor 880 are heterogeneous. Although the example system 800 is shown with two processors, the system may have three or more processors, or may be a single processor system. In some embodiments, the computing system is a SoC, for example, having a neuromorphic chip as shown in fig. 5.
Processors 870 and 880 are shown including integrated memory controller (integrated memory controller, IMC) circuits 872 and 882, respectively. Processor 870 also includes interface circuits 876 and 878; similarly, the second processor 880 includes interface circuits 886 and 888. Processors 870, 880 may exchange information via an interface 850 using interface circuits 878, 888. IMCs 872 and 882 couple processors 870, 880 to respective memories, namely a memory 832 and a memory 834, which may be portions of main memory locally attached to the respective processors.
Processors 870, 880 may each exchange information with a network interface (network interface, NW I/F) 890 via respective interfaces 852, 854 using interface circuits 876, 894, 886, 898. Network interface 890 (e.g., one or more of an interconnect, bus, and/or fabric, in some examples a chipset) may optionally exchange information with coprocessor 838 via interface circuit 892. In some examples, the coprocessor 838 is a special-purpose processor, such as a high-throughput processor, network or communication processor, compression engine, graphics processor, general-purpose graphics processing unit (general purpose graphics processing unit, GPGPU), neural Network Processing Unit (NPU), embedded processor, or the like.
A shared cache (not shown) may be included in either processor 870, 880, or external to both processors but connected to the processors via an interface (e.g., a P-P interconnect), such that: local cache information for either or both processors may also be stored in the shared cache if a processor is placed in a low power mode.
The network interface 890 may be coupled to the first interface 816 via an interface circuit 896. In some examples, the first interface 816 may be an interface such as a peripheral component interconnect (Peripheral Component Interconnect, PCI) interconnect, a PCI Express (PCI Express) interconnect, or another I/O interconnect. In some examples, the first interface 816 is coupled to a power control unit (power control unit, PCU) 817, the PCU 817 may include circuitry, software, and/or firmware to perform power management operations with respect to the processors 870, 880 and/or co-processor 838. The PCU 817 provides control information to a voltage regulator (not shown) to cause the voltage regulator to generate an appropriate regulated voltage. The PCU 817 also provides control information to control the generated operating voltage. In various examples, PCU 817 may include various power management logic units (circuits) to perform hardware-based power management. Such power management may be entirely processor controlled (e.g., by various processor hardware and may be triggered by workload and/or power constraints, thermal constraints, or other processor constraints), and/or power management may be performed in response to an external source (e.g., a platform or power management source or system software).
PCU 817 is illustrated as residing as separate logic from processor 870 and/or processor 880. In other cases, PCU 817 may execute on a given one or more of the cores (not shown) of processor 870 or 880. In some cases, PCU 817 may be implemented as a microcontroller (dedicated or general purpose) or other control logic configured to execute its own dedicated power management code (sometimes referred to as P-code). In still other examples, the power management operations to be performed by PCU 817 may be implemented external to the processor, such as by a separate power management integrated circuit (power management integrated circuit, PMIC) or another component external to the processor. In still other examples, the power management operations to be performed by PCU 817 may be implemented within a BIOS or other system software.
Various I/O devices 814 and bus bridge 818 may be coupled to first interface 816, with bus bridge 818 coupling first interface 816 to second interface 820. In some examples, additional processor(s) 815 are coupled to first interface 816, such as coprocessors, high-throughput integrated many-core (many integrated core, MIC) processors, gpgpgpu, accelerators (e.g., graphics accelerators or digital signal processing (digital signal processing, DSP) units), field programmable gate arrays (field programmable gate array, FPGAs), or any other processor (e.g., a processor according to the present disclosure). In some examples, the second interface 820 may be a Low Pin Count (LPC) interface. Various devices may be coupled to the second interface 820, including, for example, a keyboard and/or mouse 822, a communication device 827, and a storage circuit 828. The storage circuit 828 may be one or more non-transitory machine-readable storage media, such as a disk drive or other mass storage device, which in some examples may include instructions/code and data 830. In addition, an audio I/O824 may be coupled to the second interface 820. Note that other architectures are possible besides the point-to-point architecture described above. For example, a system such as multiprocessor system 800 may implement a multi-drop interface or other such architecture, rather than a point-to-point architecture.
Fig. 9 illustrates a block diagram of an example processor and/or SoC 900, which processor and/or SoC 900 may have one or more cores and have an integrated memory controller. The processor 900 illustrated in solid line boxes has a single core 902 (a), a system agent unit circuit 910, and a set of one or more interface controller unit circuits 916, while the optionally added dashed line boxes illustrate the alternative processor 900 as having a plurality of cores 902 (a) - (N), a set of one or more integrated memory control unit circuits 914, dedicated logic 908, and a set of one or more interface controller unit circuits 916 in the system agent unit circuit 910.
Different implementations of the processor 900 may include: 1) A CPU, wherein specialized logic 908 is integrated graphics and/or scientific (throughput) logic (may include one or more cores, not shown), cores 902 (a) - (N) are one or more general-purpose cores (e.g., general-purpose ordered cores, general-purpose out-of-order cores, or a combination of both); 2) Coprocessors in which cores 902 (a) - (N) are a large number of specialized cores primarily for graphics and/or scientific (throughput) purposes; and 3) a coprocessor, wherein cores 902 (A) - (N) are a multitude of general purpose ordered cores. Thus, processor 900 may be a general purpose processor, coprocessor or special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, GPGPU (general purpose graphics processing unit), a high-throughput integrated many-core (MIC) coprocessor (including 30 or more cores), embedded processor, or the like. The processor may be implemented on one or more chips. The processor 900 may be part of one or more substrates and/or may be implemented on one or more substrates using any of a variety of process technologies, such as complementary metal oxide semiconductor (complementary metal oxide semiconductor, CMOS), bipolar CMOS (BiCMOS), P-type metal oxide semiconductor (P-type metal oxide semiconductor, PMOS), or N-type metal oxide semiconductor (N-type metal oxide semiconductor, NMOS).
The memory hierarchy includes one or more levels of cache unit circuitry 904 (a) - (N) within cores 902 (a) - (N), a set of one or more shared cache unit circuitry 906, and an external memory (not shown) coupled to the set of integrated memory controller unit circuitry 914. The set of one or more shared cache unit circuits 906 may include one or more intermediate level caches, such as level 2 (L2), level 3 (L3), level 4 (4), or other levels of cache, such as Last Level Cache (LLC), and/or combinations of these. While in some examples interface network circuitry 912 (e.g., a ring interconnect) provides an interface to dedicated logic 908 (e.g., integrated graphics logic), the set of shared cache unit circuitry 906, and system agent unit circuitry 910, alternative examples use any number of well-known techniques to provide an interface to these units. In some examples, coherency is maintained between one or more of the shared cache unit circuits 906 and the cores 902 (a) - (N). In some examples, interface controller unit circuit 916 couples these cores to one or more other devices 918, such as one or more I/O devices, storage, one or more communication devices (e.g., wireless network, wired network, etc.), and so forth.
In some examples, one or more of cores 902 (A) - (N) have multi-threading capabilities. System agent unit circuitry 910 includes those components that coordinate and operate cores 902 (A) - (N). The system agent unit circuit 910 may include, for example, a power control unit (power control unit, PCU) circuit and/or a display unit circuit (not shown). The PCU may be (or may include) logic and components required to adjust the power states of cores 902 (a) - (N) and/or dedicated logic 908 (e.g., integrated graphics logic). The display element circuit is used to drive one or more externally connected displays.
Cores 902 (a) - (N) may be homogenous in terms of instruction set architecture (instruction set architecture, ISA). Alternatively, cores 902 (A) - (N) may also be heterogeneous with respect to the ISA; that is, a subset of cores 902 (a) - (N) may be capable of executing one ISA, while other cores may be capable of executing only a subset of that ISA or capable of executing another ISA. The processor cores 902 (A) - (N) may employ, in whole or in part, a RISC-V instruction set architecture. Processor cores 902 (a) - (N) may be used, in whole or in part, in a neuromorphic computing system.
Fig. 10 is a block diagram illustrating components capable of reading instructions from a machine-readable or computer-readable medium (e.g., a non-transitory machine-readable storage medium) and performing any one or more of the methods discussed herein, according to some example embodiments. In particular, fig. 10 shows a schematic diagram of a hardware resource 1000, the hardware resource 1000 comprising one or more processors (or processor cores) 1010, one or more memory/storage devices 1020, and one or more communication resources 1030, wherein each of these processors, memory/storage devices, and communication resources may be communicatively coupled via a bus 1040 or other interface circuitry. For embodiments that utilize node virtualization, such as Network Function Virtualization (NFV), the hypervisor 1002 may be executed to provide an execution environment for one or more network slices/sub-slices to utilize the hardware resources 1000.
Processor 1010 may include, for example, a processor 1012 and a processor 1014. The processor 1010 may be, for example, a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP) such as a baseband processor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Radio Frequency Integrated Circuit (RFIC), another processor (including those discussed herein), or any suitable combination thereof.
Memory/storage 1020 may include main memory, a disk storage device, or any suitable combination thereof. Memory/storage 820 may include, but is not limited to, any type of volatile, nonvolatile, or semi-volatile memory such as Dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, solid state memory, and the like.
The communication resources 1030 may include interconnection or network interface controllers, components, or other suitable devices to communicate with one or more peripheral devices 1004 or one or more databases 1006 or other network elements via the network 1008. For example, the communication resources 1030 may include wired communication components (e.g., for coupling via USB, ethernet, etc.), cellular communication components, near Field Communication (NFC) components, and the like, (or->Low energy) component, < >>Components, and other communication components.
The instructions 1050 may include software, programs, applications, applets, applications, or other executable code for causing at least any one of the processors 1010 to perform any one or more of the methods discussed herein. The instructions 1050 may reside, completely or partially, within at least one of the processor 1010 (e.g., in a cache of a processor), the memory/storage 1020, or any suitable combination thereof. Further, any portion of instructions 1050 may be transferred from any combination of peripherals 1004 or databases 1006 to hardware resource 1000. Accordingly, the memory of the processor 1010, the memory/storage 1020, the peripherals 1004, and the database 1006 are examples of computer readable and machine readable media.
Some examples may be implemented with or as an article of manufacture or at least one computer readable medium. The computer readable medium may include a non-transitory storage medium to store logic. In some examples, the non-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. In some examples, logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, APIs, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination of these.
According to some examples, a computer-readable medium may include a non-transitory storage medium to store or maintain instructions that, when executed by a machine, computing device, or system, cause the machine, computing device, or system to perform methods and/or operations in accordance with the described examples. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like. The instructions may be implemented according to a predetermined computer language, manner or syntax, for instructing a machine, computing device, or system to perform a certain function. The instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
One or more aspects of at least one example may be implemented by representative instructions stored on at least one machine-readable medium which represent various logic within a processor, which when read by a machine, computing device, or system, cause the machine, computing device, or system to fabricate logic to perform the techniques described herein. Such a representation, referred to as an "IP core," may be stored on a tangible machine readable medium and provided to various customers or manufacturing facilities for loading into the production machine that actually produces the logic or processor.
The appearances of the phrase "one example" or "an example" are not necessarily all referring to the same example or embodiment. Any aspect described herein may be combined with any other aspect or similar aspect described herein, whether or not the aspects are described with respect to the same drawing or element. The division, omission or inclusion of block functions depicted in the accompanying figures does not infer that the hardware components, circuits, software and/or elements for implementing these functions would be necessarily be divided, omitted, or included in embodiments.
Some examples may be described using the expression "coupled" and "connected" along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, a description using the terms "connected" and/or "coupled" may indicate that two or more elements are in direct physical or electrical contact with each other. However, the term "coupled" may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
The terms "first," "second," and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The terms "a" and "an" herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item. The term "assert" as used herein with reference to a signal refers to a state of the signal in which the signal is active, and which can be achieved by applying any logic level (whether a logic 0 or a logic 1) to the signal. The term "subsequently" or "after" may refer to immediately following or following some other event or events. Other sequences of steps may also be performed according to alternative embodiments. Furthermore, depending on the particular application, additional steps may be added or removed. Any combination of the variations may be used, and many variations, modifications, and alternative embodiments thereof will be understood by those of ordinary skill in the art having the benefit of this disclosure.
Unless specifically stated otherwise, disjunctive language such as the phrase "at least one of X, Y or Z" is understood within the context to generally recite an item, term, etc. may be X, Y or Z, or any combination thereof (e.g., X, Y and/or Z). Thus, such disjunctive language is generally not intended nor should it be implied that certain embodiments require the presence of each of at least one X, at least one Y, or at least one Z. Furthermore, unless specifically stated otherwise, a connectivity language such as the phrase "at least one of X, Y and Z" should also be understood to refer to X, Y, Z or any combination thereof, including "X, Y and/or Z".
Illustrative examples of the devices, systems, and methods disclosed herein are provided below. Embodiments of the devices, systems, and methods may include any one or more of the examples described below, as well as any combination thereof.

Claims (15)

1. A method for neuromorphic computation, comprising:
receiving input data from one or more data sources, wherein the input data is in a non-pulsed input format;
determining coding parameters for pulse coding the input data;
pulse encoding the input data in time steps based on the encoding parameters; and
And after the pulse coding result of each time step is obtained, the pulse coding result is directly sent to the corresponding neuromorphic core for corresponding data processing.
2. The method of claim 1, wherein the encoding parameters include a type of encoding and a sample length.
3. The method of claim 2, wherein pulse encoding results for a plurality of time steps constitute a pulse sequence corresponding to the input data, and wherein a number of the plurality of time steps is equal to a value of the sampling length.
4. The method of claim 2, wherein the encoding type comprises one of: rate coding, first pulse triggered time coding, phase coding, and burst coding.
5. The method of claim 1, wherein the method is implemented by invoking an instruction pair comprising a pulse code setting instruction for indicating how to generate a pulse and a pulse code acquisition instruction for obtaining the pulse code result time step by time step.
6. The method of claim 5, wherein the pulse code setting instruction comprises parameters of: the input data, the coding type and the sample length.
7. The method of any of claims 1-6, wherein the method is performed by a neuromorphic processor or a RISC-V processor.
8. The method of any of claims 1-6, wherein the one or more sources comprise a sensor.
9. A processor, comprising:
at least one processor core, the at least one processor core to perform operations comprising:
receiving input data from one or more data sources, wherein the input data is in a non-pulsed input format;
determining coding parameters for pulse coding the input data;
pulse encoding the input data in time steps based on the encoding parameters; and
and after the pulse coding result of each time step is obtained, the pulse coding result is directly sent to the corresponding neuromorphic core for corresponding data processing.
10. The processor of claim 9, wherein the encoding parameters include an encoding type and a sampling length.
11. The processor of claim 10, wherein pulse encoding results for a plurality of time steps constitute a pulse sequence corresponding to the input data, and wherein a number of the plurality of time steps is equal to a value of the sampling length.
12. The processor of claim 10, wherein the encoding type comprises one of: rate coding, first pulse triggered time coding, phase coding, and burst coding.
13. The processor of claim 9, wherein the processor performs the operation by invoking an instruction pair comprising a pulse code setting instruction and a pulse code acquisition instruction, wherein the pulse code setting instruction is to instruct how to generate a pulse and the pulse code acquisition instruction is to obtain the pulse code result time-step by time-step.
14. The processor of claim 13, wherein the pulse code setting instructions comprise parameters of: the input data, the coding type and the sample length.
15. The processor of any of claims 9-14, wherein the processor comprises a RISC-V core, wherein the RISC-V core has dedicated encoding units for pulse encoding.
CN202311643144.4A 2023-12-04 2023-12-04 Method and apparatus for neuromorphic computing system Pending CN117709420A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311643144.4A CN117709420A (en) 2023-12-04 2023-12-04 Method and apparatus for neuromorphic computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311643144.4A CN117709420A (en) 2023-12-04 2023-12-04 Method and apparatus for neuromorphic computing system

Publications (1)

Publication Number Publication Date
CN117709420A true CN117709420A (en) 2024-03-15

Family

ID=90161600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311643144.4A Pending CN117709420A (en) 2023-12-04 2023-12-04 Method and apparatus for neuromorphic computing system

Country Status (1)

Country Link
CN (1) CN117709420A (en)

Similar Documents

Publication Publication Date Title
EP3739499A1 (en) Grammar transfer using one or more neural networks
US9754221B1 (en) Processor for implementing reinforcement learning operations
EP3754560A1 (en) Weakly-supervised object detection using one or more neural networks
CN112561048B (en) Coordinating and increasing utilization of graphics processors during inference periods
US11934934B2 (en) Convolutional neural network optimization mechanism
US10949743B2 (en) Method and system for implementing reinforcement learning agent using reinforcement learning processor
US10970623B2 (en) System and method for training artificial intelligence systems using a sima based processor
CN114365185A (en) Generating images using one or more neural networks
WO2017124644A1 (en) Artificial neural network compression encoding device and method
Maria et al. Stacked autoencoders using low-power accelerated architectures for object recognition in autonomous systems
JP2018010633A (en) Method for neural network and electronic device
US20210064965A1 (en) Content recommendations using one or more neural networks
US20210064925A1 (en) Video prediction using one or more neural networks
US20200074318A1 (en) Inference engine acceleration for video analytics in computing environments
US20210132688A1 (en) Gaze determination using one or more neural networks
US20210118166A1 (en) Pose determination using one or more neural networks
CN114303160A (en) Video interpolation using one or more neural networks
US20190197391A1 (en) Homeostatic plasticity control for spiking neural networks
CN110825514A (en) Artificial intelligence chip and instruction execution method for artificial intelligence chip
JP2024001329A (en) Video upsampling using one or more neural networks
de Prado et al. Automated design space exploration for optimized deployment of dnn on arm cortex-a cpus
US20220076095A1 (en) Multi-level sparse neural networks with dynamic rerouting
Deng et al. Fast object tracking on a many-core neural network chip
CN117709420A (en) Method and apparatus for neuromorphic computing system
Rościszewski et al. The impact of the ac922 architecture on performance of deep neural network training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination