WO2023230764A1 - 一种光计算***以及芯片 - Google Patents

一种光计算***以及芯片 Download PDF

Info

Publication number
WO2023230764A1
WO2023230764A1 PCT/CN2022/095978 CN2022095978W WO2023230764A1 WO 2023230764 A1 WO2023230764 A1 WO 2023230764A1 CN 2022095978 W CN2022095978 W CN 2022095978W WO 2023230764 A1 WO2023230764 A1 WO 2023230764A1
Authority
WO
WIPO (PCT)
Prior art keywords
optical
data
sub
processing unit
bit
Prior art date
Application number
PCT/CN2022/095978
Other languages
English (en)
French (fr)
Inventor
李芮
周雷
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/095978 priority Critical patent/WO2023230764A1/zh
Publication of WO2023230764A1 publication Critical patent/WO2023230764A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06EOPTICAL COMPUTING DEVICES; COMPUTING DEVICES USING OTHER RADIATIONS WITH SIMILAR PROPERTIES
    • G06E3/00Devices not provided for in group G06E1/00, e.g. for processing analogue or hybrid data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation

Definitions

  • the present application relates to the field of optical computing, and in particular, to an optical computing system and a chip.
  • optical devices can be integrated into one chip.
  • Multiple optical devices process optical signals to achieve mathematical calculations, that is, optical calculations are performed.
  • AI artificial intelligence
  • Optical computing opens up a different technical path from AI computing, which can effectively avoid the above problems and effectively increase the computing speed without relying on advanced processes or greatly increasing power consumption.
  • the analog computing architecture of optoelectronic fusion in order to achieve data multiplication, the corresponding data is digital-to-analog converted to obtain analog signals, which are then input to the optical modulator in the optical computing system to complete the electro-optical conversion, and further process these data in the optical domain. Multiplication operations.
  • the signal input to the optical computing system is an analog signal, that is, a multi-level signal with a certain voltage amplitude in the electrical domain, such as a 4-level electrical signal.
  • a system that can support multi-level electrical signals. modulator and increase the driving voltage loaded on the modulator, resulting in greater power consumption and cost.
  • This application provides an optical computing system and chip, which can reduce power consumption.
  • the present application provides an optical computing system, wherein the first optical processing array may include a plurality of first optical processing units, wherein the first optical processing unit is configured to perform the processing according to the first optical processing unit separated from the first data.
  • the sub-data outputs a first optical signal, the first optical signal indicates the first sub-data, and the first sub-data is a bit of the first data; the second optical processing unit in the second optical processing array is used to respond to a first optical signal.
  • the second optical processing unit is an optical switch.
  • the electrical signal input to the optical domain is an analog electrical signal indicating the first data
  • a digital-to-analog converter is not needed, and the digital electrical signal indicating the first data is directly input to the optical domain.
  • the first data is binary data
  • the first data may include multiple first sub-data
  • each first sub-data may be a bit of the first data
  • each bit may be 0 or 1. .
  • each first optical processing unit only needs to be able to support inputting 2 levels of electrical signals and outputting 2 levels of optical signals.
  • the optical modulator needs to be relatively Low signal driving voltage (generally speaking, the multi-level driving voltage of the analog solution is as high as several volts, but in the embodiment of the present application, only a hundred millivolt driving voltage is required to achieve 2-level electro-optical conversion, and the voltage needs to be deployed before the optical modulator. amplifier, and this solution does not require additional devices and corresponding power consumption).
  • the input first data is carried on the digital signal through multiple first sub-data, no additional digital-to-analog conversion is required. converter, analog-to-digital converter and corresponding power consumption.
  • DACs digital-to-analog converters
  • ADCs analog-to-digital converters
  • DACs digital-to-analog converters
  • ADCs analog-to-digital converters
  • the size of the optical device used in the embodiment of this application is at the micron level, and the power consumption is at the milliwatt level. Therefore, this solution has greater benefits in terms of power consumption and size.
  • Using mature processes, such as silicon photonics 130nm process monolithic integration of optical devices can be achieved, which also greatly reduces the difficulty of packaging.
  • the multi-unit array architecture in the embodiment of the present application also has the advantage of scalable accuracy, that is, after increasing the number of optical processing units, higher-precision optical bit operation processing can be achieved.
  • the second sub-data itself is carried in the electrical signal.
  • the optical switch is used to realize the multiplication operation of the first sub-data and the second sub-data in the optical domain, which is equivalent to using the optical switch to realize electro-optical conversion and multiplication operation. , no additional devices are required to achieve electro-optical conversion, which can reduce the number of required devices.
  • the above-mentioned process of sub-data separation of the first data to obtain the first sub-data can be implemented by a device with a data separation function, or by inputting each bit in the first data to a preset data path.
  • data separation is equivalent to being realized.
  • the process of separating sub-data of the second data may refer to the first data, and the similarities will not be described again.
  • the second optical processing unit is specifically configured to: obtain an optical signal corresponding to the second sub-data based on the electrical signal; and execute the first sub-data based on the first optical signal and the optical signal corresponding to the second sub-data.
  • the product operation of the data and the second sub-data is performed to output the second optical signal.
  • timing sequence of electro-optical conversion and product operation is not limited here.
  • the optical switch performs the above two functions at the same time.
  • the first optical processing unit is a two-level driven optical device.
  • each first optical processing unit only needs to be able to support the input of 2 levels of electrical signals and output of 2 levels of optical signals.
  • the input data is carried on a digital signal. , no additional digital-to-analog converters, analog-to-digital converters and corresponding power consumption are required.
  • the second optical processing unit is a two-level driven optical device.
  • each second optical processing unit only needs to be able to support the input of 2 levels of electrical signals and output of 2 levels of optical signals.
  • the input data is carried on the digital signal. , no additional digital-to-analog converters, analog-to-digital converters and corresponding power consumption are required.
  • the driving voltage of the first light processing unit is less than 1k millivolts, for example, it can be 100 millivolts, 200 millivolts, 300 millivolts, 400 millivolts, 500 millivolts, 600 millivolts, or 700 millivolts. volts, 800 millivolts, 900 millivolts, etc.
  • each first optical processing unit only needs to be able to support the input of 2 levels of electrical signals and output of 2 levels of optical signals.
  • the optical modulator needs a lower signal Driving voltage (that is, no amplifier needs to be deployed before the optical modulator, and thus no additional components and corresponding power consumption are required).
  • the driving voltage of the second light processing unit is less than 1k millivolts, for example, it can be 100 millivolts, 200 millivolts, 300 millivolts, 400 millivolts, 500 millivolts, 600 millivolts, or 700 millivolts. volts, 800 millivolts, 900 millivolts, etc.
  • each second optical processing unit only needs to be able to support the input of 2 levels of electrical signals and output of 2 levels of optical signals.
  • the optical modulator needs a lower signal Driving voltage (that is, no amplifier needs to be deployed before the optical modulator, and thus no additional components and corresponding power consumption are required).
  • the signal input frequency of the first data is higher, so the first optical processing unit needs to be able to support a higher signal input frequency (that is, the maximum signal input Frequency (the maximum signal input frequency here can be understood as the property of the device itself), correspondingly, the cost of an optical processing unit that supports higher signal input frequencies is also higher.
  • the second data is coefficient data (weight)
  • the refresh frequency of the second data as coefficient data is lower than the calculation frequency, so the second optical processing unit can select a low-speed device (compared to the first optical processing unit ). That is, the signal input frequency supported by the first optical processing unit may be greater than the signal input frequency supported by the second optical processing unit, thereby reducing the cost of the selected second optical processing unit.
  • the second light processing unit is an optical switch.
  • the multiplication unit (second optical processing unit) only needs to realize two states [0] or [1], so an optical switch can be used; while in the prior art , the coefficient multiplication unit needs to implement PAM-N electro-optical modulation, so a modulator needs to be used. The power consumption and cost of the optical switch are smaller than that of the modulator.
  • each first sub-light processing unit in the first light processing array is arranged in an orderly manner, and each input first sub-data is input to the first sub-unit in order from high to low in the first data. in each first sub-light processing unit.
  • the first optical processing unit is specifically configured to output the first optical signal according to the first sub-data carried in the digital electrical signal; or,
  • the second optical processing unit is configured to output a second optical signal according to a first optical signal and second sub-data carried in the digital electrical signal.
  • the first optical processing array includes M first optical processing units, the first data includes M bits, the M first optical processing units are used to output M first optical signals, and M The first optical signals indicate the first data, and M is an integer greater than 1.
  • the optical computing system further includes: a beam splitter.
  • the first optical processing unit is specifically configured to output the first optical signal to a beam splitter, and the beam splitter is configured to power divide the first optical signal to obtain N first optical signals, and divide the N first optical signals into Each of the first optical signals is output to a second optical processing unit, the N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the beam splitter may be a 1 ⁇ N power beam splitter.
  • the beam splitter may output the N first optical signals to the second optical processing array.
  • the second light processing array can implement product operations between each first sub-data of the first data and each second sub-data of the second data.
  • each second sub-light processing unit in the second light processing array is arranged in an orderly manner, and each input second sub-data is input to the second sub-data in order from high to low in the second data. in each second sub-light processing unit of each group.
  • the second data may include N second sub-data.
  • each of the first sub-data An optical processing unit can specifically output a first optical signal to a beam splitter, and the beam splitter can divide the power of the first optical signal to obtain N first optical signals (for example, the power of the first optical signal can be equally divided into N first optical signals), N first optical signals indicate the same first sub-data, that is, each first optical signal among the N first optical signals indicates the same first sub-data.
  • optical signals before and after power division are named with the same name, which does not limit the optical signals before and after power division to be completely consistent, but can mean that the data indicated by the optical signals before and after power division is: same data.
  • the second data includes N bits
  • the second light processing array includes multiple groups of light processing units
  • each group of light processing units includes N second light processing units
  • the beam splitter is specifically used to The N first optical signals are output to the same group of optical processing units in the second optical processing array.
  • the second light processing array may include M groups of second light processing units, each group of second light processing units may include N second light processing units, and each second light processing unit may obtain the Two optical signals, the second optical signal indicates the product result of the first sub-data and the second sub-data, the second optical processing array can obtain M*N second optical signals, and the M*N second optical signals can indicate the first The product operation result between each first sub-data of the data and each second sub-data of the second data.
  • the second optical processing array is specifically configured to output the second optical signal to the optical full adder, so that the optical full adder obtains multiple third optical signals according to the multiple second optical signals, each The third optical signal indicates a third sub-data in the third data, the third data is the product of the first data and the second data, and the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • part of the data (for example, the last bit) obtained by the second optical processing array can be directly output without passing through the optical full adder, and then be the last bit of the product of the first data and the second data.
  • the optical full adder can also be in the form of an array. Specifically, it can include multiple optical full adders. Each optical full adder can operate on two objects (or it can also include the result of the previous addition operation). Carry result) is added.
  • the optical full adder may be a nonlinear adder.
  • it can be a photonic crystal structure, a coupling structure based on highly nonlinear materials (such as a nonlinear waveguide adder), or others.
  • Adders based on nonlinearity can be implemented through a single structure (excluding the coupling structure of subunits), thereby reducing the number and complexity of devices.
  • the optical full adder is used to input a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates an electrical signal indicating the third data according to the plurality of third optical signals.
  • the above-described optical processing system processes only the multiplication operation of two data objects (for example, 2-bit *2-bit), or greater than the last product operation of the multiplication operation of two objects (for example, 2-bit*2-bit*2-bit, the first multiplication operation can get 4-bit*2-bit, 4-bit *2-bit is the last product operation), and the output of the optical full adder (multiple third optical signals) can represent the final result of the multiplication operation, which needs to be passed to the electrical domain. Therefore, the output of the optical full adder can be passed to the PD, and the PD can generate an electrical signal indicating the third data according to the plurality of third optical signals (or, described as photoelectric conversion of the plurality of third optical signals).
  • the third data is binary data, and each third optical signal indicates one bit of the third data;
  • the optical computing system further includes: a third optical processing array, including a plurality of third optical processing units , each third optical processing unit is configured to output a fourth optical signal according to a third optical signal and the fourth sub-data in the fourth data, the fourth optical signal indicates a bit indicated by the third optical signal and the fourth The product of the subdata.
  • the output of the optical full adder (multiple third optical signals) can be expressed
  • the intermediate result of the multiplication operation needs to be passed to the next-level light processing array (that is, the third light processing array in the embodiment of the present application), so that the third light processing array can perform the next product operation. . Therefore, the output of the optical full adder can be passed to the third optical processing array.
  • the first optical processing unit is an optical modulator or an optical switch.
  • the optical modulator may be a low driving voltage electro-optical modulator, such as a silicon photonic integrated microring modulator, a Mach-Zehnder modulator, or others. It should be noted that, optionally, The operating frequency of the optical modulator is consistent with the calculated frequency.
  • the optical switch may be a silicon photo-integrated electro-optical Mach-Zehnder interferometer, a thermo-optical Mach-Zehnder interferometer, or others.
  • each first optical processing unit is specifically configured to modulate the first sub-data of the first data into the laser carrier provided by the light source to output the first optical signal.
  • the system further includes: a light source.
  • the above-mentioned first data and second data may be two matrix elements that need to be multiplied in the product between matrices (for example, if you want to perform the product operation of matrix A and matrix B, the first One data can be an element of matrix A, and the second data can be an element of matrix B).
  • the above-described optical processing system (including the first optical processing array, the second optical processing array and the optical full adder) can execute the first optical processing array.
  • the optical processing system may also include multiple subsystems similar to the first optical processing array, the second optical processing array and the optical full adder. Each subsystem can implement the multiplication between the matrices.
  • each subsystem can obtain a product result (for example, a subsystem including a first light processing array, a second light processing array, and an optical full adder can obtain multiple third Optical signal), the optical processing system may also include an optical full adder for adding the product results obtained by each subsystem to obtain the product result between matrices.
  • a product result for example, a subsystem including a first light processing array, a second light processing array, and an optical full adder can obtain multiple third Optical signal
  • the optical processing system may also include an optical full adder for adding the product results obtained by each subsystem to obtain the product result between matrices.
  • embodiments of the present application can also be used for direct calculation of optical signals, such as optical fiber distributed acoustic sensing (DAS) systems or laser radar (Light Detection and Ranging, LiDAR) systems.
  • DAS optical fiber distributed acoustic sensing
  • LiDAR laser radar
  • optical computing system includes:
  • a second optical processing array including a second optical processing unit configured to execute the first sub-data according to the first optical signal and the second sub-data separated from the second data. and the product operation of the second sub-data to output a second optical signal, wherein the second sub-data is one bit of the second data, the second data is binary data, and the second data Carried in an electrical signal, the second optical signal indicates a product result of the first sub-data and the second sub-data, and the second optical processing unit is an optical switch.
  • the second light processing unit is specifically used for:
  • the optical signal corresponding to the second sub-data is obtained
  • a product operation of the first sub-data and the second sub-data is performed to output the second optical signal.
  • the second optical processing unit is a two-level driven optical device.
  • the driving voltage of the second light processing unit is less than 1k millivolts.
  • each second optical processing unit is configured to output a second optical signal according to the first optical signal and the second sub-data carried in the digital electrical signal.
  • the second data includes N bits
  • the second light processing array includes multiple groups of light processing units
  • each group of light processing units includes N second light processing units
  • the beam splitter is used to combine the first
  • the optical signal is power divided to obtain N first optical signals (for example, the power of the first optical signal can be equally divided into N first optical signals), and each of the N first optical signals is Output to a second optical processing unit, N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the second optical processing array is specifically configured to output the second optical signal to the optical full adder, so that the optical full adder obtains multiple third optical signals according to the multiple second optical signals, each The third optical signal indicates a third sub-data in the third data, the third data is the product of the first data and the second data, and the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • the optical full adder is used to input a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates an electrical signal indicating the third data according to the plurality of third optical signals.
  • the optical full adder is a nonlinear adder.
  • the third data is binary data, and each third optical signal indicates one bit of the third data; the optical computing system further includes:
  • the third optical processing array includes a plurality of third optical processing units.
  • Each third optical processing unit is configured to output a fourth optical signal according to a third optical signal and fourth sub-data in the fourth data.
  • the fourth optical signal is The signal indicates a product of a bit indicated by the third optical signal and the fourth sub-data.
  • the second light processing unit is an optical switch.
  • this application provides an optical computing chip, which includes a system as described in any one of the first aspects, or a system as described in any one of the second aspects, an input interface, and an output interface;
  • the input interface is communicatively connected with the processor or the memory, and is used to obtain the first data and the second data sent from the processor or the memory;
  • the output interface is communicatively connected with the processor or the memory, and is used to transfer the product result obtained according to the first data and the second data to the processor or the memory.
  • this application provides a computing device, including a processor and an optical computing chip as described in the third aspect; the processor is communicatively connected to the optical computing chip.
  • the computing device is a terminal device or a server.
  • the application provides an optical computing method.
  • the method is applied to an optical computing system, a first optical processing unit, the second optical processing array includes a second optical processing unit; the second optical processing unit is an optical processing unit. switch; the method includes:
  • the first optical processing unit outputs a first optical signal according to the first sub-data separated from the first data, wherein the first data is binary data and the first sub-data is the first data one bit of;
  • the second light processing unit performs a product operation of the first sub-data and the second sub-data according to the first light signal and the second sub-data separated from the second data, and outputs a second light signal, wherein the second sub-data is one bit of the second data, the second data is binary data, the second data is carried in an electrical signal, and the second optical signal indicates the
  • the product of the first sub-data and the second sub-data, the second optical processing unit is an optical switch.
  • outputting the first optical signal according to the first sub-data separated from the first data includes:
  • a product operation of the first sub-data and the second sub-data is performed to output the second optical signal.
  • the first optical processing unit is a two-level driven optical device.
  • the second optical processing unit is a two-level driven optical device.
  • the driving voltage of the first light processing unit is less than 1k millivolts.
  • the driving voltage of the second light processing unit is less than 1k millivolts.
  • the maximum signal input frequency of the first optical processing unit is greater than the maximum signal input frequency of the second optical processing unit.
  • outputting the first optical signal according to the first sub-data separated from the first data includes:
  • the first optical signal and the second sub-data separated from the second data it includes:
  • a second optical signal is output based on the first optical signal and the second sub-data carried in the digital electrical signal.
  • the first optical processing array includes M first optical processing units, the first data includes M bits, the M first optical processing units are used to output M first optical signals, and M The first optical signals indicate the first data, and M is an integer greater than 1.
  • the optical computing system further includes: a beam splitter.
  • the method also includes:
  • Each first optical processing unit outputs a first optical signal to a beam splitter.
  • the beam splitter is used to divide the power of the first optical signal to obtain N first optical signals, and convert each of the N first optical signals into The first optical signal is output to a second optical processing unit, and N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the second data includes N bits
  • the second optical processing array includes multiple groups of optical processing units
  • each first optical processing unit is specifically configured to output the first optical signal to the beam splitter.
  • the beam converter is used to divide the power of the first optical signal to obtain N first optical signals, and output each of the N first optical signals to a second optical processing unit, and the N first optical signals
  • the signal indicates the same first sub-data, and N is an integer greater than 1.
  • the method further includes: the second optical processing array outputs the second optical signal to the optical full adder, so that the optical full adder obtains a plurality of third optical signals according to the plurality of second optical signals, each A third optical signal indicates a third sub-data in the third data, the third data is the product of the first data and the second data, and the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • the method further includes: the optical full adder inputs a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates an electrical signal indicating the third data according to the plurality of third optical signals. Signal.
  • the third data is binary data, and each third optical signal indicates one bit of the third data;
  • the optical computing system further includes: a third optical processing array, including a plurality of third optical processing units ;
  • the method also includes: each third optical processing unit outputs a fourth optical signal according to a third optical signal and fourth sub-data in the fourth data, and the fourth optical signal indicates a bit sum indicated by the third optical signal. The product result of the fourth sub-data.
  • the first optical processing unit is an optical modulator or an optical switch.
  • the method further includes: each first optical processing unit modulating the first sub-data of the first data into the laser carrier provided by the light source to output the first optical signal.
  • the application provides an optical computing method.
  • the method is applied to an optical computing system.
  • the optical computing system includes: a second optical processing array, including a plurality of second optical processing units; the method includes:
  • the second optical processing unit performs a product operation of the first sub-data and the second sub-data according to the first optical signal and the second sub-data separated from the second data, and outputs a second optical signal,
  • the second sub-data is one bit of the second data
  • the second data is binary data
  • the second data is carried in an electrical signal
  • the second optical signal indicates the first
  • the second light processing unit is an optical switch.
  • Optical signals including:
  • the optical signal corresponding to the second sub-data is obtained
  • a product operation of the first sub-data and the second sub-data is performed to output the second optical signal.
  • the second optical processing unit is a two-level driven optical device.
  • the driving voltage of the second light processing unit is less than 1k millivolts.
  • the second sub-data separated from the second data based on the first optical signal includes: based on the first optical signal and the second sub-data carried in the digital electrical signal. , output the second optical signal.
  • the second data includes N bits
  • the second optical processing array includes multiple groups of optical processing units, each group of optical processing units includes N second optical processing units
  • the method further includes: The beam converter divides the power of the first optical signal to obtain N first optical signals (for example, the power of the first optical signal can be equally divided into N first optical signals), and divides each of the N first optical signals into The first optical signals are output to a second optical processing unit, the N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the method further includes: the second optical processing array outputs a second optical signal to an optical full adder, so that the optical full adder obtains a plurality of third optical signals based on the plurality of second optical signals.
  • each third optical signal indicates a third sub-data in the third data
  • the third data is the product of the first data and the second data
  • the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • the method further includes: the optical full adder inputs a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates indicating third data according to the plurality of third optical signals. electrical signal.
  • the optical full adder is a nonlinear adder
  • the third data is binary data, and each third optical signal indicates one bit of the third data;
  • the optical computing system further includes: a third optical processing array, including a third optical processing unit;
  • the method also includes:
  • Each third optical processing unit outputs a fourth optical signal according to a third optical signal and the fourth sub-data in the fourth data.
  • the fourth optical signal indicates a bit indicated by the third optical signal and the fourth sub-data. product result.
  • the second light processing unit is an optical switch.
  • this application also provides a computer program product, including a program code.
  • the instructions included in the program code are executed by a computer to implement the optical computing in the fifth aspect and any possible implementation manner of the fifth aspect. method, or the sixth aspect and the light calculation method in any possible implementation of the sixth aspect.
  • this application also provides a computer-readable storage medium.
  • the computer-readable storage medium is used to store program codes.
  • the instructions included in the program codes are executed by the computer to implement the fifth aspect and any one of the fifth aspects.
  • the light calculation method in one possible implementation manner, or the light calculation method in any one of the sixth aspect and the sixth aspect possible implementation manner.
  • Figure 1 provides an architectural diagram for this application
  • Figure 2 provides a schematic diagram of the architecture of a light processing system in this application
  • FIG. 3 is a schematic diagram of the architecture of a light processing system provided by this application.
  • Figure 4 is a schematic diagram of the architecture of a light processing system provided by this application.
  • FIG. 5 is a schematic diagram of the architecture of a light processing system provided by this application.
  • Figure 6 provides a schematic diagram of a product operation for this application
  • Figure 7 is a schematic diagram of the architecture of a light processing system provided by this application.
  • Figure 8 is a schematic diagram of the architecture of a light processing system provided by this application.
  • Figure 9 is a schematic diagram of the architecture of a light processing system provided by this application.
  • Figure 10 provides an effect diagram for this application
  • Figure 11 provides a schematic diagram of the architecture of a light processing system for this application.
  • Figure 12 provides a schematic diagram of a product operation for this application.
  • Figure 13 provides an effect diagram for this application
  • Figure 14 provides a schematic diagram of the architecture of a light processing system for this application.
  • Figure 15 provides a schematic diagram of the architecture of a light processing device in this application.
  • Figure 16 provides a schematic diagram of the architecture of an execution device for this application.
  • Figure 17 provides a schematic diagram of the server architecture for this application.
  • Figure 18 provides a schematic diagram of a chip architecture for this application.
  • the terms “substantially”, “about” and similar terms are used as terms of approximation, not as terms of degree, and are intended to take into account measurements or values that would be known to one of ordinary skill in the art. The inherent bias in calculated values. Furthermore, the use of “may” when describing embodiments of the inventive concept means “one or more possible embodiments.” As used herein, the terms “use”, “using”, and “used” may be deemed to be the same as the terms “utilize”, “utilizing”, and “utilize”, respectively. Synonymous with “utilized”. Additionally, the term “exemplary” is intended to refer to an example or illustration.
  • ANN Artificial neural network
  • NN neural network-like network
  • ANN Artificial neural network
  • CNN neural network-like network
  • ANN Artificial neural network
  • NN neural network-like network
  • ANN Artificial neural network
  • CNN neural network-like network
  • MLP multilayer perceptron
  • the algorithm of the neural network system is complex and the amount of calculation is huge, which places high requirements on the calculation efficiency of the data.
  • optical computing has been applied that uses the physical properties of the optical device itself to complete the corresponding mathematical operation process.
  • Deep learning is widely used in image recognition, speech recognition, natural language processing and other fields. Deep learning is a neural network constructed to imitate the human brain, which can achieve better recognition results than traditional shallow learning methods. Due to the complexity of deep learning algorithms and the huge amount of calculations, traditional central processing units are inefficient when processing large-scale operations. Hardware research for AI acceleration has gradually become a research hotspot.
  • this application can be applied to AI-related computing scenarios, such as image-related AI operations, audio-related AI operations, video-related AI operations, text-related AI operations, and so on. More specifically, it can be applied to product operations in AI operations, such as the product operation between input data and coefficient data (for example, it can also be called weight).
  • AI-related computing scenarios such as image-related AI operations, audio-related AI operations, video-related AI operations, text-related AI operations, and so on.
  • product operations in AI operations such as the product operation between input data and coefficient data (for example, it can also be called weight).
  • this application can be applied to an AI optoelectronic fusion computing system.
  • FIG. 1 is a schematic structural diagram of an application system according to an embodiment of the present application.
  • the system may include an electrical domain and an optical domain.
  • the electrical domain may include a processor (such as a CPU, GPU, TPU, etc.) for performing AI operations and Memory (such as RAM), the processor can perform instruction control, the memory can be used to store data, the processor can load the data in the memory (for example, it can include input data and coefficient data) into the optical domain in the form of electrical signals, and the light can be used to store data.
  • a processor such as a CPU, GPU, TPU, etc.
  • Memory such as RAM
  • the processor can perform instruction control
  • the memory can be used to store data
  • the processor can load the data in the memory (for example, it can include input data and coefficient data) into the optical domain in the form of electrical signals
  • the light can be used to store data.
  • the optical processor system (such as an optical modulator, etc.) in the domain performs operations on the data in the dimension of the optical signal (such as the product operation in the embodiment of the present application), converts the operation results into electrical signals and transmits them back to the electrical domain.
  • photons propagate at the speed of light, which is 1 to 2 orders of magnitude higher than that of electrons, and the power consumption of optical devices in the optical domain is only proportional to the clock frequency (while microelectronic chip electrical devices The power consumption is proportional to the cube of the clock frequency).
  • Optical device performance is independent of the manufacturing process.
  • photons unlike electrons, are not limited by the Pauli exclusion principle and can be multiplexed in multiple dimensions such as wavelength and polarization, which can further increase the overall calculation rate.
  • D/A converter also known as a D/A converter, or DAC for short. It usually refers to a device that converts digital signals into analog signals.
  • the D/A converter basically consists of 4 parts, namely the weight resistor network, the operational amplifier, the reference power supply and the analog switch.
  • ADC analog-to-digital converter
  • Optical modulation technology is a modulation technology that superimposes electrical signals carrying information onto carrier light waves.
  • Light modulation can cause certain parameters of light waves, such as amplitude, frequency, phase, polarization state and duration, to change according to certain rules.
  • the device that implements light modulation is called an optical modulator.
  • An optical switch is an optical device with one or more optional transmission ports. Its function is to physically switch or logically operate optical signals in optical transmission lines or integrated optical circuits.
  • a beam splitter (or power divider) is an optical device that splits an incident beam (for example, a laser beam) into two or more beams that may not have the same power.
  • the precision can represent the number of bits contained in the binary data.
  • the data precision of binary data 1011 can be 4.
  • the electrical domain part includes: a processor for instruction control, a memory for storing data, and a drive circuit.
  • the digital-to-analog converter DAC in the drive circuit is used to read the data in the memory (example sexually, the data can be digital-to-analog conversion (input data and coefficient data) to obtain corresponding electrical signals, which are loaded onto the modulator and coefficient multiplication unit respectively.
  • the driving voltage output by the DAC is only a few hundred millivolts. level, it is impossible to effectively drive the modulator to achieve electro-optical conversion.
  • An amplifier needs to be used after the DAC to increase the driving voltage to the volt level.
  • the analog-to-digital converter ADC in the driving circuit can perform analog-to-digital conversion on the data output from the optical domain to the electrical domain. Feeds output data back to memory or processor.
  • the optical domain part may include: a light source (for example, a laser used to generate laser light of a specific wavelength), a modulator, a coefficient multiplication unit, and a photodetector (PD).
  • the modulator will input data (carried on an analog electrical signal). ) is modulated onto the laser carrier to form optical signal input data.
  • the coefficient multiplication unit converts the coefficient data (carried on the analog electrical signal) into optical signal coefficient data, and performs multiplication with the optical signal input data in the optical domain to form an optical signal.
  • Signal output data, the photodetector can photoelectrically convert the optical signal output data to form optical domain output data (that is, electrical signal output data).
  • input data, coefficient data and output data are all analog electrical signals, that is, they are multi-level signals with a certain voltage amplitude in the electrical domain, such as 4-level electrical signals. Therefore, it is necessary to be able to modulate multi-level signals.
  • the modulator of the electrical signal that is, the signal driving voltage loaded on the modulator needs to be increased, which will bring additional power consumption and cost.
  • the existing solution requires a DAC to read the data and perform digital-to-analog conversion.
  • the driving voltage output by the DAC is only in the order of hundreds of millivolts, which cannot effectively drive the modulator to achieve electro-optical conversion.
  • An amplifier needs to be used after the DAC. Increase the driving voltage to the volt level; the existing technology relies on DACs and amplifiers, with high power consumption, high cost, and large device size, resulting in low integration.
  • high-precision DAC also relies heavily on the advanced process of very large scale integration (VLSI) circuits.
  • VLSI very large scale integration
  • ADCs for analog-to-digital conversion, and their accuracy requirements are higher than those of DACs. For example, after multiplying two 4-bit signals, the output signal is 8 bit; if there is addition operation, the output signal accuracy is higher, so the accuracy requirement for DAC is 4 bit, but the accuracy requirement for ADC is 8 bit or even higher. .
  • High-precision ADCs also have the problems of high power consumption, high cost, and low integration. They also rely more on the advanced process of VLSI circuits.
  • FIG. 2 is a schematic diagram of the architecture of an optical computing system provided by an embodiment of the present application.
  • the optical computing system may include a light source 201, and the light source 201 may be a laser 2011 for providing a laser carrier wave.
  • the input light source 201 can use one laser 2011 to provide laser light, and the laser light can be transmitted to a 1 ⁇ M power beam splitter 2012 to obtain M laser lights (for example, the M laser lights have the same wavelength).
  • the input light source 201 can use M lasers 2011 to provide M lasers (one laser 2011 provides one laser, and for example, the M lasers are lasers with the same wavelength).
  • the laser carrier provided by the light source 201 may be a continuous wave (continuous wave, CW) laser, for example, it may be a single-wavelength CW laser.
  • the light source 201 can be a III/V integrated distributed feedback (DFB) laser 2011 or others;
  • the 1 ⁇ M power beam splitter 2012 can be a 1 ⁇ M multi-mode interferometer or a log2 (M) level 1 ⁇ 2 beam splitter cascade or other, this application is not limited.
  • the optical computing system may include a first optical processing array 202, which may load data indicated by electrical signals from the electrical domain (exemplarily, the loading may be modulation ) into the laser carrier wave provided by the light source 201.
  • the optical computing system can perform product operations, and the product operations can be the product of data A and data B.
  • data A can be input data
  • data B can be coefficient data (or called weight).
  • the data indicated by the electrical signal may be the above-mentioned data A. That is to say, the first light processing array 202 may load the data A indicated by the electrical signal from the electrical domain into the laser carrier provided by the light source 201 .
  • the optical computing system may include a first optical processing array 202 , which may load first data from the electrical domain into the laser carrier provided by the light source 201 .
  • the electrical signal input into the optical domain is an analog electrical signal indicating the first data
  • a digital-to-analog converter is not needed, and the digital electrical signal indicating the first data is directly used.
  • Input into the optical domain where the first data is binary data, the first data may include multiple first sub-data, each first sub-data may be one bit of the first data, and each bit may be 0 Or 1.
  • the first data may be 1101 (the corresponding decimal value is 13), and the first data may include 4 bits, specifically: “1", “1", “0” and "1".
  • Each bit of data (that is, the first sub-data) may be input to a first light processing unit 2021 in the first light processing array 202 .
  • 1101 is input to the optical modulator in the optical domain in the form of an analog signal. Since 1101 is data with a precision of 4 bits, the optical modulator needs to support modulation capabilities with 4 bit precision, that is, it can support the input of 2 4 (that is, 16) levels of electrical signals, and outputs 2 4 (that is, 16) levels of optical signals.
  • the power consumption required for the light modulation itself is high, and on the other hand, on the one hand, optical modulators require higher signal drive voltages (that is, an amplifier needs to be deployed before the optical modulator, which in turn requires additional components and corresponding power consumption).
  • the first light processing array 202 may include a plurality of first light processing units 2021 , each of the first light processing units 2021 being configured to output a first light processing unit according to the first sub-data of the first data.
  • An optical signal the first optical signal indicates the first sub-data. That is to say.
  • Each first optical processing unit 2021 can load the first sub-data of the first data into an optical signal (for example, it can be called electro-optical conversion of the first sub-data) to obtain the first optical signal. Signal.
  • each first optical processing unit 2021 only needs to be able to support inputting 2 levels of electrical signals and outputting 2 levels of optical signals.
  • the optical modulator needs to be lower
  • the signal driving voltage generally speaking, the multi-level driving voltage of the analog solution is as high as several volts, but in the embodiment of the present application, only a hundred millivolt driving voltage is required to achieve 2-level electro-optical conversion, and an electrical amplifier needs to be deployed before the optical modulator. , and this solution does not require additional devices and corresponding power consumption).
  • the input first data is carried on the digital signal through multiple first sub-data, no additional digital-to-analog converter is required. , analog-to-digital converter and corresponding power consumption.
  • DACs, amplifiers and ADCs are millimeter-sized devices with power consumption in the watt level, and the packaging with optical devices increases packaging complexity.
  • the size of the optical device used in the embodiment of this application is at the micron level, and the power consumption is at the milliwatt level. Therefore, this solution has greater benefits in terms of power consumption and size.
  • monolithic integration of optical devices can be achieved, which also greatly reduces the difficulty of packaging.
  • the multi-unit array architecture in the embodiment of the present application also has the advantage of scalable accuracy, that is, after increasing the number of optical processing units, higher-precision optical bit operation processing can be achieved.
  • the first light processing array 202 may include M first sub-light processing units, and the light source 201 may provide M laser carriers, where each laser carrier may be input to a first light processing unit.
  • each first optical processing unit 2021 can load a first sub-data in the first data into the corresponding laser carrier to obtain the first optical signal. Then, the first optical processing array 202 can obtain M first optical signals, the M first optical signals may indicate first data.
  • the first light processing array 202 may include four first light processing units (exemplarily, may include the first light processing unit 1, the first light processing unit 2, the first light processing unit 3, and the first light processing unit 202). 4), the first data may be 1101, then bit “1” can be input to the first light processing unit 1, bit “1” can be input to the first light processing unit 2, and bit “0” can be input to The first light processing unit 3 , bit “1” may be input to the first light processing unit 4 .
  • the first optical processing unit 1 can load the bit “1" into the input optical signal to obtain the first optical signal 1 (indicating "1")
  • the first optical processing unit 2 can load the bit “1" into In the input optical signal, to obtain the first optical signal 2 (indicating "1")
  • the first optical processing unit 3 can load the bit “0" into the input optical signal to obtain the first optical signal 3 (indicating "1").
  • the first optical processing unit 4 can load the bit “1” into the input optical signal to obtain the first optical signal 4 (indicating "1").
  • each first sub-light processing unit in the first light processing array 202 is arranged in an orderly manner, and each input first sub-data is input in order from high bit to low bit in the first data.
  • the first light processing array 202 may include a first light processing unit 1, a first light processing unit 2, a first light processing unit 3, and a first light processing unit 4.
  • the first light processing unit 1 is configured to process the input The highest bit in the data
  • the first optical processing unit 2 is configured to process the second highest bit in the input data
  • the first optical processing unit 3 is configured to process the third highest bit in the input data
  • the first light processing unit 4 is configured to process the fourth highest bit in the input data.
  • the first optical processing unit 2021 may be an optical modulator or an optical switch.
  • the first light processing unit 2021 is a two-level driven optical device.
  • each first optical processing unit 2021 only needs to be able to support inputting 2 levels of electrical signals and outputting 2 levels of optical signals.
  • the light modulation itself requires The required power consumption is low.
  • DAC is not required for digital-to-analog conversion, which greatly saves power consumption, size, packaging difficulty, etc.
  • the driving voltage of the first light processing unit 2021 is less than 1k millivolts, for example, it can be 100 millivolts, 200 millivolts, 300 millivolts, 400 millivolts, 500 millivolts, or 600 millivolts. , 700 millivolts, 800 millivolts, 900 millivolts, etc.
  • each first optical processing unit 2021 only needs to be able to support inputting 2 levels of electrical signals and outputting 2 levels of optical signals.
  • the optical modulator needs to have a lower Signal driving voltage (that is, there is no need to deploy an amplifier before the optical modulator, and thus no additional components and corresponding power consumption are required).
  • the optical modulator may be a low driving voltage electro-optical modulator, such as a silicon photonic integrated microring modulator, a Mach-Zehnder modulator, or others. It should be noted that, optionally, The operating frequency of the optical modulator is consistent with the calculated frequency.
  • the modulator in order to record and modulate the intensity of incident light, can be implemented using structures based on different principles such as doped silicon waveguides, electroabsorption modulators, and semiconductor optical amplifiers (semiconductor optical amplifiers, SOA).
  • SOA semiconductor optical amplifiers
  • the incident light intensity can be recorded by detecting the magnitude of the incident photocurrent of the SOA.
  • the intensity of the optical signal passing through the SOA can also be changed by changing the transmittance of light.
  • SOA can be made using semiconductor quantum well materials. Since different voltages can cause different changes in the light transmittance of SOA, the light transmittance of SOA can be changed between 0 and 1 through voltage control. Specifically, when the voltage passing through the SOA is a reverse bias voltage, the incident light will generate a photocurrent in the SOA, and the light intensity distribution can be obtained by detecting the size of the photocurrent.
  • each bit of the object of the product operation (the first data and the second data) (specifically, each first sub-data and the second data of the first data). between each second sub-data), and perform an addition operation between the results of the product operation, for example, refer to 6.
  • the second data carried in the electrical signal may include N second sub-data, in order to realize the communication between each first sub-data of the first data and each second sub-data of the second data.
  • each of the first optical processing units 2021 can specifically output a first optical signal to the beam splitter 204, and the beam splitter 204 can divide the power of the first optical signal to obtain N first optical signals.
  • signal for example, the first optical signal power can be equally divided into N first optical signals
  • the N first optical signals indicate the same first sub-data, that is, each of the N first optical signals
  • the first optical signals all indicate the same first sub-data.
  • the second sub-data itself is carried in the electrical signal.
  • the optical switch is used to realize the multiplication operation of the first sub-data and the second sub-data in the optical domain, which is equivalent to using the optical switch to realize electro-optical conversion and multiplication operation. , no additional devices are required to achieve electro-optical conversion, which can reduce the number of required devices.
  • the first light processing array 202 may include four first light processing units (exemplarily, it may include the first light processing unit 1, the first light processing unit 2, the first light processing unit 3 and the first light processing unit 202). Processing unit 4), the first data can be 1101, and the second data can be 1011, then the bit “1” can be input to the first optical processing unit 1, and the bit “1” can be input to the first optical processing unit 2 , the bit “0” can be input to the first light processing unit 3, and the bit "1" can be input to the first light processing unit 4.
  • the first optical processing unit 1 can load the bit “1" into the input optical signal to obtain the first optical signal 1 (indicating "1")
  • the first optical processing unit 2 can load the bit “1" into In the input optical signal, to obtain the first optical signal 2 (indicating "1")
  • the first optical processing unit 3 can load the bit “0" into the input optical signal to obtain the first optical signal 3 (indicating "1").
  • the first optical processing unit 4 can load the bit “1” into the input optical signal to obtain the first optical signal 4 (indicating "1").
  • the beam splitter 204 may include beam splitter 1, beam splitter 2, beam splitter 3, and beam splitter 4, and the first light processing unit 1 may input the first optical signal 1 (indicating “1") to the beam splitter 1 to obtain 4 first optical signals 1 (indicating “1”), the first optical processing unit 2 may input the first optical signal 2 (indicating “1”) into the beam splitter 2 to obtain 4 The first optical signal 2 (indicating “1”), the first optical processing unit 3 can input the first optical signal 3 (indicating “0”) into the beam splitter 3 to obtain four first optical signals 3 (indicating “0”) "0"), the first optical processing unit 4 can input the first optical signal 4 (indicating "1") into the beam splitter 4 to obtain four first optical signals 4 (indicating "4").
  • beam splitter 204 may be a 1 ⁇ N power beam splitter.
  • the beam splitter 204 may output the N first optical signals to the second light processing array 203 .
  • the second light processing array 203 can implement product operations between each first sub-data of the first data and each second sub-data of the second data.
  • the second light processing array 203 may include a plurality of second light processing units 2031, and each second light processing unit 2031 may implement a product between a set of first sub-data and the second sub-data.
  • the first data may include M pieces of first sub-data
  • the second data may include N pieces of second sub-data, then a total of N*M groups of first sub-data and second sub-data need to be multiplied, That is to say, N*M second sub-light processing units are needed.
  • the second data includes the N bits
  • the second light processing array 203 includes M groups of light processing units
  • each group of the light processing units includes the N second light processing units 2031
  • the beam splitter is used to input the N first light signals to the same group of light processing units in the second light processing array 203. That is to say, each group of light processing units can process the product operation of the same first sub-data and each second sub-data.
  • each of the second optical processing units 2031 can output a second optical signal according to one of the first optical signals and the second sub-data in the second data, wherein the second optical signal
  • the data is binary data
  • the second sub-data is one bit of the second data
  • the second optical signal indicates the product result of the first sub-data and the second sub-data.
  • each second optical processing unit 2031 only needs to be able to support the input of two 2-level electrical signals and output the product result of the two optical signals indicating data (the product result is represented by a 2-level optical signal),
  • the optical modulator requires a lower signal driving voltage (generally speaking, the multi-level driving voltage of the analog solution is as high as several volts, while in the embodiment of the present application, only 2-level electro-optical conversion is required)
  • a hundred millivolt driving voltage requires the deployment of an electrical amplifier before the optical modulator, but this solution does not require additional devices and corresponding power consumption).
  • the input first data passes through multiple first sub- Data is carried on digital signals and does not require additional digital-to-analog converters, analog-to-digital converters and corresponding power consumption.
  • the second light processing array 203 may include M groups of second light processing units 2031, and each group of second light processing units 2031 may include N second light processing units 2031.
  • Each second light processing unit 2031 may The unit 2031 can obtain a second optical signal, which indicates the product result of the first sub-data and the second sub-data, and the first optical processing array 202 can obtain M*N second optical signals, M* The N second optical signals may indicate product operation results between respective first sub-data of the first data and respective second sub-data of the second data.
  • the first light processing array 202 may include four first light processing units (exemplarily, may include the first light processing unit 1, the first light processing unit 2, the first light processing unit 3, and the first light processing unit 202). 4), the second light processing array 203 may include 16 second light processing units (exemplarily, may include the second light processing unit 1, the second light processing unit 2, the second light processing unit 3, the second light processing unit Unit 4, second light processing unit 5, second light processing unit 6, second light processing unit 7, second light processing unit 8, second light processing unit 9, second light processing unit 10, second light processing unit 11.
  • the bit “1" in the second data can be input to the The second light processing unit 1, the bit “0” in the second data can be input to the second light processing unit 2, the bit “1” in the second data can be input to the second light processing unit 3, the second data
  • the bit “1” can be input to the second optical processing unit 4, and the second optical processing unit 1 can obtain the second optical signal 1 according to the first optical signal 1 (indicating "1") and the bit "1" in the second data.
  • the second optical processing unit 2 can obtain the second optical signal 2 (indicating “1”) based on the first optical signal 2 (indicating “1”) and the bit “0” in the second data. 0"), the second optical processing unit 3 can obtain the second optical signal 3 (indicating "0") according to the first optical signal 3 (indicating "0") and the bit “1” in the second data.
  • the optical processing unit 4 can obtain the second optical signal 4 (indicating "1") based on the first optical signal 4 (indicating "1") and the bit "1" in the second data, and by analogy, the second optical signal 4 (indicating "1") can be obtained.
  • each second sub-data in the second data can be simultaneously loaded into the second light processing unit -1/2.../N-1, 1/2... from high bit to low bit.
  • /N-2,...,1/2.../N-N that is, the highest bit of the electrical signal coefficient data is loaded to the second optical processing unit-1-1, the second optical processing unit-2-1,..., On the second light processing unit-N-1, the lowest bit is loaded on the second light processing unit-1-N, the second light processing unit-2-N, ..., and the light opening-N-N.
  • each second sub-light processing unit in the second light processing array 203 is arranged in an orderly manner, and each input second sub-data is input in order from high bit to low bit in the second data. to each second sub-light processing unit of each group.
  • the second light processing unit 2031 is an optical switch. It should be pointed out that because the solution of the present invention adopts bit operation, the multiplication unit only needs to realize two states [0] or [1], so an optical switch can be used; while in the existing technology, the coefficient multiplication unit needs to realize PAM-N Electro-optical modulation requires the use of a modulator. The power consumption and cost of the optical switch are smaller than that of the PAM-N modulator.
  • the second sub-data itself is carried in the electrical signal.
  • the optical switch is used to realize the multiplication operation of the first sub-data and the second sub-data in the optical domain, which is equivalent to using the optical switch to realize electro-optical conversion and multiplication operation. , no additional devices are required to achieve electro-optical conversion, which can reduce the number of required devices.
  • the second light processing unit 2031 is a two-level driven optical device.
  • each second optical processing unit 2031 only needs to be able to support inputting 2 levels of electrical signals and outputting 2 levels of optical signals.
  • the light modulation itself requires The required power consumption is low.
  • DAC is not required for digital-to-analog conversion, which greatly saves power consumption, size, packaging difficulty, etc.
  • the driving voltage of the second light processing unit 2031 is less than 1k millivolts, for example, it can be 100 millivolts, 200 millivolts, 300 millivolts, 400 millivolts, 500 millivolts, or 600 millivolts. , 700 millivolts, 800 millivolts, 900 millivolts, etc.
  • each second optical processing unit 2031 only needs to be able to support inputting 2 levels of electrical signals and outputting 2 levels of optical signals.
  • the optical modulator needs to have a lower Signal driving voltage (that is, there is no need to deploy an amplifier before the optical modulator, and thus no additional components and corresponding power consumption are required).
  • the optical switch may be a silicon photo-integrated electro-optical Mach-Zehnder interferometer, a thermo-optical Mach-Zehnder interferometer, or others.
  • the signal input frequency supported by the first optical processing unit 2021 may be greater than the signal input frequency supported by the second optical processing unit 2031 .
  • the refresh frequency of the second data is lower than the calculation frequency, so the second optical processing unit 2031 can select a low-speed device (compared to the first optical processing unit 2021), thereby reducing the second Power consumption of the light processing unit 2031.
  • each bit of the object of the product operation (the first data and the second data) (specifically, each first sub-data and the second data of the first data). between each second sub-data), and perform an addition operation on the results of the product operation, for example, refer to FIG. 6 .
  • the second optical processing array 203 is specifically configured to output the second optical signal to the optical full adder 205, so that the optical full adder 205 responds to multiple second optical signals. , obtain a plurality of third optical signals, each of the third optical signals indicates a third sub-data in the third data, and the third data is the product of the first data and the second data, The third sub-data is one bit in the third data.
  • the optical full adder 205 can perform pairwise addition and carry for each bit that completes the multiplication operation to complete the output of each bit of the optical signal output data.
  • the partial data (for example, the last bit) obtained by the second optical processing array 203 can be directly output without passing through the optical full adder 205, and then be the last bit of the product of the first data and the second data.
  • the optical full adder 205 may also be in the form of an array. Specifically, it may include multiple optical full adders 205. Each optical full adder 205 may process two objects (or may also include the previous The carry result of the addition operation) performs the addition operation.
  • the optical full adder 205 may be a photonic crystal structure, a coupling structure based on highly nonlinear materials, or others.
  • the above-described optical processing system processes only the multiplication operation of two data objects (for example, 2-bit*2-bit), or greater than the last product operation of the multiplication operation of two objects (for example, 2-bit*2-bit*2-bit, the first multiplication operation can get 4-bit*2-bit, 4-bit*2-bit is the last product operation), and the output of the optical full adder 205 (a plurality of third optical signals) can represent the final result of the multiplication operation, which needs to be passed to the electrical domain.
  • the output of the optical full adder 205 can be passed to the PD, and the PD can generate an electrical signal indicating the third data according to the plurality of third optical signals (or it can be described as photoelectric conversion of the plurality of third optical signals. ).
  • the PD may be in the form of an array, and the PD may include multiple PD units, and each PD unit may perform photoelectric conversion on an optical signal of multiple third optical signals to generate a third optical signal.
  • An electrical signal of a third sub-data of the three data that is, the electrical signal of the third data).
  • the PD can output an electrical signal of the third data to an electrical domain (such as a memory or processor).
  • an electrical domain such as a memory or processor
  • the PD may be a waveguide photodetector integrated with germanium and silicon or others, which is not limited by the embodiments of this application.
  • the output of the optical full adder 205 can represent the intermediate result of the multiplication operation, and the intermediate result needs to be passed to the next-level light processing array (that is, the third light processing array in the embodiment of the present application), so that the third light processing array can perform the next step.
  • the output of the optical full adder 205 can be passed to the third optical processing array.
  • the third data is binary data, and each third optical signal indicates one bit of the third data;
  • the third optical processing array includes a plurality of third optical processing units , each of the third optical processing units is configured to output a fourth optical signal according to one of the third optical signals and fourth sub-data in the fourth data, where the fourth optical signal indicates the one of the third optical signals. The product of the bits indicated by the three optical signals and the fourth sub-data.
  • the above-mentioned first data and second data may be two matrix elements that need to be multiplied in the product between matrices (for example, if you want to perform the product operation of matrix A and matrix B, the first One data can be an element of matrix A, and the second data can be an element of matrix B).
  • the above-described optical processing system (including the first optical processing array 202, the second optical processing array 203 and the optical full adder 205) A product operation between the first data and the second data can be performed.
  • the light processing system can also include a plurality of subsystems similar to the first light processing array 202, the second light processing array 203, and the optical full adder 205.
  • each subsystem can obtain a product result (for example, including the first light processing array 202, the second light processing array 203, and the optical full adder 205
  • the subsystems can obtain multiple third optical signals), and the optical processing system may also include an optical full adder 205 for adding the product results obtained by each subsystem to obtain the product result between matrices.
  • the specific description of each subsystem may refer to the description of the subsystem including the first light processing array 202, the second light processing array 203, and the optical full adder 205 in the above embodiment, and the similarities will not be repeated again.
  • Laser 2011's single wavelength CW laser power is equally divided between 4 inputs to 4 modulators.
  • the input data is 13, that is, [1101], which is loaded on 4 modulators from high to low, that is, the highest bit [1] is loaded on modulator-1, the second bit [1] is loaded on modulator-2, and the highest bit [1] is loaded on modulator-2.
  • the three bits [0] are loaded into modulator-3 and the lowest bit [1] is loaded into modulator-4.
  • the optical signals from high to low are also [1], [1], [0], [1], and the power of each optical signal is equally divided and loaded on the four optical switches. superior.
  • the coefficient data is 11, that is, [1011], then each group of 4 optical switches is loaded with the high bit to the low bit of the coefficient, that is, the highest bit [1] is loaded on the optical switches -1, 5, 9, 13, and the second bit [0] Loaded on optical switches - 2, 6, 10, 14, the third bit [1] is loaded on optical switches - 3, 7, 11, 15, and the lowest bit [1] is loaded on optical switches - 4, 8, 12, 16.
  • the above structure realizes the multiplication of each bit of 4-bit input data and each bit of 4-bit coefficient data, and the multiplication result is output after each optical switch in the form of optical signal intensity.
  • the corresponding bit result after the multiplication needs to be calculated by optical domain addition.
  • the optical switch-16 outputs the result of multiplying the lowest bit of the input data and the lowest bit of the coefficient data, which is directly photoelectrically converted by the PD as the lowest bit of the output data.
  • the optical switch-15 outputs the result of multiplying the lowest bit of the input data and the third bit of the coefficient data
  • the optical switch-12 outputs the result of multiplying the third bit of the input data and the lowest bit of the coefficient data.
  • the two are as The corresponding bits are added by the optical full adder-12.
  • optical switch-14 outputs the result of multiplying the lowest bit of the input data and the second bit of the coefficient data
  • optical switch-11 outputs the result of multiplying the third bit of the input data and the third bit of the coefficient data
  • the output of the optical switch-8 is the multiplication result of the second bit of the input data and the lowest bit of the coefficient data. The three need to be added as corresponding bits, and they need to be added with the carry of the previous optical full adder-12.
  • the optical full adder-11 completes the results of the optical switches-14 and 11 and the optical full adder-
  • the addition result is output to the optical full adder-10, which is added to the result of the optical switch-8, and is photoelectrically modulated by the PD as the sixth bit of the output data.
  • the carry bits of the optical full adder-11 and the optical full adder-10 are output to the next layer of optical full adder.
  • N 4, so the structure has a total of 4 modulators, 16 optical switches, 12 optical full adders and 8 PDs.
  • the above example realizes 4-bit all-optical digital computing, and realizes digital output through digital input and digital operation.
  • the optical devices used are low driving voltage and low voltage. Power consuming devices.
  • the optoelectronic fusion analog computing solution the most outstanding effect is the realization of 4-bit calculation, which cannot be achieved by the existing solution.
  • it also has the beneficial effects of low power consumption, high integration, and no dependence on advanced processes.
  • the power consumption comparison here is estimated based on the calculation rate of 50Gb/s.
  • the specific evaluation method is as follows: There is currently no 4-bit high-speed DAC and RF driver. The power consumption here is estimated as twice that of the 4-channel 2-bit high-speed DAC and RF driver; the power consumption for input data reading is calculated based on 50Gb/s.
  • the power consumption of coefficient data reading is estimated according to the coefficient refresh rate of 1Gb/s;
  • the input modulator power consumption is estimated according to the total dynamic power consumption of 30fJ/bit and static power consumption of 10mW of the silicon optical micro-ring modulator, of which the static power consumption Thermal adjustment power consumption required to stabilize the operating wavelength of the modulator to the 2011 wavelength of the laser;
  • the coefficient modulator is estimated based on the static power consumption of the silicon optical micro-ring modulator 10mW;
  • the low-speed optical switch is estimated based on the silicon optical Mach Zengde optical switch dynamic power consumption of 5mW Estimated;
  • PD is estimated based on the dynamic power consumption of germanium-silicon waveguide photodetector 50fJ/bit;
  • 8-bit ADC power consumption is estimated based on twice the power consumption of 3-bit ADC in 65nm process; 1 ⁇ N power division in the scheme of the present invention
  • the beam detector and optical full adder are passive components and have no power consumption, so they are not listed in the table.
  • ADC, DAC and RF amplifier account for the largest power consumption in the existing solution.
  • the total power consumption of the existing solution is 7665mW, of which 4-bit DAC and RF amplifier
  • the power consumption ratio is ⁇ 44.2%
  • the 8-bit ADC power consumption ratio is ⁇ 52.2%.
  • the technical solution of the present invention adopts bit operation processing and does not require analog-to-digital conversion, digital-to-analog conversion or any electrical signal amplifier, so the power consumption is about 10 times the profit.
  • the largest proportion of power consumption in this solution is the reading power of input data, which accounts for ⁇ 56.2%.
  • the power consumption of laser 2011 accounts for ⁇ 21%, and the bit operation is realized.
  • Optical devices account for only ⁇ 20% of the power consumption.
  • DAC, RF amplifier and ADC are all millimeter-sized devices, and the higher the accuracy, the more dependent they are on the advanced VLSI process.
  • the size of the optical devices used in this solution is at the micron level, and the mature silicon photonic 130nm process can be used to achieve single Chip integration. Therefore, the solution of the present invention also has greater gains in size and optoelectronic packaging integration level, and does not rely on advanced manufacturing processes.
  • the single wavelength CW laser power of Laser 2011 is divided equally into 2 inputs to 2 modulators.
  • the input data is 3, that is, [11], which is loaded on two modulators from high to low, that is, the highest bit [1] is loaded on modulator-1, and the lowest bit [1] is loaded on modulator-2.
  • the optical signals from high to low are also [1] and [1] respectively.
  • the power of each optical signal is loaded on two optical switches.
  • the coefficient data of the first layer is 3, that is, [11], then each group of 2 optical switches is loaded with the high bit to the low bit of the coefficient, that is, the highest bit [1] is loaded on the optical switch-1-1 and the optical switch-2-1 , the lowest bit [1] is loaded on optical switch-1-2 and optical switch-2-2.
  • the above structure realizes the multiplication of each bit of the 2-bit input data with each bit of the first layer 2-bit coefficient data, and the multiplication result is output after each optical switch in the form of optical signal intensity.
  • the corresponding bit result after completion of multiplication needs to be calculated by optical domain addition.
  • What Guangkaiguang-2-2 realizes is the multiplication result of the lowest bit of the input data and the lowest bit of the first layer coefficient, which is directly output as the lowest bit of the first layer output to the second layer multiplication calculation.
  • the light opening light-2-1 realizes the result of multiplying the lowest bit of the input data and the highest bit of the first layer coefficient.
  • the light opening light-1-2 realizes the result of multiplying the highest bit of the input data and the lowest bit of the first layer coefficient.
  • optical switch-1-1 realizes is the multiplication result of the highest bit of the input data and the highest bit of the first layer coefficient. After adding the carry of the optical full adder-1 through the optical full adder-2, the result and the carry are respectively used as the first
  • the second and highest bits of the output data of one layer are output to the multiplication calculation of the second layer.
  • the output result of the first layer that is, the input data of the second layer, is 4-bit.
  • the power of each bit of data is divided equally into 2 and loaded on the 2 optical switches.
  • the coefficient data of the second layer is 3, that is, [11], then each group of 2 optical switches is loaded with the high to low bits of the coefficient.
  • the above structure realizes the multiplication of each bit of the second layer 4-bit input data with each bit of the second layer 2-bit coefficient data, and the multiplication result is output after each optical switch in the form of optical signal intensity.
  • the corresponding bit result after completion of multiplication needs to be calculated by optical domain addition.
  • the principle is the same as the first layer.
  • the output result of the second layer that is, the input data of the third layer, is 6-bit.
  • the power of each bit of data is divided equally into 2 and loaded on the 2 optical switches.
  • the coefficient data of the third layer is 3, that is, [11], then each group of 2 optical switches is loaded with the high to low bits of the coefficient.
  • the optical full adder array implements optical domain addition, and the output of the third layer is 8-bit. After 8 PD photoelectric conversions, the final output result is [01010001], which is 81.
  • This example implements three-layer 2-bit operation processing in the optical domain. Because the input is 2-bit, the structure has a total of 2 modulators; and the coefficient is 2-bit and has three layers, so 24 optical switches, 12 optical full adders and 8 PDs are required.
  • This example implements 2-bit three-layer all-optical digital computing. Like the first example, it has the beneficial effects of low power consumption, high integration, and no dependence on advanced processes.
  • the detailed analysis of the power consumption comparison between this example and the existing technology is as follows in Table 2.
  • the specific evaluation method is basically the same as the first example. The difference is that the 2-bit high-speed DAC and RF driver are evaluated according to actual conditions.
  • the power consumption of this example is 11.4% of the power consumption of the existing solution. Since 2-bit calculation is implemented, the input driver of the existing technology, namely 2-bit DAC and RF amplifier, consumes a relatively small power consumption of ⁇ 9%. The output requires an 8-bit ADC, which accounts for ⁇ 85% of the total power consumption. This example technical solution does not require a high-precision ADC and still saves most of the power consumption while using more optical devices, of which lasers account for ⁇ 28% and input data account for ⁇ 37.2%.
  • each 4-bit modulator group includes 4 modulators to realize electro-optical modulation of 4-bit input data.
  • each group includes 4 low-speed optical switches.
  • the optical switch group in the figure is simplified and drawn as 4 groups.
  • the input data a 11 needs to complete the multiplication with b 11 and b 12
  • the input data a 12 needs to complete the multiplication with b 21 and b 22 ...that is, each input data needs to be connected to two sets of 4-bit optical switch multiplication units, so a total of 8 sets of optical switches are required, that is, 32 optical switches.
  • the optical full adder group implements the addition.
  • the principle is the same as the first example.
  • c 11 a 11 ⁇ b 11 + a 12 ⁇ b 21
  • an additional set of optical full adders is required to complete the addition; the same applies to c 12 , c 21 and c 22 .
  • Each output is 9-bit, so 4 groups of PD arrays are required, each group has 9 PDs. After completing the photoelectric conversion, the final operation result is output.
  • This example implements an optical domain 4-bit 4 ⁇ 4 matrix calculation processor. Similar to the first example, compared with the existing optoelectronic fusion simulation calculation scheme, the most outstanding effect of this example is the realization of 4-bit calculation, which cannot be achieved by the existing scheme. In addition, it also has the beneficial effects of low power consumption, high integration, and no dependence on advanced processes.
  • embodiments of the present application can also be used for direct calculation of optical signals, such as optical fiber distributed acoustic sensing (distributed fiber acoustic sensing, DAS) systems or laser radar (Light Detection and Ranging, LiDAR) systems.
  • optical fiber distributed acoustic sensing distributed fiber acoustic sensing, DAS
  • laser radar Light Detection and Ranging, LiDAR
  • the optical computing system includes: a first optical processing array 202, including a plurality of first optical processing units 2021. Each of the first optical processing units 2021 is used according to the first optical processing unit 2021. A first sub-data of a data outputs a first optical signal, the first optical signal indicates the first sub-data, wherein the first data is binary data, and the first sub-data is the first data one bit; the second light processing array 203 includes a plurality of second light processing units 2031, each of the second light processing units 2031 is configured to perform a second light processing step according to one of the first light signals and the second data.
  • each optical processing unit only needs to be able to support input of 2 levels of electrical signals and output of 2 levels of optical signals.
  • the optical modulator needs lower signal drive Voltage (Generally speaking, the multi-level driving voltage of the analog solution is as high as several volts, but in the embodiment of the present application, only a hundred millivolt driving voltage is required to achieve 2-level electro-optical conversion, and an electric amplifier needs to be deployed before the optical modulator, and this The solution does not require additional devices and corresponding power consumption).
  • the input first data is carried on the digital signal through multiple first sub-data, no additional digital-to-analog converters and analog-to-digital converters are needed. converter and corresponding power consumption.
  • DACs, amplifiers and ADCs are millimeter-sized devices with power consumption in the watt level, and the packaging with optical devices increases packaging complexity.
  • the size of the optical device used in the embodiment of this application is at the micron level, and the power consumption is at the milliwatt level. Therefore, this solution has greater benefits in terms of power consumption and size.
  • monolithic integration of optical devices can be achieved, which also greatly reduces the difficulty of packaging.
  • the multi-unit array architecture in the embodiment of the present application also has the advantage of scalable accuracy, that is, after increasing the number of optical processing units, higher-precision optical bit operation processing can be achieved.
  • This application also provides an optical computing system, which includes:
  • the second optical processing array 203 includes a plurality of second optical processing units 2031.
  • Each second optical processing unit 2031 is configured to output a second optical signal according to a first optical signal and second sub-data carried in the electrical signal.
  • Optical signal wherein the first optical signal indicates first sub-data, the first data is binary data, the first sub-data is one bit of the first data, the second data is binary data, and the The second sub-data is one bit of the second data, the second optical signal indicates the product result of the first sub-data and the second sub-data, and the second optical processing unit 2031 is an optical switch.
  • each second light processing unit 2031 is specifically used to:
  • a product operation of the first sub-data and the second sub-data is performed to output the second optical signal.
  • the second light processing unit 2031 is a two-level driven optical device.
  • the driving voltage of the second light processing unit 2031 is less than 1k millivolts.
  • each second optical processing unit 2031 is configured to output a second optical signal according to one of the first optical signals and the second sub-data carried in the digital signal.
  • the second data includes N bits
  • the second light processing array includes multiple groups of light processing units
  • each group of light processing units includes N second light processing units
  • the beam splitter is used to combine the first
  • the optical signal is power divided to obtain N first optical signals (for example, the power of the first optical signal can be equally divided into N first optical signals), and each of the N first optical signals is Output to a second optical processing unit, the N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the second optical processing array is specifically configured to output the second optical signal to the optical full adder, so that the optical full adder obtains multiple third optical signals according to the multiple second optical signals, each The third optical signal indicates a third sub-data in the third data, the third data is the product of the first data and the second data, and the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • the optical full adder is used to input a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates an electrical signal indicating the third data according to the plurality of third optical signals.
  • the optical full adder is a nonlinear adder.
  • the third data is binary data, and each third optical signal indicates one bit of the third data; the optical computing system further includes:
  • the third optical processing array includes a plurality of third optical processing units.
  • Each third optical processing unit is configured to output a fourth optical signal according to a third optical signal and fourth sub-data in the fourth data.
  • the fourth optical signal is The signal indicates a product of a bit indicated by the third optical signal and the fourth sub-data.
  • the second light processing unit is an optical switch.
  • Embodiments of the present application also provide an optical computing method, which method is applied to an optical computing system.
  • the optical computing system includes a first optical processing array 202 and a second optical processing array 203.
  • the first optical processing array 202 Comprising a plurality of first light processing units 2021
  • the second light processing array 203 includes a plurality of second light processing units 2031; the method includes:
  • Each of the first optical processing units outputs a first optical signal according to the first sub-data of the first data, the first optical signal indicates the first sub-data, wherein the first data is binary data, so The first sub-data is one bit of the first data;
  • Each of the second optical processing units outputs a second optical signal according to one of the first optical signals and second sub-data carried in the electrical signal, wherein the second data is binary data, and the second The sub-data is one bit of the second data, the second optical signal indicates the product result of the first sub-data and the second sub-data, and the second optical processing unit is an optical switch.
  • outputting a second optical signal according to one of the first optical signals and the second sub-data carried in the electrical signal includes:
  • a product operation of the first sub-data and the second sub-data is performed to output the second optical signal.
  • the first optical processing unit is a two-level driven optical device.
  • the second optical processing unit is a two-level driven optical device.
  • the driving voltage of the first light processing unit is less than 1k millivolts.
  • the driving voltage of the second light processing unit is less than 1k millivolts.
  • the signal input frequency supported by the first optical processing unit is greater than the signal input frequency supported by the second optical processing unit.
  • outputting the first optical signal according to the first sub-data of the first data includes:
  • the first optical signals and the second sub-data carried in the electrical signal includes:
  • Each second optical processing unit is configured to output a second optical signal according to a first optical signal and second sub-data carried in the digital electrical signal.
  • the first optical processing array includes M first optical processing units, the first data includes M bits, the M first optical processing units are used to output M first optical signals, and M The first optical signals indicate the first data, and M is an integer greater than 1.
  • the optical computing system further includes: a beam splitter.
  • the method also includes:
  • Each first optical processing unit outputs a first optical signal to a beam splitter.
  • the beam splitter is used to divide the power of the first optical signal to obtain N first optical signals, and divide the N first optical signals into Each first optical signal is output to a second optical processing unit, the N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the second data includes N bits
  • the second optical processing array includes multiple groups of optical processing units
  • each of the first optical processing units is specifically configured to output the first optical signal to the beam splitter.
  • the beam splitter is used to divide the power of the first optical signal to obtain N first optical signals, and output each of the N first optical signals to a second optical processing unit
  • the N first optical signals indicate the same first sub-data
  • the N is an integer greater than 1.
  • the method further includes: the second optical processing array outputs the second optical signal to the optical full adder, so that the optical full adder obtains a plurality of third optical signals according to the plurality of second optical signals, each A third optical signal indicates a third sub-data in the third data, the third data is the product of the first data and the second data, and the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • the method further includes: the optical full adder inputs a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates an electrical signal indicating the third data according to the plurality of third optical signals. .
  • the third data is binary data, and each third optical signal indicates one bit of the third data;
  • the optical computing system further includes: a third optical processing array, including a plurality of third optical processing units ;
  • the method also includes: each third optical processing unit outputs a fourth optical signal according to a third optical signal and fourth sub-data in the fourth data, and the fourth optical signal indicates a bit sum indicated by the third optical signal. The product result of the fourth sub-data.
  • the first optical processing unit is an optical modulator or an optical switch.
  • the method further includes: each first optical processing unit modulating the first sub-data of the first data into the laser carrier provided by the light source to output the first optical signal.
  • Embodiments of the present application also provide an optical computing method.
  • the method is applied to an optical computing system.
  • the optical computing system includes: a second optical processing array, including a plurality of second optical processing units; the method includes:
  • a second optical signal is output according to a first optical signal and second sub-data carried in the electrical signal, wherein the first optical signal indicates the first sub-data, the first data is binary data, and the first sub-data is the first One bit of data, the second data is binary data, the second sub-data is one bit of the second data, the second light signal indicates the product result of the first sub-data and the second sub-data, the second light processing
  • the unit is an optical switch.
  • each of the second light processing units is specifically used to:
  • a product operation of the first sub-data and the second sub-data is performed to output the second optical signal.
  • the second optical processing unit is a two-level driven optical device.
  • the driving voltage of the second light processing unit is less than 1k millivolts.
  • each second optical processing unit is configured to output a second optical signal according to a first optical signal and second sub-data carried in the digital electrical signal.
  • the second data includes N bits
  • the second light processing array includes multiple groups of light processing units
  • each group of light processing units includes N second light processing units
  • the beam splitter is used to combine the first
  • the optical signal is power divided to obtain N first optical signals (for example, the power of the first optical signal can be equally divided into N first optical signals), and each of the N first optical signals is Output to a second optical processing unit, the N first optical signals indicate the same first sub-data, and N is an integer greater than 1.
  • the second optical processing array is specifically configured to output the second optical signal to the optical full adder, so that the optical full adder obtains multiple third optical signals according to the multiple second optical signals, each The third optical signal indicates a third sub-data in the third data, the third data is the product of the first data and the second data, and the third sub-data is a bit in the third data.
  • the optical computing system further includes: an optical full adder.
  • the optical full adder is used to input a plurality of third optical signals to the photodetector PD, so that the photodetector PD generates an electrical signal indicating the third data according to the plurality of third optical signals.
  • the optical full adder is a nonlinear adder.
  • the third data is binary data, and each third optical signal indicates one bit of the third data; the optical computing system further includes:
  • the third optical processing array includes a plurality of third optical processing units.
  • Each third optical processing unit is configured to output a fourth optical signal according to a third optical signal and fourth sub-data in the fourth data.
  • the fourth optical signal is The signal indicates a product of a bit indicated by the third optical signal and the fourth sub-data.
  • the second light processing unit is an optical switch.
  • FIG. 15 is a schematic diagram of an optical computing device provided by an embodiment of the present application.
  • the optical computing device 1500 includes a processor 1501 and a chip 1502, where the chip 1502 may include the optical processing system described in the above embodiment;
  • the processor 1501 is used to transfer multiplication objects (such as first data and second data) to the chip 1502, and receive the product calculation result of the first data and the second data sent by the chip 1502.
  • multiplication objects such as first data and second data
  • the chip 1502 can be integrated with the processor 1501 on the substrate in the form of a system on chip (SoC).
  • SoC system on chip
  • the optical calculator and the processor 1501 can realize the advantages of high-speed communication in close proximity, and at the same time, the processor 1501 is good at logical operations.
  • the chip 1502 has the advantages of high parallelism and light speed execution.
  • the chip in the embodiment of the present application can be placed in a computing device, and the computing device can perform AI operations.
  • the computing device can be a terminal device or a server.
  • the computing device can be a neural network. Execution device or training device for neural networks.
  • FIG 16 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • the execution device 1600 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, etc. are not limited here.
  • the execution device 1600 includes: a receiver 1601, a transmitter 1602, a processor 1603, and a memory 1604 (the number of processors 1603 in the execution device 1600 can be one or more, one processor is taken as an example in Figure 16) And chip 1502, wherein the processor 1603 may include an application processor 16031 and a communication processor 16032.
  • the receiver 1601, the transmitter 1602, the processor 1603, the memory 1604, and the chip 1502 may be connected through a bus or other means.
  • Memory 1604 may include read-only memory and random access memory and provides instructions and data to processor 1603 .
  • a portion of memory 1604 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1604 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1603 controls the execution of operations of the device.
  • various components of the execution device are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the processor 1603 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1603 .
  • the above-mentioned processor 1603 can be a general processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1604.
  • the processor 1603 reads the information in the memory 1604 and completes the steps of the method in combination with its hardware.
  • the receiver 1601 may be configured to receive input numeric or character information and generate signal inputs related to performing relevant settings and functional controls of the device.
  • the transmitter 1602 can be used to output numeric or character information; the transmitter 1602 can also be used to send instructions to the disk group to modify data in the disk group.
  • the chip 1502 can interact with the memory 1604 and the processor 1603. For example, the chip 1502 can obtain an object for product operation (such as first data and second data) from the memory 1604 or the processor 1603, and perform a multiplication on the object of the product operation. The operation is performed and the operation result is returned to the memory 1604 or the processor 1603.
  • an object for product operation such as first data and second data
  • FIG. 17 is a schematic structural diagram of the server provided by the embodiment of the present application.
  • the server 1700 is implemented by one or more servers.
  • the server 1700 can be configured or There are relatively large differences due to different performance, which may include one or more central processing units (CPU) 1717 (for example, one or more processors) and memory 1732, and one or more storage applications 1742 or data 1744 storage medium 1730 (eg, one or more mass storage devices).
  • the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 1717 may be configured to communicate with the storage medium 1730 and execute a series of instruction operations in the storage medium 1730 on the server 1700 .
  • the server 1700 may also include one or more power supplies 1717, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758; or, one or more operating systems 1741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the chip 1502 can interact with the memory 1732 and the central processor 1717.
  • the chip 1502 can obtain an object for product operation (such as the first data and the second data) from the memory 1732 or the central processor 1717, and perform the multiplication operation on the object.
  • the product operation is performed and the operation result is returned to the memory 1732 or the central processor 1717.
  • An embodiment of the present application also provides a computer program product including computer readable instructions, which when run on a computer causes the computer to perform the steps performed by the foregoing execution device, or causes the computer to perform the steps performed by the foregoing training device. A step of.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program for performing signal processing.
  • the program When the program is run on a computer, it causes the computer to perform the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps performed by the aforementioned training device.
  • Figure 18 is a structural schematic diagram of a system provided by an embodiment of the present application.
  • the system may include a neural network processor NPU 1800.
  • the NPU 1800 serves as a co-processor and is mounted to the host CPU. ), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1803.
  • the arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1803 includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1803 is a two-dimensional systolic array.
  • the arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1803 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .
  • the unified memory 1806 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802.
  • Input data is also transferred to unified memory 1806 via DMAC.
  • DMAC Direct Memory Access Controller
  • BIU is the Bus Interface Unit, that is, the bus interface unit 1810, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1809.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1810 (Bus Interface Unit, BIU for short) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .
  • the vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • vector calculation unit 1807 can store the processed output vectors to unified memory 1806 .
  • the vector calculation unit 1807 can apply a linear function; or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;
  • the unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any of the above places can be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above programs.
  • the chip 1502 can interact with the memory and the main CPU.
  • the chip 1502 can obtain the object for product operation (such as the first data and the second data) from the memory or the central main CPU, and perform the product operation on the object of the product operation.
  • the operation result is returned to the memory or the main CPU.
  • the device embodiments described above are only illustrative.
  • the division of modules is only a logical function division, and there may be other division methods in actual implementation.
  • multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
  • the connections between the modules discussed in the above embodiments may be electrical, mechanical or other forms.
  • the modules described as separate components may or may not be physically separated.
  • Components displayed as modules may or may not be physical modules.
  • each functional module in each embodiment of the application embodiment may exist independently or may be integrated into one processing module.
  • An embodiment of the present invention also provides a computer program product for data processing, which includes a computer-readable storage medium storing program code.
  • the program code includes instructions for executing the method flow described in any of the foregoing method embodiments.
  • the aforementioned storage media include: U disk, mobile hard disk, magnetic disk, optical disk, random-access memory (RAM), solid state disk (SSD) or non-volatile memory Memory (non-volatile memory) and other non-transitory (non-transitory) machine-readable media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Optical Modulation, Optical Deflection, Nonlinear Optics, Optical Demodulation, Optical Logic Elements (AREA)

Abstract

一种光计算***,应用于光处理领域,其中,光计算***的第一光处理阵列中的第一光处理单元用于根据从第一数据分离出的第一子数据输出第一光信号,第一光信号指示第一数据的一个比特位,也就是第一子数据,光计算***的第二光处理阵列中的第二光处理单元用于根据第一光信号以及从第二数据中分离出的第二子数据,输出第二光信号,第二子数据为第二数据的一个比特位,第二光信号指示第一子数据和第二子数据的乘积结果。本申请可以降低自身所需的功耗、器件尺寸、并降低了封装难度。

Description

一种光计算***以及芯片 技术领域
本申请涉及光计算领域,尤其涉及一种光计算***以及芯片。
背景技术
随着光学集成技术的快速发展,多个光学器件可以集成在一个芯片中,由多个光学器件通过对光信号的处理,实现数学计算,即是进行了光计算。
近年来,随着科研、气象水文预测、自动驾驶等需求的增长,和人工智能、大数据、云计算等产业的发展,人工智能(artificial intelligence,AI)计算得到了极速发展及广泛应用。随着工艺制程的进步,集成电路的功耗、时延等性能都有大幅度提升,但是电子在逻辑门之间的传输速度依然限制了其整体的运算速率。同时,集成电路的功耗随着计算速率的提升呈指数型增长。
光计算开辟了与AI计算不同的技术路径,可有效规避上述问题,在不依赖先进制程、不极大增加功耗的同时,有效提升计算速率。在光电融合的模拟计算架构中,为实现数据乘法,对相应数据进行数模转换得到模拟信号后,输入到光计算***中的光调制器上,完成电光转换,进一步在光域对这些数据进行乘法运算。
然而,输入到光计算***的信号是模拟信号,即在电域上为具有一定电压幅值的多电平信号,例如4电平电信号,相应的,也需要采用能够支持多电平电信号的调制器,并提高加载在调制器上的驱动电压,功耗与成本较大。
发明内容
本申请提供了一种光计算***以及芯片,可以降低功耗。
第一方面,本申请提供了一种光计算***,其中,第一光处理阵列可以包括多个第一光处理单元,其中,第一光处理单元用于根据从第一数据分离出的第一子数据输出第一光信号,第一光信号指示第一子数据,第一子数据为第一数据的一个比特位;第二光处理阵列中的第二光处理单元用于根据一个第一光信号以及从第二数据中分离出的第二子数据,输出第二光信号,其中,所述第二数据承载在电信号中,第二子数据为第二数据的一个比特位,第二光信号指示第一子数据和第二子数据的乘积结果,第二光处理单元为光开关。
和现有方案中输入到光域的电信号为指示第一数据的模拟电信号不同的是,本申请实施例中,无需数模转换器,而直接将指示第一数据的数字电信号输入到光域中,其中,第一数据为二进制数据,第一数据可以包括多个第一子数据,每个第一子数据可以为第一数据的一个比特位,每个比特位可以为0或者1。
本申请实施例中,每个第一光处理单元仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,一方面,光调制器需要较低的信号驱动电压(一般来说,模拟方案的多电平驱压高达几伏,而本申请实施例中实现2电平电光转换仅需要百毫伏驱压,需要在光调制器之前部署电放大器,而本方案则无需进而不需要额外的器件以及对应的功耗),另一方面,由于输入的第一数据通过多个第一子数据承载在数字信号上, 不需要额外的数模转换器、模数转换器以及对应的功耗。一般而言,数模转换器(digital-to-analog converter,DAC)、电放大器和模数转换器(analog-to-digital converter,ADC)都是毫米尺寸的器件,功耗为瓦级别,且与光器件之间的合封增加了封装复杂度。本申请实施例中所用光器件尺寸为微米级别,功耗为毫瓦级别,因此本方案在功耗、尺寸上都有较大收益。采用成熟的工艺,如硅光130nm工艺,即可实现光器件的单片集成,也极大降低了封装难度。此外,本申请实施例中的多单元的阵列架构也具有精度可扩展的优势,即在增加光处理单元数量后,可实现更高精度的光比特运算处理。
此外,第二子数据本身是承载在电信号中的,通过光开关来实现第一子数据和第二子数据在光域上的乘法运算,相当于利用了光开关实现了电光转换和乘法运算,不需要额外的器件来实现电光转换,进而可以降低所需器件的数量。
应理解,上述对第一数据进行子数据分离得到第一子数据的过程可以通过具备数据分离功能的器件实现,或者是通过预设的数据通路,将第一数据中的各个比特位分别输入到对应的第一光处理单元中,相当于实现了数据的分离。对第二数据进行子数据的分离过程可以参照第一数据,相似之处不再赘述。
在一种可能的实现中,第二光处理单元具体用于:根据电信号,得到第二子数据对应的光信号;根据第一光信号以及第二子数据对应的光信号,执行第一子数据和第二子数据的乘积运算,以输出第二光信号。
应理解,这里并不限定电光转换和乘积运算存在时序的先后,可选的,光开关是同时进行上述两个功能的执行。
在一种可能的实现中,第一光处理单元为两电平驱动的光器件。本申请实施例中,每个第一光处理单元仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,输入的数据承载在数字信号上,不需要额外的数模转换器、模数转换器以及对应的功耗。
在一种可能的实现中,第二光处理单元为两电平驱动的光器件。本申请实施例中,每个第二光处理单元仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,输入的数据承载在数字信号上,不需要额外的数模转换器、模数转换器以及对应的功耗。
在一种可能的实现中,第一光处理单元的驱动电压小于1k毫伏,例如可以为100毫伏、200毫伏、300毫伏、400毫伏、500毫伏、600毫伏、700毫伏、800毫伏、900毫伏等。本申请实施例中,每个第一光处理单元仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,光调制器需要较低的信号驱动电压(也就是不需要在光调制器之前部署放大器,进而不需要额外的器件以及对应的功耗)。
在一种可能的实现中,第二光处理单元的驱动电压小于1k毫伏,例如可以为100毫伏、200毫伏、300毫伏、400毫伏、500毫伏、600毫伏、700毫伏、800毫伏、900毫伏等。本申请实施例中,每个第二光处理单元仅需要能够支持输入2个电平的电信号,并输出2 个电平的光信号,在这种情况下,光调制器需要较低的信号驱动电压(也就是不需要在光调制器之前部署放大器,进而不需要额外的器件以及对应的功耗)。
在一种可能的实现中,在第一数据作为输入数据的情况下,第一数据的信号输入频率较高,因此需要第一光处理单元能够支持较高的信号输入频率(也就是最大信号输入频率,这里的最大信号输入频率可以理解为器件本身的属性),相应的,支持较高信号输入频率的光处理单元所需的成本也较高。在第二数据为系数数据(权重)的情况下,作为系数数据的第二数据的刷新频率较计算频率低,因此第二光处理单元可选择低速器件(相比于第一光处理单元而言)。也就是,第一光处理单元支持的信号输入频率可以大于第二光处理单元支持的信号输入频率,进而可以降低所选择的第二光处理单元的成本。
在一种可能的实现中,第二光处理单元为光开关。需指出的是,本申请实施例中因采用比特运算,乘法单元(第二光处理单元)仅需实现[0]或[1]两种状态,因此可采用光开关;而在现有技术中,系数乘法单元需实现PAM-N电光调制,因此需采用调制器,光开关的功耗和成本小于调制器。
在一种可能的实现中,第一光处理阵列中的各个第一子光处理单元为有序排列的,按照输入的各个第一子数据在第一数据中由高位到低位的顺序分别输入到各个第一子光处理单元中。
在一种可能的实现中,第一光处理单元具体用于根据承载在数字电信号中的第一子数据,输出第一光信号;或者,
第二光处理单元用于根据一个第一光信号以及承载在数字电信号中的第二子数据,输出第二光信号。
在一种可能的实现中,第一光处理阵列包括M个第一光处理单元,第一数据包括M个比特位,M个第一光处理单元用于输出M个第一光信号,且M个第一光信号指示第一数据,M为大于1的整数。
在一种可能的实现中,光计算***还包括:分束器。
在一种可能的实现中,第一光处理单元具体用于输出第一光信号至分束器,分束器用于将第一光信号进行功率划分,得到N个第一光信号,并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,N个第一光信号指示相同的第一子数据,N为大于1的整数。
在一种可能的实现中,分束器可以为1×N功率分束器。
在一种可能的实现中,分束器可以将N个第一光信号输出至第二光处理阵列。第二光处理阵列可以实现第一数据的各个第一子数据和第二数据的各个第二子数据之间的乘积运算。
在一种可能的实现中,第二光处理阵列中的各个第二子光处理单元为有序排列的,按照输入的各个第二子数据在第二数据中由高位到低位的顺序分别输入到各组的各个第二子 光处理单元中。
在一种可能的实现中,第二数据可以包括N个第二子数据,为了实现第一数据的各个第一子数据和第二数据的各个第二子数据之间的乘积运算,每个第一光处理单元具体可以输出第一光信号至分束器,分束器可以将第一光信号进行功率划分,得到N个第一光信号(例如,可以将第一光信号功率等分为N个第一光信号),N个第一光信号指示相同的第一子数据,也就是N个第一光信号中的每个第一光信号都指示相同的第一子数据。
应理解,本申请实施例中将功率划分前后的光信号采用相同的名称命名,并不限定功率划分前后的光信号是完全一致的,而是可以表示功率划分前后的光信号所指示的数据为同一个数据。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每组光处理单元包括N个第二光处理单元,分束器具体用于将N个第一光信号输出至第二光处理阵列中的同一组光处理单元。
在一种可能的实现中,第二光处理阵列可以包括M组第二光处理单元,每组第二光处理单元可以包括N个第二光处理单元,每个第二光处理单元可以得到第二光信号,第二光信号指示第一子数据和第二子数据的乘积结果,第二光处理阵列可以得到M*N个第二光信号,M*N个第二光信号可以指示第一数据的各个第一子数据和第二数据的各个第二子数据之间的乘积运算结果。
在一种可能的实现中,第二光处理阵列具体用于输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
应理解,第二光处理阵列得到部分数据(例如最后一位)可以不经过光全加器而直接输出,进而最为第一数据和第二数据的乘积结果的最后一位。
在一种可能的实现中,光全加器也可以为阵列的形式,具体可以包括多个光全加器,每个光全加器可以对两个对象(或者还可以包括上一个加法运算的进位结果)进行加法运算。
在一种可能的实现中,光全加器可以是非线性加法器。例如可以是光子晶体结构、基于高非线性材料的耦合结构(例如基于非线性的波导加法器)或其他。基于非线性的加法器,可以通过单一结构(不包括子单元的耦合结构)来实现,进而降低了器件的数量和复杂性。
在一种可能的实现中,光全加器用于将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,如果上述介绍的光处理***(包括第一光处理阵列、第二光处 理阵列以及光全加器)是处理仅执行两个数据对象的乘法运算(例如2-bit*2-bit),或者大于两个对象乘法运算的最后一次乘积运算(例如2-bit*2-bit*2-bit,第一次乘法运算可以得到4-bit*2-bit,4-bit*2-bit即为最后一次乘积运算),光全加器的输出(多个第三光信号)可以表示出乘法运算的最终结果,该结果需要被传递至电域。因此,光全加器的输出可以传递至PD,PD可以根据多个第三光信号,生成指示第三数据的电信号(或者描述为对多个第三光信号进行光电转换)。
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:第三光处理阵列,包括多个第三光处理单元,每个第三光处理单元用于根据一个第三光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,如果上述介绍的光处理***(包括第一光处理阵列、第二光处理阵列以及光全加器)是处理大于两个对象乘法运算的非最后一次乘积运算(例如2-bit*2-bit*2-bit,非最后一次乘积运算即为第一次乘法运算:2-bit*2-bit),光全加器的输出(多个第三光信号)可以表示出乘法运算的中间结果,该中间结果需要被传递至下一级的光处理阵列(也就是本申请实施例中的第三光处理阵列),以便第三光处理阵列可以进行下一次的乘积运算。因此,光全加器的输出可以传递至第三光处理阵列。
在一种可能的实现中,第一光处理单元为光调制器或光开关。
在一种可能的实现中,光调制器可以是低驱压电光调制器,例如可以为硅光集成的微环调制器、马赫增德调制器或其他,需指出的是,可选的,光调制器的工作频率与计算频率一致。
在一种可能的实现中,光开关可以是硅光集成的电光马赫增德干涉仪、热光马赫增德干涉仪或其他。
在一种可能的实现中,每个第一光处理单元具体用于将第一数据的第一子数据调制到光源提供的激光载波中,以输出第一光信号。
在一种可能的实现中,***还包括:光源。
应理解,在一种可能的实现中,上述第一数据和第二数据可以为矩阵之间的乘积中两个需要进行乘积的矩阵元素(例如若要进行矩阵A和矩阵B的乘积运算,第一数据可以为矩阵A的一个元素,第二数据可以为矩阵B的一个元素),上述介绍的光处理***(包括第一光处理阵列、第二光处理阵列以及光全加器)可以执行第一数据和第二数据之间的乘积运算,光处理***还可以包括和第一光处理阵列、第二光处理阵列以及光全加器类似的多个子***,每个子***可以实现矩阵之间的乘积中两个需要进行乘积的矩阵元素的乘积运算,每个子***可以得到一个乘积结果(例如包括第一光处理阵列、第二光处理阵列以及光全加器的子***可以得到多个第三光信号),光处理***还可以包括用于对每个子***得到的乘积结果进行加和运算的光全加器,以得到矩阵之间的乘积结果。各个子***的具体描述可以参照上述实施例中关于包括第一光处理阵列、第二光处理阵列以及光全加器的 子***的描述,相似之处不再赘述。
此外,除了光电融合计算,本申请实施例也可用于对光信号的直接计算,例如光纤分布式声学感测(distributed acoustic sensing,DAS)***或激光雷达(Light Detection and Ranging,LiDAR)***。在此应用中,无需第一光处理阵列实现输入数据的电光转换,输入数据直接是光信号,只需要采用第二光处理阵列进行系数乘法和光全加器实现加法对输入数据进行处理即可。
第二方面,本申请提供了一种光计算***,光计算***包括:
第二光处理阵列,包括第二光处理单元,所述第二光处理单元用于根据所述第一光信号以及从第二数据中分离出的第二子数据,执行所述第一子数据和所述第二子数据的乘积运算,输出第二光信号,其中,所述第二子数据为所述第二数据的一个比特位,所述第二数据为二进制数据,所述第二数据承载在电信号中,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,所述第二光处理单元为光开关。
在一种可能的实现中,第二光处理单元具体用于:
根据电信号,得到第二子数据对应的光信号;
根据第一光信号以及第二子数据对应的光信号,执行第一子数据和第二子数据的乘积运算,以输出第二光信号。
在一种可能的实现中,第二光处理单元为两电平驱动的光器件。
在一种可能的实现中,第二光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,每个第二光处理单元用于根据第一光信号以及承载在数字电信号中的第二子数据,输出第二光信号。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每组光处理单元包括N个第二光处理单元,分束器用于将第一光信号进行功率划分,得到N个第一光信号(例如,可以将第一光信号功率等分为N个第一光信号),并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,N个第一光信号指示相同的第一子数据,N为大于1的整数。
在一种可能的实现中,第二光处理阵列具体用于输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
在一种可能的实现中,光全加器用于将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,光全加器为非线性加法器。
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:
第三光处理阵列,包括多个第三光处理单元,每个第三光处理单元用于根据一个第三 光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,第二光处理单元为光开关。
第三方面,本申请提供了一种光计算芯片,芯片包括如第一方面任一描述的***、或者如第二方面任一描述的***、输入接口以及输出接口;
所述输入接口与处理器或者存储器通信连接,用于获取来自所述处理器或者所述存储器发送的第一数据和第二数据;
所述输出接口与所述处理器或者所述存储器通信连接,用于将根据所述第一数据和所述第二数据得到的乘积结果传递至所述处理器或者所述存储器。
第四方面,本申请提供了一种计算设备,包括处理器以及如第三方面描述的光计算芯片;处理器与光计算芯片通信连接。
在一种可能的实现中,计算设备为终端设备或服务器。
第五方面,本申请提供了一种光计算方法,方法应用于光计算***,第一光处理单元,所述第二光处理阵列包括第二光处理单元;所述第二光处理单元为光开关;所述方法包括:
所述第一光处理单元根据从第一数据中分离出的第一子数据,输出第一光信号,其中,所述第一数据为二进制数据,所述第一子数据为所述第一数据的一个比特位;
所述第二光处理单元根据所述第一光信号以及从第二数据中分离出的第二子数据,执行所述第一子数据和所述第二子数据的乘积运算,输出第二光信号,其中,所述第二子数据为所述第二数据的一个比特位,所述第二数据为二进制数据,所述第二数据承载在电信号中,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,所述第二光处理单元为光开关。
在一种可能的实现中,根据从第一数据中分离出的第一子数据,输出第一光信号,包括:
根据第一光信号以及第二子数据对应的光信号,执行第一子数据和第二子数据的乘积运算,以输出第二光信号。
在一种可能的实现中,第一光处理单元为两电平驱动的光器件。
在一种可能的实现中,第二光处理单元为两电平驱动的光器件。
在一种可能的实现中,第一光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,第二光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,所述第一光处理单元的最大信号输入频率大于所述第二光处理单元的最大信号输入频率。
在一种可能的实现中,根据从第一数据中分离出的第一子数据,输出第一光信号,包括:
根据承载在数字电信号中的第一子数据,输出第一光信号;或者,
根据所述第一光信号以及从第二数据中分离出的第二子数据,包括:
根据第一光信号以及承载在数字电信号中的第二子数据,输出第二光信号。
在一种可能的实现中,第一光处理阵列包括M个第一光处理单元,第一数据包括M个比特位,M个第一光处理单元用于输出M个第一光信号,且M个第一光信号指示第一数据,M为大于1的整数。
在一种可能的实现中,光计算***还包括:分束器。
在一种可能的实现中,方法还包括:
每个第一光处理单元输出第一光信号至分束器,分束器用于将第一光信号进行功率划分,得到N个第一光信号,并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,N个第一光信号指示相同的第一子数据,N为大于1的整数。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每个第一光处理单元具体用于输出第一光信号至分束器,分束器用于将第一光信号进行功率划分,得到N个第一光信号,并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,N个第一光信号指示相同的第一子数据,N为大于1的整数。
在一种可能的实现中,方法还包括:第二光处理阵列输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
在一种可能的实现中,方法还包括:光全加器将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:第三光处理阵列,包括多个第三光处理单元;方法还包括:每个第三光处理单元根据一个第三光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,第一光处理单元为光调制器或光开关。
在一种可能的实现中,方法还包括:每个第一光处理单元将第一数据的第一子数据调制到光源提供的激光载波中,以输出第一光信号。
第六方面,本申请提供了一种光计算方法,方法应用于光计算***,光计算***包括:第二光处理阵列,包括多个第二光处理单元;方法包括:
第二光处理单元根据所述第一光信号以及从第二数据中分离出的第二子数据,执行所 述第一子数据和所述第二子数据的乘积运算,输出第二光信号,其中,所述第二子数据为所述第二数据的一个比特位,所述第二数据为二进制数据,所述第二数据承载在电信号中,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,所述第二光处理单元为光开关。
在一种可能的实现中,根据所述第一光信号以及从第二数据中分离出的第二子数据,执行所述第一子数据和所述第二子数据的乘积运算,输出第二光信号,包括:
根据电信号,得到第二子数据对应的光信号;
根据第一光信号以及第二子数据对应的光信号,执行第一子数据和第二子数据的乘积运算,以输出第二光信号。
在一种可能的实现中,第二光处理单元为两电平驱动的光器件。
在一种可能的实现中,第二光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,所述根据所述第一光信号以及从第二数据中分离出的第二子数据,包括:根据第一光信号以及承载在数字电信号中的第二子数据,输出第二光信号。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每组光处理单元包括N个第二光处理单元,所述方法还包括:分束器将第一光信号进行功率划分,得到N个第一光信号(例如,可以将第一光信号功率等分为N个第一光信号),并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,N个第一光信号指示相同的第一子数据,N为大于1的整数。
在一种可能的实现中,所述方法还包括:第二光处理阵列输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
在一种可能的实现中,所述方法还包括:光全加器将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,光全加器为非线性加法器
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:第三光处理阵列,包括第三光处理单元;
所述方法还包括:
每个第三光处理单元根据一个第三光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,第二光处理单元为光开关。
第七方面,本申请还提供了一种计算机程序产品,包括程序代码,程序代码包括的指令被计算机所执行,以实现第五方面以及第五方面的任意一种可能的实现方式中的光计算方法、或者第六方面以及第六方面的任意一种可能的实现方式中的光计算方法。
第八方面,本申请还提供了一种计算机可读存储介质,计算机可读存储介质用于存储 程序代码,程序代码包括的指令被计算机所执行,以实现第五方面以及第五方面的任意一种可能的实现方式中的光计算方法、或者第六方面以及第六方面的任意一种可能的实现方式中的光计算方法。
附图说明
图1为本申请提供了一种架构示意;
图2为本申请提供了一种光处理***的架构示意;
图3为本申请提供了一种光处理***的架构示意;
图4为本申请提供了一种光处理***的架构示意;
图5为本申请提供了一种光处理***的架构示意;
图6为本申请提供了一种乘积运算的示意;
图7为本申请提供了一种光处理***的架构示意;
图8为本申请提供了一种光处理***的架构示意;
图9为本申请提供了一种光处理***的架构示意;
图10为本申请提供了一种效果示意;
图11为本申请提供了一种光处理***的架构示意;
图12为本申请提供了一种乘积运算的示意;
图13为本申请提供了一种效果示意;
图14为本申请提供了一种光处理***的架构示意;
图15为本申请提供了一种光处理装置的架构示意;
图16为本申请提供了一种执行设备的架构示意;
图17为本申请提供了一种服务器的架构示意;
图18为本申请提供了一种芯片的架构示意。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、***、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
应理解,当称一元件或层位于另一元件或层“上(on)”、“连接到(connected to)”或“耦合到(coupled to)”另一元件或层时,所述元件或层可直接位于所述另一元件或层上、直接连接到 或直接耦合到所述另一元件或层,抑或可存在一个或多个中间元件或层。还应理解,当称一元件或层位于两个元件或层“之间(between)”时,所述元件或层可为所述两个元件或层之间的唯一元件或层,或者也可存在一个或多个中间元件或层。
本文中所用用语“基本(substantially)”、“大约(about)”及类似用语用作近似用语、而并非用作程度用语,且旨在考虑到所属领域中的普通技术人员将知的测量值或计算值的固有偏差。此外,在阐述本发明概念的实施例时使用“可(may)”是指“可能的一个或多个实施例”。本文中所用用语“使用(use)”、“正使用(using)”、及“被使用(used)”可被视为分别与用语“利用(utilize)”、“正利用(utilizing)”、及“被利用(utilized)”同义。另外,用语“示例性(exemplary)”旨在指代实例或例示。
人工神经网络(artificial neural network,ANN),简称为神经网络(neural network,NN)或类神经网络,在机器学习和认知科学领域,是一种模仿生物神经网络(动物的中枢神经***,特别是大脑)的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。人工神经网络可以包括卷积神经网络(convolutional neural network,CNN)、深度神经网络(deep neural network,DNN)、多层感知器(multilayer perceptron,MLP)等神经网络。神经网络***的算法复杂,计算量巨大,对数据的计算效率提出了很高的要求。为了提升计算效率,利用光学器件本身的物理特性来完成对应的数学运算过程的光计算被得到了应用。
首先介绍本申请的应用场景:
由于互联网数据量的增大以及AI领域的快速发展,深度学习(deep learning,DL)被广泛应用于图像识别、语音识别、自然语言处理等领域。深度学习是为了模仿人脑构造的一种神经网络,可以达到比传统的浅层学习方式更好的识别效果。由于深度学习算法复杂,计算量巨大,传统的中央处理器在处理大规模运算时效率低下,用于AI加速的硬件研究逐渐成为了研究热点。
相比传统的微电子芯片,光计算在某些应用中的性能有很大提升。
在一种可能的实现中,本申请可以应用于AI相关的计算场景,例如图像相关的AI运算、音频相关的AI运算、视频相关的AI运算、文本相关的AI运算等等。更具体的,可以应用于AI运算中的乘积运算,例如输入数据和系数数据(示例性的,也可以称之为权重)之间的乘积运算。
在一种可能的实现中,本申请可以应用于AI的光电融合计算的***中。
参照图1,图1为本申请实施例的应用***的结构示意,其中,***可以包括电域和光域,电域可以包括用于执行AI运算的处理器(例如CPU、GPU、TPU等)以及存储器(例如RAM),处理器可以进行指令调控,存储器可以用于存储数据,处理器可以将存储器中的数据(例如可以包括输入数据和系数数据)通过电信号的形式加载到光域,由光域中的光处理器***(例如光调制器等)对数据在光信号的维度上执行运算(例如本申请实施例中的乘积运算),将运算结果转换为电信号并回传至电域中。一方面,在光域执行运算的过程中,光子以光速传播,速率比电子高1到2个量级,且光域中的光器件的功耗仅正比于时钟频率(而微电子芯片电器件的功耗正比于时钟频率的三次方)。光学器件的性能不依赖制程。此外,光子作为玻色子,与电子不同,不受限于泡利不相容原理,可在波长、偏振等 多个维度复用,可以进一步提升总计算速率。
为了便于理解,下面先对本申请实施例涉及的相关术语进行介绍。
(1)数模转换器(digital-to-analog converter,DAC)
一般指数字模拟转换器,又称D/A转换器,简称DAC,通常是指一个将数字信号转变为模拟信号的器件。D/A转换器基本上由4个部分组成,即权电阻网络、运算放大器、基准电源和模拟开关。
(2)模数转换器(analog-to-digital converter,ADC)
一般指模拟数字转换器,又称A/D转换器,或简称ADC,通常是指一个将模拟信号转变为数字信号的器件。
(3)光调制器
光调制技术是将携带信息的电信号叠加到载波光波上的一种调制技术。光调制能够使光波的某些参数如振幅、频率、相位、偏振状态和持续时间等按一定的规律发生变化。其中实现光调制的装置称为光调制器。
(4)光开关
光开关是一种具有一个或多个可选的传输端口的光学器件,其作用是对光传输线路或集成光路中的光信号进行物理切换或逻辑操作。
(5)功率分束器
分束器(或者称之为功率分配器)是将一束入射光束(例如,激光光束)分成两束或者更多的可能不具有相同功率光束的光学器件。
(6)精度(precision)
精度可以表示二进制数据所包含的比特数量,例如二进制数据1011的数据精度可以为4。
在现有的实现中,电域部分包括:用于进行指令调控的处理器、用于存储数据的存储器以及驱动电路,驱动电路中的数模转换器DAC用于读取存储器中的数据(示例性的,数据可以为输入数据和系数数据)进行数模转换得到对应的电信号,并分别加载到调制器和系数乘法单元上,对于高速调制而言,DAC输出的驱动电压仅百毫伏量级,无法有效驱动调制器实现电光转换,需在DAC后使用放大器将驱动电压提升至伏量级,驱动电路中的模数转换器ADC可以对光域输出至电域的数据进行模数转换后将输出数据反馈给存储器或处理器。
光域部分可以包括:光源(例如可以为用于产生特定波长的激光的激光器)、调制器、系数乘法单元以及光探测器(photodetector,PD),调制器将输入数据(承载在模拟电信号上)调制至该激光载波上形成光信号输入数据,系数乘法单元将系数数据(承载在模拟电信号上)转换为光信号系数数据,并与光信号输入数据在光域上实现乘法运算,形成光信号输出数据,光探测器可以将该光信号输出数据进行光电转换形成光域输出数据(也就是电信号输出数据)。
现有技术中,输入数据、系数数据和输出数据都是模拟电信号,即在电域上为具有一定电压幅值的多电平信号,例如4电平电信号,因此需要能够调制多电平电信号的调制器,也就是需要提高加载在调制器上的信号驱动电压,会带来额外的功耗与成本。
此外,现有的方案需DAC读取数据并进行数模转换,对于高速调制而言,DAC输出的驱动电压仅百毫伏量级,无法有效驱动调制器实现电光转换,需在DAC后使用放大器将驱压提升至伏量级;现有技术依赖DAC和放大器,功耗大、成本高、器件尺寸大导致集成度低。同时,高精度(precision)DAC也十分依赖超大规模集成(very large scale integration,VLSI)电路的先进制程。
此外,现有的方案也需要ADC进行模数转换,而且对其精度要求比DAC更高。例如,两个4bit信号相乘后,输出的信号为8bit;如还有加法运算,则输出的信号精度更高,因此对于DAC精度的要求为4bit,但对ADC精度的要求为8bit甚至更高。高精度ADC也存在功耗大、成本高、集成度低的问题,也更依赖VLSI电路的先进制程。
综上,现有的方案存在计算精度受限、功耗大、集成度低和先进制程依赖的缺点。
为了解决上述问题,参照图2,图2为本申请实施例提供的一种光计算***的架构示意。
在一种可能的实现中,光计算***可以包括光源201,光源201可以为用于提供激光载波的激光器2011。示例性的,参照图3,输入光源201可采用1个激光器2011提供激光,激光可以传递至1×M功率分束器2012,以得到M个激光(示例性的,M个激光为波长相同的激光);示例性的,参照图4,输入光源201可采用M个激光器2011来提供M个激光(一个激光器2011提供一个激光,示例性的,M个激光为波长相同的激光)。
在一种可能的实现中,光源201提供的激光载波可以为连续波(continuous wave,CW)激光,例如可以为单波长CW激光。可选的,光源201可以是III/V集成的分布式反馈(distributed feed back,DFB)激光器2011或其他;1×M功率分束器2012可以是1×M多模干涉仪,也可以是log2(M)级1×2分束器级联或其他,本申请并不限定。
在一种可能的实现中,光计算***可以包括第一光处理阵列202,第一光处理阵列202可以将来自电域中的电信号所指示的数据加载(示例性的,加载具体可以为调制)到光源201提供的激光载波中。其中,光计算***可以执行乘积运算,乘积运算可以为数据A与数据B的乘积,在AI运算的场景中,数据A可以为输入数据,数据B可以为系数数据(或者称之为权重),电信号所指示的数据可以为上述数据A,也就是说,第一光处理阵列202可以将来自电域中的电信号所指示的数据A加载到光源201提供的激光载波中。
接下来以数据A为第一数据为例进行说明:
在一种可能的实现中,光计算***可以包括第一光处理阵列202,第一光处理阵列202可以将来自电域中的第一数据加载到光源201提供的激光载波中。
和现有的方案中输入到光域中的电信号为指示第一数据的模拟电信号不同的是,本申请实施例中,无需数模转换器,而直接将指示第一数据的数字电信号输入到光域中,其中,第一数据为二进制数据,第一数据可以包括多个第一子数据,每个第一子数据可以为第一数据的一个比特位,每个比特位可以为0或者1。
例如第一数据可以为1101(对应的十进制为13),则第一数据可以包括4个比特位,具体为:“1”、“1”、“0”和“1”。每个比特位的数据(也就是第一子数据)可以输入到第一光处理阵列202中的一个第一光处理单元2021。在现有的实现中,1101以模拟信号的形 式输入到光域中的光调制器,由于1101是精度为4bit的数据,进而光调制器需要支持4bit精度的调制能力,也就是能够支持输入2 4(也就是16)个电平的电信号,并输出2 4(也就是16)个电平的光信号,在这种情况下,一方面,光调制自身所需的功耗较高,另一方面,光调制器需要较高的信号驱动电压(也就是需要在光调制器之前部署放大器,进而需要额外的器件以及对应的功耗)。
本申请实施例中,参照图5,第一光处理阵列202可以包括多个第一光处理单元2021,每个所述第一光处理单元2021用于根据第一数据的第一子数据输出第一光信号,所述第一光信号指示所述第一子数据。也就是说。每个所述第一光处理单元2021可以将第一数据的第一子数据加载到一个光信号中(示例性的,可以称之为对第一子数据进行电光转换),以得到第一光信号。
也就是说,每个第一光处理单元2021仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,一方面,光调制器需要较低的信号驱动电压(一般来说,模拟方案的多电平驱压高达几伏,而本申请实施例中实现2电平电光转换仅需要百毫伏驱压,需要在光调制器之前部署电放大器,而本方案则无需进而不需要额外的器件以及对应的功耗),另一方面,由于输入的第一数据通过多个第一子数据承载在数字信号上,不需要额外的数模转换器、模数转换器以及对应的功耗。一般而言,DAC、放大器和ADC都是毫米尺寸的器件,功耗为瓦级别,且与光器件之间的合封增加了封装复杂度。本申请实施例中所用光器件尺寸为微米级别,功耗为毫瓦、级别,因此本方案在功耗、尺寸上都有较大收益。采用成熟的工艺,如硅光130nm工艺,即可实现光器件的单片集成,也极大降低了封装难度。此外,本申请实施例中的多单元的阵列架构也具有精度可扩展的优势,即在增加光处理单元数量后,可实现更高精度的光比特运算处理。
在一种可能的实现中,参照图5,第一光处理阵列202可以包括M个第一子光处理单元,光源201可以提供M个激光载波,其中每个激光载波可以输入到一个第一光处理单元2021中,每个第一光处理单元2021可以将第一数据中的一个第一子数据加载到对应的激光载波中,以得到第一光信号,进而,第一光处理阵列202可以得到M个第一光信号,M个第一光信号可以指示第一数据。
例如,第一光处理阵列202可以包括4个第一光处理单元(示例性的,可以包括第一光处理单元1、第一光处理单元2、第一光处理单元3以及第一光处理单元4),第一数据可以为1101,则,比特位“1”可以输入到第一光处理单元1、比特位“1”可以输入到第一光处理单元2、比特位“0”可以输入到第一光处理单元3、比特位“1”可以输入到第一光处理单元4。第一光处理单元1可以将比特位“1”加载到输入的光信号中,以得到第一光信号1(指示“1”),第一光处理单元2可以将比特位“1”加载到输入的光信号中,以得到第一光信号2(指示“1”),第一光处理单元3可以将比特位“0”加载到输入的光信号中,以得到第一光信号3(指示“0”),第一光处理单元4可以将比特位“1”加载到输入的光信号中,以得到第一光信号4(指示“1”)。
在一种可能的实现中,第一光处理阵列202中的各个第一子光处理单元为有序排列的,按照输入的各个第一子数据在第一数据中由高位到低位的顺序分别输入到各个第一子光处 理单元中。例如,第一光处理阵列202可以包括第一光处理单元1、第一光处理单元2、第一光处理单元3以及第一光处理单元4,第一光处理单元1被配置为处理输入的数据中最高位的比特位,第一光处理单元2被配置为处理输入的数据中第二高位的比特位,第一光处理单元3被配置为处理输入的数据中第三高位的比特位,第一光处理单元4被配置为处理输入的数据中第四高位的比特位。
在一种可能的实现中,第一光处理单元2021可以为光调制器或者光开关。
在一种可能的实现中,所述第一光处理单元2021为两电平驱动的光器件。本申请实施例中,每个第一光处理单元2021仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,一方面,光调制自身所需的功耗较低,另一方面,不需要DAC进行数模转换以极大节省功耗、尺寸、封装难度等。
在一种可能的实现中,所述第一光处理单元2021的驱动电压小于1k毫伏,例如可以为100毫伏、200毫伏、300毫伏、400毫伏、500毫伏、600毫伏、700毫伏、800毫伏、900毫伏等。本申请实施例中,每个第一光处理单元2021仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,光调制器需要较低的信号驱动电压(也就是不需要在光调制器之前部署放大器,进而不需要额外的器件以及对应的功耗)。
在一种可能的实现中,光调制器可以是低驱压电光调制器,例如可以为硅光集成的微环调制器、马赫增德调制器或其他,需指出的是,可选的,光调制器的工作频率与计算频率一致。
示例性的,为了实现对入射光强度的记录和调制,调制器可以采用基于掺杂硅波导、电吸收调制器、半导体光放大器(semiconductor optical amplifier,SOA)等不同原理的结构来实现。例如,当使用SOA作为调制器时,可以通过检测SOA的入射光电流大小记录入射光强度。并且,也可以通过改变光的透射率来改变经过SOA的光信号强度。实际应用中,SOA可以使用半导体量子阱材料制作。由于不同电压可以使SOA对光的透过率产生不同的变化,因此,可以通过电压控制,使SOA对光的透过率在0-1之间变化。具体的,在通过SOA的电压为反偏电压的状态下,入射光会在SOA中产生光电流,从而可以通过检测光电流的大小可以得到光强的分布。
为了实现二进制数据之间的乘积运算,需要对乘积运算的对象(第一数据和第二数据)的各个比特位之间进行乘积运算(具体为第一数据的各个第一子数据和第二数据的各个第二子数据之间),并对乘积运算的结果之间进行加法运算,例如可以参照6所示。
在一种可能的实现中,承载在电信号中的第二数据可以包括N个第二子数据,为了实现第一数据的各个第一子数据和第二数据的各个第二子数据之间的乘积运算,每个所述第一光处理单元2021具体可以输出第一光信号至分束器204,所述分束器204可以将第一光信号的功率进行划分,以得到N个第一光信号(例如,可以将第一光信号功率等分为N个第一光信号),所述N个第一光信号指示相同的第一子数据,也就是N个第一光信号中的每个第一光信号都指示相同的第一子数据。
其中,第二子数据本身是承载在电信号中的,通过光开关来实现第一子数据和第二子数据在光域上的乘法运算,相当于利用了光开关实现了电光转换和乘法运算,不需要额外 的器件来实现电光转换,进而可以降低所需器件的数量。
示例性的,第一光处理阵列202可以包括4个第一光处理单元(示例性的,可以包括第一光处理单元1、第一光处理单元2、第一光处理单元3以及第一光处理单元4),第一数据可以为1101,第二数据可以为1011,则,比特位“1”可以输入到第一光处理单元1、比特位“1”可以输入到第一光处理单元2、比特位“0”可以输入到第一光处理单元3、比特位“1”可以输入到第一光处理单元4。第一光处理单元1可以将比特位“1”加载到输入的光信号中,以得到第一光信号1(指示“1”),第一光处理单元2可以将比特位“1”加载到输入的光信号中,以得到第一光信号2(指示“1”),第一光处理单元3可以将比特位“0”加载到输入的光信号中,以得到第一光信号3(指示“0”),第一光处理单元4可以将比特位“1”加载到输入的光信号中,以得到第一光信号4(指示“1”)。
分束器204可以包括分束器1、分束器2、分束器3以及分束器4,第一光处理单元1可以将第一光信号1(指示“1”)输入到分束器1中,以得到4个第一光信号1(指示“1”),第一光处理单元2可以将第一光信号2(指示“1”)输入到分束器2中,以得到4个第一光信号2(指示“1”),第一光处理单元3可以将第一光信号3(指示“0”)输入到分束器3中,以得到4个第一光信号3(指示“0”),第一光处理单元4可以将第一光信号4(指示“1”)输入到分束器4中,以得到4个第一光信号4(指示“4”)。
在一种可能的实现中,分束器204可以为1×N功率分束器。
在一种可能的实现中,分束器204可以将所述N个第一光信号输出至所述第二光处理阵列203。第二光处理阵列203可以实现第一数据的各个第一子数据和第二数据的各个第二子数据之间的乘积运算。
在一种可能的实现中,第二光处理阵列203可以包括多个第二光处理单元2031,每个第二光处理单元2031可以实现一组第一子数据和第二子数据之间的乘积运算,例如,第一数据可以包括M个第一子数据,第二数据可以包括N个第二子数据,则总共N*M组第一子数据和第二子数据之间需要进行乘积运算,也就是需要N*M个第二子光处理单元。
在一种可能的实现中,参照图7,所述第二数据包括所述N个比特位,所述第二光处理阵列203包括M组光处理单元,每组所述光处理单元包括所述N个第二光处理单元2031,所述分束器用于将所述N个第一光信号输入至所述第二光处理阵列203中的同一组光处理单元。也就是说,每组光处理单元可以处理同一个第一子数据和各个第二子数据的乘积运算。
在一种可能的实现中,每个所述第二光处理单元2031可以根据一个所述第一光信号以及第二数据中的第二子数据,输出第二光信号,其中,所述第二数据为二进制数据,所述第二子数据为所述第二数据的一个比特位,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果。
也就是说,每个第二光处理单元2031仅需要能够支持输入两个2电平的电信号,并输出两个光信号指示数据的乘积结果(以2电平的光信号表示乘积结果),在这种情况下,一方面,光调制器需要较低的信号驱动电压(一般来说,模拟方案的多电平驱压高达几伏,而本申请实施例中实现2电平电光转换仅需要百毫伏驱压,需要在光调制器之前部署电放 大器,而本方案则无需进而不需要额外的器件以及对应的功耗),另一方面,由于输入的第一数据通过多个第一子数据承载在数字信号上,不需要额外的数模转换器、模数转换器以及对应的功耗。
在一种可能的实现中,第二光处理阵列203可以包括M组第二光处理单元2031,每组第二光处理单元2031可以包括N个第二光处理单元2031,每个第二光处理单元2031可以得到第二光信号,第二光信号指示所述第一子数据和所述第二子数据的乘积结果,第一光处理阵列202可以得到M*N个第二光信号,M*N个第二光信号可以指示第一数据的各个第一子数据和第二数据的各个第二子数据之间的乘积运算结果。
例如,第一光处理阵列202可以包括4个第一光处理单元(示例性的,可以包括第一光处理单元1、第一光处理单元2、第一光处理单元3以及第一光处理单元4),第二光处理阵列203可以包括16个第二光处理单元(示例性的,可以包括第二光处理单元1、第二光处理单元2、第二光处理单元3、第二光处理单元4、第二光处理单元5、第二光处理单元6、第二光处理单元7、第二光处理单元8、第二光处理单元9、第二光处理单元10、第二光处理单元11、第二光处理单元12、第二光处理单元13、第二光处理单元14、第二光处理单元15、第二光处理单元16),第一数据可以为1101,第二数据可以为1011,则,第一光信号1(指示“1”)可以被分成4个第一光信号1(指示“1”),每个第一光信号1(指示“1”)可以输入至一组第二光处理单元中(第二光处理单元1、第二光处理单元2、第二光处理单元3、第二光处理单元4),第二数据中的比特位“1”可以输入到第二光处理单元1、第二数据中的比特位“0”可以输入到第二光处理单元2、第二数据中的比特位“1”可以输入到第二光处理单元3、第二数据中的比特位“1”可以输入到第二光处理单元4,第二光处理单元1可以根据第一光信号1(指示“1”)和第二数据中的比特位“1”,得到第二光信号1(指示“1”),第二光处理单元2可以根据第一光信号2(指示“1”)和第二数据中的比特位“0”,得到第二光信号2(指示“0”),第二光处理单元3可以根据第一光信号3(指示“0”)和第二数据中的比特位“1”,得到第二光信号3(指示“0”),第二光处理单元4可以根据第一光信号4(指示“1”)和第二数据中的比特位“1”,得到第二光信号4(指示“1”),以此类推,可以得到第二光信号5、第二光信号6、第二光信号7、第二光信号8、第二光信号9、第二光信号10、第二光信号11、第二光信号12、第二光信号13、第二光信号14、第二光信号15、第二光信号16。
在一种可能的实现中,可以对第二数据中的各个第二子数据,按照高位到低位分别同时加载到第二光处理单元-1/2……/N-1、1/2……/N-2、……、1/2……/N-N上,即电信号系数数据最高位加载到第二光处理单元-1-1、第二光处理单元-2-1、……、第二光处理单元-N-1上,最低位加载到第二光处理单元-1-N、第二光处理单元-2-N、……、光开光-N-N上。经过上述过程,实现了输入数据(第一数据)与系数数据(第二数据)之间对应每一个位的乘法。
在一种可能的实现中,第二光处理阵列203中的各个第二子光处理单元为有序排列的,按照输入的各个第二子数据在第二数据中由高位到低位的顺序分别输入到各组的各个第二子光处理单元中。
在一种可能的实现中,所述第二光处理单元2031为光开关。需指出的是,本发明方案因采用比特运算,乘法单元仅需实现[0]或[1]两种状态,因此可采用光开关;而在现有技术中,系数乘法单元需实现PAM-N电光调制,因此需采用调制器,光开关的功耗和成本小于PAM-N调制器。
此外,第二子数据本身是承载在电信号中的,通过光开关来实现第一子数据和第二子数据在光域上的乘法运算,相当于利用了光开关实现了电光转换和乘法运算,不需要额外的器件来实现电光转换,进而可以降低所需器件的数量。
在一种可能的实现中,所述第二光处理单元2031为两电平驱动的光器件。本申请实施例中,每个第二光处理单元2031仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,一方面,光调制自身所需的功耗较低,另一方面,不需要DAC进行数模转换以极大节省功耗、尺寸、封装难度等。
在一种可能的实现中,所述第二光处理单元2031的驱动电压小于1k毫伏,例如可以为100毫伏、200毫伏、300毫伏、400毫伏、500毫伏、600毫伏、700毫伏、800毫伏、900毫伏等。本申请实施例中,每个第二光处理单元2031仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,光调制器需要较低的信号驱动电压(也就是不需要在光调制器之前部署放大器,进而不需要额外的器件以及对应的功耗)。
在一种可能的实现中,光开关可以是硅光集成的电光马赫增德干涉仪、热光马赫增德干涉仪或其他。
在一种可能的实现中,在第二数据为系数数据(权重)的情况下,所述第一光处理单元2021支持的信号输入频率可以大于所述第二光处理单元2031支持的信号输入频率。需指出的是,一般情况下,第二数据的刷新频率较计算频率低,因此第二光处理单元2031可选择低速器件(相比于第一光处理单元2021而言),进而可以降低第二光处理单元2031的功耗。
为了实现二进制数据之间的乘积运算,需要对乘积运算的对象(第一数据和第二数据)的各个比特位之间进行乘积运算(具体为第一数据的各个第一子数据和第二数据的各个第二子数据之间),并对乘积运算的结果之间进行加法运算,例如可以参照图6所示。
在一种可能的实现中,所述第二光处理阵列203具体用于输出所述第二光信号至光全加器205,以便所述光全加器205根据多个所述第二光信号,得到多个第三光信号,每个所述第三光信号指示第三数据中的一个第三子数据,所述第三数据为所述第一数据和所述第二数据的乘积结果,所述第三子数据为所述第三数据中的一个比特位。在一种可能的实现中,光全加器205可以对完成乘法运算的每一位进行两两相加和进位,完成光信号输出数据的每一位的输出。
应理解,第二光处理阵列203得到部分数据(例如最后一位)可以不经过光全加器205而直接输出,进而最为第一数据和所述第二数据的乘积结果的最后一位。
在一种可能的实现中,光全加器205也可以为阵列的形式,具体可以包括多个光全加器205,每个光全加器205可以对两个对象(或者还可以包括上一个加法运算的进位结果)进行加法运算。
在一种可能的实现中,光全加器205(例如本申请实施例中的光全加器205可以是光子晶体结构、基于高非线性材料的耦合结构或其他。
在一种可能的实现中,如果上述介绍的光处理***(包括第一光处理阵列202、第二光处理阵列203以及光全加器205)是处理仅执行两个数据对象的乘法运算(例如2-bit*2-bit),或者大于两个对象乘法运算的最后一次乘积运算(例如2-bit*2-bit*2-bit,第一次乘法运算可以得到4-bit*2-bit,4-bit*2-bit即为最后一次乘积运算),光全加器205的输出(多个第三光信号)可以表示出乘法运算的最终结果,该结果需要被传递至电域。因此,光全加器205的输出可以传递至PD,PD可以根据所述多个第三光信号,生成指示所述第三数据的电信号(或者描述为对多个第三光信号进行光电转换)。
在一种可能的实现中,参照图8,PD可以为阵列的形式,PD可以包括多个PD单元,每个PD单元可以对多个第三光信号的一个光信号进行光电转换,以生成第三数据的一个第三子数据的电信号(也就是第三数据的电信号)。
在一种可能的实现中,PD可以输出第三数据的电信号至电域(例如存储器或者处理器)。
在一种可能的实现中,PD可以是锗和硅集成的波导型光电探测器或其他,本申请实施例并不限定。
在一种可能的实现中,如果上述介绍的光处理***(包括第一光处理阵列202、第二光处理阵列203以及光全加器205)是处理大于两个对象乘法运算的非最后一次乘积运算(例如2-bit*2-bit*2-bit,非最后一次乘积运算即为第一次乘法运算:2-bit*2-bit),光全加器205的输出(多个第三光信号)可以表示出乘法运算的中间结果,该中间结果需要被传递至下一级的光处理阵列(也就是本申请实施例中的第三光处理阵列),以便第三光处理阵列可以进行下一次的乘积运算。因此,光全加器205的输出可以传递至第三光处理阵列。
在一种可能的实现中,所述第三数据为二进制数据,每个所述第三光信号指示所述第三数据的一个比特位;第三光处理阵列,包括多个第三光处理单元,每个所述第三光处理单元用于根据一个所述第三光信号以及第四数据中的第四子数据,输出第四光信号,所述第四光信号指示所述一个所述第三光信号指示的比特位和所述第四子数据的乘积结果。
关于第三光处理阵列的具体描述,可以参照上述实施例中关于第三光处理阵列的描述,相似之处不再赘述。
应理解,在一种可能的实现中,上述第一数据和第二数据可以为矩阵之间的乘积中两个需要进行乘积的矩阵元素(例如若要进行矩阵A和矩阵B的乘积运算,第一数据可以为矩阵A的一个元素,第二数据可以为矩阵B的一个元素),上述介绍的光处理***(包括第一光处理阵列202、第二光处理阵列203以及光全加器205)可以执行第一数据和第二数据之间的乘积运算,光处理***还可以包括和第一光处理阵列202、第二光处理阵列203以及光全加器205类似的多个子***,每个子***可以实现矩阵之间的乘积中两个需要进行乘积的矩阵元素的乘积运算,每个子***可以得到一个乘积结果(例如包括第一光处理阵列202、第二光处理阵列203以及光全加器205的子***可以得到多个第三光信号),光处理***还可以包括用于对每个子***得到的乘积结果进行加和运算的光全加器205,以得到矩阵之间的乘积结果。各个子***的具体描述可以参照上述实施例中关于包括第一光 处理阵列202、第二光处理阵列203以及光全加器205的子***的描述,相似之处不再赘述。
接下来结合三个具体实例分别介绍仅包括两个数据对象(元素)之间进行的乘积运算、仅包括三个数据对象(元素)之间进行的乘积运算以及矩阵之间进行的乘积运算:
一、仅包括两个数据对象(元素)之间进行的乘积运算:
参照图9,图9中所举例子为13×11=143,对应原理如图6所示。激光器2011的单波长CW激光功率均分为4输入至4个调制器上。输入数据为13,即[1101],从高位至低位分别加载在4个调制器上,即最高位[1]加载在调制器-1,第二位[1]加载在调制器-2,第三位[0]加载在调制器-3,最低位[1]加载在调制器-4。经过4个电光调制器调制后,光信号分别从高位到低位也为[1]、[1]、[0]、[1],且每个光信号分别功率均分并加载在4个光开关上。系数数据为11,即[1011],则每组4个光开关分别加载系数的高位到低位,即最高位[1]加载在光开关-1、5、9、13,第二位[0]加载在光开关-2、6、10、14,第三位[1]加载在光开关-3、7、11、15,最低位[1]加载在光开关-4、8、12、16。上述结构实现了4-bit输入数据的每一位与4-bit系数数据的每一位相乘,乘法结果以光信号强度的形式在每个光开关后输出。完成乘法后的对应位结果需进行光域加法计算。例如,光开关-16输出的是输入数据的最低位与系数数据的最低位相乘结果,直接通过PD进行光电转换后作为输出数据的最低位。而光开关-15输出的是输入数据的最低位与系数数据的第三位相乘结果,光开关-12输出的是输入数据的第三位与系数数据的最低位相乘结果,两者作为对应位,通过光全加器-12进行加法,相加结果通过PD进行光电转换后作为输出数据的第七位,进位结果输出至下一层光全加器。同理,光开关-14输出的是输入数据的最低位与系数数据的第二位相乘结果,光开关-11输出的是输入数据的第三位与系数数据的第三位相乘结果,光开关-8输出的是输入数据的第二位与系数数据的最低位相乘结果,三者作为对应位需进行相加,且需与之前光全加器-12的进位进行相加。因光全加器为两个输入与一个进位进行相加,此时需要两个全加器实现上述加法功能,即光全加器-11完成光开关-14、11的结果和光全加器-12进位的加法,加法结果输出至光全加器-10,实现其与光开关-8的结果的加法,通过PD进行光电调制后作为输出数据的第六位。光全加器-11和光全加器-10的进位均输出至下一层光全加器。以此类推,各乘法运算后的对应位通过光全加器阵列实现加法后,共输出8个结果,分别经PD进行光电转换后输出。最终输出的数据为[10001111],即143。
在上述实例中,N=4,因此该结构共有4个调制器,16个光开关,12个光全加器和8个PD。
上述实例实现了4-bit全光数字计算,通过数字输入、数字运算的方式实现了数字输出,无需4-bit DAC、RF放大器或8-bitADC,所采用的光器件都是低驱压、低功耗器件。相对于现有技术,即光电融合模拟计算方案,最突出的效果是实现了4-bit计算,这是现有方案无法实现的。此外,也具有低功耗、高集成度、对先进制程无依赖性的有益效果。
虽然本实例所用器件数目多于现有技术,但整体功耗远是现有技术的约1/10,具体分析如表一所示,此处功耗对比按照50Gb/s计算速率估算。具体评估方法如下:目前暂无4-bit高速DAC和RF驱动器,此处功耗按照4通道2-bit高速DAC和RF驱动器的两倍估算;输入数 据读取的功耗按照50Gb/s的计算速率估算,系数数据读取的功耗按照1Gb/s的系数刷新速率估算;输入调制器功耗按照硅光微环调制器动态功耗30fJ/bit和静态功耗10mW总和估算,其中静态功耗为稳定调制器工作波长至激光器2011波长处所需的热调功耗;系数调制器按照硅光微环调制器静态功耗10mW估算;低速光开关按照硅光马赫增德光开关动态功耗5mW估算;PD按照锗-硅波导型光电探测器动态功耗50fJ/bit估算;8-bit ADC功耗按照65nm工艺3-bit ADC功耗的两倍估算;本发明方案中的1×N功率分束器和光全加器为无源器件,无功耗,因此未在表格中列出。在对比两个方案的功耗时,相同类型器件的功耗估算相同,且对仅在现有技术中使用的器件(例如DAC和RF放大器、ADC等)的功耗估算更偏保守。
表一
Figure PCTCN2022095978-appb-000001
根据表一可知,现有方案中功耗占比最大的是ADC、DAC和RF放大器,如下图10的(a)所示,现有方案总功耗为7665mW,其中4-bit DAC与RF放大器功耗占比~44.2%,8-bit ADC功耗占比~52.2%。本发明技术方案采用比特运算处理,无需模数转换、数模转换或任何电信号放大器,因此功耗约10倍收益。如下图10的(b)所示,本方案中功耗占比最大的是输入数据的读取功耗,约占比~56.2%,激光器2011的功耗占比~21%,而实现比特运算的光器件所占功耗仅~20%。
如上所描述的,DAC、RF放大器和ADC都是毫米尺寸的器件,精度越高越依赖VLSI先进制程;而本方案中所用光器件尺寸为微米级别,采用成熟的硅光130nm工艺即可实现单片集成。因此,本发明方案在尺寸和光电合封集成度也有较大收益,同时不依赖先进制程。
二、仅包括三个数据对象(元素)之间进行的乘积运算
参照图11,图11中所举例子为3×3×3×3=81,对应原理如图12所示。激光器2011的单波长CW激光功率均分为2输入至2个调制器上。输入数据为3,即[11],从高位至低位分别加载在2个调制器上,即最高位[1]加载在调制器-1,最低位[1]加载在调制器-2。经过2个电光调制器调制后,光信号分别从高位到低位也为[1]和[1]。每个光信号分别功率均为加载在2个光开关上。第一层的系数数据为3,即[11],则每组2个光开关分别加载系数的高位到低位,即最高位[1]加载在光开关-1-1和光开关-2-1上,最低位[1]加载在光开关-1-2和光开关-2-2上。上述结构实现了2-bit输入数据的每一位与第一层2-bit系数数据的每一位相乘,乘法结果以光信号强度的形式在每个光开关后输出。对应图12所示原理,完成乘法后的对应位结 果需进行光域加法计算。光开光-2-2实现的是输入数据最低位与第一层系数最低位相乘结果,直接作为第一层输出的最低位输出至第二层乘法计算。光开光-2-1实现的是输入数据最低位与第一层系数最高位相乘结果,光开光-1-2实现的是输入数据最高位与第一层系数最低位相乘结果,两者作为对应位通过光全加器-1相加后,结果作为第一层输出数据第三位输出至第二层乘法计算,而进位结果输出至光全加器-2。光开光-1-1实现的是输入数据最高位与第一层系数最高位相乘结果,与光全加器-1的进位通过光全加器-2相加后,结果和进位分别作为第一层输出数据的第二位和最高位输出至第二层乘法计算。第一层输出结果,即第二层输入数据,是4-bit,每比特数据分别功率均分为2加载在2个光开关上。同第一层的乘法实现,第二层的系数数据为3,即[11],则每组2个光开关分别加载系数的高位到低位。上述结构实现了第二层4-bit输入数据的每一位与第二层2-bit系数数据的每一位相乘,乘法结果以光信号强度的形式在每个光开关后输出。对应图十所示原理,完成乘法后的对应位结果需进行光域加法计算,原理与第一层相同。第二层输出结果,即第三层输入数据,是6-bit,每比特数据分别功率均分为2加载在2个光开关上。第三层的系数数据为3,即[11],则每组2个光开关分别加载系数的高位到低位。相同的,光开关实现光域乘法后,光全加器阵列实现光域加法,第三层输出为8-bit。经过8个PD光电转换后,最终输出结果为[01010001],即81。
本实例实现的是光域三层2-bit运算处理。因输入为2-bit,该结构共有2个调制器;而系数为2-bit且有三层,因此需24个光开关,12个光全加器和8个PD。
本实例实现了2-bit三层全光数字计算,同第一个实例,具有低功耗、高集成度、对先进制程无依赖性的有益效果。
本实例与现有技术的功耗对比具体分析如下表二,具体评估方法基本与第一个实例相同,不同之处在于2-bit高速DAC和RF驱动器按照实际评估。
表二
Figure PCTCN2022095978-appb-000002
如下图13所示,本实例的功耗为现有方案功耗的11.4%。因实现的是2-bit计算,现有技术的输入驱动,即2-bit DAC和RF放大器,功耗占比较小,~9%。而输出则需要8-bit ADC,占据了总功耗的~85%。本示例技术方案不需要高精度ADC,在使用更多光器件的情况下,依然节省了大部分功耗,其中激光器2011占比~28%,输入数据占比~37.2%。
三、矩阵之间进行的乘积运算
参照图14。激光器2011输入1分4至每个4-bit调制器组,每组包括4个调制器,实现对4-bit输入数据的电光调制。之后接入4-bit光开关组,每组包括4个低速光开关。需指出的是,图中光开关组简化画出了4组,而实际上,输入数据a 11需完成与b 11和b 12的乘法,输入数据a 12需完成与b 21和b 22的乘法……即每个输入数据需接两组4-bit光开关乘法单元,因此共需要8组光开关,即32个光开关。每组完成光域乘法后,由光全加器组实现加法,原理同第一个实例。根据图四的矩阵乘法原理,c 11=a 11×b 11+a 12×b 21,则需额外的一组光全加器完成该加法;c 12、c 21和c 22同理。每个输出为9-bit,因此需4组PD阵列,每组9个PD,完成光电转换后,输出最终运算结果。
本实例实现了光域4-bit 4×4矩阵计算处理器。同第一个实例,相对于现有的光电融合模拟计算方案,本实例最突出的效果是实现了4-bit计算,这是现有方案无法实现的。此外,也具有低功耗、高集成度、对先进制程无依赖性的有益效果。
此外,除了光电融合计算,本申请实施例也可用于对光信号的直接计算,例如光纤分布式声学感测(distributed fiber acoustic sensing,DAS)***或激光雷达(Light Detection and Ranging,LiDAR)***。在此应用中,无需第一光处理阵列202实现输入数据的电光转换,输入数据直接是光信号,只需要采用第二光处理阵列203进行系数乘法和光全加器实现加法对输入数据进行处理即可。
本申请实施例提供了一种光计算***,所述光计算***包括:第一光处理阵列202,包括多个第一光处理单元2021,每个所述第一光处理单元2021用于根据第一数据的第一子数据输出第一光信号,所述第一光信号指示所述第一子数据,其中,所述第一数据为二进制数据,所述第一子数据为所述第一数据的一个比特位;第二光处理阵列203,包括多个第二光处理单元2031,每个所述第二光处理单元2031用于根据一个所述第一光信号以及第二数据中的第二子数据,输出第二光信号,其中,所述第二数据为二进制数据,所述第二子数据为所述第二数据的一个比特位,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果。本申请中,每个光处理单元仅需要能够支持输入2个电平的电信号,并输出2个电平的光信号,在这种情况下,一方面,光调制器需要较低的信号驱动电压(一般来说,模拟方案的多电平驱压高达几伏,而本申请实施例中实现2电平电光转换仅需要百毫伏驱压,需要在光调制器之前部署电放大器,而本方案则无需进而不需要额外的器件以及对应的功耗),另一方面,由于输入的第一数据通过多个第一子数据承载在数字信号上,不需要额外的数模转换器、模数转换器以及对应的功耗。一般而言,DAC、放大器和ADC都是毫米尺寸的器件,功耗为瓦级别,且与光器件之间的合封增加了封装复杂度。本申请实施例中所用光器件尺寸为微米级别,功耗为毫瓦级别,因此本方案在功耗、尺寸上都有较大收益。采用成熟的工艺,如硅光130nm工艺,即可实现光器件的单片集成,也极大降低了封装难度。此外,本申请实施例中的多单元的阵列架构也具有精度可扩展的优势,即在增加光处理单元数量后,可实现更高精度的光比特运算处理。
本申请还提供了一种光计算***,所述光计算***包括:
第二光处理阵列203,包括多个第二光处理单元2031,每个所述第二光处理单元2031用于根据一个第一光信号以及承载在电信号中的第二子数据,输出第二光信号,其中,第 一光信号指示第一子数据,第一数据为二进制数据,所述第一子数据为所述第一数据的一个比特位,所述第二数据为二进制数据,所述第二子数据为所述第二数据的一个比特位,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,第二光处理单元2031为光开关。
在一种可能的实现中,每个所述第二光处理单元2031具体用于:
根据一个所述第一光信号以及所述第二子数据对应的光信号,执行所述第一子数据和所述第二子数据的乘积运算,以输出所述第二光信号。
在一种可能的实现中,所述第二光处理单元2031为两电平驱动的光器件。
在一种可能的实现中,所述第二光处理单元2031的驱动电压小于1k毫伏。
在一种可能的实现中,每个所述第二光处理单元2031用于根据一个所述第一光信号以及承载在数字信号中的所述第二子数据,输出第二光信号。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每组光处理单元包括N个第二光处理单元,分束器用于将第一光信号进行功率划分,得到N个第一光信号(例如,可以将第一光信号功率等分为N个第一光信号),并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,所述N个第一光信号指示相同的第一子数据,所述N为大于1的整数。
在一种可能的实现中,第二光处理阵列具体用于输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
在一种可能的实现中,光全加器用于将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,所述光全加器为非线性加法器。
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:
第三光处理阵列,包括多个第三光处理单元,每个第三光处理单元用于根据一个第三光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,第二光处理单元为光开关。
本申请实施例还提供了一种光计算方法,所述方法应用于光计算***,所述光计算***包括第一光处理阵列202和第二光处理阵列203,所述第一光处理阵列202包括多个第一光处理单元2021,所述第二光处理阵列203包括多个第二光处理单元2031;所述方法包括:
每个所述第一光处理单元根据第一数据的第一子数据输出第一光信号,所述第一光信号指示所述第一子数据,其中,所述第一数据为二进制数据,所述第一子数据为所述第一数据的一个比特位;
每个所述第二光处理单元根据一个所述第一光信号以及承载在电信号中的第二子数据,输出第二光信号,其中,所述第二数据为二进制数据,所述第二子数据为所述第二数据的一个比特位,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,所述第二光处理单元为光开关。
在一种可能的实现中,根据一个所述第一光信号以及承载在电信号中的第二子数据,输出第二光信号,包括:
根据一个所述第一光信号以及所述第二子数据对应的光信号,执行所述第一子数据和所述第二子数据的乘积运算,以输出所述第二光信号。
在一种可能的实现中,第一光处理单元为两电平驱动的光器件。
在一种可能的实现中,第二光处理单元为两电平驱动的光器件。
在一种可能的实现中,第一光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,第二光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,第一光处理单元支持的信号输入频率大于第二光处理单元支持的信号输入频率。
在一种可能的实现中,根据第一数据的第一子数据输出第一光信号,包括:
根据承载在数字电信号中的第一子数据,输出第一光信号;或者,
根据一个所述第一光信号以及承载在电信号中的第二子数据,包括:
每个第二光处理单元用于根据一个第一光信号以及承载在数字电信号中的第二子数据,输出第二光信号。
在一种可能的实现中,第一光处理阵列包括M个第一光处理单元,第一数据包括M个比特位,M个第一光处理单元用于输出M个第一光信号,且M个第一光信号指示第一数据,M为大于1的整数。
在一种可能的实现中,光计算***还包括:分束器。
在一种可能的实现中,方法还包括:
每个第一光处理单元输出第一光信号至分束器,分束器用于将所述第一光信号进行功率划分,得到N个第一光信号,并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,所述N个第一光信号指示相同的第一子数据,所述N为大于1的整数。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每个所述第一光处理单元具体用于输出第一光信号至分束器,所述分束器用于将所述第一光信号进行功率划分,得到N个第一光信号,并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,所述N个第一光信号指示相同的第一子数据,所述N为大于1的整数。
在一种可能的实现中,方法还包括:第二光处理阵列输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
一种可能的实现中,方法还包括:光全加器将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:第三光处理阵列,包括多个第三光处理单元;方法还包括:每个第三光处理单元根据一个第三光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,第一光处理单元为光调制器或光开关。
在一种可能的实现中,方法还包括:每个第一光处理单元将第一数据的第一子数据调制到光源提供的激光载波中,以输出第一光信号。
本申请实施例还提供了一种光计算方法,方法应用于光计算***,光计算***包括:第二光处理阵列,包括多个第二光处理单元;方法包括:
根据一个第一光信号以及承载在电信号中的第二子数据,输出第二光信号,其中,第一光信号指示第一子数据,第一数据为二进制数据,第一子数据为第一数据的一个比特位,第二数据为二进制数据,第二子数据为第二数据的一个比特位,第二光信号指示第一子数据和第二子数据的乘积结果,所述第二光处理单元为光开关。
在一种可能的实现中,每个所述第二光处理单元具体用于:
根据一个所述第一光信号以及所述第二子数据对应的光信号,执行所述第一子数据和所述第二子数据的乘积运算,以输出所述第二光信号。
在一种可能的实现中,第二光处理单元为两电平驱动的光器件。
在一种可能的实现中,第二光处理单元的驱动电压小于1k毫伏。
在一种可能的实现中,每个第二光处理单元用于根据一个第一光信号以及承载在数字电信号中的第二子数据,输出第二光信号。
在一种可能的实现中,第二数据包括N个比特位,第二光处理阵列包括多组光处理单元,每组光处理单元包括N个第二光处理单元,分束器用于将第一光信号进行功率划分,得到N个第一光信号(例如,可以将第一光信号功率等分为N个第一光信号),并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,所述N个第一光信号指示相同的第一子数据,所述N为大于1的整数。
在一种可能的实现中,第二光处理阵列具体用于输出第二光信号至光全加器,以便光全加器根据多个第二光信号,得到多个第三光信号,每个第三光信号指示第三数据中的一个第三子数据,第三数据为第一数据和第二数据的乘积结果,第三子数据为第三数据中的一个比特位。
在一种可能的实现中,光计算***还包括:光全加器。
在一种可能的实现中,光全加器用于将多个第三光信号输入至光探测器PD,以便光探测器PD根据多个第三光信号,生成指示第三数据的电信号。
在一种可能的实现中,所述光全加器为非线性加法器。
在一种可能的实现中,第三数据为二进制数据,每个第三光信号指示第三数据的一个比特位;光计算***还包括:
第三光处理阵列,包括多个第三光处理单元,每个第三光处理单元用于根据一个第三光信号以及第四数据中的第四子数据,输出第四光信号,第四光信号指示一个第三光信号指示的比特位和第四子数据的乘积结果。
在一种可能的实现中,第二光处理单元为光开关。
图15是本申请实施例提供的一种光计算装置的示意图,光计算装置1500包括处理器1501以及芯片1502,其中,该芯片1502可以包括上述实施例中所描述的光处理***;
该处理器1501用于将乘法运算的对象(例如第一数据以及第二数据),传递至芯片1502,并接收该芯片1502发送的该第一数据以及第二数据的乘积计算结果。
此时该芯片1502可以与处理器1501以***芯片(system on chip,SoC)的方式集成在基板上,光计算器与处理器1501可以实现近邻高速通信的优势,同时发挥处理器1501擅长逻辑运算和该芯片1502高并行、光速执行的优势。
此外,本申请实施例中的芯片可以置于计算设备中,计算设备可以执行AI运算,从产品形态上来说,计算设备可以为终端设备或者服务器,从功能上来说,计算设备可以为神经网络的执行设备或者神经网络的训练设备。
接下来介绍本申请实施例提供的一种执行设备,请参阅图16,图16为本申请实施例提供的执行设备的一种结构示意图,执行设备1600具体可以表现为手机、平板、笔记本电脑、智能穿戴设备等,此处不做限定。具体的,执行设备1600包括:接收器1601、发射器1602、处理器1603、存储器1604(其中执行设备1600中的处理器1603的数量可以一个或多个,图16中以一个处理器为例)以及芯片1502,其中,处理器1603可以包括应用处理器16031和通信处理器16032。在本申请的一些实施例中,接收器1601、发射器1602、处理器1603、存储器1604以及芯片1502可通过总线或其它方式连接。
存储器1604可以包括只读存储器和随机存取存储器,并向处理器1603提供指令和数据。存储器1604的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1604存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1603控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线***耦合在一起,其中总线***除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线***。
处理器1603可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1603中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1603可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位 于存储器1604,处理器1603读取存储器1604中的信息,结合其硬件完成方法的步骤。
接收器1601可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1602可用于输出数字或字符信息;发射器1602还可用于向磁盘组发送指令,以修改磁盘组中的数据。
芯片1502可以和存储器1604以及处理器1603之间交互,例如芯片1502可以从存储器1604或者处理器1603获取进行乘积运算的对象(例如第一数据和第二数据),并对乘积运算的对象执行乘积运算,将运算结果返回存储器1604或者处理器1603。
本申请实施例还提供了一种服务器,请参阅图17,图17是本申请实施例提供的服务器一种结构示意图,具体的,服务器1700由一个或多个服务器实现,服务器1700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以***处理器(central processing units,CPU)1717(例如,一个或一个以上处理器)和存储器1732,一个或一个以上存储应用程序1742或数据1744的存储介质1730(例如一个或一个以上海量存储设备)。其中,存储器1732和存储介质1730可以是短暂存储或持久存储。存储在存储介质1730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1717可以设置为与存储介质1730通信,在服务器1700上执行存储介质1730中的一系列指令操作。
服务器1700还可以包括一个或一个以上电源1717,一个或一个以上有线或无线网络接口1750,一个或一个以上输入输出接口1758;或,一个或一个以上操作***1741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
芯片1502可以和存储器1732以及中央处理器1717之间交互,例如芯片1502可以从存储器1732或者中央处理器1717获取进行乘积运算的对象(例如第一数据和第二数据),并对乘积运算的对象执行乘积运算,将运算结果返回存储器1732或者中央处理器1717。
本申请实施例中还提供一种包括计算机可读指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
具体的,请参阅图18,图18为本申请实施例提供的***的一种结构示意图,其中,***可以包括神经网络处理器NPU 1800,NPU 1800作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1803,通过控制器1804控制运算电路1803提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1803内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1803是二维脉动阵列。运算电路1803还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1803是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1802 中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1801中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1808中。
统一存储器1806用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1805,DMAC被搬运到权重存储器1802中。输入数据也通过DMAC被搬运到统一存储器1806中。
BIU为Bus Interface Unit即,总线接口单元1810,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1809的交互。
总线接口单元1810(Bus Interface Unit,简称BIU),用于取指存储器1809从外部存储器获取指令,还用于存储单元访问控制器1805从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1806或将权重数据搬运到权重存储器1802中或将输入数据数据搬运到输入存储器1801中。
向量计算单元1807包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元1807能将经处理的输出的向量存储到统一存储器1806。例如,向量计算单元1807可以将线性函数;或,非线性函数应用到运算电路1803的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1807生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1803的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1804连接的取指存储器(instruction fetch buffer)1809,用于存储控制器1804使用的指令;
统一存储器1806,输入存储器1801,权重存储器1802以及取指存储器1809均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。
芯片1502可以和存储器以及主CPU之间交互,例如芯片1502可以从存储器或者中央主CPU获取进行乘积运算的对象(例如第一数据和第二数据),并对乘积运算的对象执行乘积运算,将运算结果返回存储器或者主CPU。
可以理解的是,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个模块或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另外,上述实施例所讨论的模块相互之间的连接可以是电性、机械或其他形式。所述作为分离部件说明的模块可以是物理上分开的,也可以不是物理上分开的。作为模块显示的部件可以是物理模块 或者也可以不是物理模块。另外,在申请实施例各个实施例中的各功能模块可以独立存在,也可以集成在一个处理模块中。
本发明实施例还提供一种数据处理的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行前述任意一个方法实施例所述的方法流程。本领域普通技术人员可以理解,前述的存储介质包括:U盘、移动硬盘、磁碟、光盘、随机存储器(random-access memory,RAM)、固态硬盘(solid state disk,SSD)或者非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚的了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。

Claims (20)

  1. 一种光计算***,其特征在于,所述光计算***包括:
    第一光处理阵列,包括第一光处理单元,所述第一光处理单元用于根据从第一数据中分离出的第一子数据,输出第一光信号,其中,所述第一数据为二进制数据,所述第一子数据为所述第一数据的一个比特位;
    第二光处理阵列,包括第二光处理单元,所述第二光处理单元用于根据所述第一光信号以及从第二数据中分离出的第二子数据,执行所述第一子数据和所述第二子数据的乘积运算,输出第二光信号,其中,所述第二子数据为所述第二数据的一个比特位,所述第二数据为二进制数据,所述第二数据承载在电信号中,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,所述第二光处理单元为光开关。
  2. 根据权利要求1所述的***,其特征在于,所述第二光处理单元具体用于:
    根据所述电信号,得到所述第二子数据对应的光信号;
    根据所述第一光信号以及所述第二子数据对应的光信号,执行所述第一子数据和所述第二子数据的乘积运算,以输出所述第二光信号。
  3. 根据权利要求1或2所述的***,其特征在于,所述第一光处理单元的最大信号输入频率大于所述第二光处理单元的最大信号输入频率。
  4. 根据权利要求1至3任一所述的***,其特征在于,所述光计算***还包括:光全加器;所述第二光处理阵列包括所述第二光处理单元在内的多个光处理单元;
    所述第二光处理阵列具体用于输出所述第二光信号至所述光全加器;
    所述光全加器用于根据所述多个光处理单元输出的多个所述第二光信号,得到多个第三光信号,每个所述第三光信号指示第三数据中的一个第三子数据,所述第三数据为所述第一数据和所述第二数据的乘积结果,所述第三数据为二进制数据,所述第三子数据为所述第三数据中的一个比特位。
  5. 根据权利要求4所述的***,其特征在于,所述光全加器为非线性加法器。
  6. 根据权利要求4或5所述的***,其特征在于,所述光全加器用于将所述多个第三光信号输入至光探测器PD,以便所述光探测器PD根据所述多个第三光信号,生成指示所述第三数据的电信号。
  7. 根据权利要求1至6任一所述的***,其特征在于,所述第一光处理阵列包括M个第一光处理单元,所述第一数据包括M个比特位,所述M个第一光处理单元用于输出M个所述第一光信号,且M个第一光信号指示所述第一数据,所述M为大于1的整数。
  8. 根据权利要求1至7任一所述的***,其特征在于,所述***还包括:分束器;
    每个所述第一光处理单元具体用于输出第一光信号至分束器,所述分束器用于将所述第一光信号进行功率划分,得到N个第一光信号,并将N个第一光信号中的每个第一光信号输出至一个第二光处理单元,所述N个第一光信号指示相同的第一子数据,所述N为大于1的整数。
  9. 根据权利要求7或8所述的***,其特征在于,所述第二数据包括N个比特位,所述第二光处理阵列包括多组光处理单元,每组所述光处理单元包括N个第二光处理单元,所述分束器具体用于将所述N个第一光信号输出至所述第二光处理阵列中的同一组光处理单元。
  10. 根据权利要求4、5、以及7至9任一所述的***,其特征在于,所述光计算***还包括:
    第三光处理阵列,包括多个第三光处理单元,每个所述第三光处理单元用于根据一个所述第三光信号以及第四数据中的第四子数据,输出第四光信号,所述第四光信号指示所述一个所述第三光信号指示的比特位和所述第四子数据的乘积结果。
  11. 根据权利要求1至10任一所述的***,其特征在于,所述第一光处理单元为光调制器或光开关。
  12. 根据权利要求1至11任一所述的***,其特征在于,所述第一光处理单元具体用于将从所述第一数据分离出的所述第一子数据调制到光源提供的载波激光中,以输出所述第一光信号。
  13. 根据权利要求1至12任一所述的***,其特征在于,所述***还包括:
    所述光源。
  14. 根据权利要求1至13任一所述的***,其特征在于,所述第一光处理单元为两电平驱动的光器件。
  15. 根据权利要求1至14任一所述的***,其特征在于,所述第一光处理单元的驱动电压小于1k毫伏。
  16. 根据权利要求1至15任一所述的***,其特征在于,所述第二光处理单元的驱动电压小于1k毫伏。
  17. 一种光计算芯片,其特征在于,所述芯片包括如权利要求1-16中任一项权利要求 所述的***、输入接口以及输出接口;
    所述输入接口与处理器或者存储器通信连接,用于获取来自所述处理器或者所述存储器发送的第一数据和第二数据;
    所述输出接口与所述处理器或者所述存储器通信连接,用于将根据所述第一数据和所述第二数据得到的乘积结果传递至所述处理器或者所述存储器。
  18. 一种计算设备,其特征在于,包括处理器以及如权利要求17所述的光计算芯片;
    所述处理器与所述光计算芯片通信连接。
  19. 根据权利要求18所述的计算设备,其特征在于,所述计算设备为终端设备或服务器。
  20. 一种光计算方法,其特征在于,所述方法应用于光计算***,所述光计算***包括第一光处理阵列和第二光处理阵列,所述第一光处理阵列包括第一光处理单元,所述第二光处理阵列包括第二光处理单元;所述第二光处理单元为光开关;所述方法包括:
    所述第一光处理单元根据从第一数据中分离出的第一子数据,输出第一光信号,其中,所述第一数据为二进制数据,所述第一子数据为所述第一数据的一个比特位;
    所述第二光处理单元根据所述第一光信号以及从第二数据中分离出的第二子数据,执行所述第一子数据和所述第二子数据的乘积运算,输出第二光信号,其中,所述第二子数据为所述第二数据的一个比特位,所述第二数据为二进制数据,所述第二数据承载在电信号中,所述第二光信号指示所述第一子数据和所述第二子数据的乘积结果,所述第二光处理单元为光开关。
PCT/CN2022/095978 2022-05-30 2022-05-30 一种光计算***以及芯片 WO2023230764A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/095978 WO2023230764A1 (zh) 2022-05-30 2022-05-30 一种光计算***以及芯片

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/095978 WO2023230764A1 (zh) 2022-05-30 2022-05-30 一种光计算***以及芯片

Publications (1)

Publication Number Publication Date
WO2023230764A1 true WO2023230764A1 (zh) 2023-12-07

Family

ID=89026485

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095978 WO2023230764A1 (zh) 2022-05-30 2022-05-30 一种光计算***以及芯片

Country Status (1)

Country Link
WO (1) WO2023230764A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630178A (zh) * 2008-07-16 2010-01-20 中国科学院半导体研究所 一种硅基集成化的光学向量-矩阵乘法器
WO2021021787A1 (en) * 2019-07-29 2021-02-04 Lightmatter, Inc. Systems and methods for analog computing using a linear photonic processor
CN113312023A (zh) * 2021-06-29 2021-08-27 上海交通大学 光电混合乘法器
CN113568470A (zh) * 2020-04-29 2021-10-29 光子智能股份有限公司 光电处理设备、***和方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630178A (zh) * 2008-07-16 2010-01-20 中国科学院半导体研究所 一种硅基集成化的光学向量-矩阵乘法器
WO2021021787A1 (en) * 2019-07-29 2021-02-04 Lightmatter, Inc. Systems and methods for analog computing using a linear photonic processor
CN113568470A (zh) * 2020-04-29 2021-10-29 光子智能股份有限公司 光电处理设备、***和方法
CN113312023A (zh) * 2021-06-29 2021-08-27 上海交通大学 光电混合乘法器

Similar Documents

Publication Publication Date Title
TWI735886B (zh) 計算系統
Hamerly et al. Large-scale optical neural networks based on photoelectric multiplication
US11507818B2 (en) Optoelectronic computing systems
CN109639359B (zh) 基于微环谐振器的光子神经网络卷积层芯片
Sunny et al. A survey on silicon photonics for deep learning
US11238336B2 (en) Optical convolutional neural network accelerator
Bai et al. Photonic multiplexing techniques for neuromorphic computing
TWI741533B (zh) 計算系統、計算裝置及計算系統的操作方法
TWI777108B (zh) 計算系統、計算裝置及計算系統的操作方法
US12025862B2 (en) Optical modulation for optoelectronic processing
US11556312B2 (en) Photonic in-memory co-processor for convolutional operations
TWI806042B (zh) 光電處理設備、系統及方法
TW202147060A (zh) 光電計算系統
CN113570051A (zh) 光电处理***
De Marinis et al. A codesigned integrated photonic electronic neuron
CN113592069A (zh) 一种四输入逻辑运算的光子神经网络
WO2023230764A1 (zh) 一种光计算***以及芯片
Offrein et al. Prospects for photonic implementations of neuromorphic devices and systems
US20220156469A1 (en) Parallelization and pipelining strategies for an efficient analog neural network accelerator
Bai et al. Quantized photonic neural network modeling method based on microring modulators
Pei et al. Joint device architecture algorithm codesign of the photonic neural processing unit
De Marinis et al. A photonic accelerator for feature map generation in convolutional neural networks
Afifi Silicon Photonic Hardware Accelerators for Transformers and Graph Neural Networks
Shastri et al. Silicon Photonics for Machine Learning: Training and Inference
Salmani et al. Integrated photonic-electronic platform for real-time analog data processing in LiDARs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944106

Country of ref document: EP

Kind code of ref document: A1