Disclosure of Invention
It is an object of the present invention to solve at least the above problems and to provide at least the advantages to be described later.
The invention also aims to provide a memory-computing integrated processing architecture and a software optimization method, which realize that the edge equipment ensures the efficient operation of the binary neural network by depending on an accelerator module based on STT-MRAM, thereby effectively avoiding the problem of high data transmission cost caused by the fact that the edge equipment wirelessly transmits a large amount of data to a server with higher performance for processing in the prior art.
To achieve the above object and some other objects, the present invention adopts the following technical solutions:
A memory-accounting integrated processing architecture comprising:
The system comprises an energy collection and management module, a CPU module and a calculation integrated module, wherein the output end of the energy collection and management module is electrically connected with the input ends of the CPU module and the calculation integrated module, and the CPU module is in bidirectional electrical connection with the calculation integrated module.
The STT-MRAM is arranged in the storage and calculation integrated module, an energy collector and an energy management unit are arranged in the energy collection and management module, the energy management unit comprises an energy storage capacitor and a DC/DC converter, the output end of the energy storage capacitor is connected with the input end of the DC/DC converter, the output end of the energy collector is electrically connected with the input end of the energy management unit, the energy collector comprises a photovoltaic solar panel, a wind power generation module, a wireless radio frequency charging module, a kinetic energy generation module and a thermal energy generation module, the output ends of the photovoltaic solar panel, the wind power generation module, the wireless radio frequency charging module, the kinetic energy generation module and the thermal energy generation module are electrically connected with the input end of the energy management unit, the STT-MRAM arrays are distributed in the storage and calculation integrated module in a crossing mode, and 1T1MTJ units are arranged in each STT-MRAM;
the self-powered system-oriented integrated memory and calculation processing architecture comprises a software optimization module, wherein the software optimization module comprises the following two parts:
Offline modeling section: the power consumption and the delay of binary convolution calculation are completed by offline analysis of various logic combinations, a decision table is obtained according to analysis results and energy levels, the offline decision table is obtained, wherein the offline decision table comprises an optimal execution logic combination, power consumption and delay corresponding to each energy level, so that the execution decision is provided by fluctuation of energy of online simulation, and a neural network adopted by offline modeling is a two-layer convolution neural network: a LeNet network;
On-line simulation part: acquiring an offline decision table and an energy trace table as input of online simulation, wherein the energy trace table is used for simulating an unstable self-powered scene, the offline decision table is used as a basis for adapting to energy change, and the simulation is oriented to an execution process of a binary neural network in-memory processing architecture in a self-powered system;
in the offline modeling section, the establishment of the offline decision table includes the steps of:
step one: acquiring an energy trace, dividing energy levels according to the characteristics of the energy trace, and determining the energy level interval and the number of energy levels;
Step two: according to the obtained power consumption, delay and divided energy levels, obtaining logic combinations adapted to different energy levels, further obtaining an offline decision table, and establishing the offline decision table to provide input for online simulation;
in the online simulation part, the judging method of the logic combination comprises the following steps:
Step one: firstly, traversing an energy trace table, judging the energy level of the current energy, selecting a proper logic combination according to an offline decision table, and executing binary neural network convolution calculation so as to adapt to the energy fluctuation problem of a self-powered scene;
step two: the energy level is low, and the scheme with low power consumption of the logic combination is selected to be executed, so that the energy is not wasted when the energy is low; the energy level is higher, and a scheme with lower logic combination delay can be selected for execution; when the energy is higher, the parallel execution can be selected, so that the energy can be utilized as much as possible;
step three: and (3) traversing the trace table to obtain the energy efficiency and throughput rate of the adopted architecture.
The application method is based on the self-powered system oriented storage and calculation integrated processing architecture of claim 1, and comprises the following steps:
Step 1: based on STT-MRAM, a reconfigurable in-memory processing architecture oriented to self-power supply scene is designed to support the efficient operation of a binary neural network;
Step 2: the binary convolutional calculation can be realized by different logic combinations, and the binary neural network convolutional calculation is mapped to a hardware platform;
Step 3: the method is optimized by adaptive software to adapt to the fluctuation of energy so as to utilize the energy as much as possible.
Preferably, in the step 2, the mapping manner of the binary neural network and the hardware platform includes the following steps:
Step 2.1: according to the reconfigurability of the hardware architecture, the multiplication and addition operation of the binary neural network is completed by adopting XOR, XNOR and AND or NOT combination logic;
step 2.2: obtaining different mapping modes of binary neural network calculation mapping to a hardware architecture;
step 2.3: different system power consumption is corresponding to the adopted mapping mode, and a basis is provided for offline modeling of the self-adaptive software optimization method.
The invention at least comprises the following beneficial effects:
1. According to the invention, through arranging the energy collection and management module, the CPU module and the calculation integrated module, the inside of the calculation integrated module is distributed with the STT-MRAM through the array in a cross manner, each 1T1MTJ unit is provided with the 1T1MTJ unit in the array, each 1T1MTJ unit supports AND, OR, NOT and XOR logic, different logics can be realized by utilizing a plurality of 1T1MTJ units, the reconfigurability of the self-powered embedded cloud device can provide hardware support for applicable energy fluctuation, thereby assisting the CPU to process the data, the function of directly connecting with edge equipment locally, the required energy consumption is low, the provided in-memory processing architecture and the adaptive software optimization method can enable the binary neural network to efficiently operate in the architecture, the non-volatile STT-MRAM is adopted to ensure that the equipment is powered off, the data cannot be lost, the problem of energy fluctuation is fully utilized, the self-powered embedded cloud device can effectively complete intelligent reasoning programs locally at the edge end, the self-powered embedded cloud device is not relied on, the pressure of network transmission is reduced, the main edge device can transmit large amount of data to a certain computer, the data is more required to be transmitted wirelessly, and the problem of data is solved, and the problem of data transmission is solved, but more than the existing edge device has a certain wireless data transmission requirement is solved.
2. The energy collection and management module supplies energy to the storage and calculation integrated module and the CPU, the energy collection and management module comprises an energy collector and an energy management unit, wherein the collector comprises a photovoltaic solar panel, a wind power generation module, a wireless radio frequency charging module, a kinetic energy generation module and a thermal energy generation module, the wind energy, solar energy, radio frequency energy, kinetic energy, thermal energy and the like can be converted into electric energy, the electric energy is supplied to the storage and calculation integrated module and the CPU module for use, the self-energy supply effect is achieved, the energy is saved, the environment is protected, the complicated battery maintenance procedure is avoided, the whole system is widely applied, and the use requirements of various edge-end devices such as intelligent bracelets, wearable devices, wild animal detection and exploration tools can be met.
Detailed Description
The present invention is described in detail below with reference to the drawings so as to enable one of ordinary skill in the art to practice the same after having read the specification.
As shown in fig. 1-7, a memory integrated processing architecture, comprising: the system comprises an energy collection and management module 1, a CPU module 2 and an integrated storage and calculation module 3, wherein the output end of the energy collection and management module 1 is electrically connected with the input ends of the CPU module 2 and the integrated storage and calculation module 3, and the CPU module 2 is electrically connected with the integrated storage and calculation module 3 in a bidirectional way; the STT-MRAM array 12 is disposed inside the storage and calculation integrated module 3, the STT-MRAM arrays 12 are distributed inside the storage and calculation integrated module 3, each STT-MRAM array 12 is provided with a 1T1MTJ unit inside, the energy harvester 4 includes a photovoltaic solar panel 5, a wind power generation module 6, a wireless radio frequency charging module 7, a kinetic energy generation module 8 and a thermal energy generation module 9, and the output ends of the photovoltaic solar panel 5, the wind power generation module 6, the wireless radio frequency charging module 7, the kinetic energy generation module 8 and the thermal energy generation module 9 are electrically connected with the input ends of the energy management unit, the energy harvesting and management module 1 is internally provided with an energy harvester 4 and an energy management unit, the energy management unit includes an energy storage capacitor 10 and a DC/DC converter 11, and the output end of the energy storage capacitor 10 is connected with the input end of the DC/DC converter 11, the output end of the energy harvester 4 is electrically connected with the input end of the energy management unit, the self-powered integrated self-energy-supply-oriented process includes an optimizing software module, and the software module includes two software modules:
Offline modeling section: the power consumption and the delay of the binary convolution calculation are completed by offline analysis of various logic combinations, a decision table is obtained according to analysis results and energy levels, and the offline decision table is obtained, wherein the offline decision table comprises an optimal execution logic combination, power consumption and delay corresponding to each energy level, so that the fluctuation of the energy simulated on line can provide an execution decision;
On-line simulation part: obtaining an offline decision table and an energy trace table as input of online simulation, wherein the energy trace table is used for simulating an unstable self-powered scene, the offline decision table is used as a basis for adapting to energy change, the simulation is oriented to the execution process of a binary neural network in-memory processing architecture in a self-powered system,
In the offline modeling section, the establishment of the offline decision table includes the steps of:
step one: acquiring an energy trace, dividing energy levels according to the characteristics of the energy trace, and determining the energy level interval and the number of energy levels;
Step two: and obtaining logic combinations adapted to different energy levels according to the obtained power consumption, delay and divided energy levels, further obtaining an offline decision table, and establishing the offline decision table to provide input for online simulation.
In the scheme, the collector in the energy collection and management module is a photovoltaic solar panel, a wind power generation module, a wireless radio frequency charging module, a kinetic energy generation module and a thermal energy generation module, when the energy collection and management module is applied to edge equipment, environmental energy can be converted into electric energy and stored in an energy storage capacitor of the energy management unit, the electric energy is converted by a DC/DC converter and then is supplied to the CPU module and the storage integrated module, so that the effect of self energy supply is achieved, the self energy supply system is adopted to supply power, the energy collection and management system has the advantages of green and economical performance, no need of replacement and maintenance of battery charging, in the architecture, the CPU module is used as a main general control module, the storage integrated module is adopted in the aspect of storage, the energy collection and management module is a reconfigurable binary neural network accelerator module based on STT-MRAM and is in bidirectional electrical connection with the CPU module, the STT-MRAM is distributed in the accelerator module in an array cross way, each STT-MRAM is internally provided with 1T1MTJ units, the units of each 1T1MTJ of the array support AND, OR, NOT and XOR logic, different logics can be realized by utilizing a plurality of 1T1MTJ units, so that the reconfigurability of the STT-MRAM can provide hardware support for energy fluctuation, thereby achieving the function of replacing computer processing data, being directly connected with edge equipment in a local way, having low energy consumption and no delay condition of network transmission, the energy collector can transmit the acquired electric energy to the energy management unit, store the electric energy to the energy storage electric energy, supply the power requirements of the CPU module and the storage integrated module under the conversion function of the DC/DC converter, the self-powered system is adopted, the energy is saved, the environment is protected, the complicated battery maintenance procedure is avoided, the model of the buck-boost DC/DC converter is LTC3129, the set number of the device depends on how many environmental energy sources need to be controlled for conversion so as to ensure the stability of power supply, the device is provided with an accurate RUN pin threshold and a maximum power point control function, the device is used for providing voltage stabilizing communication function, the device can ensure that an energy collector absorbs maximum power, an energy collecting and managing module is environmental energy conversion equipment which can convert wind energy, solar energy, radio frequency energy, kinetic energy, heat energy and the like into electric energy and supply the electric energy to a calculation integrated module and a CPU module for use, the unit of each 1T1MTJ of the array supports AND logic, OR logic, NOT logic and XOR logic, different logics can be realized by utilizing a plurality of 1T1MTJ units, the reconfigurability of the device is suitable for providing hardware support for energy fluctuation, data processing is realized so as to replace a computer to be directly connected with edge equipment, and an offline modeling part firstly needs to acquire the power required by executing various logic operations of 1T1MTJ units in an STT-MRAM array and delay required by completing the logic operation, and the logic operation is required by the 1T1MTJ unit executing logic, AND logic, NOT not or NOT logic and XOR logic P and xor、Pand、Por P logic P and the STT logic P and xor、Pand、Por respectively; the delays of finishing exclusive-or logic, AND logic, OR logic and NOT logic are T xor、Tand、Tor and T not respectively, then, the adopted environment energy is selected and divided into energy levels, the adopted environment energy is a family WiFi signal as shown in figure 6, and an offline decision table is designed according to the power required by executing logic operation, the delay of finishing the logic and the energy level division of the adopted energy, and the power and delay required by various exclusive-or combination logics are analyzed offline; the on-line simulation part uses four energy sampling periods to illustrate the on-line simulation process, as shown in fig. 7, by the decision table generated by off-line modeling, and the four energy sampling powers are respectively: 50 μW, 820 μW, 360 μW and 550 μW, the neural network used for offline modeling is a two-layer convolutional neural network: the LeNet network is a two-layer network, the first layer convolution kernel is 6x5x5x1, the second layer convolution kernel is 16x5x5x6, after the LeNet network is binarized, 150 exclusive OR operations are needed to be executed for one time of calculation of the first layer, 2400 exclusive OR operations are needed to be executed for one time of calculation of the second layer, according to a sampling trace diagram of the environment energy of the home WiFi signal, which is known in FIG. 7, the acquired power range is 0-1000 mu W, the energy is divided into 4 energy levels, the energy level 1 is 0-200 mu W, the energy level 2 is 200-400 mu W, the energy level 3 is 400-600 mu W, the energy level 4 is more than 600 mu W, according to three logic mapping modes of the binary neural network and the hardware platform, the first logic is adopted to execute the first layer convolution required power to be 150P xor, the calculation delay is T xor, the second layer convolution required power is 2400P xor, and the calculation delay is T xor; the maximum power required by the second logic for executing the first layer convolution is 150P and, the completion calculation delay is T and, the power required by the second layer convolution is 2400P and, and the completion calculation delay is T and; the maximum power required by the third logic for executing the first layer convolution is 150P or, the completion calculation delay is T or, the power required by the second layer convolution is 2400P or, the completion calculation delay is T or, and the judging method of the logic combination in the online simulation part comprises the following steps:
Step one: firstly, traversing an energy trace table, judging the energy level of the current energy, selecting a proper logic combination according to an offline decision table, and executing binary neural network convolution calculation so as to adapt to the energy fluctuation problem of a self-powered scene;
step two: the energy level is low, and the scheme with low power consumption of the logic combination is selected to be executed, so that the energy is not wasted when the energy is low; the energy level is higher, and a scheme with lower logic combination delay can be selected for execution; when the energy is higher, the parallel execution can be selected, so that the energy can be utilized as much as possible;
step three: the trace table is traversed and completed, the energy efficiency and throughput rate of the adopted architecture are calculated, when a first energy sampling period is entered, the last convolution operation of which layer is completed is firstly obtained, then the sampling power is obtained to be 50 mu W, the energy level is judged to belong to the energy level 1, and then the energy level cannot be continuously executed according to a decision table, and backup data is carried out; when a second energy sampling period is entered, firstly, obtaining which layer of convolution operation is completed last time, judging that the first layer of convolution operation should be executed, then obtaining sampling power as 820 mu W, judging that the energy level belongs to an energy level 4, then according to a decision table of the first layer of convolution operation, continuously executing the energy level, selecting corresponding logic to execute the first layer of convolution operation of the LeNet network, when the time after the first layer of convolution operation is executed does not exceed the sampling period, executing the second layer of convolution operation according to the decision table of the second layer of convolution operation and the combination logic corresponding to the energy level, and repeating the process until the next sampling period is entered; when a third energy sampling period is entered, firstly, obtaining which layer of convolution operation is completed last time, judging that the second layer of convolution operation should be executed, then obtaining the sampling power to be 360 mu W, judging that the energy level belongs to the energy level 2, then selecting corresponding logic to execute the second layer of convolution operation of the LeNet network according to a decision table of the second layer of convolution operation, executing the first layer of convolution operation according to the decision table of the first layer of convolution operation and the combination logic corresponding to the energy level when the time after the second layer of convolution operation is executed does not exceed the sampling period, and repeating the process until the next sampling period is entered; when a fourth energy sampling period is entered, firstly, which layer of convolution operation is completed last time is obtained, it is judged that the first layer of convolution operation should be executed, then the sampling power is obtained to be 550 mu W, it is judged that the energy level belongs to the energy level 3, then according to a decision table of the first layer of convolution operation, the energy level can be continuously executed, the first layer of convolution operation of the LeNet network is selected to be executed by corresponding logic, when the time after the first layer of convolution operation is executed does not exceed the sampling period, the second layer of convolution operation is executed according to a decision table of the second layer of convolution operation and the combination logic corresponding to the energy level, and the process is repeated until the next sampling period is entered.
An application method is based on the self-powered system oriented memory-accounting integrated processing architecture of claim 1, the application method comprising the steps of:
Step 1: based on STT-MRAM, a reconfigurable in-memory processing architecture oriented to self-power supply scene is designed to support the efficient operation of a binary neural network;
Step 2: the binary convolutional calculation can be realized by different logic combinations, and the binary neural network convolutional calculation is mapped to a hardware platform;
Step 3: the method is optimized by adaptive software to adapt to the fluctuation of energy so as to utilize the energy as much as possible.
In the above scheme, firstly, a memory computing architecture is adopted to realize efficient binary neural network computing, and for an accelerator module, a spin transfer torque-magnetic random access memory STT-MRAM based memory processing platform is adopted, as shown in fig. 4, the platform realizes the principle of and, or, non-and exclusive-or logic, each 1T1MTJ cell can execute a single logic operation, wherein the and, or, non-and exclusive-or logic can realize conversion between different logics through a control signal C, the STT-MRAM array architecture of the hardware platform is formed by a plurality of 1T1MTJ cells, the array has reconfigurability, and the plurality of 1T1MTJ cells can be combined to realize more complex logic operation through configuring the array.
In a preferred embodiment, in the step 2, the mapping manner between the binary neural network and the hardware platform includes the following steps:
Step 2.1: according to the reconfigurability of the hardware architecture, the multiplication and addition operation of the binary neural network is completed by adopting XOR, XNOR and AND or NOT combination logic;
step 2.2: obtaining different mapping modes of binary neural network calculation mapping to a hardware architecture;
step 2.3: different system power consumption is corresponding to the adopted mapping mode, and a basis is provided for offline modeling of the self-adaptive software optimization method.
In the above scheme, since the convolution calculation of the binary neural network can be implemented by exclusive or, each 1T1MTJ cell of the accelerator module used supports exclusive or, and or and not logic, there are multiple mapping modes of the convolution calculation of the binary neural network, as shown in fig. 5, three mapping modes are implemented, the first mapping mode directly maps the calculation to the first column, and each cell of the column supports exclusive or logic; the second type is exclusive-or nor, and columns 2 to 6 are combinational logic for realizing exclusive-or logic; the third is to exclusive-or nor, and columns N-7 to N are combinational logic to implement exclusive-or logic.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown, it is well suited to various fields of use, and further modifications may be readily apparent to those skilled in the art, without departing from the general concepts defined by the claims and the equivalents thereof, and therefore the invention is not limited to the specific details and illustrations shown and described herein.