CN111860778A

CN111860778A - Full-additive convolution method and device

Info

Publication number: CN111860778A
Application number: CN202010653816.XA
Authority: CN
Inventors: 王红伟; 吴臻志
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-30

Abstract

The invention discloses a full convolution method and a device, which determine each target weight address for accumulation operation according to pulse data of an input image; and performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a characteristic vector of the input image, wherein the pulse data of the input image is pulse data of 0 and 1. The invention utilizes the pulse data binarization characteristic of the input image, adopts accumulation to replace multiply-add, reduces the calculated amount and the calculating time, reduces the chip power consumption and reduces the chip area.

Description

Full-additive convolution method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a full-additive convolution method and device.

Background

In the existing artificial intelligence chip, in the addition operation process of the calculation neural network, the input neuron activation value needs to meet the high-precision requirement, and a multiplier and an adder are used in the calculation process, so that the consumption of calculation resources and storage resources is caused, and the power consumption and the area of the artificial intelligence chip are increased.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide a full-additive convolution method and apparatus, which utilize the binary characteristics of the pulse data of the input image, and use accumulation instead of multiply-add, thereby reducing the calculation amount and time, reducing the power consumption of the chip, and reducing the chip area.

The invention provides a full-additive convolution method, which comprises the following steps:

determining each target weight address for accumulation operation according to pulse data of an input image; and performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a characteristic vector of the input image, wherein the pulse data of the input image is pulse data of 0 and 1.

As a further improvement of the present invention, determining each target weight address for performing an accumulation operation based on pulse data of the input image comprises:

generating a weight address corresponding to each pulse data according to the pulse data of the input image;

and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.

As a further improvement of the present invention, determining each target weight address for performing an accumulation operation based on pulse data of an input image comprises:

generating each weight address corresponding to the pulse data with the value of 1 according to the pulse data of the input image;

and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.

and determining the weight address corresponding to each pulse data as the target weight address.

As a further improvement of the present invention, the step of performing an accumulation operation on the weights corresponding to the target weight addresses to obtain a full convolution value includes:

traversing pulse data of the input image;

when the pulse data is 0, determining 0 as the weight to be accumulated;

when the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;

and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.

traversing pulse data of the input image;

when the pulse data is 0, the accumulation operation is not executed;

As a further improvement of the present invention, the method further comprises: and determining the membrane potential information of the t time step according to the characteristic vector of the input image and the membrane potential information of the t-1 time step.

As a further improvement of the present invention, the determining of the membrane potential information at t time step based on the feature vector of the input image and the membrane potential information at t-1 time step comprises:

adding the characteristic vector of the input image and the membrane potential information of the t-1 time step according to elements to obtain the membrane potential information of the t time step.

The invention also provides a full-additive convolution device, which comprises:

the address generation module is used for determining each target weight address for accumulation operation according to the pulse data of the input image;

the input data buffer module is used for inputting pulse data of the input image;

the full convolution module is used for performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a feature vector of the input image;

Wherein the pulse data of the input image are pulse data of 0 and 1.

As a further improvement of the present invention, the address generating module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.

As a further improvement of the present invention, the address generating module is configured to generate each weight address corresponding to pulse data with a value of 1 according to pulse data of the input image; and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.

As a further improvement of the present invention, the address generating module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and determining the weight address corresponding to each pulse data as the target weight address.

As a further improvement of the present invention, the apparatus further comprises:

a judging module for judging the pulse data of the input image and determining the pulse data with the value of 1,

and the address generation module is used for generating each weight address corresponding to the pulse data with the numerical value of 1.

As a further improvement of the present invention, the address generation module is further configured to determine pulse data of the input image, and determine pulse data with a value of 1.

As a further improvement of the present invention, the full-adding convolution module comprises a multiplexer and an accumulator;

traversing pulse data of the input image;

when the pulse data is 0, the multiplexer determines 0 as the weight to be accumulated and outputs the weight;

when the pulse data is 1, the multiplexer determines the weight corresponding to a reference weight address as the weight to be accumulated and outputs the weight, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;

and the accumulator performs accumulation operation on each weight to be accumulated output by the multiplexer to obtain a full convolution value.

As a further improvement of the present invention, the full convolution module is an enable accumulator;

traversing pulse data of the input image;

when the pulse data is 0, enabling the enable accumulator to be 0, and not performing accumulation operation;

when the pulse data is 1, enabling the enable accumulator to be 1, and determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;

And the enabling accumulator performs accumulation operation on each weight to be accumulated to obtain a full convolution value.

The present invention also provides an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, and wherein the one or more computer instructions are executed by the processor to implement the full-additive convolution method.

The invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the full-additive convolution method.

The invention has the beneficial effects that: the binary characteristic of pulse data of an input image is utilized, and accumulation is adopted to replace multiply-add, so that the calculation amount and the calculation time are reduced, the power consumption of a chip is reduced, and the area of the chip is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.

Fig. 1 is a schematic flow chart of a full convolution method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a diagram of a prior art multiply-add convolution device;

FIG. 3 is a schematic diagram of a full convolution device according to an exemplary embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a full convolution device according to yet another exemplary embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a zero skip operation according to an exemplary embodiment of the disclosure;

FIG. 6 is a calculation example of a zero jump operation according to an exemplary embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a pulsed full-convolution layer according to an exemplary embodiment of the present disclosure;

fig. 8 is a schematic diagram of a burst full-add network according to an exemplary embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the disclosed embodiment, the directional indications are only used to explain the relative position relationship between the components, the motion situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, in the description of the present disclosure, the terms used are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The terms "comprises" and/or "comprising" are used to specify the presence of elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present disclosure will be more readily understood by those of ordinary skill in the art. The drawings are used for the purpose of illustrating embodiments of the disclosure only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated in the present disclosure may be employed without departing from the principles of the present disclosure.

In the full convolution method according to the embodiment of the present disclosure, as shown in fig. 1, each target weight address for performing an accumulation operation is determined according to pulse data of an input image;

performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a feature vector of the input image;

wherein the pulse data of the input image are pulse data of 0 and 1.

As shown IN fig. 2, IN the prior art, IN the artificial intelligence chip based on the impulse neural network, during the calculation process, a multiplier and an adder are used, an address generation module (AGU) generates each weight address, an input data Buffer module (IN-Buffer) generates input data corresponding to each weight address, a multiplier performs corresponding element multiplication on an input X and a weight W, and the output result is added to the result generated by the multiplier of the previous address, so as to finally obtain a convolution value. However, the input of the impulse neural network is a binary input (0 indicates no impulse, and 1 indicates impulse), that is, the data output by the IN-Buffer has only two values of 0 and 1, so that the multiplication operation is a redundant operation IN the convolution process, which increases the computational resources, increases the power consumption of the chip, and increases the area of the chip.

The method utilizes the pulse data binarization characteristic of an input image and adopts the accumulation operation only with a full addition function to replace the addition operation of a multiplication and addition function in the prior art. The pulse data of the input image is binary input, and by using the characteristic, each target weight address needing to be subjected to accumulation operation is determined according to the pulse data of the input image, the accumulation operation is performed on the weight corresponding to each target weight address, and in the same convolution task, the time required by calculation by using the accumulation operation is greatly reduced compared with the traditional multiplication and addition operation, so that half of the time can be saved, and the power consumption of a chip is reduced. For example, the addition of the fp32 precision requires that the energy consumption is 3.7+ 0.9-4.6 (pJ), while the addition of the fp32 precision only requires that the energy consumption is 0.9(pJ), and thus the power consumption of the convolution process can be reduced significantly by adopting accumulation instead of the addition of the multiplication. After the multiplier is removed, the chip can save the calculation space, the chip area is greatly reduced, and the improvement of the integration level of the chip is facilitated.

The method disclosed by the disclosure can be used for a pulse neural network, a fusion network of the pulse neural network and an artificial neural network, and the like, and the method disclosed by the disclosure is not particularly limited to the used network.

In an alternative embodiment, determining each target weight address for performing an accumulation operation according to the pulse data of the input image includes:

In an alternative embodiment, determining each target weight address for performing an accumulation operation based on pulse data of an input image comprises:

The method disclosed by the present disclosure may, for example, generate the weight addresses corresponding to all the pulse data, and determine the weight address corresponding to the pulse data with a value of 1 as each target weight address for performing the accumulation operation. For example, it is also possible to directly generate weight addresses corresponding to all pulse data having a value of 1, and determine these weight addresses as the respective target weight addresses for performing the accumulation operation. For example, weight addresses corresponding to all pulse data may be generated, and the weight addresses may be determined as target weight addresses for performing the accumulation operation.

In an optional implementation manner, performing an accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value includes:

traversing pulse data of the input image;

when the pulse data is 0, determining 0 as the weight to be accumulated;

For example, let X be the pulse data of the input image and W be the weight of the convolution, the pulse data of the input image is traversed, for each pulse data X in X_iComprises the following steps:

if the pulse data X _i0 is determined as the weight to be accumulated, 0 is output,s +0, and executing i +1, and entering next pulse data;

if the pulse data X_iWhen the value is 1, then W is added_iDetermining the weight to be accumulated, and adding W_iPut into an accumulator, S ═ S + W_iAnd executing i to i +1, and entering next pulse data;

the output full convolution value S is used as a characteristic vector of an input image, and a new binary input is generated after passing through a kinetic equation of the impulse neural network and can be used as the input of the next layer in the impulse neural network.

traversing pulse data of the input image;

when the pulse data is 0, the accumulation operation is not executed;

if the pulse data X_iWhen 0, the corresponding weight W_iThe accumulation operation is not executed, i is directly executed as i +1, and the next pulse data is entered;

if the pulse data X_iWhen the weight is 1, the corresponding weight W is obtained_iDetermining the weight to be accumulated, and adding W_iPut into an enable accumulator, S ═ S + W_iAnd executing i to i +1, and entering next pulse data;

This embodiment may be understood as using a zero skip operation in the calculation process, that is, when the pulse data of the input image is a "0" value, skipping the weight address corresponding to the "0" value, and directly calculating the next pulse data. The weight W is a dense matrix, the input X is a sparse matrix, and due to the sparsity of X, only a few rows of W need to be integrated, so that effective data of the weight address index W can be obtained, the weight address index can be calculated by taking a numerical value 1 in pulse data X of an input image as a mark, and the jump '0' direct calculation is realized. And zero jump operation is used in the convolution process, so that the calculation efficiency can be further improved, and the power consumption can be reduced.

In an alternative embodiment, the membrane potential information U is determined from the feature vector Add _ Conv (X, W) of the input image and the t-1 time step^t-1Determining the membrane potential information U of the t time step^t。

In an alternative embodiment, determining the membrane potential information at t time step from the feature vector Add _ Conv (X, W) of the input image and the membrane potential information at t-1 time step comprises:

membrane potential information U for the feature vector Add _ Conv (X, W) of the input image and the t-1 time step ^t-1Adding according to elements to obtain membrane potential information U of the t time step^tWherein, U^t＝Add_Conv(X，W)+U^t-1。

The full convolution device of the embodiment of the present disclosure includes:

wherein the pulse data of the input image are pulse data of 0 and 1.

As described above, in the prior art, redundant multiplication operations are performed during convolution, which increases the computational resources and increases the power consumption and area of the chip. The full-addition convolution device disclosed by the invention utilizes the pulse data binarization characteristic of an input image and adopts the accumulation operation only with a full-addition function to replace the addition operation of a multiplication-addition function in the prior art. The pulse data of the input image is binary input, and by using the characteristic, each target weight address needing to be subjected to accumulation operation is determined according to the pulse data of the input image, the accumulation operation is performed on the weight corresponding to each target weight address, and in the same convolution task, the time required by calculation by using the accumulation operation is greatly reduced compared with the traditional multiplication and addition operation, so that half of the time can be saved, and the power consumption of a chip is reduced. For example, the addition of the fp32 precision requires that the energy consumption is 3.7+ 0.9-4.6 (pJ), while the addition of the fp32 precision only requires that the energy consumption is 0.9(pJ), and thus the power consumption of the convolution process can be reduced significantly by adopting accumulation instead of the addition of the multiplication. After the multiplier is removed, the chip can save the calculation space, the chip area is greatly reduced, and the improvement of the integration level of the chip is facilitated.

The device disclosed by the disclosure can be used for a pulse neural network, a fusion network of the pulse neural network and an artificial neural network and the like, and the device disclosed by the disclosure is not particularly limited to the used network.

In an optional implementation manner, the address generation module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.

In an optional implementation manner, the address generation module is configured to generate, according to pulse data of the input image, each weight address corresponding to pulse data having a value of 1; and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.

In an optional implementation manner, the address generation module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and determining the weight address corresponding to each pulse data as the target weight address.

The address generation module of the present disclosure may, for example, generate the weight addresses corresponding to all the pulse data first, and then determine the weight address corresponding to the pulse data with a value of 1 as each target weight address for performing the accumulation operation. For example, it is also possible to directly generate weight addresses corresponding to all pulse data having a value of 1, and determine these weight addresses as the respective target weight addresses for performing the accumulation operation. For example, weight addresses corresponding to all pulse data may be generated, and the weight addresses may be determined as target weight addresses for performing the accumulation operation.

In an alternative embodiment, the apparatus further comprises:

In an optional implementation manner, the address generation module is further configured to determine pulse data of the input image, and determine pulse data with a value of 1.

According to the present disclosure, the judgment module can be set independently, and the function of the judgment module can also be realized through the address generation module.

In an alternative embodiment, the full convolution module includes a multiplexer and an accumulator;

traversing pulse data of the input image;

As shown IN fig. 3, the address generating module is, for example, an AGU module, and is responsible for determining each target weight address for performing an accumulation operation according to impulse data of an input image, the input data buffering module is, for example, an IN-buffer module, and is responsible for inputting impulse data of the input image, and W is a weight of convolution.

For example, let X be the pulse data of the input image of the impulse neural network, W be the weight of convolution, traverse the pulse data of the input image, X for each pulse data in X_iComprises the following steps:

if the pulse data X_iIf the pulse data is equal to 0, the multiplexer determines 0 as the weight to be accumulated, the multiplexer outputs 0, S is equal to S +0, and executes i is equal to i +1, and then the next pulse data is entered;

if the pulse data X_iWhen the value is 1, the multiplexer will send W_iDetermining the weight to be accumulated, and adding W_iPut into an accumulator which performs the accumulation operation S ═ S + W_iThe multiplexer executes i to i +1 and enters the next pulse data;

IN the apparatus described IN this embodiment, a conventional multiplier is replaced with a multiplexer, when pulse data corresponding to a target weight address generated by an address generation module (AGU) IN an input data Buffer module (IN-Buffer) is 0, the output of the multiplexer (Sel) is 0, when pulse data corresponding to a target weight address generated by an address generation module (AGU) IN an input data Buffer module (IN-Buffer) is 1, the output of the multiplexer is a weight IN W, and after the multiplexer is executed, the weight output by the multiplexer is added to a calculation result of a previous target weight address calculated by an accumulator, thereby finally completing an accumulation operation. In the same convolution task, the time required for calculation by utilizing accumulation operation is greatly reduced compared with the traditional multiplication and addition operation, half of the time can be saved, the power consumption of a chip is reduced, and the area of the chip is reduced.

In an alternative embodiment, the full convolution module is an enable accumulator;

traversing pulse data of the input image;

As shown IN fig. 4, the address generating module is, for example, an AGU module, and is responsible for determining each target weight address for performing an accumulation operation according to impulse data of an input image, the input data buffering module is, for example, an IN-buffer module, and is responsible for inputting impulse data of the input image, and W is a weight of convolution.

For example, let X be the pulse data of the input image and W be the weight of the convolution, the pulse data of the input image is traversed, for each pulse X in X_iComprises the following steps:

if the pulse data X_iWhen equal to 0, the enable accumulator determines the corresponding weight W_iThe accumulation operation is not executed, the enable accumulator is enabled to be 0, i-i +1 is directly executed, and the weight corresponding to the next pulse data is determined;

If the pulse data X_iWhen the value is 1, the enable accumulator is enabled to be 1, and the corresponding weight W is determined_iIs the weight to be accumulated, and W is_iPut into an enable accumulator, which performs an accumulation operation S ═ S + W_iExecuting i to i +1, and determining the weight corresponding to the next pulse data;

and the output full convolution value S is used as a characteristic vector of the input image, and a new binary input is generated after passing through a kinetic equation of the impulse neural network and can be used as the input of the next layer in the impulse neural network.

In the full convolution device described in this embodiment, the multiplier and the adder are replaced by an enable accumulator, as shown in fig. 5, the enable accumulator uses a zero skip operation in the calculation process, that is, when the pulse data of the input image is a "0" value, the weight address corresponding to the "0" value is skipped, and the next pulse data is calculated. The weight W is a dense matrix, the input X is a sparse matrix, and only a few rows of W need to be integrated due to the sparsity of X, so that effective data of the W can be indexed through the weight address index, and the weight address index can be calculated by taking a numerical value 1 in pulse data X of an input image as a mark, so that the jump '0' direct calculation is realized. The full convolution device can further improve the calculation efficiency, reduce the power consumption of the chip and reduce the area of the chip by zero jump operation in the convolution process.

For example, as shown IN fig. 6, when each target weight address generated by the AGU module corresponds to each pulse data of 0, 1, 0, …, 1, respectively, corresponding to the pulse data X of the input image IN the IN-Buffer module, the weight corresponding to each target weight address is W00, W001, W010, W011, …, W311, respectively, and the pulse data is 0, the enable accumulator is enabled to be 0, the weight W corresponding to the address does not perform the accumulation operation, and when the pulse data is 1, the enable accumulator is enabled to be 1, the weight corresponding to the weight address performs the accumulation operation, and the full convolution value is obtained as: w001+ W010+, + W311.

The full convolution device according to the embodiment of the present disclosure obtains a full convolution value Add _ Conv (X, W), i.e., a feature vector of an input image, according to pulse data of the input image, where X represents an input sparse matrix and W represents a weight;

the membrane potential module converts the membrane potential value U of t-1 time step^t-1Adding the membrane potential value U with the characteristic vector Add _ Conv (X, W) of the input image according to elements to obtain a membrane potential value U of t time step^t，U^t＝Add_Conv(X，W)+U^t-1；

And the issuing module determines the output value of the t time step according to the membrane potential value of the t time step to be used as a time sequence convolution vector, and updates the membrane potential module by the time step and the enabling.

The full convolution device may adopt, for example, the address generation module, the input data buffer module, and the full convolution module as described in the foregoing embodiments. The full convolution module may use, for example, the multiplexer and accumulator described in the foregoing embodiments, or may use the enable accumulator described in the foregoing embodiments, which will not be described in detail herein.

For example, as shown in fig. 7, the full-plus-zero convolution module in the drawing may be understood as an implementation mode in which the full-plus-convolution module uses a zero-jump operation, that is, the pulse full-plus-convolution layer uses an enable accumulator, because the zero-jump operation is used in the convolution process, the pulse data of the input image passes through the full-plus-zero convolution module to obtain a full-plus-convolution value, the full-plus-convolution value and the membrane potential information of the last time step (t-1 time step) in the membrane potential module are added according to elements to obtain the membrane potential information of the current time step (t time step), the issuing module performs the membrane potential information processing of the current time step to obtain an output vector as a time sequence convolution vector, and updates the time step update enable signal to update the membrane potential module. The pulse full-convolution layer is convolution processing with time sequence information, time sequence convolution vectors include time dimension, the pulse full-convolution layer can be connected with the time sequence information between characteristic diagrams in specific application, meanwhile, zero setting operation reduces calculation amount, calculation time is saved, chip power consumption is reduced, and chip area is reduced.

The pulse full convolution network comprises at least one pulse full convolution layer, and pulse data of an input image are subjected to at least one time sequence convolution processing through the at least one pulse full convolution layer to obtain a time sequence convolution vector. The pulsed full convolution layer is as described in the previous embodiments and will not be described in detail here.

In an alternative embodiment, performing at least one time-series convolution process on pulse data of an input image by using at least one pulse full convolution layer to obtain a time-series convolution vector includes:

performing first time sequence convolution processing on pulse data of an input image through a first pulse full convolution layer to obtain a first vector;

when time sequence convolution processing is carried out for one time, determining a first vector as a time sequence convolution vector;

and when the time sequence convolution processing is performed for n times, pooling the first vector to obtain a first intermediate vector, performing second time sequence convolution processing on the first intermediate vector through a second pulse full convolution layer to obtain a second vector, pooling the second vector to obtain a second intermediate vector, and repeating the steps to determine the nth vector as the time sequence convolution vector, wherein n is an integer greater than 1.

The process of performing the time-series convolution on the pulse data of the input image is as described in the foregoing embodiments, and will not be described in detail here.

For example, as shown in fig. 8, the pulse full convolution network includes two pulse full convolution layers, a first time sequence convolution processing is performed on pulse data of an input image by a first pulse full convolution layer to obtain a first vector, pooling processing is performed on the first vector to obtain a first intermediate vector, a second time sequence convolution processing is performed on the first intermediate vector by a second pulse full convolution layer to obtain a second vector, and the second vector is determined as a time sequence convolution vector and is used as an input of a next layer.

The artificial intelligence chip of the embodiment of the disclosure comprises the pulse full-adding convolution device. The pulse full convolution device is as described in the previous embodiments and will not be described in detail here. In the calculation process of the chip, redundant multiplication operation in the conventional convolution is removed, so that the calculation amount is greatly reduced, and the power consumption of the chip is reduced. For example, the addition with fp32 precision needs to consume 3.7+ 0.9-4.6 (pJ) energy, while the addition with fp32 precision only needs to consume 0.9(pJ) energy, and it is seen that the power consumption of the convolution process can be significantly reduced by adopting addition instead of the addition. After the multiplier is removed, the chip can save the calculation space, the chip area is greatly reduced, and the improvement of the integration level of the chip is facilitated.

The disclosure also relates to an electronic device comprising a server, a terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the full convolution method in the above embodiments.

In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the full convolution method, by running non-volatile software programs, instructions, and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory and, when executed by the one or more processors, perform the full convolution method in any of the method embodiments described above.

The product can execute the full convolution method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the full convolution method provided by the embodiment of the application without detailed technical details in the embodiment.

The present disclosure also relates to a computer-readable storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described embodiments of the full-additive convolution method.

That is, as can be understood by those skilled in the art, all or part of the steps in the method of the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Moreover, those of ordinary skill in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the disclosure and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

It will be understood by those skilled in the art that while the present disclosure has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the disclosure. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiment disclosed, but that the disclosure will include all embodiments falling within the scope of the appended claims.

Claims

1. A full convolution method, the method comprising:

determining each target weight address for accumulation operation according to pulse data of an input image;

and performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a characteristic vector of the input image, wherein the pulse data of the input image is pulse data of 0 and 1.

2. The method of claim 1, wherein determining each target weight address for an accumulation operation based on the pulse data of the input image comprises:

3. The method of claim 1, wherein determining each target weight address for performing an accumulation operation based on impulse data of an input image comprises:

4. The method of claim 1, determining each target weight address for performing an accumulation operation based on pulse data of an input image, comprising:

5. The method of claim 4, wherein performing an accumulation operation on the weights corresponding to the target weight addresses to obtain fully convolved values comprises:

traversing pulse data of the input image;

when the pulse data is 0, determining 0 as the weight to be accumulated;

6. The method of claim 4, wherein performing an accumulation operation on the weights corresponding to the target weight addresses to obtain fully convolved values comprises:

traversing pulse data of the input image;

when the pulse data is 0, the accumulation operation is not executed;

7. The method of any of claims 1-6, further comprising: and determining the membrane potential information of the t time step according to the characteristic vector of the input image and the membrane potential information of the t-1 time step.

8. A full convolution device, the device comprising:

wherein the pulse data of the input image are pulse data of 0 and 1.

9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the full-convolution method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which computer program is executable by a processor for implementing the full-additive convolution method according to any one of claims 1 to 7.