CN111860778A - Full-additive convolution method and device - Google Patents

Full-additive convolution method and device Download PDF

Info

Publication number
CN111860778A
CN111860778A CN202010653816.XA CN202010653816A CN111860778A CN 111860778 A CN111860778 A CN 111860778A CN 202010653816 A CN202010653816 A CN 202010653816A CN 111860778 A CN111860778 A CN 111860778A
Authority
CN
China
Prior art keywords
pulse data
weight
input image
address
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010653816.XA
Other languages
Chinese (zh)
Inventor
王红伟
吴臻志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202010653816.XA priority Critical patent/CN111860778A/en
Publication of CN111860778A publication Critical patent/CN111860778A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a full convolution method and a device, which determine each target weight address for accumulation operation according to pulse data of an input image; and performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a characteristic vector of the input image, wherein the pulse data of the input image is pulse data of 0 and 1. The invention utilizes the pulse data binarization characteristic of the input image, adopts accumulation to replace multiply-add, reduces the calculated amount and the calculating time, reduces the chip power consumption and reduces the chip area.

Description

Full-additive convolution method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a full-additive convolution method and device.
Background
In the existing artificial intelligence chip, in the addition operation process of the calculation neural network, the input neuron activation value needs to meet the high-precision requirement, and a multiplier and an adder are used in the calculation process, so that the consumption of calculation resources and storage resources is caused, and the power consumption and the area of the artificial intelligence chip are increased.
Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a full-additive convolution method and apparatus, which utilize the binary characteristics of the pulse data of the input image, and use accumulation instead of multiply-add, thereby reducing the calculation amount and time, reducing the power consumption of the chip, and reducing the chip area.
The invention provides a full-additive convolution method, which comprises the following steps:
determining each target weight address for accumulation operation according to pulse data of an input image; and performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a characteristic vector of the input image, wherein the pulse data of the input image is pulse data of 0 and 1.
As a further improvement of the present invention, determining each target weight address for performing an accumulation operation based on pulse data of the input image comprises:
generating a weight address corresponding to each pulse data according to the pulse data of the input image;
and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.
As a further improvement of the present invention, determining each target weight address for performing an accumulation operation based on pulse data of an input image comprises:
generating each weight address corresponding to the pulse data with the value of 1 according to the pulse data of the input image;
and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.
As a further improvement of the present invention, determining each target weight address for performing an accumulation operation based on pulse data of an input image comprises:
generating a weight address corresponding to each pulse data according to the pulse data of the input image;
and determining the weight address corresponding to each pulse data as the target weight address.
As a further improvement of the present invention, the step of performing an accumulation operation on the weights corresponding to the target weight addresses to obtain a full convolution value includes:
traversing pulse data of the input image;
when the pulse data is 0, determining 0 as the weight to be accumulated;
when the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.
As a further improvement of the present invention, the step of performing an accumulation operation on the weights corresponding to the target weight addresses to obtain a full convolution value includes:
traversing pulse data of the input image;
when the pulse data is 0, the accumulation operation is not executed;
When the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.
As a further improvement of the present invention, the method further comprises: and determining the membrane potential information of the t time step according to the characteristic vector of the input image and the membrane potential information of the t-1 time step.
As a further improvement of the present invention, the determining of the membrane potential information at t time step based on the feature vector of the input image and the membrane potential information at t-1 time step comprises:
adding the characteristic vector of the input image and the membrane potential information of the t-1 time step according to elements to obtain the membrane potential information of the t time step.
The invention also provides a full-additive convolution device, which comprises:
the address generation module is used for determining each target weight address for accumulation operation according to the pulse data of the input image;
the input data buffer module is used for inputting pulse data of the input image;
the full convolution module is used for performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a feature vector of the input image;
Wherein the pulse data of the input image are pulse data of 0 and 1.
As a further improvement of the present invention, the address generating module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.
As a further improvement of the present invention, the address generating module is configured to generate each weight address corresponding to pulse data with a value of 1 according to pulse data of the input image; and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.
As a further improvement of the present invention, the address generating module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and determining the weight address corresponding to each pulse data as the target weight address.
As a further improvement of the present invention, the apparatus further comprises:
a judging module for judging the pulse data of the input image and determining the pulse data with the value of 1,
and the address generation module is used for generating each weight address corresponding to the pulse data with the numerical value of 1.
As a further improvement of the present invention, the address generation module is further configured to determine pulse data of the input image, and determine pulse data with a value of 1.
As a further improvement of the present invention, the full-adding convolution module comprises a multiplexer and an accumulator;
traversing pulse data of the input image;
when the pulse data is 0, the multiplexer determines 0 as the weight to be accumulated and outputs the weight;
when the pulse data is 1, the multiplexer determines the weight corresponding to a reference weight address as the weight to be accumulated and outputs the weight, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and the accumulator performs accumulation operation on each weight to be accumulated output by the multiplexer to obtain a full convolution value.
As a further improvement of the present invention, the full convolution module is an enable accumulator;
traversing pulse data of the input image;
when the pulse data is 0, enabling the enable accumulator to be 0, and not performing accumulation operation;
when the pulse data is 1, enabling the enable accumulator to be 1, and determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
And the enabling accumulator performs accumulation operation on each weight to be accumulated to obtain a full convolution value.
The present invention also provides an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, and wherein the one or more computer instructions are executed by the processor to implement the full-additive convolution method.
The invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the full-additive convolution method.
The invention has the beneficial effects that: the binary characteristic of pulse data of an input image is utilized, and accumulation is adopted to replace multiply-add, so that the calculation amount and the calculation time are reduced, the power consumption of a chip is reduced, and the area of the chip is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.
Fig. 1 is a schematic flow chart of a full convolution method according to an exemplary embodiment of the present disclosure;
FIG. 2 is a diagram of a prior art multiply-add convolution device;
FIG. 3 is a schematic diagram of a full convolution device according to an exemplary embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a full convolution device according to yet another exemplary embodiment of the present disclosure;
fig. 5 is a flowchart illustrating a zero skip operation according to an exemplary embodiment of the disclosure;
FIG. 6 is a calculation example of a zero jump operation according to an exemplary embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a pulsed full-convolution layer according to an exemplary embodiment of the present disclosure;
fig. 8 is a schematic diagram of a burst full-add network according to an exemplary embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the disclosed embodiment, the directional indications are only used to explain the relative position relationship between the components, the motion situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, in the description of the present disclosure, the terms used are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The terms "comprises" and/or "comprising" are used to specify the presence of elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present disclosure will be more readily understood by those of ordinary skill in the art. The drawings are used for the purpose of illustrating embodiments of the disclosure only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated in the present disclosure may be employed without departing from the principles of the present disclosure.
In the full convolution method according to the embodiment of the present disclosure, as shown in fig. 1, each target weight address for performing an accumulation operation is determined according to pulse data of an input image;
performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a feature vector of the input image;
wherein the pulse data of the input image are pulse data of 0 and 1.
As shown IN fig. 2, IN the prior art, IN the artificial intelligence chip based on the impulse neural network, during the calculation process, a multiplier and an adder are used, an address generation module (AGU) generates each weight address, an input data Buffer module (IN-Buffer) generates input data corresponding to each weight address, a multiplier performs corresponding element multiplication on an input X and a weight W, and the output result is added to the result generated by the multiplier of the previous address, so as to finally obtain a convolution value. However, the input of the impulse neural network is a binary input (0 indicates no impulse, and 1 indicates impulse), that is, the data output by the IN-Buffer has only two values of 0 and 1, so that the multiplication operation is a redundant operation IN the convolution process, which increases the computational resources, increases the power consumption of the chip, and increases the area of the chip.
The method utilizes the pulse data binarization characteristic of an input image and adopts the accumulation operation only with a full addition function to replace the addition operation of a multiplication and addition function in the prior art. The pulse data of the input image is binary input, and by using the characteristic, each target weight address needing to be subjected to accumulation operation is determined according to the pulse data of the input image, the accumulation operation is performed on the weight corresponding to each target weight address, and in the same convolution task, the time required by calculation by using the accumulation operation is greatly reduced compared with the traditional multiplication and addition operation, so that half of the time can be saved, and the power consumption of a chip is reduced. For example, the addition of the fp32 precision requires that the energy consumption is 3.7+ 0.9-4.6 (pJ), while the addition of the fp32 precision only requires that the energy consumption is 0.9(pJ), and thus the power consumption of the convolution process can be reduced significantly by adopting accumulation instead of the addition of the multiplication. After the multiplier is removed, the chip can save the calculation space, the chip area is greatly reduced, and the improvement of the integration level of the chip is facilitated.
The method disclosed by the disclosure can be used for a pulse neural network, a fusion network of the pulse neural network and an artificial neural network, and the like, and the method disclosed by the disclosure is not particularly limited to the used network.
In an alternative embodiment, determining each target weight address for performing an accumulation operation according to the pulse data of the input image includes:
generating a weight address corresponding to each pulse data according to the pulse data of the input image;
and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.
In an alternative embodiment, determining each target weight address for performing an accumulation operation based on pulse data of an input image comprises:
generating each weight address corresponding to the pulse data with the value of 1 according to the pulse data of the input image;
and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.
In an alternative embodiment, determining each target weight address for performing an accumulation operation based on pulse data of an input image comprises:
generating a weight address corresponding to each pulse data according to the pulse data of the input image;
and determining the weight address corresponding to each pulse data as the target weight address.
The method disclosed by the present disclosure may, for example, generate the weight addresses corresponding to all the pulse data, and determine the weight address corresponding to the pulse data with a value of 1 as each target weight address for performing the accumulation operation. For example, it is also possible to directly generate weight addresses corresponding to all pulse data having a value of 1, and determine these weight addresses as the respective target weight addresses for performing the accumulation operation. For example, weight addresses corresponding to all pulse data may be generated, and the weight addresses may be determined as target weight addresses for performing the accumulation operation.
In an optional implementation manner, performing an accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value includes:
traversing pulse data of the input image;
when the pulse data is 0, determining 0 as the weight to be accumulated;
when the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.
For example, let X be the pulse data of the input image and W be the weight of the convolution, the pulse data of the input image is traversed, for each pulse data X in XiComprises the following steps:
if the pulse data X i0 is determined as the weight to be accumulated, 0 is output,s +0, and executing i +1, and entering next pulse data;
if the pulse data XiWhen the value is 1, then W is addediDetermining the weight to be accumulated, and adding WiPut into an accumulator, S ═ S + WiAnd executing i to i +1, and entering next pulse data;
the output full convolution value S is used as a characteristic vector of an input image, and a new binary input is generated after passing through a kinetic equation of the impulse neural network and can be used as the input of the next layer in the impulse neural network.
In an optional implementation manner, performing an accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value includes:
traversing pulse data of the input image;
when the pulse data is 0, the accumulation operation is not executed;
when the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.
For example, let X be the pulse data of the input image and W be the weight of the convolution, the pulse data of the input image is traversed, for each pulse data X in XiComprises the following steps:
if the pulse data XiWhen 0, the corresponding weight WiThe accumulation operation is not executed, i is directly executed as i +1, and the next pulse data is entered;
if the pulse data XiWhen the weight is 1, the corresponding weight W is obtainediDetermining the weight to be accumulated, and adding WiPut into an enable accumulator, S ═ S + WiAnd executing i to i +1, and entering next pulse data;
the output full convolution value S is used as a characteristic vector of an input image, and a new binary input is generated after passing through a kinetic equation of the impulse neural network and can be used as the input of the next layer in the impulse neural network.
This embodiment may be understood as using a zero skip operation in the calculation process, that is, when the pulse data of the input image is a "0" value, skipping the weight address corresponding to the "0" value, and directly calculating the next pulse data. The weight W is a dense matrix, the input X is a sparse matrix, and due to the sparsity of X, only a few rows of W need to be integrated, so that effective data of the weight address index W can be obtained, the weight address index can be calculated by taking a numerical value 1 in pulse data X of an input image as a mark, and the jump '0' direct calculation is realized. And zero jump operation is used in the convolution process, so that the calculation efficiency can be further improved, and the power consumption can be reduced.
In an alternative embodiment, the membrane potential information U is determined from the feature vector Add _ Conv (X, W) of the input image and the t-1 time stept-1Determining the membrane potential information U of the t time stept
In an alternative embodiment, determining the membrane potential information at t time step from the feature vector Add _ Conv (X, W) of the input image and the membrane potential information at t-1 time step comprises:
membrane potential information U for the feature vector Add _ Conv (X, W) of the input image and the t-1 time step t-1Adding according to elements to obtain membrane potential information U of the t time steptWherein, Ut=Add_Conv(X,W)+Ut-1
The full convolution device of the embodiment of the present disclosure includes:
the address generation module is used for determining each target weight address for accumulation operation according to the pulse data of the input image;
the input data buffer module is used for inputting pulse data of the input image;
the full convolution module is used for performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a feature vector of the input image;
wherein the pulse data of the input image are pulse data of 0 and 1.
As described above, in the prior art, redundant multiplication operations are performed during convolution, which increases the computational resources and increases the power consumption and area of the chip. The full-addition convolution device disclosed by the invention utilizes the pulse data binarization characteristic of an input image and adopts the accumulation operation only with a full-addition function to replace the addition operation of a multiplication-addition function in the prior art. The pulse data of the input image is binary input, and by using the characteristic, each target weight address needing to be subjected to accumulation operation is determined according to the pulse data of the input image, the accumulation operation is performed on the weight corresponding to each target weight address, and in the same convolution task, the time required by calculation by using the accumulation operation is greatly reduced compared with the traditional multiplication and addition operation, so that half of the time can be saved, and the power consumption of a chip is reduced. For example, the addition of the fp32 precision requires that the energy consumption is 3.7+ 0.9-4.6 (pJ), while the addition of the fp32 precision only requires that the energy consumption is 0.9(pJ), and thus the power consumption of the convolution process can be reduced significantly by adopting accumulation instead of the addition of the multiplication. After the multiplier is removed, the chip can save the calculation space, the chip area is greatly reduced, and the improvement of the integration level of the chip is facilitated.
The device disclosed by the disclosure can be used for a pulse neural network, a fusion network of the pulse neural network and an artificial neural network and the like, and the device disclosed by the disclosure is not particularly limited to the used network.
In an optional implementation manner, the address generation module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.
In an optional implementation manner, the address generation module is configured to generate, according to pulse data of the input image, each weight address corresponding to pulse data having a value of 1; and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.
In an optional implementation manner, the address generation module is configured to generate a weight address corresponding to each pulse data according to the pulse data of the input image; and determining the weight address corresponding to each pulse data as the target weight address.
The address generation module of the present disclosure may, for example, generate the weight addresses corresponding to all the pulse data first, and then determine the weight address corresponding to the pulse data with a value of 1 as each target weight address for performing the accumulation operation. For example, it is also possible to directly generate weight addresses corresponding to all pulse data having a value of 1, and determine these weight addresses as the respective target weight addresses for performing the accumulation operation. For example, weight addresses corresponding to all pulse data may be generated, and the weight addresses may be determined as target weight addresses for performing the accumulation operation.
In an alternative embodiment, the apparatus further comprises:
a judging module for judging the pulse data of the input image and determining the pulse data with the value of 1,
and the address generation module is used for generating each weight address corresponding to the pulse data with the numerical value of 1.
In an optional implementation manner, the address generation module is further configured to determine pulse data of the input image, and determine pulse data with a value of 1.
According to the present disclosure, the judgment module can be set independently, and the function of the judgment module can also be realized through the address generation module.
In an alternative embodiment, the full convolution module includes a multiplexer and an accumulator;
traversing pulse data of the input image;
when the pulse data is 0, the multiplexer determines 0 as the weight to be accumulated and outputs the weight;
when the pulse data is 1, the multiplexer determines the weight corresponding to a reference weight address as the weight to be accumulated and outputs the weight, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and the accumulator performs accumulation operation on each weight to be accumulated output by the multiplexer to obtain a full convolution value.
As shown IN fig. 3, the address generating module is, for example, an AGU module, and is responsible for determining each target weight address for performing an accumulation operation according to impulse data of an input image, the input data buffering module is, for example, an IN-buffer module, and is responsible for inputting impulse data of the input image, and W is a weight of convolution.
For example, let X be the pulse data of the input image of the impulse neural network, W be the weight of convolution, traverse the pulse data of the input image, X for each pulse data in XiComprises the following steps:
if the pulse data XiIf the pulse data is equal to 0, the multiplexer determines 0 as the weight to be accumulated, the multiplexer outputs 0, S is equal to S +0, and executes i is equal to i +1, and then the next pulse data is entered;
if the pulse data XiWhen the value is 1, the multiplexer will send WiDetermining the weight to be accumulated, and adding WiPut into an accumulator which performs the accumulation operation S ═ S + WiThe multiplexer executes i to i +1 and enters the next pulse data;
the output full convolution value S is used as a characteristic vector of an input image, and a new binary input is generated after passing through a kinetic equation of the impulse neural network and can be used as the input of the next layer in the impulse neural network.
IN the apparatus described IN this embodiment, a conventional multiplier is replaced with a multiplexer, when pulse data corresponding to a target weight address generated by an address generation module (AGU) IN an input data Buffer module (IN-Buffer) is 0, the output of the multiplexer (Sel) is 0, when pulse data corresponding to a target weight address generated by an address generation module (AGU) IN an input data Buffer module (IN-Buffer) is 1, the output of the multiplexer is a weight IN W, and after the multiplexer is executed, the weight output by the multiplexer is added to a calculation result of a previous target weight address calculated by an accumulator, thereby finally completing an accumulation operation. In the same convolution task, the time required for calculation by utilizing accumulation operation is greatly reduced compared with the traditional multiplication and addition operation, half of the time can be saved, the power consumption of a chip is reduced, and the area of the chip is reduced.
In an alternative embodiment, the full convolution module is an enable accumulator;
traversing pulse data of the input image;
when the pulse data is 0, enabling the enable accumulator to be 0, and not performing accumulation operation;
when the pulse data is 1, enabling the enable accumulator to be 1, and determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and the enabling accumulator performs accumulation operation on each weight to be accumulated to obtain a full convolution value.
As shown IN fig. 4, the address generating module is, for example, an AGU module, and is responsible for determining each target weight address for performing an accumulation operation according to impulse data of an input image, the input data buffering module is, for example, an IN-buffer module, and is responsible for inputting impulse data of the input image, and W is a weight of convolution.
For example, let X be the pulse data of the input image and W be the weight of the convolution, the pulse data of the input image is traversed, for each pulse X in XiComprises the following steps:
if the pulse data XiWhen equal to 0, the enable accumulator determines the corresponding weight WiThe accumulation operation is not executed, the enable accumulator is enabled to be 0, i-i +1 is directly executed, and the weight corresponding to the next pulse data is determined;
If the pulse data XiWhen the value is 1, the enable accumulator is enabled to be 1, and the corresponding weight W is determinediIs the weight to be accumulated, and W isiPut into an enable accumulator, which performs an accumulation operation S ═ S + WiExecuting i to i +1, and determining the weight corresponding to the next pulse data;
and the output full convolution value S is used as a characteristic vector of the input image, and a new binary input is generated after passing through a kinetic equation of the impulse neural network and can be used as the input of the next layer in the impulse neural network.
In the full convolution device described in this embodiment, the multiplier and the adder are replaced by an enable accumulator, as shown in fig. 5, the enable accumulator uses a zero skip operation in the calculation process, that is, when the pulse data of the input image is a "0" value, the weight address corresponding to the "0" value is skipped, and the next pulse data is calculated. The weight W is a dense matrix, the input X is a sparse matrix, and only a few rows of W need to be integrated due to the sparsity of X, so that effective data of the W can be indexed through the weight address index, and the weight address index can be calculated by taking a numerical value 1 in pulse data X of an input image as a mark, so that the jump '0' direct calculation is realized. The full convolution device can further improve the calculation efficiency, reduce the power consumption of the chip and reduce the area of the chip by zero jump operation in the convolution process.
For example, as shown IN fig. 6, when each target weight address generated by the AGU module corresponds to each pulse data of 0, 1, 0, …, 1, respectively, corresponding to the pulse data X of the input image IN the IN-Buffer module, the weight corresponding to each target weight address is W00, W001, W010, W011, …, W311, respectively, and the pulse data is 0, the enable accumulator is enabled to be 0, the weight W corresponding to the address does not perform the accumulation operation, and when the pulse data is 1, the enable accumulator is enabled to be 1, the weight corresponding to the weight address performs the accumulation operation, and the full convolution value is obtained as: w001+ W010+, + W311.
The full convolution device according to the embodiment of the present disclosure obtains a full convolution value Add _ Conv (X, W), i.e., a feature vector of an input image, according to pulse data of the input image, where X represents an input sparse matrix and W represents a weight;
the membrane potential module converts the membrane potential value U of t-1 time stept-1Adding the membrane potential value U with the characteristic vector Add _ Conv (X, W) of the input image according to elements to obtain a membrane potential value U of t time stept,Ut=Add_Conv(X,W)+Ut-1
And the issuing module determines the output value of the t time step according to the membrane potential value of the t time step to be used as a time sequence convolution vector, and updates the membrane potential module by the time step and the enabling.
The full convolution device may adopt, for example, the address generation module, the input data buffer module, and the full convolution module as described in the foregoing embodiments. The full convolution module may use, for example, the multiplexer and accumulator described in the foregoing embodiments, or may use the enable accumulator described in the foregoing embodiments, which will not be described in detail herein.
For example, as shown in fig. 7, the full-plus-zero convolution module in the drawing may be understood as an implementation mode in which the full-plus-convolution module uses a zero-jump operation, that is, the pulse full-plus-convolution layer uses an enable accumulator, because the zero-jump operation is used in the convolution process, the pulse data of the input image passes through the full-plus-zero convolution module to obtain a full-plus-convolution value, the full-plus-convolution value and the membrane potential information of the last time step (t-1 time step) in the membrane potential module are added according to elements to obtain the membrane potential information of the current time step (t time step), the issuing module performs the membrane potential information processing of the current time step to obtain an output vector as a time sequence convolution vector, and updates the time step update enable signal to update the membrane potential module. The pulse full-convolution layer is convolution processing with time sequence information, time sequence convolution vectors include time dimension, the pulse full-convolution layer can be connected with the time sequence information between characteristic diagrams in specific application, meanwhile, zero setting operation reduces calculation amount, calculation time is saved, chip power consumption is reduced, and chip area is reduced.
The pulse full convolution network comprises at least one pulse full convolution layer, and pulse data of an input image are subjected to at least one time sequence convolution processing through the at least one pulse full convolution layer to obtain a time sequence convolution vector. The pulsed full convolution layer is as described in the previous embodiments and will not be described in detail here.
In an alternative embodiment, performing at least one time-series convolution process on pulse data of an input image by using at least one pulse full convolution layer to obtain a time-series convolution vector includes:
performing first time sequence convolution processing on pulse data of an input image through a first pulse full convolution layer to obtain a first vector;
when time sequence convolution processing is carried out for one time, determining a first vector as a time sequence convolution vector;
and when the time sequence convolution processing is performed for n times, pooling the first vector to obtain a first intermediate vector, performing second time sequence convolution processing on the first intermediate vector through a second pulse full convolution layer to obtain a second vector, pooling the second vector to obtain a second intermediate vector, and repeating the steps to determine the nth vector as the time sequence convolution vector, wherein n is an integer greater than 1.
The process of performing the time-series convolution on the pulse data of the input image is as described in the foregoing embodiments, and will not be described in detail here.
For example, as shown in fig. 8, the pulse full convolution network includes two pulse full convolution layers, a first time sequence convolution processing is performed on pulse data of an input image by a first pulse full convolution layer to obtain a first vector, pooling processing is performed on the first vector to obtain a first intermediate vector, a second time sequence convolution processing is performed on the first intermediate vector by a second pulse full convolution layer to obtain a second vector, and the second vector is determined as a time sequence convolution vector and is used as an input of a next layer.
The artificial intelligence chip of the embodiment of the disclosure comprises the pulse full-adding convolution device. The pulse full convolution device is as described in the previous embodiments and will not be described in detail here. In the calculation process of the chip, redundant multiplication operation in the conventional convolution is removed, so that the calculation amount is greatly reduced, and the power consumption of the chip is reduced. For example, the addition with fp32 precision needs to consume 3.7+ 0.9-4.6 (pJ) energy, while the addition with fp32 precision only needs to consume 0.9(pJ) energy, and it is seen that the power consumption of the convolution process can be significantly reduced by adopting addition instead of the addition. After the multiplier is removed, the chip can save the calculation space, the chip area is greatly reduced, and the improvement of the integration level of the chip is facilitated.
The disclosure also relates to an electronic device comprising a server, a terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the full convolution method in the above embodiments.
In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the full convolution method, by running non-volatile software programs, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory and, when executed by the one or more processors, perform the full convolution method in any of the method embodiments described above.
The product can execute the full convolution method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the full convolution method provided by the embodiment of the application without detailed technical details in the embodiment.
The present disclosure also relates to a computer-readable storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described embodiments of the full-additive convolution method.
That is, as can be understood by those skilled in the art, all or part of the steps in the method of the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Moreover, those of ordinary skill in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the disclosure and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the present disclosure has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the disclosure. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiment disclosed, but that the disclosure will include all embodiments falling within the scope of the appended claims.

Claims (10)

1. A full convolution method, the method comprising:
determining each target weight address for accumulation operation according to pulse data of an input image;
and performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a characteristic vector of the input image, wherein the pulse data of the input image is pulse data of 0 and 1.
2. The method of claim 1, wherein determining each target weight address for an accumulation operation based on the pulse data of the input image comprises:
generating a weight address corresponding to each pulse data according to the pulse data of the input image;
and traversing the pulse data of the input image, and determining a weight address corresponding to the pulse data with the value of 1 as the target weight address.
3. The method of claim 1, wherein determining each target weight address for performing an accumulation operation based on impulse data of an input image comprises:
generating each weight address corresponding to the pulse data with the value of 1 according to the pulse data of the input image;
and determining each weight address corresponding to the pulse data with the value of 1 as the target weight address.
4. The method of claim 1, determining each target weight address for performing an accumulation operation based on pulse data of an input image, comprising:
generating a weight address corresponding to each pulse data according to the pulse data of the input image;
and determining the weight address corresponding to each pulse data as the target weight address.
5. The method of claim 4, wherein performing an accumulation operation on the weights corresponding to the target weight addresses to obtain fully convolved values comprises:
traversing pulse data of the input image;
when the pulse data is 0, determining 0 as the weight to be accumulated;
when the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.
6. The method of claim 4, wherein performing an accumulation operation on the weights corresponding to the target weight addresses to obtain fully convolved values comprises:
traversing pulse data of the input image;
when the pulse data is 0, the accumulation operation is not executed;
When the pulse data is 1, determining the weight corresponding to a reference weight address as the weight to be accumulated, wherein the reference weight address is a target weight address corresponding to the pulse data with the value of 1;
and performing accumulation operation on each weight to be accumulated to obtain a full convolution value.
7. The method of any of claims 1-6, further comprising: and determining the membrane potential information of the t time step according to the characteristic vector of the input image and the membrane potential information of the t-1 time step.
8. A full convolution device, the device comprising:
the address generation module is used for determining each target weight address for accumulation operation according to the pulse data of the input image;
the input data buffer module is used for inputting pulse data of the input image;
the full convolution module is used for performing accumulation operation on the weight corresponding to each target weight address to obtain a full convolution value, and determining the full convolution value as a feature vector of the input image;
wherein the pulse data of the input image are pulse data of 0 and 1.
9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the full-convolution method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which computer program is executable by a processor for implementing the full-additive convolution method according to any one of claims 1 to 7.
CN202010653816.XA 2020-07-08 2020-07-08 Full-additive convolution method and device Pending CN111860778A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010653816.XA CN111860778A (en) 2020-07-08 2020-07-08 Full-additive convolution method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010653816.XA CN111860778A (en) 2020-07-08 2020-07-08 Full-additive convolution method and device

Publications (1)

Publication Number Publication Date
CN111860778A true CN111860778A (en) 2020-10-30

Family

ID=73152550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010653816.XA Pending CN111860778A (en) 2020-07-08 2020-07-08 Full-additive convolution method and device

Country Status (1)

Country Link
CN (1) CN111860778A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240102A (en) * 2021-05-24 2021-08-10 北京灵汐科技有限公司 Membrane potential updating method of neuron, brain-like neuron device and processing core

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240102A (en) * 2021-05-24 2021-08-10 北京灵汐科技有限公司 Membrane potential updating method of neuron, brain-like neuron device and processing core
CN113240102B (en) * 2021-05-24 2023-11-10 北京灵汐科技有限公司 Membrane potential updating method of neuron, brain-like neuron device and processing core

Similar Documents

Publication Publication Date Title
CN110163368B (en) Deep learning model training method, device and system based on mixed precision
US11262982B2 (en) Computation circuit including a plurality of processing elements coupled to a common accumulator, a computation device and a system including the same
CN108416327B (en) Target detection method and device, computer equipment and readable storage medium
CN110427171B (en) In-memory computing device and method for expandable fixed-point matrix multiply-add operation
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
CN110210610B (en) Convolution calculation accelerator, convolution calculation method and convolution calculation device
CN111667051A (en) Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
KR20180092810A (en) Automatic thresholds for neural network pruning and retraining
CN111144561A (en) Neural network model determining method and device
CN109542512B (en) Data processing method, device and storage medium
CN108304922A (en) Computing device and computational methods for neural computing
CN108304926B (en) Pooling computing device and method suitable for neural network
US20200183834A1 (en) Method and device for determining memory size
CN110580519A (en) Convolution operation structure and method thereof
CN111860778A (en) Full-additive convolution method and device
CN110874635A (en) Deep neural network model compression method and device
CN107783935B (en) Approximate calculation reconfigurable array based on dynamic precision configurable operation
CN111626399B (en) Convolutional neural network computing device and data computing method
CN109711543B (en) Reconfigurable deep belief network implementation system
CN111027688A (en) Neural network calculator generation method and device based on FPGA
CN116304212A (en) Data processing system, method, equipment and storage medium
CN111916049B (en) Voice synthesis method and device
CN111324793B (en) Method and device for controlling operation of storing data of region of interest
JP7104183B2 (en) Neural network contraction device
CN114065923A (en) Compression method, system and accelerating device of convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination