CN108345934A - A kind of activation device and method for neural network processor - Google Patents
A kind of activation device and method for neural network processor Download PDFInfo
- Publication number
- CN108345934A CN108345934A CN201810038612.8A CN201810038612A CN108345934A CN 108345934 A CN108345934 A CN 108345934A CN 201810038612 A CN201810038612 A CN 201810038612A CN 108345934 A CN108345934 A CN 108345934A
- Authority
- CN
- China
- Prior art keywords
- activation
- neuron
- activated
- arithmetic element
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Advance Control (AREA)
Abstract
The present invention provides a kind of activation device and method for neural network processor, reduces the standby time of hardware in a time multiplexed manner and realizes hardware circuit in simple structure.The activation device, including:At least one activation arithmetic element, activation control unit, input interface and output interface;Wherein, the maximum amount of data that the activation arithmetic element can be handled simultaneously is less than or equal to disposably input the pending data amount of the activation device;And, the activation control unit is connect with the activation arithmetic element, for according to the relationship between the pending data amount for disposably inputting the activation device and the processing capacity of the activation arithmetic element, it controls the activation arithmetic element and the processing of activation in batches is carried out to the neuron to be activated disposably received outside the activation device by the input interface, and the result of activation processing is exported into the activation device by the output interface.
Description
Technical field
The present invention relates to neural network processor architecture and design methods, it particularly relates to neural computing
Acceleration field.
Background technology
Depth learning technology was obtaining development at full speed in recent years, on solving high-level abstractions cognitive question, such as schemed
As the fields such as identification, speech recognition, natural language understanding, weather forecasting, gene expression, commending contents and intelligent robot obtain
Extensively using and there is outstanding performance, therefore, it has become the research hotspot of academia and industrial quarters.
Deep neural network is one of the sensor model that artificial intelligence field has highest development level, such network passes through
The neural connection structure for establishing modeling human brain is described data characteristics by the layering of multiple conversion stages, is
The large-scale datas such as image, video and audio processing task brings breakthrough.The model structure is a kind of operational model,
It is made up of netted interconnection structure great deal of nodes, these nodes are referred to as neuron.Bonding strength all generations between each two node
Weighted value of the table by the connection signal between two nodes, is referred to as weight, corresponding with the memory in human nerve's network.
Research purpose for neural network accelerator is to push neural network to broader applications such as intelligence wearing, intelligence
The fields such as robot, automatic Pilot and pattern-recognition.The calculating process of its neural network can be divided into convolution, activation, pond etc.
Step, wherein the activation operation of neuron is the inevitable operating process of neural network.Common activation primitive includes:It passes
System using sigmoid as the nonlinear function of representative.However, the calculating process of above-mentioned traditional activation primitive is complicated, cause god
It is single to activate computing unit that consume larger hardware resource thus difficult through network accelerator when designing corresponding active module
To realize the expansion of scale.In comparison, based on linear function ReLU, since it does not include complicated nonlinear operation, because
And when using the active module that ReLU is activation primitive to design neural network accelerator, relatively small number of hardware face may be used
Product is come the calculation amount of identical scale when realizing with using conventional activation function.
Also, when the prior art designs the active circuit module of neural network accelerator, in order to enable input neuron
It activates as early as possible, activation task is completed using activation unit identical with input neuron number.However, being made using which
The neuron number evidence of input cannot be continuously handled, and this discontinuity results in the idle of active module, reduces hardware
The utilization rate of resource.
Invention content
It is an object of the invention to overcome the defect of the above-mentioned prior art, provide a kind of for neural network processor
Device is activated, including:
At least one activation arithmetic element, activation control unit, input interface and output interface;
Wherein, the maximum amount of data that the activation arithmetic element can be handled simultaneously is less than or equal to disposably input the activation
The pending data amount of device;
Also, the activation control unit is connect with the activation arithmetic element, for according to the disposable input institute
The relationship between the pending data amount of activation device and the processing capacity of the activation arithmetic element is stated, the activation fortune is controlled
Unit is calculated to swash the neuron to be activated disposably received outside the activation device by the input interface in batches
Processing living, and the result of activation processing is exported into the activation device by the output interface.
Preferably, according to the activation device, wherein the activation arithmetic element is ReLU activation arithmetic elements or is
The activation arithmetic element of sigmoid functions, tanh functions.
Preferably, according to the activation device, wherein at least one of described activation arithmetic element is ReLU activation fortune
Circuit is calculated, including:
At least one negated element and N-1 gating element;
Wherein, N is the number of bits of neuron value to be activated, and the negated element is with the nerve to be activated of the N-bit
The sign bit of 1 bit in first value be to input and it exports the control of each being connected in the N-1 gating element
Position processed, each in the N-1 gating element are respectively with remaining N-1 value bit in the neuron value to be activated
Input;
The ReLU activates computing circuit with the sign bit of 1 bit in the neuron value to be activated and the N-1
The output of each in a gating element is exported as it.
Preferably, according to the activation device, wherein the pending data amount for disposably inputting the activation device
The bit wide of input interface for the input bandwidth of the neuron to be activated of neural network processor, or for the activation device.
Preferably, according to the activation device, wherein the activation control unit is additionally operable to according at the neural network
The input bandwidth for managing the neuron to be activated of device controls the input interface and starts and suspend reception to neuron to be activated.
Preferably, according to the activation device, wherein the activation control unit control the activation arithmetic element to by
The neuron to be activated that the input interface disposably receives outside the activation device carries out activation processing in batches and draws
The quantity for the batch divided is equal to the pending data amount for disposably inputting the activation device divided by all activation
The result rounding for the maximum amount of data that arithmetic element can be handled simultaneously.
And a kind of Activiation method for neural network processor, the neural network processor includes at least one
Arithmetic element, the maximum amount of data that the activation arithmetic element can be handled simultaneously is activated to be less than or equal to the neural network processor
The data volume of the neuron to be activated disposably generated, the method includes:
1) data volume of the neuron to be activated disposably generated according to the neural network processor and the activation
Relationship between the processing capacity of arithmetic element carries out the neuron to be activated that the neural network processor disposably generates
Batch processing;
2) according to batch processing as a result, the neuron to be activated of respective batch is provided to the activation operation list successively
Member is handled into line activating.
Preferably, according to the method, wherein the neuron to be activated disposably generated to the neural network processor
Carry out the quantity of batch that batch processing is divided be equal to the data volume of the neuron to be activated disposably generated divided by
The result rounding for the maximum amount of data that all the activation arithmetic element can be handled simultaneously.
A kind of computer readable storage medium, wherein being stored with computer program, the computer program is when executed
For realizing the method described in above-mentioned any one.
It is a kind of to be used for neuron to be activated in neural network into the system of line activating, including:
Neural network processor and storage device,
Wherein, the neural network processor includes at least one activation arithmetic element, and the activation arithmetic element can be same
When the maximum amount of data that handles be less than or equal to the data volume of the neuron to be activated that the neural network processor disposably generates;
For storing computer program, the computer program is used for the storage device when being executed by the processor
Realize the method described in above-mentioned any one.
Compared with the prior art, the advantages of the present invention are as follows:
Based on time division multiplexing idea, the neuron to be activated of input is activated in batches to utilize active module in god
The activation processing unit standby time in processing procedure is activated through member so that relatively little of activation arithmetic element may be used and complete
The activation task of equity has achieved the purpose that reduce circuit usable floor area.
Description of the drawings
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the structure diagram of ReLU functions activation device according to an embodiment of the invention;
Fig. 2 is the workflow of ReLU functions activation device as illustrated in FIG. 1 according to an embodiment of the invention
Schematic diagram;
Fig. 3 is according to an embodiment of the invention for neural network accelerator realization ReLU function activation functions
The hardware architecture diagram of 8 bit ReLU activation computing circuits;
Fig. 4 is the hardware configuration that ReLU as illustrated in FIG. 3 is activated to computing circuit according to one embodiment of present invention
The example being applied in ReLU functions activation device as illustrated in FIG. 1.
Specific implementation mode
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
Inventor is based on the analysis to the prior art, it is believed that how to design efficient active mode and is activated towards ReLU
Special several excitation devices of letter become a research emphasis.
As described in background technology, in the activation device design process of traditional neural network accelerator, exist
Such technology prejudice, it is believed that based on the bandwidth of input data in order to ensure activation operation rapidity, it is necessary to use and bandwidth
The activation primitive arithmetic element of the identical scale of data completes activation task.However, inventor has found that due to be activated
The intermittence of data transmission can correspondingly increase the standby time of activation device, if it is possible to add to the standby time in this way
To utilize the utilization rate that can then improve hardware resource.In this regard, inventor proposes a kind of feasible scheme, pass through the god to input
Activation in batches being carried out through metadata to handle, the scale of activation primitive arithmetic element is just not necessarily to be consistent with bandwidth, so as to
Fairly large neuronal activation task is completed with less arithmetic element, improves the profit of neural network accelerator computing circuit
With rate.
In addition, it is to design face in neural network accelerator in the form of funcall that inventor, which also found the prior art,
To the activation unit of ReLU functions.If the activation list towards ReLU functions can be realized with a kind of special computing circuit structure
Member can then enable neural network accelerator rapidly to complete the activation operation of ReLU functions to the neuron of input, from
And further increase the efficiency of activation processing.
The ReLU functions activation dress of neural network accelerator proposed by the invention according to one embodiment, will be introduced below
The structure set.
Fig. 1 shows the ReLU functions activation device according to an embodiment of the invention towards neural network processor
Module diagram.
With reference to figure 1, the ReLU functions activation device 101 includes:Input interface unit 102, neuron temporary storage location
103, ReLU activates arithmetic element 104, activation control unit 105 and output interface unit 106, and in the ReLU letters
Further include some Auxiliary register units and connection line (not shown) in number activation device 101, for the number between each unit
According to transmission.
Wherein, have in input interface unit 102 data communication protocol that is communicated with external module and with ReLU functions
The internal data transfer agreement for activating other unit communications inside device 101 activates device for obtaining the ReLU functions
101 neuron to be activated.
Neuron temporary storage location 103 is connect with input interface unit 102, for temporarily storing by input interface unit 102
The neuron to be activated obtained.
ReLU activation arithmetic element 104 is connect with neuron temporary storage location 103, with to neuron to be activated at line activating
Reason, the maximum amount of data that can be handled simultaneously are less than or equal to disposably input the pending of the ReLU functions activation device 101
Data volume.
Output interface unit 106 is connect with ReLU activation arithmetic element 104, for that will activate described in the result output of processing
ReLU functions activate device 101.
Activate control unit 105 and input interface unit 102, neuron temporary storage location 103, ReLU activation units 104 with
And output interface unit 106 is respectively connected with, for disposably inputting the ReLU functions according to described device 101 being activated to wait swashing
Relationship between the data volume and the processing capacity of ReLU activation arithmetic elements 104 of neuron living, controls the ReLU and swashs
Arithmetic element 104 living by the input interface unit 101 from 101 outside of ReLU functions activation device to disposably being received
Neuron to be activated carry out in batches activation handle, and control by output interface unit 106 export activation processing result.
The above-mentioned processing of activation in batches refers to that data will be received since input interface unit 102 receives data to pause
A period of time in, the disposable whole neuron to be activated for being input to ReLU functions activation device 101 keeps in and is divided into more
A batch is successively handled the neuron to be activated of each batch into line activating by ReLU activation arithmetic element 104.
As described in the text, the maximum amount of data that the ReLU activation arithmetic element 104 can be handled simultaneously is less than or equal to one
Secondary property inputs the data volume of the whole neuron to be activated of the ReLU functions activation device 101.Therefore, as illustrated in FIG. 1
ReLU functions activation device in can activate device according to the ReLU functions are disposably inputted by activation control unit 105
Relationship between the processing capacity of pending data amount and ReLU activation arithmetic elements 104 is divided into several times to determine to be activated
Neuron is handled, and the neuron to be activated handled each time is called the nerve to be activated of one " batch " of division
Member.In the present invention, by it is this ReLU activation arithmetic element 104 be repeatedly multiplexed wait swashing to handle each batch successively
The mode of neuron living referred to as " is time-multiplexed ".
Inventor thinks can be by the way of above-mentioned " time division multiplexing " to nerve to be activated in neural network processor
Member is that the computation rule based on neural network, neuron to be activated to be treated are often into line activating processing, reason
Intermittently generate.Either input content is carried out in the weights of training neural network or using trained neural network
Classification iterative process adjacent twice between, or by neural network using convolution kernel to input data execute convolution after, all
It can make at interval of disposably generating a certain amount of neuron to be activated after when one section.This make generate next time it is a certain amount of
Neuron to be activated before, the activation device of neural network processor will not receive new neuron to be activated.It can manage
Solution, using can disposably be handled with the input bandwidth of neuron to be activated activation arithmetic element consistent in length in the prior art
Whole neurons to be activated can quickly complete activation, however until neuron to be activated next time arrives
Before, such activation arithmetic element will be always maintained at idle state.In contrast, using ReLU functions as illustrated in FIG. 1
Device 101 is activated, the scale for the data that ReLU activation arithmetic element 104 can be handled simultaneously, which is less than, is disposably input to ReLU letters
The scale of data in number activation device 101, control activate arithmetic element 104 in turn to the to be activated of each batch by ReLU
Neuron is handled so that complete activation is elongated in longer time interval, reduces ReLU activation operations
The standby time of unit 104.Also, use smaller scale due to being directed to ReLU activation arithmetic element 104 so that need not adopt
It realizes that ReLU activates the function of arithmetic element 104 with large-sized hardware, reduces energy consumption and hardware cost.
In the preferred embodiment of the present invention, it can include multiple ReLU activation fortune that ReLU functions, which activate in device 101,
Unit 104 is calculated, processor active task is performed in parallel by multiple ReLU activation arithmetic element 104, to meet high bandwidth neuron
The requirement of real-time of data activation.
Also, it is to be understood that can also be used in the activation device according to the present invention for neural network processor
Other activation arithmetic elements, such as sigmoid activation arithmetic elements or tanh activate arithmetic element.Also, the activation fortune
Unit is calculated either being realized with programmable software module, can also be realized with dedicated hardware circuit.
ReLU functions as illustrated in FIG. 1 are introduced by a specific embodiment below in conjunction with Fig. 2 activates device
Workflow.With reference to figure 2, the method includes:
Step 1.ReLU functions activate device 101 according to the input bandwidth of neuron to be activated, control input interface unit
102 obtain the neuron to be activated for disposably inputting the ReLU functions activation device 101, and control the disposable input
The neuron to be activated deposit neuron temporary storage location 103 of the ReLU functions activation device 101.
ReLU functions activation device 101 can obtain the parameter information about the input bandwidth outside it, can also
Acquiescence sets the input bandwidth to fixed value.
It is appreciated that here for the ease of implementing to control, can be come with the bit wide of direct basis input interface unit 102 true
The fixed neuron to be activated for disposably inputting the ReLU functions activation device 101.
Step 2.ReLU functions activate device 101 according to the input bandwidth of the neuron to be activated and its can locate simultaneously
The maximum amount of data of reason is determined for the neuron to be activated for disposably inputting the ReLU functions activation device 101
Batch splitting scheme, and generate corresponding control coding.
According to one embodiment of present invention, which includes:
Step 2.1. activation control units 105 activate arithmetic element 104 according to the parameter information and ReLU of input bandwidth
Processing capacity between relationship, determine the nerve to be activated for disposably inputting the ReLU functions and activating device 101
Member is divided into how many a batches, and calculates load address of the neuron of each batch in neuron temporary storage location 103;
Step 2.2. activates control unit 105 according to the quantity of the batch of the division and the neuron of each batch
Load address, generate for control ReLU activation arithmetic element 104 execute activation operation control operation, generate control compile
Code.
Step 3. neuron temporary storage location 103 is encoded to activate to ReLU according to the control for carrying out self-activation control unit 105 and be transported
It calculates unit 104 and transmits neuron to be activated to execute the ReLU functions activation operation of neuron.
Step 4.ReLU activation arithmetic elements 104 execute the activation operation towards ReLU, and its activation result is transmitted to
Output interface unit 106, while next batch neuron is received, it is continued for the activation operation of ReLU functions, until according to control
System coding determines that ReLU activation arithmetic elements 104 complete activation operation, output for the neuron to be activated of each batch
Neuronal activation result output ReLU functions are activated device 101 by interface unit 106.
Thus, it is possible to activation arithmetic element is multiplexed in different times, it can be with more flexible control
Method is matched to be matched with the processing capacity of activation arithmetic element, to cope with different input bandwidth, and is kept away
Exempt to activate processing unit in activation as used in the existing scheme with the activation processing unit for being equal to input bandwidth
Present in have longer idle state the case where.
Also, it according to one embodiment of present invention, provides a kind of for realizing above-mentioned ReLU activation arithmetic elements 104
Special circuit structure.The electricity of 8 bit ReLU activation arithmetic element according to an embodiment of the invention is shown in FIG. 3
Line structure schematic diagram.With reference to figure 3, the circuit of the ReLU activation arithmetic element includes a negated element and seven gatings
Element;Wherein, the negated element is input with the sign bit of neuron value, and its output is connected to seven gatings
The control bit of each in element, each in seven gating elements is respectively defeated with the value bit of neuron value
Enter, and the output of each in the sign bit of the neuron value and seven gating elements together constitute it is described
ReLU activates the output of arithmetic element.
The circuit structure of above-mentioned 8 bit ReLU activation arithmetic element meets the characteristic of ReLU functions:
When inputting neuron value less than or equal to zero, the result of activation is equal to 0;
When inputting neuron value more than 0, activation output result is its value itself.
In the embodiment illustrated in figure 3, it is exported with symbol bit line to control the level of data bit line, when input neuron
It is worth (negative) when sign bit is 1, block nerves member numerical value bit level, the i.e. carry-out bit 0 of value bit;When the symbol of input neuron value
When number position is 0 (positive number), neuron numerical value bit level is gated, output valve is equal to input value.Pass through circuit structure entirety
Ground realizes the activation operation of 8 bit neuron number evidences of tape symbol position, compared to traditional comprehensive basic operation called
The computing module of unit is more efficient succinct.
The activation structure of shown ReLU functions can be in the feelings for consuming less circuit resource in the embodiments of figure 3
To realize that the ReLU functions of neuron activate operation under condition, the hardware resource utilization of neural network chip is effectively reduced.
It should be appreciated that being not only used in as illustrated in FIG. 1 according to the present invention above-mentioned special circuit structure illustrated in fig. 3
ReLU functions activation device in, may be utilized in other arbitrary neural network accelerators to provide for realizing ReLU
The hardware configuration of function activation.
It is introduced using special circuit structure as illustrated in FIG. 3 using a specific example below with reference to Fig. 4
ReLU functions activate the structure and the course of work of device.It is appreciated that the input bandwidth packet of common neural network active module
128bit, 256bit, 512bit, 1024bit and 2048bit etc. are included, is situated between for inputting bandwidth and being 256bit here
It continues.
Fig. 4 shows the structural schematic diagram of ReLU functions activation device according to an embodiment of the invention, wherein defeated
The bandwidth of incoming interface unit is 256bit, and the bandwidth of output interface unit is 64bit, and in ReLU activation arithmetic elements
ReLU activation arithmetic elements (being indicated respectively with circle) with 8 8bit as illustrated in FIG. 3.It can be seen by Fig. 4
It arrives, activation control unit for input and provides input control signal with bandwidth parameter for input interface unit, temporary to activate
Unit provides address coding, activates arithmetic element to provide activation control signal for ReLU, and output control is provided for output interface unit
Signal processed.
With reference to figure 4, should include the following steps for the data batch process of activation temporary storage location:
Step S21 will be stored in activation temporary storage location from the neuron to be activated of externally input 256bit, and activation control is single
Member obtains the bandwidth parameter " 256 " of input bandwidth.
Step S22, activation control unit analyze the bandwidth parameter and with ReLU activate arithmetic element processing capacity into
Row compares, so that it is determined that executing the rule of batch processing to neuron to be activated;
By taking the example as an example, the bandwidth parameter for inputting bandwidth is 256bit, and includes 8 in ReLU activation arithmetic elements
The special circuit structure (data of 64bit obviously, can be handled simultaneously) of 8bit, it is possible thereby to by the god in temporary storage location
The rule that batch processing is carried out through member is determined as:Being divided into neuron bandwidth in 4 (256/64) batch processeds and each batch is
64bit.Therefore, the quantity 4 of the batch divided here is equal to the input bandwidth 256bit divided by all activation operations
The result rounding for the maximum amount of data 64bit that unit can be handled simultaneously.Further, each batch neuron can also be calculated temporary
Original upload address in memory cell, for example, the initial address of the neuron to be activated of first batch is set as 0, it is similar
The initial address of the neuron to be activated of remaining batch is respectively set to 63,127 and 191 by ground.
The initial address combination producing of data batch number 4 and each batch data in control program is controlled and is compiled by step S23
(4.0.63.127.191, wherein 4 indicate to be divided into 4 batches, 0 is the initial address of the data of the 1st batch to code, and 63 be the 2nd
The initial address of the data of a batch, and so on;It should be appreciated that can also be compiled here using other modes to generate control
Code), activation task is completed jointly for controlling each module.
With reference to figure 4, the following specifically describes the activation of ReLU function of the device based on time-multiplexed neuron, packets
Include step:
Step S31 is waited for according to the step S21-S23 control codings generated from the activation temporary storage location corresponding batch of load
Neuron is activated, 64 bit datas, which are transmitted separately to 8 ReLU, activates arithmetic element;
Step S32, each ReLU activation arithmetic element executes the activation operation towards ReLU, and activation result is spliced into
64bit data are transmitted to external receiving unit by output interface unit;
Step S33, control module continue parsing coding, when code parsing finishes (need to activate without follow-up batch data,
The activation completely of the neuron of 4 64bit bandwidth batches finishes) when, terminate this activation operation;When the not parsing completely of control coding
When, next group neuron is loaded to activation arithmetic element from temporary storage location according to the initial address in coding and carries out activation behaviour
Make, repeats the process until code parsing is completed.
It should be noted that each functional unit input output band width of neural network activation device proposed by the present invention can root
It is adjusted according to application request, when improving the time availability of activation, reduces bandwidth and arithmetic element size
The continuity for ensureing the processing of neuron data activation simultaneously and transmitting.
According to one embodiment of present invention, the activation of ReLU functions activation device starts and suspends process, including:When outer
When the neuronal quantity that portion's input unit need to activate reaches activation and requires, external function module will be filled to activation provided by the invention
The activation signal input terminal input activation signal set, input interface unit opens agreement preparation at this time and external input unit carries out
Data transmission, control unit enable each unit, and activation starts.When activation is completed in the neuron inputted, control unit will
Each functional unit function blocking, pause activation.
It is appreciated that since the input of neuron to be activated belongs to discontinuous input, and the data volume of single input is larger,
When therefore, in the prior art, using the activation arithmetic element of scale identical as corresponding data amount, operating process will not also connect
It is continuous, therefore there is phenomenon of leaving unused with activation arithmetic element in the transmission of data;The present invention utilizes the arithmetic element of small-scale, is equipped with
Temporary storage location keeps in the mass data of input, then temporary data are transmitted to the arithmetic element of defined in batches
In, the neuron of input is activated in batches, has reached the utilizing status of standby time at this time.The batch processing is referred to as in the present invention
For the method that is time-multiplexed.
It should be noted that the neural network proposed by the present invention towards ReLU functions activates device, it is described corresponding
Internal temporary storage location can store all neurons that input interface unit receives every time.
In carrying out batch process to the neuron in temporary storage location, controller is according to the calculating inside activation computing module
Element number carries out batch division to neuron, and control, will be to the neuron data volume of every batch of when being carried out in batches to data
The occupancy of computing unit is encoded, to ensure the accuracy of data division.
The activation of neuron in neural network should be corresponded to towards device is activated based on time-multiplexed neural network
Journey.
In conclusion the present invention provides a kind of activation device for neural network processor, use at least one
Arithmetic element is activated, and the maximum amount of data that all activation arithmetic element can be handled simultaneously is less than or equal to the input tape
It is wide;When in use, activate the activation control unit in device according to the processing energy of input bandwidth and the activation arithmetic element
Relationship between power carries out batch processing to the neuron to be activated inputted based on the input bandwidth, and controlling will be every
The neuron to be activated of a batch is provided to each activation arithmetic element to be handled into line activating respectively.The activation dress as a result,
Setting can be using the activation arithmetic element with relatively simple circuit structure, in a time multiplexed manner to swashing
Arithmetic element living realizes multiplexing, avoids the idle of activation arithmetic element.Meanwhile it is opposite using circuit structure according to the present invention
Simple activation arithmetic element can reduce the consumption to resource.
In addition, the present invention also provides a kind of special circuit construction for ReLU activation primitives, uses and simply take
Not element and gating element can realize the function of ReLU activation primitives.Tradition ReLU is replaced with less circuit resource expense
Structure makes the hardware resource of ReLU functions activation operation reduce.
It should be noted that each step introduced in above-described embodiment is all not necessary, those skilled in the art
Can carry out according to actual needs it is appropriate accept or reject, replace, modification etc..
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.On although
Text is described the invention in detail with reference to embodiment, it will be understood by those of ordinary skill in the art that, to the skill of the present invention
Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this
In the right of invention.
Claims (10)
1. a kind of activation device for neural network processor, including:
At least one activation arithmetic element, activation control unit, input interface and output interface;
Wherein, the maximum amount of data that the activation arithmetic element can be handled simultaneously is less than or equal to disposably input the activation device
Pending data amount;
Also, the activation control unit is connect with the activation arithmetic element, for according to it is described disposably input it is described swash
Relationship between pending data amount and the processing capacity of the activation arithmetic element that removable mounting is set, controls the activation operation list
Member carries out activation in batches to the neuron to be activated disposably received outside the activation device by the input interface
Reason, and the result of activation processing is exported into the activation device by the output interface.
2. activation device according to claim 1, wherein the activation arithmetic element is ReLU activate arithmetic element or
For the activation arithmetic element of sigmoid functions, tanh functions.
3. activation device according to claim 2, wherein at least one of described activation arithmetic element activates for ReLU
Computing circuit, including:
At least one negated element and N-1 gating element;
Wherein, N is the number of bits of neuron value to be activated, and the negated element is with the neuron value to be activated of the N-bit
In the sign bit of 1 bit be input and its output is connected to the control bit of each in the N-1 gating element,
Each in the N-1 gating element is respectively input with remaining N-1 value bit in the neuron value to be activated;
The ReLU activation computing circuits are with the sign bit of 1 bit in the neuron value to be activated and the N-1 choosing
The output of each in logical element is exported as it.
4. according to the arbitrary activation device in claim 1-3, wherein described disposably input waiting for for the activation device
Processing data amount is the input bandwidth of the neuron to be activated of neural network processor, or is connect for the input of the activation device
The bit wide of mouth.
5. activation device according to claim 4, wherein the activation control unit is additionally operable to according to the neural network
The input bandwidth of the neuron to be activated of processor, which controls the input interface and starts and suspend, connects neuron to be activated
It receives.
6. according to the arbitrary activation device in claim 1-3, wherein the activation control unit controls the activation fortune
Unit is calculated to swash the neuron to be activated disposably received outside the activation device by the input interface in batches
The quantity living for handling divided batch is equal to the pending data amount for disposably inputting the activation device divided by entirely
The result rounding for the maximum amount of data that activation arithmetic element can be handled simultaneously described in portion.
7. a kind of Activiation method for neural network processor, the neural network processor includes at least one activation operation
Unit, the maximum amount of data that the activation arithmetic element can be handled simultaneously are disposably produced less than or equal to the neural network processor
The data volume of raw neuron to be activated, the method includes:
1) data volume of the neuron to be activated disposably generated according to the neural network processor and the activation operation
Relationship between the processing capacity of unit carries out in batches the neuron to be activated that the neural network processor disposably generates
Processing;
2) according to batch processing as a result, successively by the neuron to be activated of respective batch be provided to the activation arithmetic element into
Line activating processing.
8. according to the method described in claim 6, the nerve to be activated wherein disposably generated to the neural network processor
The quantity for the batch that member progress batch processing is divided, which is equal to, removes the data volume of the neuron to be activated disposably generated
With the result rounding for the maximum amount of data that all the activation arithmetic element can be handled simultaneously.
9. a kind of computer readable storage medium, wherein being stored with computer program, the computer program is used when executed
In method of the realization as described in any one of claim 7-8.
10. a kind of be used for neuron to be activated in neural network into the system of line activating, including:
Neural network processor and storage device,
Wherein, the neural network processor includes at least one activation arithmetic element, and the activation arithmetic element can be located simultaneously
The maximum amount of data of reason is less than or equal to the data volume for the neuron to be activated that the neural network processor disposably generates;
The storage device for storing computer program, the computer program when being executed by the processor for realizing
Method as described in any one of claim 7-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810038612.8A CN108345934B (en) | 2018-01-16 | 2018-01-16 | Activation device and method for neural network processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810038612.8A CN108345934B (en) | 2018-01-16 | 2018-01-16 | Activation device and method for neural network processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108345934A true CN108345934A (en) | 2018-07-31 |
CN108345934B CN108345934B (en) | 2020-11-03 |
Family
ID=62960758
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810038612.8A Active CN108345934B (en) | 2018-01-16 | 2018-01-16 | Activation device and method for neural network processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108345934B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190756A (en) * | 2018-09-10 | 2019-01-11 | 中国科学院计算技术研究所 | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device |
CN109754071A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Activate operation method, device, electronic equipment and readable storage medium storing program for executing |
CN110610235A (en) * | 2019-08-22 | 2019-12-24 | 北京时代民芯科技有限公司 | Neural network activation function calculation circuit |
CN110866595A (en) * | 2018-08-28 | 2020-03-06 | 北京嘉楠捷思信息技术有限公司 | Method, device and circuit for operating activation function in integrated circuit |
CN113378149A (en) * | 2021-06-10 | 2021-09-10 | 青岛海洋科学与技术国家实验室发展中心 | Artificial intelligence-based two-way mobile communication identity verification method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127302A (en) * | 2016-06-23 | 2016-11-16 | 杭州华为数字技术有限公司 | Process the circuit of data, image processing system, the method and apparatus of process data |
US20160342890A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Batch processing in a neural network processor |
CN106203621A (en) * | 2016-07-11 | 2016-12-07 | 姚颂 | The processor calculated for convolutional neural networks |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN107480782A (en) * | 2017-08-14 | 2017-12-15 | 电子科技大学 | Learn neural network processor on a kind of piece |
-
2018
- 2018-01-16 CN CN201810038612.8A patent/CN108345934B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160342890A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Batch processing in a neural network processor |
CN106127302A (en) * | 2016-06-23 | 2016-11-16 | 杭州华为数字技术有限公司 | Process the circuit of data, image processing system, the method and apparatus of process data |
CN106203621A (en) * | 2016-07-11 | 2016-12-07 | 姚颂 | The processor calculated for convolutional neural networks |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN107480782A (en) * | 2017-08-14 | 2017-12-15 | 电子科技大学 | Learn neural network processor on a kind of piece |
Non-Patent Citations (1)
Title |
---|
申忠如: "《数字电子技术基础》", 31 August 2010, 西安交通大学出版社 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866595A (en) * | 2018-08-28 | 2020-03-06 | 北京嘉楠捷思信息技术有限公司 | Method, device and circuit for operating activation function in integrated circuit |
CN110866595B (en) * | 2018-08-28 | 2024-04-26 | 嘉楠明芯(北京)科技有限公司 | Method, device and circuit for operating activation function in integrated circuit |
CN109190756A (en) * | 2018-09-10 | 2019-01-11 | 中国科学院计算技术研究所 | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device |
CN109754071A (en) * | 2018-12-29 | 2019-05-14 | 北京中科寒武纪科技有限公司 | Activate operation method, device, electronic equipment and readable storage medium storing program for executing |
CN109754071B (en) * | 2018-12-29 | 2020-05-05 | 中科寒武纪科技股份有限公司 | Activation operation method and device, electronic equipment and readable storage medium |
CN110610235A (en) * | 2019-08-22 | 2019-12-24 | 北京时代民芯科技有限公司 | Neural network activation function calculation circuit |
CN110610235B (en) * | 2019-08-22 | 2022-05-13 | 北京时代民芯科技有限公司 | Neural network activation function calculation circuit |
CN113378149A (en) * | 2021-06-10 | 2021-09-10 | 青岛海洋科学与技术国家实验室发展中心 | Artificial intelligence-based two-way mobile communication identity verification method and system |
CN113378149B (en) * | 2021-06-10 | 2022-06-03 | 青岛海洋科学与技术国家实验室发展中心 | Artificial intelligence-based two-way mobile communication identity verification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108345934B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108345934A (en) | A kind of activation device and method for neural network processor | |
Zhou et al. | Edge intelligence: Paving the last mile of artificial intelligence with edge computing | |
CN109858620B (en) | Brain-like computing system | |
CN106529670B (en) | It is a kind of based on weight compression neural network processor, design method, chip | |
CN109753751B (en) | MEC random task migration method based on machine learning | |
CN106650924B (en) | A kind of processor based on time dimension and space dimension data stream compression, design method | |
CN108665059A (en) | Convolutional neural networks acceleration system based on field programmable gate array | |
CN110163016B (en) | Hybrid computing system and hybrid computing method | |
CN110222760B (en) | Quick image processing method based on winograd algorithm | |
CN109478144A (en) | A kind of data processing equipment and method | |
Wang et al. | Deep spiking neural networks with binary weights for object recognition | |
CN105159148A (en) | Robot instruction processing method and device | |
CN107508698B (en) | Software defined service reorganization method based on content perception and weighted graph in fog calculation | |
Gao et al. | Deep neural network task partitioning and offloading for mobile edge computing | |
CN108304925A (en) | A kind of pond computing device and method | |
CN111831358B (en) | Weight precision configuration method, device, equipment and storage medium | |
CN111831355A (en) | Weight precision configuration method, device, equipment and storage medium | |
Dinelli et al. | MEM-OPT: A scheduling and data re-use system to optimize on-chip memory usage for CNNs on-board FPGAs | |
CN116957698A (en) | Electricity price prediction method based on improved time sequence mode attention mechanism | |
CN111831359A (en) | Weight precision configuration method, device, equipment and storage medium | |
CN114169506A (en) | Deep learning edge computing system framework based on industrial Internet of things platform | |
CN109767002A (en) | A kind of neural network accelerated method based on muti-piece FPGA collaboration processing | |
CN111831356B (en) | Weight precision configuration method, device, equipment and storage medium | |
CN109542513B (en) | Convolutional neural network instruction data storage system and method | |
Zou et al. | A scatter-and-gather spiking convolutional neural network on a reconfigurable neuromorphic hardware |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |