CN108345934A - A kind of activation device and method for neural network processor - Google Patents

A kind of activation device and method for neural network processor Download PDF

Info

Publication number
CN108345934A
CN108345934A CN201810038612.8A CN201810038612A CN108345934A CN 108345934 A CN108345934 A CN 108345934A CN 201810038612 A CN201810038612 A CN 201810038612A CN 108345934 A CN108345934 A CN 108345934A
Authority
CN
China
Prior art keywords
activation
neuron
activated
arithmetic element
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810038612.8A
Other languages
Chinese (zh)
Other versions
CN108345934B (en
Inventor
韩银和
闵丰
许浩博
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201810038612.8A priority Critical patent/CN108345934B/en
Publication of CN108345934A publication Critical patent/CN108345934A/en
Application granted granted Critical
Publication of CN108345934B publication Critical patent/CN108345934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)

Abstract

The present invention provides a kind of activation device and method for neural network processor, reduces the standby time of hardware in a time multiplexed manner and realizes hardware circuit in simple structure.The activation device, including:At least one activation arithmetic element, activation control unit, input interface and output interface;Wherein, the maximum amount of data that the activation arithmetic element can be handled simultaneously is less than or equal to disposably input the pending data amount of the activation device;And, the activation control unit is connect with the activation arithmetic element, for according to the relationship between the pending data amount for disposably inputting the activation device and the processing capacity of the activation arithmetic element, it controls the activation arithmetic element and the processing of activation in batches is carried out to the neuron to be activated disposably received outside the activation device by the input interface, and the result of activation processing is exported into the activation device by the output interface.

Description

A kind of activation device and method for neural network processor
Technical field
The present invention relates to neural network processor architecture and design methods, it particularly relates to neural computing Acceleration field.
Background technology
Depth learning technology was obtaining development at full speed in recent years, on solving high-level abstractions cognitive question, such as schemed As the fields such as identification, speech recognition, natural language understanding, weather forecasting, gene expression, commending contents and intelligent robot obtain Extensively using and there is outstanding performance, therefore, it has become the research hotspot of academia and industrial quarters.
Deep neural network is one of the sensor model that artificial intelligence field has highest development level, such network passes through The neural connection structure for establishing modeling human brain is described data characteristics by the layering of multiple conversion stages, is The large-scale datas such as image, video and audio processing task brings breakthrough.The model structure is a kind of operational model, It is made up of netted interconnection structure great deal of nodes, these nodes are referred to as neuron.Bonding strength all generations between each two node Weighted value of the table by the connection signal between two nodes, is referred to as weight, corresponding with the memory in human nerve's network.
Research purpose for neural network accelerator is to push neural network to broader applications such as intelligence wearing, intelligence The fields such as robot, automatic Pilot and pattern-recognition.The calculating process of its neural network can be divided into convolution, activation, pond etc. Step, wherein the activation operation of neuron is the inevitable operating process of neural network.Common activation primitive includes:It passes System using sigmoid as the nonlinear function of representative.However, the calculating process of above-mentioned traditional activation primitive is complicated, cause god It is single to activate computing unit that consume larger hardware resource thus difficult through network accelerator when designing corresponding active module To realize the expansion of scale.In comparison, based on linear function ReLU, since it does not include complicated nonlinear operation, because And when using the active module that ReLU is activation primitive to design neural network accelerator, relatively small number of hardware face may be used Product is come the calculation amount of identical scale when realizing with using conventional activation function.
Also, when the prior art designs the active circuit module of neural network accelerator, in order to enable input neuron It activates as early as possible, activation task is completed using activation unit identical with input neuron number.However, being made using which The neuron number evidence of input cannot be continuously handled, and this discontinuity results in the idle of active module, reduces hardware The utilization rate of resource.
Invention content
It is an object of the invention to overcome the defect of the above-mentioned prior art, provide a kind of for neural network processor Device is activated, including:
At least one activation arithmetic element, activation control unit, input interface and output interface;
Wherein, the maximum amount of data that the activation arithmetic element can be handled simultaneously is less than or equal to disposably input the activation The pending data amount of device;
Also, the activation control unit is connect with the activation arithmetic element, for according to the disposable input institute The relationship between the pending data amount of activation device and the processing capacity of the activation arithmetic element is stated, the activation fortune is controlled Unit is calculated to swash the neuron to be activated disposably received outside the activation device by the input interface in batches Processing living, and the result of activation processing is exported into the activation device by the output interface.
Preferably, according to the activation device, wherein the activation arithmetic element is ReLU activation arithmetic elements or is The activation arithmetic element of sigmoid functions, tanh functions.
Preferably, according to the activation device, wherein at least one of described activation arithmetic element is ReLU activation fortune Circuit is calculated, including:
At least one negated element and N-1 gating element;
Wherein, N is the number of bits of neuron value to be activated, and the negated element is with the nerve to be activated of the N-bit The sign bit of 1 bit in first value be to input and it exports the control of each being connected in the N-1 gating element Position processed, each in the N-1 gating element are respectively with remaining N-1 value bit in the neuron value to be activated Input;
The ReLU activates computing circuit with the sign bit of 1 bit in the neuron value to be activated and the N-1 The output of each in a gating element is exported as it.
Preferably, according to the activation device, wherein the pending data amount for disposably inputting the activation device The bit wide of input interface for the input bandwidth of the neuron to be activated of neural network processor, or for the activation device.
Preferably, according to the activation device, wherein the activation control unit is additionally operable to according at the neural network The input bandwidth for managing the neuron to be activated of device controls the input interface and starts and suspend reception to neuron to be activated.
Preferably, according to the activation device, wherein the activation control unit control the activation arithmetic element to by The neuron to be activated that the input interface disposably receives outside the activation device carries out activation processing in batches and draws The quantity for the batch divided is equal to the pending data amount for disposably inputting the activation device divided by all activation The result rounding for the maximum amount of data that arithmetic element can be handled simultaneously.
And a kind of Activiation method for neural network processor, the neural network processor includes at least one Arithmetic element, the maximum amount of data that the activation arithmetic element can be handled simultaneously is activated to be less than or equal to the neural network processor The data volume of the neuron to be activated disposably generated, the method includes:
1) data volume of the neuron to be activated disposably generated according to the neural network processor and the activation Relationship between the processing capacity of arithmetic element carries out the neuron to be activated that the neural network processor disposably generates Batch processing;
2) according to batch processing as a result, the neuron to be activated of respective batch is provided to the activation operation list successively Member is handled into line activating.
Preferably, according to the method, wherein the neuron to be activated disposably generated to the neural network processor Carry out the quantity of batch that batch processing is divided be equal to the data volume of the neuron to be activated disposably generated divided by The result rounding for the maximum amount of data that all the activation arithmetic element can be handled simultaneously.
A kind of computer readable storage medium, wherein being stored with computer program, the computer program is when executed For realizing the method described in above-mentioned any one.
It is a kind of to be used for neuron to be activated in neural network into the system of line activating, including:
Neural network processor and storage device,
Wherein, the neural network processor includes at least one activation arithmetic element, and the activation arithmetic element can be same When the maximum amount of data that handles be less than or equal to the data volume of the neuron to be activated that the neural network processor disposably generates;
For storing computer program, the computer program is used for the storage device when being executed by the processor Realize the method described in above-mentioned any one.
Compared with the prior art, the advantages of the present invention are as follows:
Based on time division multiplexing idea, the neuron to be activated of input is activated in batches to utilize active module in god The activation processing unit standby time in processing procedure is activated through member so that relatively little of activation arithmetic element may be used and complete The activation task of equity has achieved the purpose that reduce circuit usable floor area.
Description of the drawings
Embodiments of the present invention is further illustrated referring to the drawings, wherein:
Fig. 1 is the structure diagram of ReLU functions activation device according to an embodiment of the invention;
Fig. 2 is the workflow of ReLU functions activation device as illustrated in FIG. 1 according to an embodiment of the invention Schematic diagram;
Fig. 3 is according to an embodiment of the invention for neural network accelerator realization ReLU function activation functions The hardware architecture diagram of 8 bit ReLU activation computing circuits;
Fig. 4 is the hardware configuration that ReLU as illustrated in FIG. 3 is activated to computing circuit according to one embodiment of present invention The example being applied in ReLU functions activation device as illustrated in FIG. 1.
Specific implementation mode
It elaborates with reference to the accompanying drawings and detailed description to the present invention.
Inventor is based on the analysis to the prior art, it is believed that how to design efficient active mode and is activated towards ReLU Special several excitation devices of letter become a research emphasis.
As described in background technology, in the activation device design process of traditional neural network accelerator, exist Such technology prejudice, it is believed that based on the bandwidth of input data in order to ensure activation operation rapidity, it is necessary to use and bandwidth The activation primitive arithmetic element of the identical scale of data completes activation task.However, inventor has found that due to be activated The intermittence of data transmission can correspondingly increase the standby time of activation device, if it is possible to add to the standby time in this way To utilize the utilization rate that can then improve hardware resource.In this regard, inventor proposes a kind of feasible scheme, pass through the god to input Activation in batches being carried out through metadata to handle, the scale of activation primitive arithmetic element is just not necessarily to be consistent with bandwidth, so as to Fairly large neuronal activation task is completed with less arithmetic element, improves the profit of neural network accelerator computing circuit With rate.
In addition, it is to design face in neural network accelerator in the form of funcall that inventor, which also found the prior art, To the activation unit of ReLU functions.If the activation list towards ReLU functions can be realized with a kind of special computing circuit structure Member can then enable neural network accelerator rapidly to complete the activation operation of ReLU functions to the neuron of input, from And further increase the efficiency of activation processing.
The ReLU functions activation dress of neural network accelerator proposed by the invention according to one embodiment, will be introduced below The structure set.
Fig. 1 shows the ReLU functions activation device according to an embodiment of the invention towards neural network processor Module diagram.
With reference to figure 1, the ReLU functions activation device 101 includes:Input interface unit 102, neuron temporary storage location 103, ReLU activates arithmetic element 104, activation control unit 105 and output interface unit 106, and in the ReLU letters Further include some Auxiliary register units and connection line (not shown) in number activation device 101, for the number between each unit According to transmission.
Wherein, have in input interface unit 102 data communication protocol that is communicated with external module and with ReLU functions The internal data transfer agreement for activating other unit communications inside device 101 activates device for obtaining the ReLU functions 101 neuron to be activated.
Neuron temporary storage location 103 is connect with input interface unit 102, for temporarily storing by input interface unit 102 The neuron to be activated obtained.
ReLU activation arithmetic element 104 is connect with neuron temporary storage location 103, with to neuron to be activated at line activating Reason, the maximum amount of data that can be handled simultaneously are less than or equal to disposably input the pending of the ReLU functions activation device 101 Data volume.
Output interface unit 106 is connect with ReLU activation arithmetic element 104, for that will activate described in the result output of processing ReLU functions activate device 101.
Activate control unit 105 and input interface unit 102, neuron temporary storage location 103, ReLU activation units 104 with And output interface unit 106 is respectively connected with, for disposably inputting the ReLU functions according to described device 101 being activated to wait swashing Relationship between the data volume and the processing capacity of ReLU activation arithmetic elements 104 of neuron living, controls the ReLU and swashs Arithmetic element 104 living by the input interface unit 101 from 101 outside of ReLU functions activation device to disposably being received Neuron to be activated carry out in batches activation handle, and control by output interface unit 106 export activation processing result.
The above-mentioned processing of activation in batches refers to that data will be received since input interface unit 102 receives data to pause A period of time in, the disposable whole neuron to be activated for being input to ReLU functions activation device 101 keeps in and is divided into more A batch is successively handled the neuron to be activated of each batch into line activating by ReLU activation arithmetic element 104.
As described in the text, the maximum amount of data that the ReLU activation arithmetic element 104 can be handled simultaneously is less than or equal to one Secondary property inputs the data volume of the whole neuron to be activated of the ReLU functions activation device 101.Therefore, as illustrated in FIG. 1 ReLU functions activation device in can activate device according to the ReLU functions are disposably inputted by activation control unit 105 Relationship between the processing capacity of pending data amount and ReLU activation arithmetic elements 104 is divided into several times to determine to be activated Neuron is handled, and the neuron to be activated handled each time is called the nerve to be activated of one " batch " of division Member.In the present invention, by it is this ReLU activation arithmetic element 104 be repeatedly multiplexed wait swashing to handle each batch successively The mode of neuron living referred to as " is time-multiplexed ".
Inventor thinks can be by the way of above-mentioned " time division multiplexing " to nerve to be activated in neural network processor Member is that the computation rule based on neural network, neuron to be activated to be treated are often into line activating processing, reason Intermittently generate.Either input content is carried out in the weights of training neural network or using trained neural network Classification iterative process adjacent twice between, or by neural network using convolution kernel to input data execute convolution after, all It can make at interval of disposably generating a certain amount of neuron to be activated after when one section.This make generate next time it is a certain amount of Neuron to be activated before, the activation device of neural network processor will not receive new neuron to be activated.It can manage Solution, using can disposably be handled with the input bandwidth of neuron to be activated activation arithmetic element consistent in length in the prior art Whole neurons to be activated can quickly complete activation, however until neuron to be activated next time arrives Before, such activation arithmetic element will be always maintained at idle state.In contrast, using ReLU functions as illustrated in FIG. 1 Device 101 is activated, the scale for the data that ReLU activation arithmetic element 104 can be handled simultaneously, which is less than, is disposably input to ReLU letters The scale of data in number activation device 101, control activate arithmetic element 104 in turn to the to be activated of each batch by ReLU Neuron is handled so that complete activation is elongated in longer time interval, reduces ReLU activation operations The standby time of unit 104.Also, use smaller scale due to being directed to ReLU activation arithmetic element 104 so that need not adopt It realizes that ReLU activates the function of arithmetic element 104 with large-sized hardware, reduces energy consumption and hardware cost.
In the preferred embodiment of the present invention, it can include multiple ReLU activation fortune that ReLU functions, which activate in device 101, Unit 104 is calculated, processor active task is performed in parallel by multiple ReLU activation arithmetic element 104, to meet high bandwidth neuron The requirement of real-time of data activation.
Also, it is to be understood that can also be used in the activation device according to the present invention for neural network processor Other activation arithmetic elements, such as sigmoid activation arithmetic elements or tanh activate arithmetic element.Also, the activation fortune Unit is calculated either being realized with programmable software module, can also be realized with dedicated hardware circuit.
ReLU functions as illustrated in FIG. 1 are introduced by a specific embodiment below in conjunction with Fig. 2 activates device Workflow.With reference to figure 2, the method includes:
Step 1.ReLU functions activate device 101 according to the input bandwidth of neuron to be activated, control input interface unit 102 obtain the neuron to be activated for disposably inputting the ReLU functions activation device 101, and control the disposable input The neuron to be activated deposit neuron temporary storage location 103 of the ReLU functions activation device 101.
ReLU functions activation device 101 can obtain the parameter information about the input bandwidth outside it, can also Acquiescence sets the input bandwidth to fixed value.
It is appreciated that here for the ease of implementing to control, can be come with the bit wide of direct basis input interface unit 102 true The fixed neuron to be activated for disposably inputting the ReLU functions activation device 101.
Step 2.ReLU functions activate device 101 according to the input bandwidth of the neuron to be activated and its can locate simultaneously The maximum amount of data of reason is determined for the neuron to be activated for disposably inputting the ReLU functions activation device 101 Batch splitting scheme, and generate corresponding control coding.
According to one embodiment of present invention, which includes:
Step 2.1. activation control units 105 activate arithmetic element 104 according to the parameter information and ReLU of input bandwidth Processing capacity between relationship, determine the nerve to be activated for disposably inputting the ReLU functions and activating device 101 Member is divided into how many a batches, and calculates load address of the neuron of each batch in neuron temporary storage location 103;
Step 2.2. activates control unit 105 according to the quantity of the batch of the division and the neuron of each batch Load address, generate for control ReLU activation arithmetic element 104 execute activation operation control operation, generate control compile Code.
Step 3. neuron temporary storage location 103 is encoded to activate to ReLU according to the control for carrying out self-activation control unit 105 and be transported It calculates unit 104 and transmits neuron to be activated to execute the ReLU functions activation operation of neuron.
Step 4.ReLU activation arithmetic elements 104 execute the activation operation towards ReLU, and its activation result is transmitted to Output interface unit 106, while next batch neuron is received, it is continued for the activation operation of ReLU functions, until according to control System coding determines that ReLU activation arithmetic elements 104 complete activation operation, output for the neuron to be activated of each batch Neuronal activation result output ReLU functions are activated device 101 by interface unit 106.
Thus, it is possible to activation arithmetic element is multiplexed in different times, it can be with more flexible control Method is matched to be matched with the processing capacity of activation arithmetic element, to cope with different input bandwidth, and is kept away Exempt to activate processing unit in activation as used in the existing scheme with the activation processing unit for being equal to input bandwidth Present in have longer idle state the case where.
Also, it according to one embodiment of present invention, provides a kind of for realizing above-mentioned ReLU activation arithmetic elements 104 Special circuit structure.The electricity of 8 bit ReLU activation arithmetic element according to an embodiment of the invention is shown in FIG. 3 Line structure schematic diagram.With reference to figure 3, the circuit of the ReLU activation arithmetic element includes a negated element and seven gatings Element;Wherein, the negated element is input with the sign bit of neuron value, and its output is connected to seven gatings The control bit of each in element, each in seven gating elements is respectively defeated with the value bit of neuron value Enter, and the output of each in the sign bit of the neuron value and seven gating elements together constitute it is described ReLU activates the output of arithmetic element.
The circuit structure of above-mentioned 8 bit ReLU activation arithmetic element meets the characteristic of ReLU functions:
When inputting neuron value less than or equal to zero, the result of activation is equal to 0;
When inputting neuron value more than 0, activation output result is its value itself.
In the embodiment illustrated in figure 3, it is exported with symbol bit line to control the level of data bit line, when input neuron It is worth (negative) when sign bit is 1, block nerves member numerical value bit level, the i.e. carry-out bit 0 of value bit;When the symbol of input neuron value When number position is 0 (positive number), neuron numerical value bit level is gated, output valve is equal to input value.Pass through circuit structure entirety Ground realizes the activation operation of 8 bit neuron number evidences of tape symbol position, compared to traditional comprehensive basic operation called The computing module of unit is more efficient succinct.
The activation structure of shown ReLU functions can be in the feelings for consuming less circuit resource in the embodiments of figure 3 To realize that the ReLU functions of neuron activate operation under condition, the hardware resource utilization of neural network chip is effectively reduced.
It should be appreciated that being not only used in as illustrated in FIG. 1 according to the present invention above-mentioned special circuit structure illustrated in fig. 3 ReLU functions activation device in, may be utilized in other arbitrary neural network accelerators to provide for realizing ReLU The hardware configuration of function activation.
It is introduced using special circuit structure as illustrated in FIG. 3 using a specific example below with reference to Fig. 4 ReLU functions activate the structure and the course of work of device.It is appreciated that the input bandwidth packet of common neural network active module 128bit, 256bit, 512bit, 1024bit and 2048bit etc. are included, is situated between for inputting bandwidth and being 256bit here It continues.
Fig. 4 shows the structural schematic diagram of ReLU functions activation device according to an embodiment of the invention, wherein defeated The bandwidth of incoming interface unit is 256bit, and the bandwidth of output interface unit is 64bit, and in ReLU activation arithmetic elements ReLU activation arithmetic elements (being indicated respectively with circle) with 8 8bit as illustrated in FIG. 3.It can be seen by Fig. 4 It arrives, activation control unit for input and provides input control signal with bandwidth parameter for input interface unit, temporary to activate Unit provides address coding, activates arithmetic element to provide activation control signal for ReLU, and output control is provided for output interface unit Signal processed.
With reference to figure 4, should include the following steps for the data batch process of activation temporary storage location:
Step S21 will be stored in activation temporary storage location from the neuron to be activated of externally input 256bit, and activation control is single Member obtains the bandwidth parameter " 256 " of input bandwidth.
Step S22, activation control unit analyze the bandwidth parameter and with ReLU activate arithmetic element processing capacity into Row compares, so that it is determined that executing the rule of batch processing to neuron to be activated;
By taking the example as an example, the bandwidth parameter for inputting bandwidth is 256bit, and includes 8 in ReLU activation arithmetic elements The special circuit structure (data of 64bit obviously, can be handled simultaneously) of 8bit, it is possible thereby to by the god in temporary storage location The rule that batch processing is carried out through member is determined as:Being divided into neuron bandwidth in 4 (256/64) batch processeds and each batch is 64bit.Therefore, the quantity 4 of the batch divided here is equal to the input bandwidth 256bit divided by all activation operations The result rounding for the maximum amount of data 64bit that unit can be handled simultaneously.Further, each batch neuron can also be calculated temporary Original upload address in memory cell, for example, the initial address of the neuron to be activated of first batch is set as 0, it is similar The initial address of the neuron to be activated of remaining batch is respectively set to 63,127 and 191 by ground.
The initial address combination producing of data batch number 4 and each batch data in control program is controlled and is compiled by step S23 (4.0.63.127.191, wherein 4 indicate to be divided into 4 batches, 0 is the initial address of the data of the 1st batch to code, and 63 be the 2nd The initial address of the data of a batch, and so on;It should be appreciated that can also be compiled here using other modes to generate control Code), activation task is completed jointly for controlling each module.
With reference to figure 4, the following specifically describes the activation of ReLU function of the device based on time-multiplexed neuron, packets Include step:
Step S31 is waited for according to the step S21-S23 control codings generated from the activation temporary storage location corresponding batch of load Neuron is activated, 64 bit datas, which are transmitted separately to 8 ReLU, activates arithmetic element;
Step S32, each ReLU activation arithmetic element executes the activation operation towards ReLU, and activation result is spliced into 64bit data are transmitted to external receiving unit by output interface unit;
Step S33, control module continue parsing coding, when code parsing finishes (need to activate without follow-up batch data, The activation completely of the neuron of 4 64bit bandwidth batches finishes) when, terminate this activation operation;When the not parsing completely of control coding When, next group neuron is loaded to activation arithmetic element from temporary storage location according to the initial address in coding and carries out activation behaviour Make, repeats the process until code parsing is completed.
It should be noted that each functional unit input output band width of neural network activation device proposed by the present invention can root It is adjusted according to application request, when improving the time availability of activation, reduces bandwidth and arithmetic element size The continuity for ensureing the processing of neuron data activation simultaneously and transmitting.
According to one embodiment of present invention, the activation of ReLU functions activation device starts and suspends process, including:When outer When the neuronal quantity that portion's input unit need to activate reaches activation and requires, external function module will be filled to activation provided by the invention The activation signal input terminal input activation signal set, input interface unit opens agreement preparation at this time and external input unit carries out Data transmission, control unit enable each unit, and activation starts.When activation is completed in the neuron inputted, control unit will Each functional unit function blocking, pause activation.
It is appreciated that since the input of neuron to be activated belongs to discontinuous input, and the data volume of single input is larger, When therefore, in the prior art, using the activation arithmetic element of scale identical as corresponding data amount, operating process will not also connect It is continuous, therefore there is phenomenon of leaving unused with activation arithmetic element in the transmission of data;The present invention utilizes the arithmetic element of small-scale, is equipped with Temporary storage location keeps in the mass data of input, then temporary data are transmitted to the arithmetic element of defined in batches In, the neuron of input is activated in batches, has reached the utilizing status of standby time at this time.The batch processing is referred to as in the present invention For the method that is time-multiplexed.
It should be noted that the neural network proposed by the present invention towards ReLU functions activates device, it is described corresponding Internal temporary storage location can store all neurons that input interface unit receives every time.
In carrying out batch process to the neuron in temporary storage location, controller is according to the calculating inside activation computing module Element number carries out batch division to neuron, and control, will be to the neuron data volume of every batch of when being carried out in batches to data The occupancy of computing unit is encoded, to ensure the accuracy of data division.
The activation of neuron in neural network should be corresponded to towards device is activated based on time-multiplexed neural network Journey.
In conclusion the present invention provides a kind of activation device for neural network processor, use at least one Arithmetic element is activated, and the maximum amount of data that all activation arithmetic element can be handled simultaneously is less than or equal to the input tape It is wide;When in use, activate the activation control unit in device according to the processing energy of input bandwidth and the activation arithmetic element Relationship between power carries out batch processing to the neuron to be activated inputted based on the input bandwidth, and controlling will be every The neuron to be activated of a batch is provided to each activation arithmetic element to be handled into line activating respectively.The activation dress as a result, Setting can be using the activation arithmetic element with relatively simple circuit structure, in a time multiplexed manner to swashing Arithmetic element living realizes multiplexing, avoids the idle of activation arithmetic element.Meanwhile it is opposite using circuit structure according to the present invention Simple activation arithmetic element can reduce the consumption to resource.
In addition, the present invention also provides a kind of special circuit construction for ReLU activation primitives, uses and simply take Not element and gating element can realize the function of ReLU activation primitives.Tradition ReLU is replaced with less circuit resource expense Structure makes the hardware resource of ReLU functions activation operation reduce.
It should be noted that each step introduced in above-described embodiment is all not necessary, those skilled in the art Can carry out according to actual needs it is appropriate accept or reject, replace, modification etc..
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.On although Text is described the invention in detail with reference to embodiment, it will be understood by those of ordinary skill in the art that, to the skill of the present invention Art scheme is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered at this In the right of invention.

Claims (10)

1. a kind of activation device for neural network processor, including:
At least one activation arithmetic element, activation control unit, input interface and output interface;
Wherein, the maximum amount of data that the activation arithmetic element can be handled simultaneously is less than or equal to disposably input the activation device Pending data amount;
Also, the activation control unit is connect with the activation arithmetic element, for according to it is described disposably input it is described swash Relationship between pending data amount and the processing capacity of the activation arithmetic element that removable mounting is set, controls the activation operation list Member carries out activation in batches to the neuron to be activated disposably received outside the activation device by the input interface Reason, and the result of activation processing is exported into the activation device by the output interface.
2. activation device according to claim 1, wherein the activation arithmetic element is ReLU activate arithmetic element or For the activation arithmetic element of sigmoid functions, tanh functions.
3. activation device according to claim 2, wherein at least one of described activation arithmetic element activates for ReLU Computing circuit, including:
At least one negated element and N-1 gating element;
Wherein, N is the number of bits of neuron value to be activated, and the negated element is with the neuron value to be activated of the N-bit In the sign bit of 1 bit be input and its output is connected to the control bit of each in the N-1 gating element, Each in the N-1 gating element is respectively input with remaining N-1 value bit in the neuron value to be activated;
The ReLU activation computing circuits are with the sign bit of 1 bit in the neuron value to be activated and the N-1 choosing The output of each in logical element is exported as it.
4. according to the arbitrary activation device in claim 1-3, wherein described disposably input waiting for for the activation device Processing data amount is the input bandwidth of the neuron to be activated of neural network processor, or is connect for the input of the activation device The bit wide of mouth.
5. activation device according to claim 4, wherein the activation control unit is additionally operable to according to the neural network The input bandwidth of the neuron to be activated of processor, which controls the input interface and starts and suspend, connects neuron to be activated It receives.
6. according to the arbitrary activation device in claim 1-3, wherein the activation control unit controls the activation fortune Unit is calculated to swash the neuron to be activated disposably received outside the activation device by the input interface in batches The quantity living for handling divided batch is equal to the pending data amount for disposably inputting the activation device divided by entirely The result rounding for the maximum amount of data that activation arithmetic element can be handled simultaneously described in portion.
7. a kind of Activiation method for neural network processor, the neural network processor includes at least one activation operation Unit, the maximum amount of data that the activation arithmetic element can be handled simultaneously are disposably produced less than or equal to the neural network processor The data volume of raw neuron to be activated, the method includes:
1) data volume of the neuron to be activated disposably generated according to the neural network processor and the activation operation Relationship between the processing capacity of unit carries out in batches the neuron to be activated that the neural network processor disposably generates Processing;
2) according to batch processing as a result, successively by the neuron to be activated of respective batch be provided to the activation arithmetic element into Line activating processing.
8. according to the method described in claim 6, the nerve to be activated wherein disposably generated to the neural network processor The quantity for the batch that member progress batch processing is divided, which is equal to, removes the data volume of the neuron to be activated disposably generated With the result rounding for the maximum amount of data that all the activation arithmetic element can be handled simultaneously.
9. a kind of computer readable storage medium, wherein being stored with computer program, the computer program is used when executed In method of the realization as described in any one of claim 7-8.
10. a kind of be used for neuron to be activated in neural network into the system of line activating, including:
Neural network processor and storage device,
Wherein, the neural network processor includes at least one activation arithmetic element, and the activation arithmetic element can be located simultaneously The maximum amount of data of reason is less than or equal to the data volume for the neuron to be activated that the neural network processor disposably generates;
The storage device for storing computer program, the computer program when being executed by the processor for realizing Method as described in any one of claim 7-8.
CN201810038612.8A 2018-01-16 2018-01-16 Activation device and method for neural network processor Active CN108345934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810038612.8A CN108345934B (en) 2018-01-16 2018-01-16 Activation device and method for neural network processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810038612.8A CN108345934B (en) 2018-01-16 2018-01-16 Activation device and method for neural network processor

Publications (2)

Publication Number Publication Date
CN108345934A true CN108345934A (en) 2018-07-31
CN108345934B CN108345934B (en) 2020-11-03

Family

ID=62960758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810038612.8A Active CN108345934B (en) 2018-01-16 2018-01-16 Activation device and method for neural network processor

Country Status (1)

Country Link
CN (1) CN108345934B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190756A (en) * 2018-09-10 2019-01-11 中国科学院计算技术研究所 Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
CN109754071A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Activate operation method, device, electronic equipment and readable storage medium storing program for executing
CN110610235A (en) * 2019-08-22 2019-12-24 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN110866595A (en) * 2018-08-28 2020-03-06 北京嘉楠捷思信息技术有限公司 Method, device and circuit for operating activation function in integrated circuit
CN113378149A (en) * 2021-06-10 2021-09-10 青岛海洋科学与技术国家实验室发展中心 Artificial intelligence-based two-way mobile communication identity verification method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127302A (en) * 2016-06-23 2016-11-16 杭州华为数字技术有限公司 Process the circuit of data, image processing system, the method and apparatus of process data
US20160342890A1 (en) * 2015-05-21 2016-11-24 Google Inc. Batch processing in a neural network processor
CN106203621A (en) * 2016-07-11 2016-12-07 姚颂 The processor calculated for convolutional neural networks
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN107480782A (en) * 2017-08-14 2017-12-15 电子科技大学 Learn neural network processor on a kind of piece

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160342890A1 (en) * 2015-05-21 2016-11-24 Google Inc. Batch processing in a neural network processor
CN106127302A (en) * 2016-06-23 2016-11-16 杭州华为数字技术有限公司 Process the circuit of data, image processing system, the method and apparatus of process data
CN106203621A (en) * 2016-07-11 2016-12-07 姚颂 The processor calculated for convolutional neural networks
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN107480782A (en) * 2017-08-14 2017-12-15 电子科技大学 Learn neural network processor on a kind of piece

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
申忠如: "《数字电子技术基础》", 31 August 2010, 西安交通大学出版社 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866595A (en) * 2018-08-28 2020-03-06 北京嘉楠捷思信息技术有限公司 Method, device and circuit for operating activation function in integrated circuit
CN110866595B (en) * 2018-08-28 2024-04-26 嘉楠明芯(北京)科技有限公司 Method, device and circuit for operating activation function in integrated circuit
CN109190756A (en) * 2018-09-10 2019-01-11 中国科学院计算技术研究所 Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
CN109754071A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Activate operation method, device, electronic equipment and readable storage medium storing program for executing
CN109754071B (en) * 2018-12-29 2020-05-05 中科寒武纪科技股份有限公司 Activation operation method and device, electronic equipment and readable storage medium
CN110610235A (en) * 2019-08-22 2019-12-24 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN110610235B (en) * 2019-08-22 2022-05-13 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN113378149A (en) * 2021-06-10 2021-09-10 青岛海洋科学与技术国家实验室发展中心 Artificial intelligence-based two-way mobile communication identity verification method and system
CN113378149B (en) * 2021-06-10 2022-06-03 青岛海洋科学与技术国家实验室发展中心 Artificial intelligence-based two-way mobile communication identity verification method and system

Also Published As

Publication number Publication date
CN108345934B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN108345934A (en) A kind of activation device and method for neural network processor
Zhou et al. Edge intelligence: Paving the last mile of artificial intelligence with edge computing
CN109858620B (en) Brain-like computing system
CN106529670B (en) It is a kind of based on weight compression neural network processor, design method, chip
CN109753751B (en) MEC random task migration method based on machine learning
CN106650924B (en) A kind of processor based on time dimension and space dimension data stream compression, design method
CN108665059A (en) Convolutional neural networks acceleration system based on field programmable gate array
CN110163016B (en) Hybrid computing system and hybrid computing method
CN110222760B (en) Quick image processing method based on winograd algorithm
CN109478144A (en) A kind of data processing equipment and method
Wang et al. Deep spiking neural networks with binary weights for object recognition
CN105159148A (en) Robot instruction processing method and device
CN107508698B (en) Software defined service reorganization method based on content perception and weighted graph in fog calculation
Gao et al. Deep neural network task partitioning and offloading for mobile edge computing
CN108304925A (en) A kind of pond computing device and method
CN111831358B (en) Weight precision configuration method, device, equipment and storage medium
CN111831355A (en) Weight precision configuration method, device, equipment and storage medium
Dinelli et al. MEM-OPT: A scheduling and data re-use system to optimize on-chip memory usage for CNNs on-board FPGAs
CN116957698A (en) Electricity price prediction method based on improved time sequence mode attention mechanism
CN111831359A (en) Weight precision configuration method, device, equipment and storage medium
CN114169506A (en) Deep learning edge computing system framework based on industrial Internet of things platform
CN109767002A (en) A kind of neural network accelerated method based on muti-piece FPGA collaboration processing
CN111831356B (en) Weight precision configuration method, device, equipment and storage medium
CN109542513B (en) Convolutional neural network instruction data storage system and method
Zou et al. A scatter-and-gather spiking convolutional neural network on a reconfigurable neuromorphic hardware

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant