CN109670581A - A kind of computing device and board - Google Patents
A kind of computing device and board Download PDFInfo
- Publication number
- CN109670581A CN109670581A CN201811579542.3A CN201811579542A CN109670581A CN 109670581 A CN109670581 A CN 109670581A CN 201811579542 A CN201811579542 A CN 201811579542A CN 109670581 A CN109670581 A CN 109670581A
- Authority
- CN
- China
- Prior art keywords
- data
- input
- result
- operator
- circuit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a kind of computing device and board, the computing device is for executing LSTM operation, the board, the board includes: memory device, interface arrangement and control device and neural network chip, the neural network chip includes computing device, the memory device, for storing data;The interface arrangement, for realizing the data transmission between the chip and external equipment;The control device is monitored for the state to the chip.Computing device provided by the present application has the advantages that low in energy consumption.
Description
Technical field
This application involves technical field of information processing, and in particular to a kind of computing device and board.
Background technique
Length time memory network (LSTM) is a kind of time recurrent neural network (RNN), since network itself is unique
Structure design, LSTM are suitable for the critical event being spaced in processing and predicted time sequence and delay is very long.Compared to tradition
Recurrent neural network, LSTM network shows better performance, it is very suitable to learn through experience, so as in critical event
Between there are after the unknown size time when classified to time series, handled and predicted.Currently, in speech recognition, video
Description, machine translation and music numerous areas, the LSTM network such as are automatically synthesized and are widely used.
Existing LSTM network realizes that the energy consumption that existing processor executes LSTM operation is high based on general processor.
Summary of the invention
The embodiment of the present application provides a kind of computing device and Related product, can promote the processing speed of LSTM, saves function
Consumption.
In a first aspect, providing a kind of computing device for executing LSTM operation, the LSTM includes: input gate, forgets
Remember that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage unit;
The storage unit, for storing LSTM operation operator, input data Xt, weight data, output data ht, input
State value CT-1, input results hT-1, output state value Ct;
The controller unit, for obtaining input data Xt, weight data, input state value CT-1, input results hT-1、
And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation
Operator is sent to arithmetic element,
The arithmetic element, for according to input data Xt, weight data, input results hT-1And LSTM operation operator
It executes the operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door and obtains each defeated
Out as a result, according to input state value CT-1And each output result obtains output data htAnd output state value Ct。
Optionally, the arithmetic element includes: main process task circuit and from processing circuit;
The controller unit is specifically used for constructing multiple fractionation operators, multiple sequence operators, multiplication according to LSTM operator
Operator, activation operator and addition operator;
The main process task circuit is specifically used for foundation sequence operator for input data Xt, weight data and input state
Value reorders, and the weight data includes: each weight data, and then foundation splits algorithm for each weight
Data and multiplication operator are broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks
And multiple input state data blocks, multiple input blocks and multiple input state data blocks are distributed to described from processing
Circuit;
It is described from processing circuit, for according to multiplication operator by the multiple input block and each weight data
It executes multiplying and obtains each intermediate result, according to multiplication operator by the multiple input state data block and each door
Weight data execute multiplying and obtain each state intermediate result, by each intermediate result and each
State intermediate result is sent to main process task circuit;
The main process task circuit, for sorting each intermediate result to obtain each sequence according to sequence operator
As a result, each ranking results, which are executed biasing operation, according to addition operator obtains each operation result, according to sequence
Knot sequence among each state is obtained each state ranking results by operator, arranges each state according to addition operator
Sequence result executes biasing operation and obtains each state operation result;According to addition operator by each operation result and
Subsequent processing, which is carried out, after the corresponding addition of each state operation result obtains each output result.
Optionally, the main process task circuit is specifically used for foundation multiplication operator for input state value CT-1With forget the defeated of door
Result f outtMultiplication obtains first as a result, according to multiplication operator by the output result g of more new state doortWith the output knot of input gate
Fruit itMultiplication obtains second as a result, the first result and the second results added are obtained output state value Ct。
Optionally, the main process task circuit is specifically used for according to activation operator to output state value CtExecute activation operation
Activation result is obtained, by the output result O of out gatetIt is multiplied to obtain output result h with activation resultt。
Optionally, the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
Optionally, the main process task circuit is also used to output data htAs the input results of subsequent time, will export
State value CtInput state value as subsequent time.
Optionally, be from the quantity of processing circuit as described it is multiple, the arithmetic element includes: tree-shaped module, the tree
Pattern block includes: a root port and multiple ports, and the root port of the tree-shaped module connects the main process task circuit, described
Multiple ports of tree-shaped module are separately connected multiple one from processing circuit from processing circuit;
The tree-shaped module, for forward the main process task circuit and the multiple data between processing circuit and
Operator.
Optionally, be from the quantity of processing circuit as described it is multiple, the arithmetic element further includes one or more branches
Processing circuit, each branch process circuit connection at least one from processing circuit,
The branch process circuit, for forwarding the main process task circuit and the multiple data between processing circuit
And operator.
Optionally, be from the quantity of processing circuit as described it is multiple, it is the multiple from processing circuit be in array distribution;Each
It is connect from processing circuit with other adjacent from processing circuit, the main process task circuit connection is the multiple from processing circuit
K is from processing circuit, the k tandem circuit are as follows: the n of n of the 1st row from processing circuit, m row it is a from processing circuit and
The m of 1st column is a from processing circuit;
The K is a from processing circuit, for forwarding the main process task circuit and multiple data between processing circuit
And operator.
Optionally, the main process task circuit includes: conversion processing circuit;
The conversion processing circuit, for executing conversion process to data, specifically: by the received data of main process task circuit
Execute the exchange between the first data structure and the second data structure.
Optionally, it is described from processing circuit include: multiplication process circuit and accumulation process circuit;
The multiplication process circuit, for in the input block received element value with it is right in each weight
It answers the element value of position to execute product calculation and obtains each result of product;The element in input state data block received
The element value of value and corresponding position in each weight executes product calculation and obtains each another result of product;
The accumulation process circuit obtains each centre for executing accumulating operation to each result of product
As a result, each another result of product execution accumulating operation is obtained each state intermediate result.
Optionally, the tree-shaped module is that n pitches tree construction, and the n is the integer more than or equal to 2.
Second aspect, the embodiment of the present application provide a kind of LSTM arithmetic unit, and the LSTM arithmetic unit includes one
Or the computing device that multiple first aspects provide, for being obtained from other processing units to operational data and control information, and
Specified LSTM operation is executed, implementing result is passed into other processing units by I/O interface;
When the LSTM device includes multiple computing devices, spy can be passed through between the multiple computing device
Fixed structure is attached and transmits data;
Wherein, multiple computing devices are interconnected by quick external equipment interconnection Bus PC IE bus and transmit number
According to support the operation of more massive LSTM;Multiple computing devices share same control system or possess respective control
System processed;Multiple computing device shared drives possess respective memory;The mutual contact mode of multiple computing devices
It is any interconnection topology.
The third aspect, provides a kind of combined treatment device, and the combined treatment device includes the LSTM operation of second aspect
Device, general interconnecting interface and other processing units;
The LSTM arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying
Make.
Fourth aspect, provides a kind of neural network chip, and neural network chip includes the computing device that first aspect provides
Or the combined treatment device that the LSTM arithmetic unit or the third aspect of second aspect offer provide.
5th aspect, provides a kind of electronic equipment, and the electronic equipment includes the chip provided such as fourth aspect.
6th aspect, provides a kind of board, and the board includes: memory device, interface arrangement and control device and the
The neural network chip that four aspects provide;
Wherein, the neural network chip and the memory device, the control device and the interface arrangement are distinguished
Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
7th aspect, the embodiment of the present application also provide a kind of LSTM operation method, and the LSTM includes: the LSTM packet
Include: input gate forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage
Unit;The storage unit storage: LSTM operation operator, input data Xt, weight data, output data ht, input state value
CT-1, input results hT-1, output state value Ct;
Described method includes following steps:
The controller unit obtains input data Xt, weight data, input state value CT-1, input results hT-1And
LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator
Arithmetic element is sent to,
The arithmetic element is according to input data Xt, weight data, input results hT-1And the execution of LSTM operation operator is defeated
The operation of introduction, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output as a result,
According to input state value CT-1And each output result obtains output data htAnd output state value Ct。
In some embodiments, the electronic equipment includes data processing equipment, robot, computer, printer, scanning
Instrument, tablet computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server,
Camera, video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or medical treatment
Equipment.
In some embodiments, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include electricity
Depending on, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include
Nuclear Magnetic Resonance, B ultrasound instrument and/or electrocardiograph.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is the structural schematic diagram of LSTM a kind of
Fig. 2 is a kind of structural schematic diagram of computing device provided by the embodiments of the present application.
Fig. 2 a is a kind of structural schematic diagram of arithmetic element provided by the embodiments of the present application.
Fig. 3 is the structural schematic diagram of another computing device provided by the present application.
Fig. 3 a is the structural schematic diagram of main process task circuit provided by the present application.
Fig. 4 a is a kind of structural schematic diagram of tree-shaped module transmitting terminal provided by the present application.
Fig. 4 b is a kind of structural schematic diagram of tree-shaped module receiving end provided by the present application.
Fig. 4 c is binary tree structure schematic diagram provided by the present application.
Fig. 5 is the structure chart for the computing device that the application one embodiment provides.
Fig. 6 is the flow diagram for the LSTM operation method that the application one embodiment provides.
Fig. 7 is a kind of structure chart of combined treatment device provided by the embodiments of the present application.
Fig. 8 is the structure chart of another combined treatment device provided by the embodiments of the present application.
Fig. 9 is a kind of structural schematic diagram of board provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
The description and claims of this application and term " first ", " second ", " third " and " in the attached drawing
Four " etc. are not use to describe a particular order for distinguishing different objects.In addition, term " includes " and " having " and it
Any deformation, it is intended that cover and non-exclusive include.Such as it contains the process, method of a series of steps or units, be
System, product or equipment are not limited to listed step or unit, but optionally further comprising the step of not listing or list
Member, or optionally further comprising other step or units intrinsic for these process, methods, product or equipment.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
Refering to fig. 1, Fig. 1 is the schematic diagram of LSTM a kind of, as shown in Figure 1, the LSTM includes: input gate, forgets door, updates
State cell and out gate, corresponding calculation formula are as follows:
ft=σ (Wf[ht-1,xt]+bf
it=σ (Wi[ht-1,xt]+bi
gt=tanh (Wg[ht-1,xt]+bg
Ct=Ct-1⊙ft+gt⊙it
Ot=σ (Wo[ht-1,xt]+bo
ht=Ot⊙tanh(Ct)
Wherein, XtFor the input data of t moment, hT-1Indicate the output data at t-1 moment, Wf、Wi、WgAnd WoRespectively
It indicates to forget door, input gate, update weight vector corresponding to state cell and out gate, bf、bi、bcAnd boIt respectively indicates and forgets
Remember door, input gate, update state cell and the corresponding biasing of out gate;ftThe output of door, the shape with the t-1 moment are forgotten in expression
State unit carries out dot product and carrys out selectively Lethean state cell value;itThe output for indicating input gate, with obtaining for t moment
Candidate state value dot product come selectively by the candidate state value of t moment be added to update state cell in;gtIndicate t moment
The candidate state value being calculated;ctIt indicates by selectively forgeing the state value at t-1 moment and by the state of t moment
Obtained new state value, c is selectively added in valuetIt will be used calculating the final output moment and be transferred to subsequent time;Ot
Indicate that t moment updates the alternative condition that part output as a result is needed in state cell;htIndicate the output of t moment, simultaneously
It will also be transferred to subsequent time (i.e. t+1 moment);⊙ is the product that vector presses element operation;σ is sigmoid function, meter
Calculate formula are as follows:The calculation formula of activation primitive tanh function is
.When specific calculate, the application is by Wf、Wi、WgAnd WoIt is combined into a matrix W, bf、bi、bcAnd boIt is combined into one
Matrix b.
Referring to Fig.2, Fig. 2 is computing device provided by the present application.Referring to Fig.2, a kind of computing device is provided, calculating dress
It sets for executing LSTM operation, which includes: controller unit 11, arithmetic element 12 and storage unit 10,
In, controller unit 11 is connect with arithmetic element 12, storage unit 10, which includes: a main process task circuit 101
It (can preferentially be selected multiple from processing circuit to be one or more from processing circuit) with from processing circuit 102;
It should be noted that above-mentioned main process task circuit itself includes memory (such as memory or register), the memory
The some data that can store main process task circuit can choose carrying memory from processing circuit.
LSTM includes: input gate, forgets door, out gate and more new state door;
Storage unit 10, for storing LSTM operation operator, input data Xt, weight data, output data ht, input shape
State value CT-1, input results hT-1, output state value Ct;
Controller unit 11, for obtaining input data Xt, weight data, input state value CT-1, input results hT-1, with
And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation is calculated
Son is sent to arithmetic element,
Arithmetic element 12, for according to input data Xt, weight data, input results hT-1And LSTM operation operator is held
The operation of row input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output
As a result, according to input state value CT-1And each output result obtains output data htAnd output state value Ct。
Optionally, controller unit described above is specifically used for constructing multiple fractionation operators, Duo Gepai according to LSTM operator
Sequence operator, multiplication operator, activation operator and addition operator;
The main process task circuit is specifically used for foundation sequence operator for input data Xt, weight data and input state
Value reorders, and the weight data includes: each weight data, and then foundation splits algorithm for each weight
Data and multiplication operator are broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks
And multiple input state data blocks, multiple input blocks and multiple input state data blocks are distributed to described from processing
Circuit;
It is described from processing circuit, for according to multiplication operator by the multiple input block and each weight data
It executes multiplying and obtains each intermediate result, according to multiplication operator by the multiple input state data block and each door
Weight data execute multiplying and obtain each state intermediate result, by each intermediate result and each
State intermediate result is sent to main process task circuit;
It should be noted that in above-mentioned each door each operation be it is relatively independent, calculated result is also relatively independent,
I.e. each door all has respective weight data, such as Wf、Wi、WgAnd WoRespectively represent the weight data of 4 doors.
It is above-mentioned to obtain the multiple input block and the execution multiplying of each weight data according to multiplication operator
It can specifically include to each intermediate result:
Multiple input blocks and input gate weight data are executed into multiplication multiplying and obtain the intermediate result of input gate,
Multiple input blocks and out gate weight data execute multiplication multiplying and obtain the intermediate result of out gate, multiple input numbers
The intermediate result that multiplication multiplying obtains forgetting door, multiple input blocks and update are executed with a weight data is forgotten according to block
State door weight data executes multiplication multiplying and obtains the intermediate result of more new state door.Knot among above-mentioned each state
Fruit is similar with each intermediate result, is not repeating here.
The main process task circuit, for sorting each intermediate result to obtain each sequence according to sequence operator
As a result, each ranking results, which are executed biasing operation, according to addition operator obtains each operation result, according to sequence
Knot sequence among each state is obtained each state ranking results by operator, arranges each state according to addition operator
Sequence result executes biasing operation and obtains each state operation result;According to addition operator by each operation result and
Subsequent processing, which is carried out, after the corresponding addition of each state operation result obtains each output result.
Arithmetic element is arranged to host-guest architecture by technical solution provided by the present application, for the forward operation of LSTM, incite somebody to action this
The input data at moment and the output data fractionation parallel processing for forgetting door, it is electric by main process task circuit and from processing in this way
Road can carry out concurrent operation to the biggish part of calculation amount, to improve arithmetic speed, save operation time, and then reduce
Power consumption.
Optionally, the main process task circuit is specifically used for foundation multiplication operator for input state value CT-1With forget the defeated of door
Result f outtMultiplication obtains first as a result, according to multiplication operator by the output result g of more new state doortWith the output knot of input gate
Fruit itMultiplication obtains second as a result, the first result and the second results added are obtained output state value Ct。
Optionally, the main process task circuit is specifically used for according to activation operator to output state value CtExecute activation operation
Activation result is obtained, by the output result O of out gatetIt is multiplied to obtain output result h with activation resultt。
Optionally, the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
Optionally, the main process task circuit is also used to output data htAs the input results of subsequent time, will export
State value CtInput state value as subsequent time.
Above-mentioned LSTM may include multiple hidden layers, and h is the integer more than or equal to 2, can be in LSTM for h-th of hidden layer
Any one intermediate hidden layer operation, multiple LSTM operations, realization process is, in forward operation, as last moment t-
1 executes completion obtains output result t-1 later, and the operation operator of current time t can will export result t-1 conduct last moment
The input data for forgetting door of subsequent time is forgotten door and is determined by sigmoid to export passing through for result t-1 constantly
Rate has obtained forgetting the output result t of a t moment in this way, and output result t and weight are carried out operation, another part operation
For moment t input layer input data as another part input neuron, then by two parts input neuron respectively with power
Value executes product calculation and obtains two operation results, and two operation results are added up to the output of moment t as a result, then will
The output result of moment t forgets the input data of door as subsequent time t+1, can selectively determine last moment in this way
Result percent of pass.
Optionally, above-mentioned computing device can also include: direct memory access unit 50, and storage unit 10 may include:
One in register, caching or any combination, specifically, the caching, calculates operator for storing;The register is used
In the storage input data and scalar;The caching is that scratchpad caches.Direct memory access unit 50 is used for from storage
Unit 10 is read or storing data.
Optionally, which includes: operator storage unit 110, operator processing unit 111 and storage queue unit
113;
Operator storage unit 110, for storing the associated calculating operator of the LSTM operation;
The operator processing unit 111, for parsing to obtain multiple operation operators to the calculating operator;
Storage queue unit 113 is used for storage operators queue, which includes: to wait for by the tandem of the queue
The multiple operation operators or calculating operator executed.
Optionally, which can also include:
The dependence processing unit 108, for determining the first operation operator and institute when with multiple operation operators
The 0th operation operator before stating the first operation operator whether there is incidence relation, such as the first operation operator and the described 0th
There are incidence relations for operation operator, then the first operation operator are buffered in the operator storage unit, the described 0th
After operation operator is finished, the first operation operator is extracted from the operator storage unit and is transmitted to the arithmetic element;
The determination the first operation operator whether there is with the 0th operation operator before the first operation operator to be associated with
System includes:
First according to data (such as matrix) required in the first operation operator described in the first operation operator extraction deposits
Store up address section, the 0th stored address area according to required matrix in the 0th operation operator described in the 0th operation operator extraction
Between, such as first storage address section has Chong Die region with the 0th storage address section, it is determined that described first
Operation operator and the 0th operation operator have incidence relation, such as first storage address section and the 0th storage
Location section does not have the region of overlapping, it is determined that the first operation operator does not have with the 0th operation operator to be associated with
System.
In another alternative embodiment, arithmetic element 12 is as shown in figure 3, may include 101 He of main process task circuit
It is multiple from processing circuit 102.In one embodiment, as shown in figure 3, it is multiple from processing circuit be in array distribution;Each from
Reason circuit is connect with other adjacent from processing circuit, and the multiple k from processing circuit of main process task circuit connection are from
Circuit is managed, the k is a from processing circuit are as follows: the n of n of the 1st row from processing circuit, m row is a to be arranged from processing circuit and the 1st
M from processing circuit, it should be noted that as shown in Figure 3 K only include n of the 1st row from processing circuit from processing electricity
Road, the n m arranged from processing circuit and the 1st of m row are a from processing circuit, i.e. the k are multiple from processing from processing circuit
In circuit directly with the slave processing circuit of main process task circuit connection.
K is a from processing circuit, in the main process task circuit and multiple data (data between processing circuit
Can be input block, input state data block, intermediate result, state intermediate result etc.) and operator forwarding.
Optionally, as shown in Figure 3a, which can also include: conversion processing circuit 110, activation processing circuit
111, one of addition process circuit 112 or any combination;
Conversion processing circuit 110 executes conversion process for data, specifically: by the received data of main process task circuit (packet
It includes but is not limited to: input data Xt, weight data (each weight), input state value CT-1, input results hT-1) execute the
Exchange (such as the conversion of continuous data and discrete data, such as floating data between one data structure and the second data structure
With the conversion of fixed-point data).
Processing circuit 111 is activated, for executing the activation operation of data in main process task circuit;
Addition process circuit 112, for executing add operation or accumulating operation.
In another embodiment, which is Matrix Multiplication in terms of the operator of matrix, sum operator, activation operator etc.
Calculate operator.
In a kind of optional embodiment, as shown in fig. 4 a, the arithmetic element includes: tree-shaped module 40, the tree
Pattern block includes: a root port 401 and multiple ports 404, and the root port of the tree-shaped module connects the main process task electricity
Road, multiple ports of the tree-shaped module are separately connected multiple one from processing circuit from processing circuit;
Above-mentioned tree-shaped module has transmission-receiving function, such as shown in fig. 4 a, which is sending function, such as Fig. 4 b
Shown, which is receive capabilities.
The tree-shaped module, for forwarding the main process task circuit and the multiple data between processing circuit (should
Data can be input block, input state data block, intermediate result, state intermediate result etc.).
Optionally, which is the optional as a result, it may include at least 1 node layer, the node of computing device
For the cable architecture with forwarding capability, the node itself can not have computing function.If tree-shaped module has zero layer node, i.e.,
Without the tree-shaped module.
Optionally, which can pitch tree construction for n, for example, binary tree structure as illustrated in fig. 4 c, certainly may be used
Think trident tree construction, which can be the integer more than or equal to 2.The application specific embodiment is not intended to limit the specific of above-mentioned n
Value, the above-mentioned number of plies may be 2, can connect the node of other layers in addition to node layer second from the bottom from processing circuit,
Such as it can connect the node of layer last as illustrated in fig. 4 c.
Optionally, above-mentioned arithmetic element can carry individual caching, may include: neuron caching as shown in Figure 2 a
Unit, the neuron cache unit 63 cache the input neuron vector data and output neuron value number from processing circuit
According to.
Such as Fig. 2 a, which can also include: weight cache unit 64, calculate for caching this from processing circuit
The weight data needed in the process.
In an alternative embodiment, arithmetic element 12 is as shown in figure 5, may include branch process circuit 103;It is specific
Connection structure it is as shown in Figure 5, wherein
Above-mentioned branch process circuit 103 may include memory, as shown in figure 5, the memory of branch process circuit 103
Size can for individually between 2 to 2.5 times of maximum data capacity that processing circuit needs to store, in this way after setting,
From processing circuit i.e. no setting is required memory, relative to a branch process circuit, only with setting 2.5*R (individually from processing
Capability value needed for device circuit), if there is no branch process circuit, need to be arranged 4*R, and the utilization of its register
Rate is also low, therefore the structure can effectively reduce the total capacity of memory, reduce cost.
The branch process circuit, for forwarding the main process task circuit and the multiple (this between processing circuit
Data can be input block, input state data block, intermediate result, state intermediate result etc.).
Illustrate mode (the above-mentioned input state data of the fractionation of above-mentioned input data below by the example of an example
Fractionation can also be with the fractionation of input data), for output result and input data because data type is identical, fractionation
Mode is essentially identical, it is assumed that the data type is matrix, which is H*W, then the mode split can be, as H numerical value compared with
It is small (to be less than given threshold, such as 100), then (each vector is matrix matrix H * W is split into H vector along the direction H
A line of H*W), each vector is an input block, and to the position mark of the first element of input block defeated
Enter data block, i.e. input blockH, w, wherein h, w are respectively input blockH, wThe first element in the direction H and the direction W
Value, such as the first input block, the h=1.w=1.Input block is received from processing circuitH, wAfterwards, by input data
BlockH, wIt is multiplied with the every column element one-to-one correspondence of weight and accumulating operation obtains intermediate resultW, i, the w of intermediate result is input data
The w value of block, i be the columns value of the column element calculated with input block, and main process task circuit determines intermediate result to answering the door
The position of operation result is w, i.For example, input block input block1,1Among the input being calculated with weight first row
As a result1,1, main process task circuit will input intermediate result1,1It is arranged in the operation result the first row first row answered the door.
The application also provides a kind of LSTM operation method, and it includes: described that the method, which is applied to LSTM described in computing device,
LSTM includes: input gate, forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller list
Member, storage unit;The storage unit storage: LSTM operation operator, input data Xt, weight data, output data ht, input
State value CT-1, input results hT-1, output state value Ct;Described method includes following steps:
Step S601, controller unit obtains input data Xt, weight data, input state value CT-1, input results hT-1、
And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation
Operator is sent to arithmetic element,
Step S601, arithmetic element is according to input data Xt, weight data, input results hT-1And LSTM operation operator
It executes the operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door and obtains each defeated
Out as a result, according to input state value CT-1And each output result obtains output data htAnd output state value Ct。
Optionally, the arithmetic element includes: main process task circuit and from processing circuit;The arithmetic element is according to input
Data Xt, weight data, input results hT-1And LSTM operation operator executes the operation, the operation for forgetting door, output of input gate
The output result that the operation of door and the operation of more new state door obtain each specifically includes:
The controller unit constructs multiple fractionation operators, multiple sequence operators, multiplication operator, activation according to LSTM operator
Operator and addition operator;
The main process task circuit is according to sequence operator by input data Xt, weight data and input state value reset
Sequence, the weight data include: each weight data, then by each weight data and are multiplied according to fractionation algorithm
Method operator is broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks and multiple defeated
Enter status data block, multiple input blocks and multiple input state data blocks is distributed to described from processing circuit;
It is described to execute the multiple input block and each weight data according to multiplication operator from processing circuit
Multiplying obtains each intermediate result, according to multiplication operator by the multiple input state data block and each power
Value Data executes multiplying and obtains each state intermediate result, by each intermediate result and each state
Intermediate result is sent to main process task circuit;
The main process task circuit sorts each intermediate result to obtain each ranking results according to sequence operator,
Each ranking results are executed into biasing operation according to addition operator and obtain each operation result, it will according to sequence operator
Knot sequence obtains each state ranking results among each state, according to addition operator by each state ranking results
It executes biasing operation and obtains each state operation result;According to addition operator by each operation result and each door
State operation result it is corresponding be added after carry out subsequent processing and obtain each output result.
Optionally, according to input state value CT-1And each output result obtains output state value CtIt specifically includes:
The main process task circuit is according to multiplication operator by input state value CT-1With the output result f for forgetting doortMultiplication obtains
First as a result, according to multiplication operator by the output result g of more new state doortWith the output result i of input gatetMultiplication obtains second
As a result, the first result and the second results added are obtained output state value Ct。
Optionally, described according to input state value CT-1And each output result obtains output data htSpecific packet
It includes:
The main process task circuit is according to activation operator to output state value CtIt executes activation operation and obtains activation result, it will be defeated
The output result O to go outtIt is multiplied to obtain output result h with activation resultt。
The application is also disclosed that a LSTM device comprising the computing device that one or more is mentioned in this application,
For being obtained from other processing units to operational data and control information, specified LSTM operation is executed, implementing result passes through
I/O interface passes to peripheral equipment.For example camera, display, mouse, keyboard, network interface card, wifi interface service peripheral equipment
Device.When comprising more than one computing device, it can be linked by specific structure between computing device and transmit data, example
Such as, data are interconnected and are transmitted, by PCIE bus to support the operation of more massive convolutional neural networks training.This
When, same control system can be shared, there can also be control system independent;Can also can each it be added with shared drive
Fast device has respective memory.In addition, its mutual contact mode can be any interconnection topology.
The LSTM device compatibility with higher can be connected by PCIE interface with various types of servers.
The application is also disclosed that a combined treatment device comprising above-mentioned LSTM device, general interconnecting interface and its
His processing unit.LSTM arithmetic unit is interacted with other processing units, the common operation completing user and specifying.Fig. 7 is group
Close the schematic diagram of processing unit.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special
With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its
His interface of the processing unit as LSTM arithmetic unit and external data and control, including data are carried, and complete to transport this LSTM
Calculate the basic control such as unlatching, stopping of device;Other processing units can also cooperate with LSTM arithmetic unit and complete operation jointly
Task.
General interconnecting interface, for transmitting data and Control operators between the LSTM device and other processing units.It should
LSTM device obtains required input data from other processing units, and the storage device of LSTM device on piece is written;It can be from
Control operators, the control caching of write-in LSTM device on piece are obtained in other processing units;Depositing for LSTM device can also be read
It stores up the data in module and is transferred to other processing units.
Optionally, the structure as shown in figure 8, can also include storage device, storage device respectively with the LSTM device
It is connected with other described processing units.Storage device is used to be stored in the number of the LSTM device and other processing units
According to the data of operation required for being particularly suitable for can not be protected all in the storage inside of this LSTM device or other processing units
The data deposited.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.
In some embodiments, a kind of chip has also been applied for comprising above-mentioned LSTM device or combined treatment device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.It is mentioned refering to Fig. 9, Fig. 9
A kind of board is supplied, above-mentioned board can also include other matching components, this is mating other than including said chip 389
Component includes but is not limited to: memory device 390, interface arrangement 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute
Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can
To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate
Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses
Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with
Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment
In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers
According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group
Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with
Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips,
Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described
Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface
Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server
Piece realizes data transfer.Preferably, when using the transmission of PCIE3.0X16 interface, theoretical bandwidth can reach 16000MB/s.?
In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces
Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by described
Interface arrangement sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip
Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list
Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more
A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load
State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits
Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand
Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit,
It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated unit is realized in the form of software program module and sells or use as independent product
When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a memory, including several operators are used so that a computer equipment
(can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application
Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
To be completed by program come operator relevant hardware, which be can store in a computer-readable memory, memory
May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English:
Random Access Memory, referred to as: RAM), disk or CD etc..
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and
Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas;
At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application
There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.
Claims (29)
1. a kind of computing device, which is characterized in that the computing device includes: input for executing LSTM operation, the LSTM
Door forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage unit;
The storage unit, for storing LSTM operation operator, input data Xt, weight data, output data ht, input state
Value CT-1, input results hT-1, output state value Ct;
The controller unit, for obtaining input data Xt, weight data, input state value CT-1, input results hT-1And
LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator
Arithmetic element is sent to,
The arithmetic element, for according to input data Xt, weight data, input results hT-1And LSTM operation operator executes
The operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output knot
Fruit, according to input state value CT-1And each output result obtains output data htAnd output state value Ct。
2. the apparatus according to claim 1, which is characterized in that the arithmetic element includes: main process task circuit and from
Manage circuit;
The controller unit is specifically used for constructing multiple fractionation operators, multiple sequence operators, multiplication calculation according to LSTM operator
Son, activation operator and addition operator;
The main process task circuit is specifically used for foundation sequence operator for input data Xt, weight data and input state value carry out
Reorder, the weight data includes: each weight data, then according to split algorithm by each weight data with
And multiplication operator is broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks and more
Multiple input blocks and multiple input state data blocks are distributed to described from processing circuit by a input state data block;
It is described from processing circuit, for the multiple input block to be executed with each weight data according to multiplication operator
Multiplying obtains each intermediate result, according to multiplication operator by the multiple input state data block and each power
Value Data executes multiplying and obtains each state intermediate result, by each intermediate result and each state
Intermediate result is sent to main process task circuit;
The main process task circuit, for sorting each intermediate result to obtain each sequence knot according to sequence operator
Each ranking results are executed biasing operation according to addition operator and obtain each operation result, calculated according to sequence by fruit
Knot sequence among each state is obtained each state ranking results by son, and each state sorts according to addition operator
As a result it executes biasing operation and obtains each state operation result;According to addition operator by each operation result and respectively
Subsequent processing, which is carried out, after the corresponding addition of a state operation result obtains each output result.
3. the apparatus of claim 2, which is characterized in that
The main process task circuit is specifically used for foundation multiplication operator for input state value CT-1With the output result f for forgetting doortPhase
It is multiplied to first as a result, according to multiplication operator by the output result g of more new state doortWith the output result i of input gatetIt is mutually multiplied
To second as a result, the first result and the second results added are obtained output state value Ct。
4. device according to claim 3, which is characterized in that
The main process task circuit is specifically used for according to activation operator to output state value CtIt executes activation operation and obtains activation result,
By the output result O of out gatetIt is multiplied to obtain output result h with activation resultt。
5. the apparatus of claim 2, which is characterized in that the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
6. the apparatus of claim 2, which is characterized in that
The main process task circuit is also used to output data htAs the input results of subsequent time, by output state value CtAs
The input state value of subsequent time.
7. according to device described in claim 2-6 any one, which is characterized in that be from the quantity of processing circuit as described
Multiple, the arithmetic element includes: tree-shaped module, and the tree-shaped module includes: a root port and multiple ports, the tree
The root port of pattern block connects the main process task circuit, and multiple ports of the tree-shaped module are separately connected multiple from processing electricity
One in road from processing circuit;
The tree-shaped module, for forwarding the main process task circuit and the multiple data and calculation between processing circuit
Son.
8. according to device described in claim 2-6 any one, which is characterized in that be from the quantity of processing circuit as described
Multiple, the arithmetic element further includes one or more branch process circuits, each branch process circuit connection at least one from
Processing circuit,
The branch process circuit, for forward the main process task circuit and the multiple data between processing circuit and
Operator.
9. according to device described in claim 2-6 any one, which is characterized in that be from the quantity of processing circuit as described
It is multiple, it is the multiple from processing circuit be in array distribution;It is each connect from processing circuit with other adjacent from processing circuit, institute
The multiple k from processing circuit of main process task circuit connection are stated from processing circuit, the k tandem circuit are as follows: the 1st row
The n n m arranged from processing circuit and the 1st from processing circuit, m row are a from processing circuit;
The K from processing circuit, for forward the main process task circuit and multiple data between processing circuit and
Operator.
10. according to device described in claim 2-6 any one, which is characterized in that the main process task circuit includes: conversion
Processing circuit;
The conversion processing circuit, for executing conversion process to data, specifically: the received data of main process task circuit are executed
Exchange between first data structure and the second data structure.
11. according to device described in claim 2-6, which is characterized in that it is described from processing circuit include: multiplication process circuit
With accumulation process circuit;
The multiplication process circuit, for corresponding to position in the element value and each weight in the input block received
The element value set executes product calculation and obtains each result of product;The element value in input state data block received with
The element value of corresponding position executes product calculation and obtains each another result of product in each weight;
The accumulation process circuit obtains each intermediate knot for executing accumulating operation to each result of product
Each another result of product execution accumulating operation is obtained each state intermediate result by fruit.
12. device according to claim 7, which is characterized in that the tree-shaped module be n pitch tree construction, the n be greater than
Integer equal to 2.
13. a kind of LSTM arithmetic unit, which is characterized in that the LSTM arithmetic unit includes one or more such as claim 1-
12 described in any item computing devices for being obtained from other processing units to operational data and control information, and execute and refer to
Implementing result is passed to other processing units by I/O interface by fixed LSTM operation;
It, can be by specific between the multiple computing device when the LSTM device includes multiple computing devices
Structure is attached and transmits data;
Wherein, multiple computing devices are interconnected and are transmitted data by quick external equipment interconnection Bus PC IE bus,
To support the operation of more massive LSTM;Multiple computing devices share same control system or possess respective control system
System;Multiple computing device shared drives possess respective memory;The mutual contact mode of multiple computing devices is to appoint
Meaning interconnection topology.
14. a kind of combined treatment device, which is characterized in that the combined treatment device includes LSTM as claimed in claim 13
Arithmetic unit, general interconnecting interface and other processing units;
The LSTM arithmetic unit is interacted with other described processing units, the common calculating operation completing user and specifying.
15. combined treatment device according to claim 14, which is characterized in that further include: storage device, the storage device
Connect respectively with the LSTM arithmetic unit and other described processing units, for save the LSTM arithmetic unit and it is described its
The data of his processing unit.
16. a kind of neural network chip, which is characterized in that the neural network chip includes as described in claim 1 calculates
Device or LSTM arithmetic unit as claimed in claim 13 or combined treatment device as claimed in claim 15.
17. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 16.
18. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right
It is required that neural network chip described in 16;
Wherein, the neural network chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
19. board according to claim 18, which is characterized in that
The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute
State storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
20. a kind of LSTM operation method, which is characterized in that the method is applied to computing device, and the LSTM includes: input
Door forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage unit;
The storage unit storage: LSTM operation operator, input data Xt, weight data, output data ht, input state value CT-1, it is defeated
Enter result hT-1, output state value Ct;
Described method includes following steps:
The controller unit obtains input data Xt, weight data, input state value CT-1, input results hT-1And LSTM
Operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator is sent
To arithmetic element,
The arithmetic element is according to input data Xt, weight data, input results hT-1And LSTM operation operator executes input gate
Operation, operation, the operation of out gate and the operation of more new state door of forgetting door obtain each output as a result, foundation
Input state value CT-1And each output result obtains output data htAnd output state value Ct。
21. according to the method for claim 20, which is characterized in that the arithmetic element include: main process task circuit and from
Processing circuit;The arithmetic element is according to input data Xt, weight data, input results hT-1And LSTM operation operator executes
The operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output knot
Fruit specifically includes:
The controller unit constructs multiple fractionation operators, multiple sequence operators, multiplication operator, activation operator according to LSTM operator
And addition operator;
The main process task circuit is according to sequence operator by input data Xt, weight data and input state value reorder, institute
The weight data that weight data includes: each is stated, then calculates each weight data and multiplication according to fractionation algorithm
Input data and input state value are split into multiple input blocks and multiple input shapes to from processing circuit by son broadcast
Multiple input blocks and multiple input state data blocks are distributed to described from processing circuit by state data block;
It is described that the multiple input block and each weight data are executed into multiplication according to multiplication operator from processing circuit
Operation obtains each intermediate result, according to multiplication operator by the multiple input state data block and each weight number
Each state intermediate result is obtained according to multiplying is executed, among each intermediate result and each state
As a result it is sent to main process task circuit;
The main process task circuit sorts each intermediate result to obtain each ranking results, foundation according to sequence operator
Each ranking results are executed biasing operation and obtain each operation result by addition operator, will be each according to sequence operator
Knot sequence obtains each state ranking results among state, executes each state ranking results according to addition operator
Biasing operation obtains each state operation result;According to addition operator by each operation result and each shape
Subsequent processing, which is carried out, after the corresponding addition of state operation result obtains each output result.
22. according to the method for claim 21, which is characterized in that according to input state value CT-1And each output
As a result output state value C is obtainedtIt specifically includes:
The main process task circuit is according to multiplication operator by input state value CT-1With the output result f for forgetting doortMultiplication obtains first
As a result, according to multiplication operator by the output result g of more new state doortWith the output result i of input gatetMultiplication obtain second as a result,
First result and the second results added are obtained into output state value Ct。
23. according to the method for claim 21, which is characterized in that described according to input state value CT-1And each
Output result obtains output data htIt specifically includes:
The main process task circuit is according to activation operator to output state value CtIt executes activation operation and obtains activation result, by out gate
Output result OtIt is multiplied to obtain output result h with activation resultt。
24. according to the method for claim 21, which is characterized in that the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
25. according to the method for claim 21, which is characterized in that the method also includes:
The main process task circuit is by output data htAs the input results of subsequent time, by output state value CtAs lower a period of time
The input state value at quarter.
26. according to method described in claim 20-25 any one, which is characterized in that such as the quantity from processing circuit
To be multiple, the arithmetic element includes: tree-shaped module, and the tree-shaped module includes: a root port and multiple ports, described
The root port of tree-shaped module connects the main process task circuit, and multiple ports of the tree-shaped module are separately connected multiple from processing
One in circuit from processing circuit;The method also includes:
Main process task circuit described in the tree-shaped module forwards and the multiple data and operator between processing circuit.
27. according to method described in claim 20-25 any one, which is characterized in that such as the quantity from processing circuit
To be multiple, the arithmetic element further includes one or more branch process circuits, each branch process circuit connection at least one
From processing circuit, the method also includes:
The branch process circuit forwards the main process task circuit and the multiple data and operator between processing circuit.
28. according to method described in claim 20-25 any one, which is characterized in that such as the quantity from processing circuit
To be multiple, it is the multiple from processing circuit be in array distribution;Each it is connect from processing circuit with other adjacent from processing circuit,
The main process task circuit connection is the multiple a from processing circuit, the k tandem circuit are as follows: the 1st row from the k in processing circuit
N from processing circuit, m row n m arranged from processing circuit and the 1st from processing circuit;The method also includes:
The K is a from main process task circuit described in processing circuit and multiple data and operator between processing circuit.
29. according to method described in claim 20-25, which is characterized in that it is described from processing circuit include: multiplication process electricity
Road and accumulation process circuit;The method specifically includes:
The multiplication process circuit is to the element value in the input block received and corresponding position in each weight
Element value executes product calculation and obtains each result of product;The element value in input state data block received with it is each
The element value of corresponding position executes product calculation and obtains each another result of product in the weight of door;
The accumulation process circuit executes accumulating operation to each result of product and obtains each intermediate result, by this
Each another result of product executes accumulating operation and obtains each state intermediate result.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811579542.3A CN109670581B (en) | 2018-12-21 | 2018-12-21 | Computing device and board card |
PCT/CN2019/105932 WO2020125092A1 (en) | 2018-12-20 | 2019-09-16 | Computing device and board card |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811579542.3A CN109670581B (en) | 2018-12-21 | 2018-12-21 | Computing device and board card |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109670581A true CN109670581A (en) | 2019-04-23 |
CN109670581B CN109670581B (en) | 2023-05-23 |
Family
ID=66147138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811579542.3A Active CN109670581B (en) | 2018-12-20 | 2018-12-21 | Computing device and board card |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670581B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020125092A1 (en) * | 2018-12-20 | 2020-06-25 | 中科寒武纪科技股份有限公司 | Computing device and board card |
CN112329926A (en) * | 2020-11-30 | 2021-02-05 | 珠海采筑电子商务有限公司 | Quality improvement method and system for intelligent robot |
CN112491555A (en) * | 2020-11-20 | 2021-03-12 | 重庆无缝拼接智能科技有限公司 | Medical electronic signature processing method and electronic equipment |
WO2021088404A1 (en) * | 2019-11-06 | 2021-05-14 | 深圳大普微电子科技有限公司 | Data processing method, apparatus and device, and readable storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6654730B1 (en) * | 1999-12-28 | 2003-11-25 | Fuji Xerox Co., Ltd. | Neural network arithmetic apparatus and neutral network operation method |
US20160342891A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Neural Network Processor |
US20170103305A1 (en) * | 2015-10-08 | 2017-04-13 | Via Alliance Semiconductor Co., Ltd. | Neural network unit that performs concurrent lstm cell calculations |
CN107341542A (en) * | 2016-04-29 | 2017-11-10 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings |
WO2018058452A1 (en) * | 2016-09-29 | 2018-04-05 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network operation |
US20180174036A1 (en) * | 2016-12-15 | 2018-06-21 | DeePhi Technology Co., Ltd. | Hardware Accelerator for Compressed LSTM |
WO2018120016A1 (en) * | 2016-12-30 | 2018-07-05 | 上海寒武纪信息科技有限公司 | Apparatus for executing lstm neural network operation, and operational method |
CN108268939A (en) * | 2016-12-30 | 2018-07-10 | 上海寒武纪信息科技有限公司 | For performing the device of LSTM neural network computings and operation method |
-
2018
- 2018-12-21 CN CN201811579542.3A patent/CN109670581B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6654730B1 (en) * | 1999-12-28 | 2003-11-25 | Fuji Xerox Co., Ltd. | Neural network arithmetic apparatus and neutral network operation method |
US20160342891A1 (en) * | 2015-05-21 | 2016-11-24 | Google Inc. | Neural Network Processor |
US20170103305A1 (en) * | 2015-10-08 | 2017-04-13 | Via Alliance Semiconductor Co., Ltd. | Neural network unit that performs concurrent lstm cell calculations |
CN107341542A (en) * | 2016-04-29 | 2017-11-10 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings |
WO2018058452A1 (en) * | 2016-09-29 | 2018-04-05 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network operation |
US20180174036A1 (en) * | 2016-12-15 | 2018-06-21 | DeePhi Technology Co., Ltd. | Hardware Accelerator for Compressed LSTM |
WO2018120016A1 (en) * | 2016-12-30 | 2018-07-05 | 上海寒武纪信息科技有限公司 | Apparatus for executing lstm neural network operation, and operational method |
CN108268939A (en) * | 2016-12-30 | 2018-07-10 | 上海寒武纪信息科技有限公司 | For performing the device of LSTM neural network computings and operation method |
Non-Patent Citations (1)
Title |
---|
何峰等: "长短期记忆LSTM神经形态芯片设计的两步映射方法" * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020125092A1 (en) * | 2018-12-20 | 2020-06-25 | 中科寒武纪科技股份有限公司 | Computing device and board card |
WO2021088404A1 (en) * | 2019-11-06 | 2021-05-14 | 深圳大普微电子科技有限公司 | Data processing method, apparatus and device, and readable storage medium |
CN112491555A (en) * | 2020-11-20 | 2021-03-12 | 重庆无缝拼接智能科技有限公司 | Medical electronic signature processing method and electronic equipment |
CN112491555B (en) * | 2020-11-20 | 2022-04-05 | 山西智杰软件工程有限公司 | Medical electronic signature processing method and electronic equipment |
CN112329926A (en) * | 2020-11-30 | 2021-02-05 | 珠海采筑电子商务有限公司 | Quality improvement method and system for intelligent robot |
Also Published As
Publication number | Publication date |
---|---|
CN109670581B (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543832A (en) | A kind of computing device and board | |
CN109522052A (en) | A kind of computing device and board | |
CN109670581A (en) | A kind of computing device and board | |
CN109740739A (en) | Neural computing device, neural computing method and Related product | |
CN109189474A (en) | Processing with Neural Network device and its method for executing vector adduction instruction | |
CN109657782A (en) | Operation method, device and Related product | |
CN110163357A (en) | A kind of computing device and method | |
CN109032670A (en) | Processing with Neural Network device and its method for executing vector duplicate instructions | |
CN111047022B (en) | Computing device and related product | |
CN110059797A (en) | A kind of computing device and Related product | |
CN110147249A (en) | A kind of calculation method and device of network model | |
CN109739703A (en) | Adjust wrong method and Related product | |
CN110059809A (en) | A kind of computing device and Related product | |
CN109753319A (en) | A kind of device and Related product of release dynamics chained library | |
CN109711540A (en) | A kind of computing device and board | |
CN109726800A (en) | Operation method, device and Related product | |
CN109711538A (en) | Operation method, device and Related product | |
CN111047021B (en) | Computing device and related product | |
CN109740730A (en) | Operation method, device and Related product | |
CN109740729A (en) | Operation method, device and Related product | |
CN110472734A (en) | A kind of computing device and Related product | |
CN111260070B (en) | Operation method, device and related product | |
CN110515586A (en) | Multiplier, data processing method, chip and electronic equipment | |
CN111738429B (en) | Computing device and related product | |
CN111368990A (en) | Neural network computing device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant after: Zhongke Cambrian Technology Co.,Ltd. Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant before: Beijing Zhongke Cambrian Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |