CN109670581A - A kind of computing device and board - Google Patents

A kind of computing device and board Download PDF

Info

Publication number
CN109670581A
CN109670581A CN201811579542.3A CN201811579542A CN109670581A CN 109670581 A CN109670581 A CN 109670581A CN 201811579542 A CN201811579542 A CN 201811579542A CN 109670581 A CN109670581 A CN 109670581A
Authority
CN
China
Prior art keywords
data
input
result
operator
circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811579542.3A
Other languages
Chinese (zh)
Other versions
CN109670581B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Cambrian Technology Co Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201811579542.3A priority Critical patent/CN109670581B/en
Publication of CN109670581A publication Critical patent/CN109670581A/en
Priority to PCT/CN2019/105932 priority patent/WO2020125092A1/en
Application granted granted Critical
Publication of CN109670581B publication Critical patent/CN109670581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a kind of computing device and board, the computing device is for executing LSTM operation, the board, the board includes: memory device, interface arrangement and control device and neural network chip, the neural network chip includes computing device, the memory device, for storing data;The interface arrangement, for realizing the data transmission between the chip and external equipment;The control device is monitored for the state to the chip.Computing device provided by the present application has the advantages that low in energy consumption.

Description

A kind of computing device and board
Technical field
This application involves technical field of information processing, and in particular to a kind of computing device and board.
Background technique
Length time memory network (LSTM) is a kind of time recurrent neural network (RNN), since network itself is unique Structure design, LSTM are suitable for the critical event being spaced in processing and predicted time sequence and delay is very long.Compared to tradition Recurrent neural network, LSTM network shows better performance, it is very suitable to learn through experience, so as in critical event Between there are after the unknown size time when classified to time series, handled and predicted.Currently, in speech recognition, video Description, machine translation and music numerous areas, the LSTM network such as are automatically synthesized and are widely used.
Existing LSTM network realizes that the energy consumption that existing processor executes LSTM operation is high based on general processor.
Summary of the invention
The embodiment of the present application provides a kind of computing device and Related product, can promote the processing speed of LSTM, saves function Consumption.
In a first aspect, providing a kind of computing device for executing LSTM operation, the LSTM includes: input gate, forgets Remember that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage unit;
The storage unit, for storing LSTM operation operator, input data Xt, weight data, output data ht, input State value CT-1, input results hT-1, output state value Ct
The controller unit, for obtaining input data Xt, weight data, input state value CT-1, input results hT-1、 And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation Operator is sent to arithmetic element,
The arithmetic element, for according to input data Xt, weight data, input results hT-1And LSTM operation operator It executes the operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door and obtains each defeated Out as a result, according to input state value CT-1And each output result obtains output data htAnd output state value Ct
Optionally, the arithmetic element includes: main process task circuit and from processing circuit;
The controller unit is specifically used for constructing multiple fractionation operators, multiple sequence operators, multiplication according to LSTM operator Operator, activation operator and addition operator;
The main process task circuit is specifically used for foundation sequence operator for input data Xt, weight data and input state Value reorders, and the weight data includes: each weight data, and then foundation splits algorithm for each weight Data and multiplication operator are broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks And multiple input state data blocks, multiple input blocks and multiple input state data blocks are distributed to described from processing Circuit;
It is described from processing circuit, for according to multiplication operator by the multiple input block and each weight data It executes multiplying and obtains each intermediate result, according to multiplication operator by the multiple input state data block and each door Weight data execute multiplying and obtain each state intermediate result, by each intermediate result and each State intermediate result is sent to main process task circuit;
The main process task circuit, for sorting each intermediate result to obtain each sequence according to sequence operator As a result, each ranking results, which are executed biasing operation, according to addition operator obtains each operation result, according to sequence Knot sequence among each state is obtained each state ranking results by operator, arranges each state according to addition operator Sequence result executes biasing operation and obtains each state operation result;According to addition operator by each operation result and Subsequent processing, which is carried out, after the corresponding addition of each state operation result obtains each output result.
Optionally, the main process task circuit is specifically used for foundation multiplication operator for input state value CT-1With forget the defeated of door Result f outtMultiplication obtains first as a result, according to multiplication operator by the output result g of more new state doortWith the output knot of input gate Fruit itMultiplication obtains second as a result, the first result and the second results added are obtained output state value Ct
Optionally, the main process task circuit is specifically used for according to activation operator to output state value CtExecute activation operation Activation result is obtained, by the output result O of out gatetIt is multiplied to obtain output result h with activation resultt
Optionally, the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
Optionally, the main process task circuit is also used to output data htAs the input results of subsequent time, will export State value CtInput state value as subsequent time.
Optionally, be from the quantity of processing circuit as described it is multiple, the arithmetic element includes: tree-shaped module, the tree Pattern block includes: a root port and multiple ports, and the root port of the tree-shaped module connects the main process task circuit, described Multiple ports of tree-shaped module are separately connected multiple one from processing circuit from processing circuit;
The tree-shaped module, for forward the main process task circuit and the multiple data between processing circuit and Operator.
Optionally, be from the quantity of processing circuit as described it is multiple, the arithmetic element further includes one or more branches Processing circuit, each branch process circuit connection at least one from processing circuit,
The branch process circuit, for forwarding the main process task circuit and the multiple data between processing circuit And operator.
Optionally, be from the quantity of processing circuit as described it is multiple, it is the multiple from processing circuit be in array distribution;Each It is connect from processing circuit with other adjacent from processing circuit, the main process task circuit connection is the multiple from processing circuit K is from processing circuit, the k tandem circuit are as follows: the n of n of the 1st row from processing circuit, m row it is a from processing circuit and The m of 1st column is a from processing circuit;
The K is a from processing circuit, for forwarding the main process task circuit and multiple data between processing circuit And operator.
Optionally, the main process task circuit includes: conversion processing circuit;
The conversion processing circuit, for executing conversion process to data, specifically: by the received data of main process task circuit Execute the exchange between the first data structure and the second data structure.
Optionally, it is described from processing circuit include: multiplication process circuit and accumulation process circuit;
The multiplication process circuit, for in the input block received element value with it is right in each weight It answers the element value of position to execute product calculation and obtains each result of product;The element in input state data block received The element value of value and corresponding position in each weight executes product calculation and obtains each another result of product;
The accumulation process circuit obtains each centre for executing accumulating operation to each result of product As a result, each another result of product execution accumulating operation is obtained each state intermediate result.
Optionally, the tree-shaped module is that n pitches tree construction, and the n is the integer more than or equal to 2.
Second aspect, the embodiment of the present application provide a kind of LSTM arithmetic unit, and the LSTM arithmetic unit includes one Or the computing device that multiple first aspects provide, for being obtained from other processing units to operational data and control information, and Specified LSTM operation is executed, implementing result is passed into other processing units by I/O interface;
When the LSTM device includes multiple computing devices, spy can be passed through between the multiple computing device Fixed structure is attached and transmits data;
Wherein, multiple computing devices are interconnected by quick external equipment interconnection Bus PC IE bus and transmit number According to support the operation of more massive LSTM;Multiple computing devices share same control system or possess respective control System processed;Multiple computing device shared drives possess respective memory;The mutual contact mode of multiple computing devices It is any interconnection topology.
The third aspect, provides a kind of combined treatment device, and the combined treatment device includes the LSTM operation of second aspect Device, general interconnecting interface and other processing units;
The LSTM arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying Make.
Fourth aspect, provides a kind of neural network chip, and neural network chip includes the computing device that first aspect provides Or the combined treatment device that the LSTM arithmetic unit or the third aspect of second aspect offer provide.
5th aspect, provides a kind of electronic equipment, and the electronic equipment includes the chip provided such as fourth aspect.
6th aspect, provides a kind of board, and the board includes: memory device, interface arrangement and control device and the The neural network chip that four aspects provide;
Wherein, the neural network chip and the memory device, the control device and the interface arrangement are distinguished Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
7th aspect, the embodiment of the present application also provide a kind of LSTM operation method, and the LSTM includes: the LSTM packet Include: input gate forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage Unit;The storage unit storage: LSTM operation operator, input data Xt, weight data, output data ht, input state value CT-1, input results hT-1, output state value Ct
Described method includes following steps:
The controller unit obtains input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator Arithmetic element is sent to,
The arithmetic element is according to input data Xt, weight data, input results hT-1And the execution of LSTM operation operator is defeated The operation of introduction, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output as a result, According to input state value CT-1And each output result obtains output data htAnd output state value Ct
In some embodiments, the electronic equipment includes data processing equipment, robot, computer, printer, scanning Instrument, tablet computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, Camera, video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or medical treatment Equipment.
In some embodiments, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include electricity Depending on, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument and/or electrocardiograph.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the structural schematic diagram of LSTM a kind of
Fig. 2 is a kind of structural schematic diagram of computing device provided by the embodiments of the present application.
Fig. 2 a is a kind of structural schematic diagram of arithmetic element provided by the embodiments of the present application.
Fig. 3 is the structural schematic diagram of another computing device provided by the present application.
Fig. 3 a is the structural schematic diagram of main process task circuit provided by the present application.
Fig. 4 a is a kind of structural schematic diagram of tree-shaped module transmitting terminal provided by the present application.
Fig. 4 b is a kind of structural schematic diagram of tree-shaped module receiving end provided by the present application.
Fig. 4 c is binary tree structure schematic diagram provided by the present application.
Fig. 5 is the structure chart for the computing device that the application one embodiment provides.
Fig. 6 is the flow diagram for the LSTM operation method that the application one embodiment provides.
Fig. 7 is a kind of structure chart of combined treatment device provided by the embodiments of the present application.
Fig. 8 is the structure chart of another combined treatment device provided by the embodiments of the present application.
Fig. 9 is a kind of structural schematic diagram of board provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.
The description and claims of this application and term " first ", " second ", " third " and " in the attached drawing Four " etc. are not use to describe a particular order for distinguishing different objects.In addition, term " includes " and " having " and it Any deformation, it is intended that cover and non-exclusive include.Such as it contains the process, method of a series of steps or units, be System, product or equipment are not limited to listed step or unit, but optionally further comprising the step of not listing or list Member, or optionally further comprising other step or units intrinsic for these process, methods, product or equipment.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.
Refering to fig. 1, Fig. 1 is the schematic diagram of LSTM a kind of, as shown in Figure 1, the LSTM includes: input gate, forgets door, updates State cell and out gate, corresponding calculation formula are as follows:
ft=σ (Wf[ht-1,xt]+bf
it=σ (Wi[ht-1,xt]+bi
gt=tanh (Wg[ht-1,xt]+bg
Ct=Ct-1⊙ft+gt⊙it
Ot=σ (Wo[ht-1,xt]+bo
ht=Ot⊙tanh(Ct)
Wherein, XtFor the input data of t moment, hT-1Indicate the output data at t-1 moment, Wf、Wi、WgAnd WoRespectively It indicates to forget door, input gate, update weight vector corresponding to state cell and out gate, bf、bi、bcAnd boIt respectively indicates and forgets Remember door, input gate, update state cell and the corresponding biasing of out gate;ftThe output of door, the shape with the t-1 moment are forgotten in expression State unit carries out dot product and carrys out selectively Lethean state cell value;itThe output for indicating input gate, with obtaining for t moment Candidate state value dot product come selectively by the candidate state value of t moment be added to update state cell in;gtIndicate t moment The candidate state value being calculated;ctIt indicates by selectively forgeing the state value at t-1 moment and by the state of t moment Obtained new state value, c is selectively added in valuetIt will be used calculating the final output moment and be transferred to subsequent time;Ot Indicate that t moment updates the alternative condition that part output as a result is needed in state cell;htIndicate the output of t moment, simultaneously It will also be transferred to subsequent time (i.e. t+1 moment);⊙ is the product that vector presses element operation;σ is sigmoid function, meter Calculate formula are as follows:The calculation formula of activation primitive tanh function is
.When specific calculate, the application is by Wf、Wi、WgAnd WoIt is combined into a matrix W, bf、bi、bcAnd boIt is combined into one Matrix b.
Referring to Fig.2, Fig. 2 is computing device provided by the present application.Referring to Fig.2, a kind of computing device is provided, calculating dress It sets for executing LSTM operation, which includes: controller unit 11, arithmetic element 12 and storage unit 10, In, controller unit 11 is connect with arithmetic element 12, storage unit 10, which includes: a main process task circuit 101 It (can preferentially be selected multiple from processing circuit to be one or more from processing circuit) with from processing circuit 102;
It should be noted that above-mentioned main process task circuit itself includes memory (such as memory or register), the memory The some data that can store main process task circuit can choose carrying memory from processing circuit.
LSTM includes: input gate, forgets door, out gate and more new state door;
Storage unit 10, for storing LSTM operation operator, input data Xt, weight data, output data ht, input shape State value CT-1, input results hT-1, output state value Ct
Controller unit 11, for obtaining input data Xt, weight data, input state value CT-1, input results hT-1, with And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation is calculated Son is sent to arithmetic element,
Arithmetic element 12, for according to input data Xt, weight data, input results hT-1And LSTM operation operator is held The operation of row input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output As a result, according to input state value CT-1And each output result obtains output data htAnd output state value Ct
Optionally, controller unit described above is specifically used for constructing multiple fractionation operators, Duo Gepai according to LSTM operator Sequence operator, multiplication operator, activation operator and addition operator;
The main process task circuit is specifically used for foundation sequence operator for input data Xt, weight data and input state Value reorders, and the weight data includes: each weight data, and then foundation splits algorithm for each weight Data and multiplication operator are broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks And multiple input state data blocks, multiple input blocks and multiple input state data blocks are distributed to described from processing Circuit;
It is described from processing circuit, for according to multiplication operator by the multiple input block and each weight data It executes multiplying and obtains each intermediate result, according to multiplication operator by the multiple input state data block and each door Weight data execute multiplying and obtain each state intermediate result, by each intermediate result and each State intermediate result is sent to main process task circuit;
It should be noted that in above-mentioned each door each operation be it is relatively independent, calculated result is also relatively independent, I.e. each door all has respective weight data, such as Wf、Wi、WgAnd WoRespectively represent the weight data of 4 doors.
It is above-mentioned to obtain the multiple input block and the execution multiplying of each weight data according to multiplication operator It can specifically include to each intermediate result:
Multiple input blocks and input gate weight data are executed into multiplication multiplying and obtain the intermediate result of input gate, Multiple input blocks and out gate weight data execute multiplication multiplying and obtain the intermediate result of out gate, multiple input numbers The intermediate result that multiplication multiplying obtains forgetting door, multiple input blocks and update are executed with a weight data is forgotten according to block State door weight data executes multiplication multiplying and obtains the intermediate result of more new state door.Knot among above-mentioned each state Fruit is similar with each intermediate result, is not repeating here.
The main process task circuit, for sorting each intermediate result to obtain each sequence according to sequence operator As a result, each ranking results, which are executed biasing operation, according to addition operator obtains each operation result, according to sequence Knot sequence among each state is obtained each state ranking results by operator, arranges each state according to addition operator Sequence result executes biasing operation and obtains each state operation result;According to addition operator by each operation result and Subsequent processing, which is carried out, after the corresponding addition of each state operation result obtains each output result.
Arithmetic element is arranged to host-guest architecture by technical solution provided by the present application, for the forward operation of LSTM, incite somebody to action this The input data at moment and the output data fractionation parallel processing for forgetting door, it is electric by main process task circuit and from processing in this way Road can carry out concurrent operation to the biggish part of calculation amount, to improve arithmetic speed, save operation time, and then reduce Power consumption.
Optionally, the main process task circuit is specifically used for foundation multiplication operator for input state value CT-1With forget the defeated of door Result f outtMultiplication obtains first as a result, according to multiplication operator by the output result g of more new state doortWith the output knot of input gate Fruit itMultiplication obtains second as a result, the first result and the second results added are obtained output state value Ct
Optionally, the main process task circuit is specifically used for according to activation operator to output state value CtExecute activation operation Activation result is obtained, by the output result O of out gatetIt is multiplied to obtain output result h with activation resultt
Optionally, the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
Optionally, the main process task circuit is also used to output data htAs the input results of subsequent time, will export State value CtInput state value as subsequent time.
Above-mentioned LSTM may include multiple hidden layers, and h is the integer more than or equal to 2, can be in LSTM for h-th of hidden layer Any one intermediate hidden layer operation, multiple LSTM operations, realization process is, in forward operation, as last moment t- 1 executes completion obtains output result t-1 later, and the operation operator of current time t can will export result t-1 conduct last moment The input data for forgetting door of subsequent time is forgotten door and is determined by sigmoid to export passing through for result t-1 constantly Rate has obtained forgetting the output result t of a t moment in this way, and output result t and weight are carried out operation, another part operation For moment t input layer input data as another part input neuron, then by two parts input neuron respectively with power Value executes product calculation and obtains two operation results, and two operation results are added up to the output of moment t as a result, then will The output result of moment t forgets the input data of door as subsequent time t+1, can selectively determine last moment in this way Result percent of pass.
Optionally, above-mentioned computing device can also include: direct memory access unit 50, and storage unit 10 may include: One in register, caching or any combination, specifically, the caching, calculates operator for storing;The register is used In the storage input data and scalar;The caching is that scratchpad caches.Direct memory access unit 50 is used for from storage Unit 10 is read or storing data.
Optionally, which includes: operator storage unit 110, operator processing unit 111 and storage queue unit 113;
Operator storage unit 110, for storing the associated calculating operator of the LSTM operation;
The operator processing unit 111, for parsing to obtain multiple operation operators to the calculating operator;
Storage queue unit 113 is used for storage operators queue, which includes: to wait for by the tandem of the queue The multiple operation operators or calculating operator executed.
Optionally, which can also include:
The dependence processing unit 108, for determining the first operation operator and institute when with multiple operation operators The 0th operation operator before stating the first operation operator whether there is incidence relation, such as the first operation operator and the described 0th There are incidence relations for operation operator, then the first operation operator are buffered in the operator storage unit, the described 0th After operation operator is finished, the first operation operator is extracted from the operator storage unit and is transmitted to the arithmetic element;
The determination the first operation operator whether there is with the 0th operation operator before the first operation operator to be associated with System includes:
First according to data (such as matrix) required in the first operation operator described in the first operation operator extraction deposits Store up address section, the 0th stored address area according to required matrix in the 0th operation operator described in the 0th operation operator extraction Between, such as first storage address section has Chong Die region with the 0th storage address section, it is determined that described first Operation operator and the 0th operation operator have incidence relation, such as first storage address section and the 0th storage Location section does not have the region of overlapping, it is determined that the first operation operator does not have with the 0th operation operator to be associated with System.
In another alternative embodiment, arithmetic element 12 is as shown in figure 3, may include 101 He of main process task circuit It is multiple from processing circuit 102.In one embodiment, as shown in figure 3, it is multiple from processing circuit be in array distribution;Each from Reason circuit is connect with other adjacent from processing circuit, and the multiple k from processing circuit of main process task circuit connection are from Circuit is managed, the k is a from processing circuit are as follows: the n of n of the 1st row from processing circuit, m row is a to be arranged from processing circuit and the 1st M from processing circuit, it should be noted that as shown in Figure 3 K only include n of the 1st row from processing circuit from processing electricity Road, the n m arranged from processing circuit and the 1st of m row are a from processing circuit, i.e. the k are multiple from processing from processing circuit In circuit directly with the slave processing circuit of main process task circuit connection.
K is a from processing circuit, in the main process task circuit and multiple data (data between processing circuit Can be input block, input state data block, intermediate result, state intermediate result etc.) and operator forwarding.
Optionally, as shown in Figure 3a, which can also include: conversion processing circuit 110, activation processing circuit 111, one of addition process circuit 112 or any combination;
Conversion processing circuit 110 executes conversion process for data, specifically: by the received data of main process task circuit (packet It includes but is not limited to: input data Xt, weight data (each weight), input state value CT-1, input results hT-1) execute the Exchange (such as the conversion of continuous data and discrete data, such as floating data between one data structure and the second data structure With the conversion of fixed-point data).
Processing circuit 111 is activated, for executing the activation operation of data in main process task circuit;
Addition process circuit 112, for executing add operation or accumulating operation.
In another embodiment, which is Matrix Multiplication in terms of the operator of matrix, sum operator, activation operator etc. Calculate operator.
In a kind of optional embodiment, as shown in fig. 4 a, the arithmetic element includes: tree-shaped module 40, the tree Pattern block includes: a root port 401 and multiple ports 404, and the root port of the tree-shaped module connects the main process task electricity Road, multiple ports of the tree-shaped module are separately connected multiple one from processing circuit from processing circuit;
Above-mentioned tree-shaped module has transmission-receiving function, such as shown in fig. 4 a, which is sending function, such as Fig. 4 b Shown, which is receive capabilities.
The tree-shaped module, for forwarding the main process task circuit and the multiple data between processing circuit (should Data can be input block, input state data block, intermediate result, state intermediate result etc.).
Optionally, which is the optional as a result, it may include at least 1 node layer, the node of computing device For the cable architecture with forwarding capability, the node itself can not have computing function.If tree-shaped module has zero layer node, i.e., Without the tree-shaped module.
Optionally, which can pitch tree construction for n, for example, binary tree structure as illustrated in fig. 4 c, certainly may be used Think trident tree construction, which can be the integer more than or equal to 2.The application specific embodiment is not intended to limit the specific of above-mentioned n Value, the above-mentioned number of plies may be 2, can connect the node of other layers in addition to node layer second from the bottom from processing circuit, Such as it can connect the node of layer last as illustrated in fig. 4 c.
Optionally, above-mentioned arithmetic element can carry individual caching, may include: neuron caching as shown in Figure 2 a Unit, the neuron cache unit 63 cache the input neuron vector data and output neuron value number from processing circuit According to.
Such as Fig. 2 a, which can also include: weight cache unit 64, calculate for caching this from processing circuit The weight data needed in the process.
In an alternative embodiment, arithmetic element 12 is as shown in figure 5, may include branch process circuit 103;It is specific Connection structure it is as shown in Figure 5, wherein
Above-mentioned branch process circuit 103 may include memory, as shown in figure 5, the memory of branch process circuit 103 Size can for individually between 2 to 2.5 times of maximum data capacity that processing circuit needs to store, in this way after setting, From processing circuit i.e. no setting is required memory, relative to a branch process circuit, only with setting 2.5*R (individually from processing Capability value needed for device circuit), if there is no branch process circuit, need to be arranged 4*R, and the utilization of its register Rate is also low, therefore the structure can effectively reduce the total capacity of memory, reduce cost.
The branch process circuit, for forwarding the main process task circuit and the multiple (this between processing circuit Data can be input block, input state data block, intermediate result, state intermediate result etc.).
Illustrate mode (the above-mentioned input state data of the fractionation of above-mentioned input data below by the example of an example Fractionation can also be with the fractionation of input data), for output result and input data because data type is identical, fractionation Mode is essentially identical, it is assumed that the data type is matrix, which is H*W, then the mode split can be, as H numerical value compared with It is small (to be less than given threshold, such as 100), then (each vector is matrix matrix H * W is split into H vector along the direction H A line of H*W), each vector is an input block, and to the position mark of the first element of input block defeated Enter data block, i.e. input blockH, w, wherein h, w are respectively input blockH, wThe first element in the direction H and the direction W Value, such as the first input block, the h=1.w=1.Input block is received from processing circuitH, wAfterwards, by input data BlockH, wIt is multiplied with the every column element one-to-one correspondence of weight and accumulating operation obtains intermediate resultW, i, the w of intermediate result is input data The w value of block, i be the columns value of the column element calculated with input block, and main process task circuit determines intermediate result to answering the door The position of operation result is w, i.For example, input block input block1,1Among the input being calculated with weight first row As a result1,1, main process task circuit will input intermediate result1,1It is arranged in the operation result the first row first row answered the door.
The application also provides a kind of LSTM operation method, and it includes: described that the method, which is applied to LSTM described in computing device, LSTM includes: input gate, forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller list Member, storage unit;The storage unit storage: LSTM operation operator, input data Xt, weight data, output data ht, input State value CT-1, input results hT-1, output state value Ct;Described method includes following steps:
Step S601, controller unit obtains input data Xt, weight data, input state value CT-1, input results hT-1、 And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation Operator is sent to arithmetic element,
Step S601, arithmetic element is according to input data Xt, weight data, input results hT-1And LSTM operation operator It executes the operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door and obtains each defeated Out as a result, according to input state value CT-1And each output result obtains output data htAnd output state value Ct
Optionally, the arithmetic element includes: main process task circuit and from processing circuit;The arithmetic element is according to input Data Xt, weight data, input results hT-1And LSTM operation operator executes the operation, the operation for forgetting door, output of input gate The output result that the operation of door and the operation of more new state door obtain each specifically includes:
The controller unit constructs multiple fractionation operators, multiple sequence operators, multiplication operator, activation according to LSTM operator Operator and addition operator;
The main process task circuit is according to sequence operator by input data Xt, weight data and input state value reset Sequence, the weight data include: each weight data, then by each weight data and are multiplied according to fractionation algorithm Method operator is broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks and multiple defeated Enter status data block, multiple input blocks and multiple input state data blocks is distributed to described from processing circuit;
It is described to execute the multiple input block and each weight data according to multiplication operator from processing circuit Multiplying obtains each intermediate result, according to multiplication operator by the multiple input state data block and each power Value Data executes multiplying and obtains each state intermediate result, by each intermediate result and each state Intermediate result is sent to main process task circuit;
The main process task circuit sorts each intermediate result to obtain each ranking results according to sequence operator, Each ranking results are executed into biasing operation according to addition operator and obtain each operation result, it will according to sequence operator Knot sequence obtains each state ranking results among each state, according to addition operator by each state ranking results It executes biasing operation and obtains each state operation result;According to addition operator by each operation result and each door State operation result it is corresponding be added after carry out subsequent processing and obtain each output result.
Optionally, according to input state value CT-1And each output result obtains output state value CtIt specifically includes:
The main process task circuit is according to multiplication operator by input state value CT-1With the output result f for forgetting doortMultiplication obtains First as a result, according to multiplication operator by the output result g of more new state doortWith the output result i of input gatetMultiplication obtains second As a result, the first result and the second results added are obtained output state value Ct
Optionally, described according to input state value CT-1And each output result obtains output data htSpecific packet It includes:
The main process task circuit is according to activation operator to output state value CtIt executes activation operation and obtains activation result, it will be defeated The output result O to go outtIt is multiplied to obtain output result h with activation resultt
The application is also disclosed that a LSTM device comprising the computing device that one or more is mentioned in this application, For being obtained from other processing units to operational data and control information, specified LSTM operation is executed, implementing result passes through I/O interface passes to peripheral equipment.For example camera, display, mouse, keyboard, network interface card, wifi interface service peripheral equipment Device.When comprising more than one computing device, it can be linked by specific structure between computing device and transmit data, example Such as, data are interconnected and are transmitted, by PCIE bus to support the operation of more massive convolutional neural networks training.This When, same control system can be shared, there can also be control system independent;Can also can each it be added with shared drive Fast device has respective memory.In addition, its mutual contact mode can be any interconnection topology.
The LSTM device compatibility with higher can be connected by PCIE interface with various types of servers.
The application is also disclosed that a combined treatment device comprising above-mentioned LSTM device, general interconnecting interface and its His processing unit.LSTM arithmetic unit is interacted with other processing units, the common operation completing user and specifying.Fig. 7 is group Close the schematic diagram of processing unit.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its His interface of the processing unit as LSTM arithmetic unit and external data and control, including data are carried, and complete to transport this LSTM Calculate the basic control such as unlatching, stopping of device;Other processing units can also cooperate with LSTM arithmetic unit and complete operation jointly Task.
General interconnecting interface, for transmitting data and Control operators between the LSTM device and other processing units.It should LSTM device obtains required input data from other processing units, and the storage device of LSTM device on piece is written;It can be from Control operators, the control caching of write-in LSTM device on piece are obtained in other processing units;Depositing for LSTM device can also be read It stores up the data in module and is transferred to other processing units.
Optionally, the structure as shown in figure 8, can also include storage device, storage device respectively with the LSTM device It is connected with other described processing units.Storage device is used to be stored in the number of the LSTM device and other processing units According to the data of operation required for being particularly suitable for can not be protected all in the storage inside of this LSTM device or other processing units The data deposited.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.
In some embodiments, a kind of chip has also been applied for comprising above-mentioned LSTM device or combined treatment device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.It is mentioned refering to Fig. 9, Fig. 9 A kind of board is supplied, above-mentioned board can also include other matching components, this is mating other than including said chip 389 Component includes but is not limited to: memory device 390, interface arrangement 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server Piece realizes data transfer.Preferably, when using the transmission of PCIE3.0X16 interface, theoretical bandwidth can reach 16000MB/s.? In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by described Interface arrangement sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated unit is realized in the form of software program module and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including several operators are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can To be completed by program come operator relevant hardware, which be can store in a computer-readable memory, memory May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), disk or CD etc..
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas; At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.

Claims (29)

1. a kind of computing device, which is characterized in that the computing device includes: input for executing LSTM operation, the LSTM Door forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage unit;
The storage unit, for storing LSTM operation operator, input data Xt, weight data, output data ht, input state Value CT-1, input results hT-1, output state value Ct
The controller unit, for obtaining input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator Arithmetic element is sent to,
The arithmetic element, for according to input data Xt, weight data, input results hT-1And LSTM operation operator executes The operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output knot Fruit, according to input state value CT-1And each output result obtains output data htAnd output state value Ct
2. the apparatus according to claim 1, which is characterized in that the arithmetic element includes: main process task circuit and from Manage circuit;
The controller unit is specifically used for constructing multiple fractionation operators, multiple sequence operators, multiplication calculation according to LSTM operator Son, activation operator and addition operator;
The main process task circuit is specifically used for foundation sequence operator for input data Xt, weight data and input state value carry out Reorder, the weight data includes: each weight data, then according to split algorithm by each weight data with And multiplication operator is broadcasted to from processing circuit, and input data and input state value are split into multiple input blocks and more Multiple input blocks and multiple input state data blocks are distributed to described from processing circuit by a input state data block;
It is described from processing circuit, for the multiple input block to be executed with each weight data according to multiplication operator Multiplying obtains each intermediate result, according to multiplication operator by the multiple input state data block and each power Value Data executes multiplying and obtains each state intermediate result, by each intermediate result and each state Intermediate result is sent to main process task circuit;
The main process task circuit, for sorting each intermediate result to obtain each sequence knot according to sequence operator Each ranking results are executed biasing operation according to addition operator and obtain each operation result, calculated according to sequence by fruit Knot sequence among each state is obtained each state ranking results by son, and each state sorts according to addition operator As a result it executes biasing operation and obtains each state operation result;According to addition operator by each operation result and respectively Subsequent processing, which is carried out, after the corresponding addition of a state operation result obtains each output result.
3. the apparatus of claim 2, which is characterized in that
The main process task circuit is specifically used for foundation multiplication operator for input state value CT-1With the output result f for forgetting doortPhase It is multiplied to first as a result, according to multiplication operator by the output result g of more new state doortWith the output result i of input gatetIt is mutually multiplied To second as a result, the first result and the second results added are obtained output state value Ct
4. device according to claim 3, which is characterized in that
The main process task circuit is specifically used for according to activation operator to output state value CtIt executes activation operation and obtains activation result, By the output result O of out gatetIt is multiplied to obtain output result h with activation resultt
5. the apparatus of claim 2, which is characterized in that the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
6. the apparatus of claim 2, which is characterized in that
The main process task circuit is also used to output data htAs the input results of subsequent time, by output state value CtAs The input state value of subsequent time.
7. according to device described in claim 2-6 any one, which is characterized in that be from the quantity of processing circuit as described Multiple, the arithmetic element includes: tree-shaped module, and the tree-shaped module includes: a root port and multiple ports, the tree The root port of pattern block connects the main process task circuit, and multiple ports of the tree-shaped module are separately connected multiple from processing electricity One in road from processing circuit;
The tree-shaped module, for forwarding the main process task circuit and the multiple data and calculation between processing circuit Son.
8. according to device described in claim 2-6 any one, which is characterized in that be from the quantity of processing circuit as described Multiple, the arithmetic element further includes one or more branch process circuits, each branch process circuit connection at least one from Processing circuit,
The branch process circuit, for forward the main process task circuit and the multiple data between processing circuit and Operator.
9. according to device described in claim 2-6 any one, which is characterized in that be from the quantity of processing circuit as described It is multiple, it is the multiple from processing circuit be in array distribution;It is each connect from processing circuit with other adjacent from processing circuit, institute The multiple k from processing circuit of main process task circuit connection are stated from processing circuit, the k tandem circuit are as follows: the 1st row The n n m arranged from processing circuit and the 1st from processing circuit, m row are a from processing circuit;
The K from processing circuit, for forward the main process task circuit and multiple data between processing circuit and Operator.
10. according to device described in claim 2-6 any one, which is characterized in that the main process task circuit includes: conversion Processing circuit;
The conversion processing circuit, for executing conversion process to data, specifically: the received data of main process task circuit are executed Exchange between first data structure and the second data structure.
11. according to device described in claim 2-6, which is characterized in that it is described from processing circuit include: multiplication process circuit With accumulation process circuit;
The multiplication process circuit, for corresponding to position in the element value and each weight in the input block received The element value set executes product calculation and obtains each result of product;The element value in input state data block received with The element value of corresponding position executes product calculation and obtains each another result of product in each weight;
The accumulation process circuit obtains each intermediate knot for executing accumulating operation to each result of product Each another result of product execution accumulating operation is obtained each state intermediate result by fruit.
12. device according to claim 7, which is characterized in that the tree-shaped module be n pitch tree construction, the n be greater than Integer equal to 2.
13. a kind of LSTM arithmetic unit, which is characterized in that the LSTM arithmetic unit includes one or more such as claim 1- 12 described in any item computing devices for being obtained from other processing units to operational data and control information, and execute and refer to Implementing result is passed to other processing units by I/O interface by fixed LSTM operation;
It, can be by specific between the multiple computing device when the LSTM device includes multiple computing devices Structure is attached and transmits data;
Wherein, multiple computing devices are interconnected and are transmitted data by quick external equipment interconnection Bus PC IE bus, To support the operation of more massive LSTM;Multiple computing devices share same control system or possess respective control system System;Multiple computing device shared drives possess respective memory;The mutual contact mode of multiple computing devices is to appoint Meaning interconnection topology.
14. a kind of combined treatment device, which is characterized in that the combined treatment device includes LSTM as claimed in claim 13 Arithmetic unit, general interconnecting interface and other processing units;
The LSTM arithmetic unit is interacted with other described processing units, the common calculating operation completing user and specifying.
15. combined treatment device according to claim 14, which is characterized in that further include: storage device, the storage device Connect respectively with the LSTM arithmetic unit and other described processing units, for save the LSTM arithmetic unit and it is described its The data of his processing unit.
16. a kind of neural network chip, which is characterized in that the neural network chip includes as described in claim 1 calculates Device or LSTM arithmetic unit as claimed in claim 13 or combined treatment device as claimed in claim 15.
17. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 16.
18. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right It is required that neural network chip described in 16;
Wherein, the neural network chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
19. board according to claim 18, which is characterized in that
The memory device includes: multiple groups storage unit, and storage unit described in each group is connect with the chip by bus, institute State storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
20. a kind of LSTM operation method, which is characterized in that the method is applied to computing device, and the LSTM includes: input Door forgets that door, out gate and more new state door, the computing device include: arithmetic element, controller unit, storage unit; The storage unit storage: LSTM operation operator, input data Xt, weight data, output data ht, input state value CT-1, it is defeated Enter result hT-1, output state value Ct
Described method includes following steps:
The controller unit obtains input data Xt, weight data, input state value CT-1, input results hT-1And LSTM Operation operator, by input data Xt, weight data, input state value CT-1, input results hT-1And LSTM operation operator is sent To arithmetic element,
The arithmetic element is according to input data Xt, weight data, input results hT-1And LSTM operation operator executes input gate Operation, operation, the operation of out gate and the operation of more new state door of forgetting door obtain each output as a result, foundation Input state value CT-1And each output result obtains output data htAnd output state value Ct
21. according to the method for claim 20, which is characterized in that the arithmetic element include: main process task circuit and from Processing circuit;The arithmetic element is according to input data Xt, weight data, input results hT-1And LSTM operation operator executes The operation of input gate, the operation for forgetting door, the operation of out gate and the operation of more new state door obtain each output knot Fruit specifically includes:
The controller unit constructs multiple fractionation operators, multiple sequence operators, multiplication operator, activation operator according to LSTM operator And addition operator;
The main process task circuit is according to sequence operator by input data Xt, weight data and input state value reorder, institute The weight data that weight data includes: each is stated, then calculates each weight data and multiplication according to fractionation algorithm Input data and input state value are split into multiple input blocks and multiple input shapes to from processing circuit by son broadcast Multiple input blocks and multiple input state data blocks are distributed to described from processing circuit by state data block;
It is described that the multiple input block and each weight data are executed into multiplication according to multiplication operator from processing circuit Operation obtains each intermediate result, according to multiplication operator by the multiple input state data block and each weight number Each state intermediate result is obtained according to multiplying is executed, among each intermediate result and each state As a result it is sent to main process task circuit;
The main process task circuit sorts each intermediate result to obtain each ranking results, foundation according to sequence operator Each ranking results are executed biasing operation and obtain each operation result by addition operator, will be each according to sequence operator Knot sequence obtains each state ranking results among state, executes each state ranking results according to addition operator Biasing operation obtains each state operation result;According to addition operator by each operation result and each shape Subsequent processing, which is carried out, after the corresponding addition of state operation result obtains each output result.
22. according to the method for claim 21, which is characterized in that according to input state value CT-1And each output As a result output state value C is obtainedtIt specifically includes:
The main process task circuit is according to multiplication operator by input state value CT-1With the output result f for forgetting doortMultiplication obtains first As a result, according to multiplication operator by the output result g of more new state doortWith the output result i of input gatetMultiplication obtain second as a result, First result and the second results added are obtained into output state value Ct
23. according to the method for claim 21, which is characterized in that described according to input state value CT-1And each Output result obtains output data htIt specifically includes:
The main process task circuit is according to activation operator to output state value CtIt executes activation operation and obtains activation result, by out gate Output result OtIt is multiplied to obtain output result h with activation resultt
24. according to the method for claim 21, which is characterized in that the subsequent processing specifically includes:
For example forget door, input gate and out gate, the subsequent processing is sigmoid operation;
For example more new state door, the subsequent processing are activation operation tanh function.
25. according to the method for claim 21, which is characterized in that the method also includes:
The main process task circuit is by output data htAs the input results of subsequent time, by output state value CtAs lower a period of time The input state value at quarter.
26. according to method described in claim 20-25 any one, which is characterized in that such as the quantity from processing circuit To be multiple, the arithmetic element includes: tree-shaped module, and the tree-shaped module includes: a root port and multiple ports, described The root port of tree-shaped module connects the main process task circuit, and multiple ports of the tree-shaped module are separately connected multiple from processing One in circuit from processing circuit;The method also includes:
Main process task circuit described in the tree-shaped module forwards and the multiple data and operator between processing circuit.
27. according to method described in claim 20-25 any one, which is characterized in that such as the quantity from processing circuit To be multiple, the arithmetic element further includes one or more branch process circuits, each branch process circuit connection at least one From processing circuit, the method also includes:
The branch process circuit forwards the main process task circuit and the multiple data and operator between processing circuit.
28. according to method described in claim 20-25 any one, which is characterized in that such as the quantity from processing circuit To be multiple, it is the multiple from processing circuit be in array distribution;Each it is connect from processing circuit with other adjacent from processing circuit, The main process task circuit connection is the multiple a from processing circuit, the k tandem circuit are as follows: the 1st row from the k in processing circuit N from processing circuit, m row n m arranged from processing circuit and the 1st from processing circuit;The method also includes:
The K is a from main process task circuit described in processing circuit and multiple data and operator between processing circuit.
29. according to method described in claim 20-25, which is characterized in that it is described from processing circuit include: multiplication process electricity Road and accumulation process circuit;The method specifically includes:
The multiplication process circuit is to the element value in the input block received and corresponding position in each weight Element value executes product calculation and obtains each result of product;The element value in input state data block received with it is each The element value of corresponding position executes product calculation and obtains each another result of product in the weight of door;
The accumulation process circuit executes accumulating operation to each result of product and obtains each intermediate result, by this Each another result of product executes accumulating operation and obtains each state intermediate result.
CN201811579542.3A 2018-12-20 2018-12-21 Computing device and board card Active CN109670581B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811579542.3A CN109670581B (en) 2018-12-21 2018-12-21 Computing device and board card
PCT/CN2019/105932 WO2020125092A1 (en) 2018-12-20 2019-09-16 Computing device and board card

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811579542.3A CN109670581B (en) 2018-12-21 2018-12-21 Computing device and board card

Publications (2)

Publication Number Publication Date
CN109670581A true CN109670581A (en) 2019-04-23
CN109670581B CN109670581B (en) 2023-05-23

Family

ID=66147138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811579542.3A Active CN109670581B (en) 2018-12-20 2018-12-21 Computing device and board card

Country Status (1)

Country Link
CN (1) CN109670581B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020125092A1 (en) * 2018-12-20 2020-06-25 中科寒武纪科技股份有限公司 Computing device and board card
CN112329926A (en) * 2020-11-30 2021-02-05 珠海采筑电子商务有限公司 Quality improvement method and system for intelligent robot
CN112491555A (en) * 2020-11-20 2021-03-12 重庆无缝拼接智能科技有限公司 Medical electronic signature processing method and electronic equipment
WO2021088404A1 (en) * 2019-11-06 2021-05-14 深圳大普微电子科技有限公司 Data processing method, apparatus and device, and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6654730B1 (en) * 1999-12-28 2003-11-25 Fuji Xerox Co., Ltd. Neural network arithmetic apparatus and neutral network operation method
US20160342891A1 (en) * 2015-05-21 2016-11-24 Google Inc. Neural Network Processor
US20170103305A1 (en) * 2015-10-08 2017-04-13 Via Alliance Semiconductor Co., Ltd. Neural network unit that performs concurrent lstm cell calculations
CN107341542A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings
WO2018058452A1 (en) * 2016-09-29 2018-04-05 北京中科寒武纪科技有限公司 Apparatus and method for performing artificial neural network operation
US20180174036A1 (en) * 2016-12-15 2018-06-21 DeePhi Technology Co., Ltd. Hardware Accelerator for Compressed LSTM
WO2018120016A1 (en) * 2016-12-30 2018-07-05 上海寒武纪信息科技有限公司 Apparatus for executing lstm neural network operation, and operational method
CN108268939A (en) * 2016-12-30 2018-07-10 上海寒武纪信息科技有限公司 For performing the device of LSTM neural network computings and operation method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6654730B1 (en) * 1999-12-28 2003-11-25 Fuji Xerox Co., Ltd. Neural network arithmetic apparatus and neutral network operation method
US20160342891A1 (en) * 2015-05-21 2016-11-24 Google Inc. Neural Network Processor
US20170103305A1 (en) * 2015-10-08 2017-04-13 Via Alliance Semiconductor Co., Ltd. Neural network unit that performs concurrent lstm cell calculations
CN107341542A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings
WO2018058452A1 (en) * 2016-09-29 2018-04-05 北京中科寒武纪科技有限公司 Apparatus and method for performing artificial neural network operation
US20180174036A1 (en) * 2016-12-15 2018-06-21 DeePhi Technology Co., Ltd. Hardware Accelerator for Compressed LSTM
WO2018120016A1 (en) * 2016-12-30 2018-07-05 上海寒武纪信息科技有限公司 Apparatus for executing lstm neural network operation, and operational method
CN108268939A (en) * 2016-12-30 2018-07-10 上海寒武纪信息科技有限公司 For performing the device of LSTM neural network computings and operation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何峰等: "长短期记忆LSTM神经形态芯片设计的两步映射方法" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020125092A1 (en) * 2018-12-20 2020-06-25 中科寒武纪科技股份有限公司 Computing device and board card
WO2021088404A1 (en) * 2019-11-06 2021-05-14 深圳大普微电子科技有限公司 Data processing method, apparatus and device, and readable storage medium
CN112491555A (en) * 2020-11-20 2021-03-12 重庆无缝拼接智能科技有限公司 Medical electronic signature processing method and electronic equipment
CN112491555B (en) * 2020-11-20 2022-04-05 山西智杰软件工程有限公司 Medical electronic signature processing method and electronic equipment
CN112329926A (en) * 2020-11-30 2021-02-05 珠海采筑电子商务有限公司 Quality improvement method and system for intelligent robot

Also Published As

Publication number Publication date
CN109670581B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN109543832A (en) A kind of computing device and board
CN109522052A (en) A kind of computing device and board
CN109670581A (en) A kind of computing device and board
CN109740739A (en) Neural computing device, neural computing method and Related product
CN109189474A (en) Processing with Neural Network device and its method for executing vector adduction instruction
CN109657782A (en) Operation method, device and Related product
CN110163357A (en) A kind of computing device and method
CN109032670A (en) Processing with Neural Network device and its method for executing vector duplicate instructions
CN111047022B (en) Computing device and related product
CN110059797A (en) A kind of computing device and Related product
CN110147249A (en) A kind of calculation method and device of network model
CN109739703A (en) Adjust wrong method and Related product
CN110059809A (en) A kind of computing device and Related product
CN109753319A (en) A kind of device and Related product of release dynamics chained library
CN109711540A (en) A kind of computing device and board
CN109726800A (en) Operation method, device and Related product
CN109711538A (en) Operation method, device and Related product
CN111047021B (en) Computing device and related product
CN109740730A (en) Operation method, device and Related product
CN109740729A (en) Operation method, device and Related product
CN110472734A (en) A kind of computing device and Related product
CN111260070B (en) Operation method, device and related product
CN110515586A (en) Multiplier, data processing method, chip and electronic equipment
CN111738429B (en) Computing device and related product
CN111368990A (en) Neural network computing device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences

Applicant after: Zhongke Cambrian Technology Co.,Ltd.

Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences

Applicant before: Beijing Zhongke Cambrian Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant