CN107657316A - The cooperative system of general processor and neural network processor designs - Google Patents

The cooperative system of general processor and neural network processor designs Download PDF

Info

Publication number
CN107657316A
CN107657316A CN201610695285.4A CN201610695285A CN107657316A CN 107657316 A CN107657316 A CN 107657316A CN 201610695285 A CN201610695285 A CN 201610695285A CN 107657316 A CN107657316 A CN 107657316A
Authority
CN
China
Prior art keywords
cpu
module
treatment unit
dma
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610695285.4A
Other languages
Chinese (zh)
Other versions
CN107657316B (en
Inventor
余金城
姚颂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xilinx Technology Beijing Ltd
Original Assignee
Beijing Insight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Insight Technology Co Ltd filed Critical Beijing Insight Technology Co Ltd
Publication of CN107657316A publication Critical patent/CN107657316A/en
Application granted granted Critical
Publication of CN107657316B publication Critical patent/CN107657316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The present invention relates to artificial neural network (ANN), such as convolutional neural networks (CNN), more particularly to how to be designed based on the cooperative system of general processor and neutral net application specific processor to realize artificial neural network.

Description

The cooperative system of general processor and neural network processor designs
The priority requisition of reference
This application claims the Chinese patent application 201610663201.9 formerly submitted, a kind of " optimized artificial neural network Method " and Chinese patent application 201610663563.8 " a kind of to be used to realize ANN advanced treatment unit " priority.
Technical field
The present invention relates to artificial neural network (ANN), such as convolutional neural networks (CNN), more particularly to how based on logical Designed with the cooperative system of processor and neural network processor to realize artificial neural network.
Background technology
Convolutional neural networks have a very wide range of applications in present image process field, and neutral net has training method simple The characteristics of single, calculating structure is unified.But neutral net storage amount of calculation is all very big.Many work attempt to build on FPGA or Person direct design specialized chip realizes the accelerator of neutral net.But due to special neutral net accelerating hardware flexibility Or limited, can completing for task is excessively single.
Article " the Going Deeper With Embedded FPGA Platform for that inventor Yao Song etc. is delivered Convolutional Neural Network " (2016.2) describe a kind of acceleration system using FPGA, wherein using logical The calculating completed with processor (for example, ARM) to complete some FPGA to have no idea.For example, ARM is responsible for transmission instruction and standard Standby data.
The content of the invention
On the basis of above-mentioned article, further improvement is inventors herein proposed.Present applicant proposes with reference to neutral net Application specific processor and general processor (CPU) and a flexible system is provided, complicated neutral net can be applied to.
A kind of according to an aspect of the invention, it is proposed that advanced treatment unit for being used to run artificial neural network (ANN) (DPU), including:CPU, for dispatching programmable processing module (PL) and direct memory access device (DMA);Direct memory is visited Device (DMA) is asked, is connected respectively with the CPU, programmable processing module and external memory storage, for the CPU and and may be programmed Communication between processing module;Programmable processor module (PL), including:Controller (Controller), instructed for obtaining, And be scheduled based on the instruction to calculating core, calculate core (Computing Complex), including multiple computing units (PE), for carrying out calculating task, buffering area (buffer), for preserving the programmable processor mould based on instruction and data The data and instruction that block uses;External memory storage (DDR), is connected with described CPU, DMA, for preserving:For realizing ANN finger Order and needs are by the data of ANN processing;The CPU controls the DMA with the external memory storage and the FPGA Instruction and data is transmitted between module.
In addition, the DMA transmits data by FIFO between the external memory storage and the programmable processing module; The DMA transmits instruction by FIFO between the external memory storage and the programmable processing module.
A kind of according to another aspect of the invention, it is proposed that advanced treating list for being used to run artificial neural network (ANN) First (DPU), including:CPU, for dispatching programmable processing module (PL) and direct memory access device (DMA);Direct memory Accessor (DMA), it is connected with the CPU, programmable processing module and external memory storage, for the CPU and and can compiles respectively Communication between journey processing module;Programmable processor module (PL), including:Controller (Controller), refer to for obtaining Order, and be scheduled based on the instruction to calculating core, calculate core (Computing Complex), including multiple computing units (PE), for carrying out calculating task, buffering area (buffer), for preserving the programmable processor mould based on instruction and data The data and instruction that block uses;External memory storage (DDR), is connected with described CPU, DMA and programmed logical module, for protecting Deposit:For realizing ANN instruction and needing by the data of ANN processing;Wherein described CPU controls the DMA to be deposited in the outside Instruction is transmitted between reservoir and the programmed logical module;Wherein described programmed logical module and the external memory storage are straight Connect transmission data.
Instructed in addition, being transmitted between the DMA and the programmable processing module by FIFO.
In addition, the CPU also includes:State monitoring module, for monitoring the finite state of the programmed logical module The state of machine (FSM).
In addition, the computing unit (PE) includes:Complicated convolution kernel (convolver complex), with the buffering area It is connected to receive ANN weight, input data, for carrying out the operation of the convolutional calculation in the ANN;Add tree (adder Tree), it is connected with the complicated convolution kernel, for the result summation operated to convolutional calculation;Non-linearization module, add with described Method tree connects, for linear function operation being applied to the output of the add tree.
In addition, the computing unit (PE) also includes:Collection module, it is connected with the nonlinear block, for carrying out institute State the integration operations in ANN.
In addition, the buffering area includes:Input block, for prepare it is described calculating assess the input data used, Instruction;Output buffer, preserve and export result of calculation.
In addition, the buffering area also includes:Bias shift device (bias shift), for shifting the weight to different Quantizing range, the weight are the fixed-point number being quantized, and the weight after displacement are exported to the add tree.
According to one embodiment of present invention, the CPU, programmed logical module, the DMA are implemented in a SOC On.The external memory storage is implemented on another chip different from the SOC.
Brief description of the drawings
Fig. 1 a and 1b show the common structure of artificial nerve network model.
Fig. 2 shows flow artificial nerve network model being deployed on specialized hardware.
Fig. 3 shows the overall flow of optimized artificial neural network.
Fig. 4 shows that the collaboration for using CPU and special accelerator (for example, DPU) according to a first embodiment of the present invention is set Count to realize the hardware structure of artificial neural network.
Fig. 5 shows that hardware structure shown in Fig. 3 uses FIFO data transmission mechanism.
Fig. 6 shows that the collaboration for using CPU and special accelerator (for example, DPU) according to a second embodiment of the present invention is set Count to realize the hardware structure of artificial neural network.
Fig. 7 shows the further improvement to first embodiment of the invention.
Fig. 8 shows the further improvement to second embodiment of the invention.
Fig. 9 shows the similarities and differences of the handling process of first, second embodiment.
Embodiment
A part of content of the application is once by inventor Yao Song academic article " Going Deeper With Embedded FPGA Platform for Convolutional Neural Network " (2016.2) are delivered.The application More improvement have been carried out on its basis.
In the application, mainly it will illustrate improvement of the present invention to CNN by taking image procossing as an example.Deep neural network (DNN) and Recognition with Recurrent Neural Network (RNN) is similar with CNN.
CNN basic conceptions
CNN reaches state-of-the-art performance in extensive visual correlation task.Help, which is understood in the application, to be analyzed Based on CNN image classification algorithms, we describe CNN rudimentary knowledge first, introduce image network data set and existing CNN moulds Type.
As shown in Fig. 1 (a), typical CNN is made up of a series of layer of orderly functions.
The parameter of CNN models is referred to as " weight " (weights).CNN first layer reads input picture, and exports a system The characteristic pattern (map) of row.Following layer reads the characteristic pattern as caused by last layer, and exports new characteristic pattern.Last point The probability for each classification that class device (classifier) output input picture may belong to.CONV layers (convolutional layer) and FC layers are (complete Even layer) it is two kinds of basic channel types in CNN.After CONV layers, generally there is tether layer (Pooling layers).
For example, for a CNN layer,J-th of input feature vector figure (input feature map) is represented,Represent I-th of output characteristic figure (output feature map), biRepresent the bias term of i-th of output figure.
For CONV layers, ninAnd noutThe quantity of input and output characteristic figure is represented respectively.
For FC layers, ninAnd noutThe length of input and output characteristic vector is represented respectively.
The definition of CONV layers (Convolutional layers, convolutional layer):CONV layers are using series of features figure as defeated Enter, and output characteristic figure is obtained with convolution kernels convolution.
The non-linear layer being generally connected with CONV layers, i.e. nonlinear activation function, be applied to every in output characteristic figure Individual element.
CONV layers can be represented with expression formula 1:
Wherein gI, jIt is applied to the convolution kernels of j-th of input feature vector figure and i-th of output characteristic figure.
The definition of FC layers (Fully-Connected layers, connect layer entirely):FC layers are applied on input feature value One linear transformation:
fout=Wfin+b (2)
W is a nout×ninTransformation matrix, b are bias terms.It is noted that for FC layers, input is not several two dimensions The combination of characteristic pattern, but a characteristic vector.Therefore, in expression formula 2, parameter ninAnd noutActually correspond to input and The length of output characteristic vector.
Collect (pooling) layer:Generally it is connected with CONV layers, for exporting each subregion in each characteristic pattern (subarea) maximum or average value.Pooling maximums can be represented by expression formula 3:
Wherein p is the size for collecting kernel.This nonlinear " down-sampled " is not only that next layer reduces characteristic pattern Size and calculating, additionally provide a kind of translation invariant (translation invariance).
CNN can be used for during forward inference carrying out image classification.But before CNN is used to any task, it should first First train CNN data sets.It has recently been demonstrated that the CNN of the forward direction training based on large data sets for a Given task Model can be used for other tasks, and realize high-precision minor adjustment in network weight (network weights), this Minor adjustment is called " fine setting (fine-tune) ".CNN training is mainly realized on large server.For embedded FPGA platform, we are absorbed in the reasoning process for accelerating CNN.
Image-Net data sets
Image-Net data sets are considered as canonical reference benchmark, to assess the performance of image classification and algorithm of target detection. Up to the present, Image-Net data sets have been have collected in individual classification more than 20,000 1 thousand more than 14,000,000 width images.Image- Net is that ILSVRC classification tasks issue one and have 1000 classifications, the subset of 1,200,000 images, and CV technologies are greatly facilitated Development.In this application, all CNN models are verified by ILSVRC 2014 and collected by the training set trainings of ILSVRC 2014 Assess.
Existing CNN models
In ILSVRC in 2012, Supervision teams used AlexNet, first place have been won in image classification task, 84.7% preceding 5 precision.CaffeNet has minor variations on the basis of AlexNet is replicated.AlexNet and CaffeNet are wrapped Include 5 CONV layers and 3 FC layers.
In ILSVRC in 2013, Zeiler-and-Fergus (ZF) networks won first place in image classification task, 88.8% preceding 5 precision.ZF networks also have 5 CONV layers and 3 FC layers.
As shown in Fig. 1 (b), illustrate a typical CNN from the data flow angle of input-output.
CNN shown in Fig. 1 (b) includes 5 CONV groups Conv1, conv2, conv3, conv4, conv5,3 FC layers FC1, FC2, FC3, and a Softmax decision function, wherein each CONV groups include 3 convolutional layers.
What Fig. 2 was represented is the software optimization of artificial neural network and hard-wired schematic diagram.
As shown in Fig. 2 in order to accelerate CNN, technological package scheme is proposed from the angle of Optimizing Flow and hardware structure.
Side shows artificial nerve network model under figure 2.Illustrate how to compress CNN models between in fig. 2 to reduce EMS memory occupation and operation amount, while loss of significance is reduced to greatest extent.
The specialized hardware that CNN after being shown on the upside of Fig. 2 as compression is provided.
Shown on the upside of Fig. 2, in hardware structure, including two modules of PS and PL.
(processing system, PS) includes in generic processing system:CPU and external memory storage (EXTERNAL MEMORY)。
Programmed logical module (Programmable Logic, PL) includes:DMA and calculate core, input/output buffering with And controller etc..
As shown in Fig. 2 PL is provided with:Complicated calculations (Computing Complex), input block, output buffer, Controller and direct memory access (DMA).
Calculating core includes multiple processing units (PEs), and it is responsible for CONV layers, tether layer and FC layers in artificial neural network Most calculating task.
Chip buffering area includes input block and output buffer, prepares the data that PEs is used and stores result.
Controller, instruction on external memory storage is obtained, to instruction decoding (if desired), and to all moulds in PL Block is allocated, except DMA.
DMAs is used to transmit data and instruction of the external memory storage (such as DDR) between PL.
PS includes general processor (CPU) 8110 and external memory storage 8120.
External memory storage stores the model parameter, data and instruction of all people's artificial neural networks.
PS is stone, and hardware configuration is fixed, and is scheduled with software.
PL is programmable hardware logic, and hardware configuration is variable.For example, the programmed logical module (PL) can be FPGA。
, will although DMA in PL sides, directly receives CPU control it is noted that according to an embodiment of the invention Data are transported in PL from EXTERNAL MEMORY.
Therefore, the hardware structure shown in Fig. 2 is only function division, and the boundary between above-mentioned PL and PS is not absolute.For example, In actual implementation, the PL and CPU can be realized on a SOC, such as xilinx Zynq chips.The external memory storage It can be realized by another memory chip, be connected with the CPU in the SOC.
The Optimizing Flow before artificial neural network is deployed to hardware chip is shown in Fig. 3.
Fig. 3 input is original artificial neural network.
Step 405:Compression
Compression step can include trimming CNN models.Network cut is proved to be a kind of effective method, to subtract The complexity and overfitting of few network.For example, with reference to B.Hassibi and D.G.Stork article " Second order derivatives for network pruning:Optimal brain surgeon”。
In priority requisition 201610663201.9, " a kind of method of optimized artificial neural network " in the application reference, Propose and a kind of compress the method for CNN networks by trimming.
First, initialization step, the weights initialisation convolutional layer, FC layers is random value, wherein generating with complete The ANN of connection, the connection have weight parameter.
Second, training step, the ANN is trained, according to ANN precision, to adjust ANN weight, until the precision Reach preassigned.
For example, the training step adjusts the weight of the ANN based on stochastic gradient descent algorithm, i.e., random adjustment power Weight values, the intensive reading based on ANN change to be selected.On the introduction of stochastic gradient algorithm, above-mentioned " Learning may refer to both weights and connections for efficient neural networks”。
The precision can be quantified as, for training dataset, the difference between ANN prediction result and correct result.
3rd, shearing procedure, based on predetermined condition, the unessential connection in ANN is found, is trimmed described unessential Connection.Specifically, the weight parameter for the connection being trimmed to about no longer is saved.
The predetermined condition includes following one of any:The weight parameter of connection is 0;Or the weight parameter of connection is less than in advance Definite value.
4th, trim step, the connection being trimmed to about is re-set as the connection that weight parameter value is zero, i.e. recover institute The connection being trimmed to about is stated, and distributes weighted value as 0.
Finally, judge that ANN precision reaches preassigned.If not provided, repeat second, third, four steps.
Step 410:Data fixed point quantifies
For a fixed-point number, its value represents as follows:
Wherein bw is several bit widths, flBe can be negative part length (fractional length).
In order to obtain full accuracy while floating number is converted into fixed-point number, a dynamic accuracy number is inventors herein proposed According to quantization strategy and automatic workflow.
It is different from former static accuracy quantization strategy, in the data quantization flow given by us, flFor different Layer and feature atlas are dynamic changes, while keep static in one layer, to reduce by every layer of truncated error as far as possible.
The quantization flow proposed is mainly made up of two stages.
(1) weight quantization stage:
The purpose of weight quantization stage is the optimal f for the weight for finding a layerl, such as expression formula 5:
Wherein W is weight, W (bw, fl) represent in given bw and flUnder W fixed point format.
In one embodiment, the dynamic range of each layer of weight is analyzed first, such as is estimated by sampling.It Afterwards, in order to avoid data are overflowed, f is initializedl.In addition, we are in initial flThe optimal f of neighborhood searchl
According to another embodiment, in weight pinpoints quantization step, optimal f is found using another wayl, such as table Up to formula 6.
Wherein, i represents a certain position in bw position, kiFor this weight.By the way of expression formula 6, to different positions Different weights is given, then calculates optimal fl
(2) the data quantization stage.
The data quantization stage is it is intended that the feature atlas between two layers of CNN models finds optimal fl
In this stage, CNN is trained using training dataset (bench mark).The training dataset can be data set 0。
According to one embodiment of present invention, all CNN CONV layers are completed first, the weight of FC layers quantifies, then carried out Data quantization.Now, training dataset is input to the CNN for being quantized weight, by the successively place of CONV layers, FC layers Reason, obtains each layer input feature vector figure.
For each layer of input feature vector figure, successively compared in fixed point CNN models and floating-point CNN models using greedy algorithm Between data, to reduce loss of significance.Each layer of optimization aim is as shown in expression formula 7:
In expression formula 7, when A represents the calculating of one layer (such as a certain CONV layers or FC layers), x represents input, x+=A During x, x+Represent the output of this layer.It is worth noting that, for CONV layers or FC layers, direct result x+With than given standard Longer bit width.Therefore, as optimal flNeed to block during selection.Finally, whole data quantization configuration generation.
According to another embodiment, in data pinpoint quantization step, optimal f is found using another wayl, such as table Up to formula 8.
Wherein, i represents a certain position in bw position, kiFor this weight.It is similar with the mode of expression formula 4, to different Different weights is given in position, then calculates optimal fl
Above-mentioned data quantization step obtains optimal fl
In addition, according to another embodiment, weight quantifies and data quantify not being alternately to carry out successively.
For the flow order of data processing, the convolutional layer (CONV layers) of the ANN, connect each layer in layer (FC layers) entirely The each feature atlas obtained for series relationship, the training dataset when being handled successively by the CONV layers of the ANN and FC layers.
Specifically, the weight quantization step and the data quantization step according to the series relationship alternately, Wherein after the weight quantization step completes the fixed point quantization of wherein a certain layer, the feature atlas exported to this layer performs Data quantization step.
First embodiment
In priority requisition, the collaborative design using general processor and special accelerator is inventors herein proposed, but simultaneously How do not discuss efficiently using the flexibility of general processor and the computing capability of special accelerator, such as how to transmit and refer to Make, transmit data, perform calculating etc..In the application, further prioritization scheme is inventors herein proposed.
Fig. 4 shows the further improvement to Fig. 2 hardware configuration.
In Fig. 4, CPU controls DMA, DMA to be responsible for dispatching data.Specifically, CPU controls DMA by external memory storage (DDR) instruction in is transported in FIFO.Then, special accelerator instruction fetch and performs in FIFO.
Data number is also transported in FIFO by the data required for special accelerator by CPU controls DMA from DDR, is calculated When from FIFO carry data calculated.Equally, CPU also safeguards the carrying work of the output data of accelerator.
When operation, CPU needs moment monitoring DMA state.When Input FIFO are discontented, it is necessary to data from It is transported in DDR in Input FIFO.When output FIFO not space-time, it is necessary to which data are carried back DDR from Output FIFO In.
In addition, Fig. 4 special accelerator includes:Controller, calculate core (computation Complex) and buffering area (buffer)。
Calculating core includes:Acoustic convolver, adder tree, nonlinear block etc..
The size of convolution kernel generally only has several options such as 3 × 3,5 × 5 and 7 × 7.For example, designed for convolution operation two It is 3 × 3 windows to tie up acoustic convolver.
Adder tree (AD) is summed to all results of acoustic convolver.Non-linear (NL) module is applied to nonlinear activation function Input traffic.For example, the function can be ReLU functions.(do not show in addition, maximum collects (Max-Pooling) module Go out) it is used for integration operations, for example, specific 2 × 2 window is used for into input traffic, and export maximum therein.
Buffering area includes:Input data buffering area, data output buffer area, bias shift (bias shift) module.
Bias shift (bias shift) is used for the conversion for supporting dynamic quantization scope.For example, shifted for weight. Further for example, shifted for data.
Input data buffering area can also include:Input Data Buffer, weight buffer.Input Data Buffer can be with It is wire data buffer (line buffer), for preserving the data of computing needs, and data described in sustained release, with reality The reuse of the existing data.
Fig. 5 shows the FIFO interactive modes between CPU and special accelerator.
There are 3 class FIFO in Organization Chart shown in Fig. 5.Equally, controls of the CPU to DMA also has three kinds.
In first embodiment, communicated completely by FIFO between CPU and special accelerator, between CPU and special accelerator There are three classes to cache FIFO:Instruction, input data, output data.Specifically, under the control of cpu, DMA is responsible for external memory Input data, output data, instruction transmission between special accelerator, wherein being carried respectively between DMA and special accelerator Input data FIFO, output data FIFO, instruction FIFO are supplied.
For special accelerator, this design is simple, it is only necessary to is concerned about and calculates, it is not necessary to is concerned about data.Data are grasped Work is controlled by CPU completely.
But in some application scenarios, scheme also has weak point shown in Fig. 5.
First, CPU performs the resource that scheduling will consume CPU.For example, CPU needs each FIFO of moment monitoring state, with When prepare receive and send data.CPU listening states and when will consume substantial amounts of CPU according to different state processing data Between.In some applications, CPU monitors FIFO and the cost of processing data can be very big, causes CPU to be almost fully occupied, without CPU Other tasks (reading picture, pretreatment picture etc.) of time-triggered protocol.
Secondly, need to set multiple FIFO in special accelerator, also take PL resources.
Second embodiment
The characteristics of second embodiment, is as follows:First, application specific processor shares external memory with CPU, both can read External memory.Secondly, CPU only controls the instruction input of special accelerator.In this way, CPU cooperates with special accelerator system Operation, wherein CPU undertake the task that some special accelerators can not be completed.
As shown in fig. 6, in second embodiment, special accelerator (PL) and external memory (DDR) direct interaction.Correspondingly, Input FIFO and the Ouput FIFO (as shown in Figure 5) between DMA and special accelerator are eliminated, only retains 1 FIFO and exists Instruction is transmitted between DMA and special accelerator, saves resource.
For CPU, it is not necessary to complicated scheduling is carried out to inputoutput data, and by special accelerator directly from outer Portion's internal memory (DDR) accesses data.When artificial neural network is run, CPU can carry out other processing, such as be read from camera Take pending view data etc..
Therefore, second embodiment solves the problems, such as that CPU tasks are overweight, and CPU can be freed, and processing is more to appoint Business.But, special accelerator needs oneself to perform and the data access of external memory (DDR) is controlled.
The improvement of first, second embodiment
In first and second embodiments, CPU is to control accelerator by instructing.
Accelerator may occur mistake " run fly " in the process of running, and (that is, program enters endless loop or without meaning Disorderly operation).In current scheme, it is winged that CPU can not determine whether accelerator has run.
In the improvement embodiment based on first or second embodiments, inventor additionally provides " state peripheral hardware " in CPU, So as to which the state of the finite state machine (FSM) in special accelerator (PL) is directly passed into CPU.
CPU can understand the running situation of accelerator by detecting the state of finite state machine (FSM).If it find that accelerate Device run fly or it is stuck, CPU can also send signal direct reduction accelerator.
Fig. 7 shows the example that " state peripheral hardware " is added on the framework of first embodiment shown in Fig. 4.
Fig. 8 shows the example that " state peripheral hardware " is added on the framework of second embodiment shown in Fig. 6.
As shown in Figure 7,8, finite state machine (FSM), the state of finite state machine are set in the controller of special accelerator CPU state peripheral hardware (that is, monitoring module) is transmitted directly to, so as to which CPU can run the failures such as deadlock with monitoring programme Situation.
First and second embodiments compare
Two kinds of scheduling strategies of the first and second embodiments are each advantageous.
In Fig. 4 embodiment, image data needs CPU to dispatch DMA to be transferred to special accelerator, so special accelerator Meeting having time is left unused.But because CPU scheduling data are carried, special accelerator is only responsible for calculating, and computing capability is optimised, processing The time of data also can shorter.
In Fig. 6 embodiment, there is special accelerator the ability for individually accessing data to be carried without CPU scheduling data. Data processing can independently be carried out on special accelerator.
CPU can only be responsible for the digital independent between external system and output.Read operation refers to that for example CPU is picture Data are read out from camera (not shown), are transferred to external memory storage;Output operation refer to CPU the output after identification from External memory is output to screen (not shown).
Using Fig. 6 embodiment, task pipeline can be got up so that the processing speed of multitask is faster.Accordingly Cost is that special accelerator is responsible for calculating simultaneously and data are carried, and inefficient, processing needs the longer time.
Fig. 9 contrasts show the similarities and differences of the handling process of the first and second embodiments.
The application of second embodiment:Recognition of face
According to second embodiment, because there is shared external memory (DDR), CPU can be total to special accelerator With one calculating task of completion.
For example, in the task of recognition of face:CPU can read camera and detect the face in input picture; Neutral net crosses the identification that accelerator accelerates core to complete face.
Can be quickly by the neural computing task portion above CPU using use CPU and special accelerator collaborative design Administration is on embedded device.
Specifically, referring to Fig. 9 example 2, the reading (for example, being read from camera) and in advance of picture is run on CPU Processing, the processing procedure of picture is completed on special accelerator.
Come because the above method isolates the task of CPU task and accelerator so that CPU and accelerator can be complete Parallel processing task.
Form 1 is illustrated merely with the performance pair between CPU and second embodiment (the special accelerator collaborative designs of CPU+) Than.
Form 1
CPU as a comparison uses the tall and handsome Tegra k1 up to company's production.It can be seen that the CPU+ using us The collaborative design of special accelerator has obvious acceleration to each layer, overall to accelerate to have reached 7 times.
It is an advantage of the current invention that using CPU (general processor) it is feature-rich the characteristics of make up special accelerator and (can compile Journey logic module PL, such as FPGA) flexibility deficiency the characteristics of, also using special accelerator calculating speed it is fast the characteristics of make up The characteristics of CPU calculating speeds are not enough to complete to calculate in real time.
In addition, general processor can be arm processor, or any other CPU.Programmed logical module can be FPGA or other programmable application specific processors (ASIC).
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment emphasis What is illustrated is all the difference with other embodiment, between each embodiment identical similar part mutually referring to.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Gai Ben
Within the protection domain of invention.Therefore, protection scope of the present invention should it is described using scope of the claims as It is accurate.

Claims (18)

1. one kind is used for the advanced treatment unit (DPU) for running artificial neural network (ANN), including:
CPU, for dispatching programmable processing module (PL) and direct memory access device (DMA);
Direct memory access device (DMA), it is connected respectively with the CPU, programmable processing module and external memory storage, for institute State CPU and the communication between programmable processing module;
Programmable processor module (PL), including:
Controller (Controller), instruct for obtaining, and be scheduled based on the instruction to calculating core;
Core (Computing Complex), including multiple computing units (PE) are calculated, for being calculated based on instruction and data Task;
Buffering area (buffer), for preserving data and the instruction that the programmable processor module uses;
External memory storage (DDR), it is connected with the CPU, direct memory access device (DMA), for preserving:Realize ANN finger Order and needs are by the data of ANN processing;
The CPU controls the DMA to transmit instruction sum between the external memory storage and the programmed logical module According to.
2. advanced treatment unit according to claim 1, wherein:
The DMA transmits data by FIFO between the external memory storage and the programmable processing module;
The DMA transmits instruction by FIFO between the external memory storage and the programmable processing module.
3. advanced treatment unit according to claim 1, wherein the CPU also includes:
State monitoring module, the state of the finite state machine (FSM) for monitoring the programmed logical module.
4. advanced treatment unit according to claim 1, the computing unit (PE) includes:
Complicated convolution kernel (convolver complex), it is connected with the buffering area to receive ANN weight, input data, uses Operated in carrying out the convolutional calculation in the ANN;
Add tree (adder tree), is connected with the complicated convolution kernel, for the result summation operated to convolutional calculation;
Non-linearization module, it is connected with the add tree, for linear function operation being applied to the output of the add tree.
5. advanced treatment unit according to claim 4, the computing unit (PE) also includes:
Collection module, it is connected with the nonlinear block, for carrying out the integration operations in the ANN.
6. advanced treatment unit according to claim 1, the buffering area includes:
Input block, the input data used, instruction are assessed for preparing the calculating;
Output buffer, preserve and export result of calculation.
7. advanced treatment unit according to claim 6, the buffering area also includes:
Bias shift device (bias shift), for shifting the weight to different quantizing ranges, the weight is is quantized Fixed-point number, and the weight after displacement is exported to the add tree.
8. advanced treatment unit according to claim 1, wherein the CPU, programmed logical module, the DMA are by reality On a present SOC.
9. advanced treatment unit according to claim 8, wherein the external memory storage is implemented in different from the SOC Another chip on.
10. one kind is used for the advanced treatment unit (DPU) for running artificial neural network (ANN), including:
CPU, for dispatching programmable processing module (PL) and direct memory access device (DMA);
Direct memory access device (DMA), it is connected respectively with the CPU, programmable processing module and external memory storage, for institute State CPU and the communication between programmable processing module;
Programmable processor module (PL), including:
Controller (Controller), instruct for obtaining, and be scheduled based on the instruction to calculating core;
Core (Computing Complex), including multiple computing units (PE) are calculated, for being calculated based on instruction and data Task;
Buffering area (buffer), for preserving data and the instruction that the programmable processor module uses;
External memory storage (DDR), is connected, for preserving respectively with described CPU, DMA and programmed logical module:Realize ANN's Instruction and needs are by the data of ANN processing;
Wherein described CPU controls the DMA to transmit instruction between the external memory storage and the programmed logical module;
Wherein described programmed logical module and the external memory storage directly transmit data.
11. advanced treatment unit according to claim 10, wherein leading between the DMA and the programmable processing module Cross FIFO transmission instructions.
12. advanced treatment unit according to claim 10, wherein the CPU also includes:
State monitoring module, the state of the finite state machine (FSM) for monitoring the programmed logical module.
13. advanced treatment unit according to claim 10, the computing unit (PE) includes:
Complicated convolution kernel (convolver complex), it is connected with the buffering area to receive ANN weight, input data, uses Operated in carrying out the convolutional calculation in the ANN;
Add tree (adder tree), is connected with the complicated convolution kernel, for the result summation operated to convolutional calculation;
Non-linearization module, it is connected with the add tree, for linear function operation being applied to the output of the add tree.
14. advanced treatment unit according to claim 13, the computing unit (PE) also includes:
Collection module, it is connected with the nonlinear block, for carrying out the integration operations in the ANN.
15. advanced treatment unit according to claim 10, the buffering area includes:
Input block, the input data used, instruction are assessed for preparing the calculating;
Output buffer, preserve and export result of calculation.
16. advanced treatment unit according to claim 15, the buffering area also includes:
Bias shift device (bias shift), for shifting the weight to different quantizing ranges, the weight is is quantized Fixed-point number, and the weight after displacement is exported to the add tree.
17. advanced treatment unit according to claim 10, wherein the CPU, programmed logical module, the DMA quilts Realize on a SOC.
18. advanced treatment unit according to claim 17, wherein the external memory storage is implemented in different from described On SOC another chip.
CN201610695285.4A 2016-08-12 2016-08-19 Design of cooperative system of general processor and neural network processor Active CN107657316B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN2016106632019 2016-08-12
CN201610663201 2016-08-12
CN201610663563 2016-08-12
CN2016106635638 2016-08-12

Publications (2)

Publication Number Publication Date
CN107657316A true CN107657316A (en) 2018-02-02
CN107657316B CN107657316B (en) 2020-04-07

Family

ID=61127258

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201610695285.4A Active CN107657316B (en) 2016-08-12 2016-08-19 Design of cooperative system of general processor and neural network processor
CN201610698184.2A Active CN107688855B (en) 2016-08-12 2016-08-19 Hierarchical quantization method and device for complex neural network

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201610698184.2A Active CN107688855B (en) 2016-08-12 2016-08-19 Hierarchical quantization method and device for complex neural network

Country Status (1)

Country Link
CN (2) CN107657316B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491890A (en) * 2018-04-04 2018-09-04 百度在线网络技术(北京)有限公司 Image method and device
CN108564165A (en) * 2018-03-13 2018-09-21 上海交通大学 The method and system of convolutional neural networks fixed point optimization
CN109034025A (en) * 2018-07-16 2018-12-18 东南大学 A kind of face critical point detection system based on ZYNQ
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
CN109711367A (en) * 2018-12-29 2019-05-03 北京中科寒武纪科技有限公司 Operation method, device and Related product
CN109740619A (en) * 2018-12-27 2019-05-10 北京航天飞腾装备技术有限责任公司 Neural network terminal operating method and device for target identification
CN110413255A (en) * 2018-04-28 2019-11-05 北京深鉴智能科技有限公司 Artificial neural network method of adjustment and device
CN110554913A (en) * 2018-05-30 2019-12-10 三星电子株式会社 Neural network system, operation method thereof and application processor
CN110569713A (en) * 2019-07-22 2019-12-13 北京航天自动控制研究所 Target detection system and method for realizing data serial-parallel two-dimensional transmission by using DMA (direct memory access) controller
WO2019238029A1 (en) * 2018-06-12 2019-12-19 华为技术有限公司 Convolutional neural network system, and method for quantifying convolutional neural network
CN110889497A (en) * 2018-12-29 2020-03-17 中科寒武纪科技股份有限公司 Learning task compiling method of artificial intelligence processor and related product
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN111344719A (en) * 2019-07-22 2020-06-26 深圳市大疆创新科技有限公司 Data processing method and device based on deep neural network and mobile device
CN111461310A (en) * 2019-01-21 2020-07-28 三星电子株式会社 Neural network device, neural network system and method for processing neural network model
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN111868754A (en) * 2018-03-23 2020-10-30 索尼公司 Information processing apparatus, information processing method, and computer program
CN112396157A (en) * 2019-08-12 2021-02-23 美光科技公司 System, method and apparatus for communicating with data storage devices in neural network computing
CN112805727A (en) * 2018-10-08 2021-05-14 深爱智能科技有限公司 Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network
CN113240101A (en) * 2021-05-13 2021-08-10 湖南大学 Method for realizing heterogeneous SoC (system on chip) by cooperative acceleration of software and hardware of convolutional neural network
CN113361695A (en) * 2021-06-30 2021-09-07 南方电网数字电网研究院有限公司 Convolutional neural network accelerator

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11620130B2 (en) 2018-02-13 2023-04-04 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
EP3640863B1 (en) 2018-02-13 2021-10-27 Shanghai Cambricon Information Technology Co., Ltd Computation device and method
CN116991225A (en) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 Control device, method and equipment of processor
CN108510067B (en) * 2018-04-11 2021-11-09 西安电子科技大学 Convolutional neural network quantification method based on engineering realization
EP3624020A4 (en) 2018-05-18 2021-05-05 Shanghai Cambricon Information Technology Co., Ltd Computing method and related product
CN108805265B (en) * 2018-05-21 2021-03-30 Oppo广东移动通信有限公司 Neural network model processing method and device, image processing method and mobile terminal
CN110555450B (en) * 2018-05-31 2022-06-28 赛灵思电子科技(北京)有限公司 Face recognition neural network adjusting method and device
CN110555508B (en) * 2018-05-31 2022-07-12 赛灵思电子科技(北京)有限公司 Artificial neural network adjusting method and device
JP6867518B2 (en) 2018-08-28 2021-04-28 カンブリコン テクノロジーズ コーポレイション リミティド Data preprocessing methods, devices, computer equipment and storage media
KR20200026455A (en) * 2018-09-03 2020-03-11 삼성전자주식회사 Artificial neural network system and method of controlling fixed point in artificial neural network
EP3836032A4 (en) * 2018-09-21 2021-09-08 Huawei Technologies Co., Ltd. Quantization method and apparatus for neural network model in device
US11703939B2 (en) 2018-09-28 2023-07-18 Shanghai Cambricon Information Technology Co., Ltd Signal processing device and related products
CN109523016B (en) * 2018-11-21 2020-09-01 济南大学 Multi-valued quantization depth neural network compression method and system for embedded system
CN111383638A (en) 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Signal processing device, signal processing method and related product
US10592799B1 (en) * 2019-01-23 2020-03-17 StradVision, Inc. Determining FL value by using weighted quantization loss values to thereby quantize CNN parameters and feature values to be used for optimizing hardware applicable to mobile devices or compact networks with high precision
CN110009096A (en) * 2019-03-06 2019-07-12 开易(北京)科技有限公司 Target detection network model optimization method based on embedded device
CN111832739B (en) 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 Data processing method and related product
US11847554B2 (en) 2019-04-18 2023-12-19 Cambricon Technologies Corporation Limited Data processing method and related products
CN112085189B (en) 2019-06-12 2024-03-29 上海寒武纪信息科技有限公司 Method for determining quantization parameter of neural network and related product
US11676029B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
CN110348562B (en) * 2019-06-19 2021-10-15 北京迈格威科技有限公司 Neural network quantization strategy determination method, image identification method and device
CN110309877B (en) * 2019-06-28 2021-12-07 北京百度网讯科技有限公司 Feature map data quantization method and device, electronic equipment and storage medium
CN112446460A (en) * 2019-08-28 2021-03-05 上海寒武纪信息科技有限公司 Method, apparatus and related product for processing data
CN110837890A (en) * 2019-10-22 2020-02-25 西安交通大学 Weight value fixed-point quantization method for lightweight convolutional neural network
CN111144511B (en) * 2019-12-31 2020-10-20 上海云从汇临人工智能科技有限公司 Image processing method, system, medium and electronic terminal based on neural network
CN111178522B (en) * 2020-04-13 2020-07-10 杭州雄迈集成电路技术股份有限公司 Software and hardware cooperative acceleration method and system and computer readable storage medium
CN112561933A (en) * 2020-12-15 2021-03-26 深兰人工智能(深圳)有限公司 Image segmentation method and device
CN115705482A (en) * 2021-07-20 2023-02-17 腾讯科技(深圳)有限公司 Model quantization method and device, computer equipment and storage medium
CN114708180B (en) * 2022-04-15 2023-05-30 电子科技大学 Bit depth quantization and enhancement method for predistortion image with dynamic range preservation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1776644A (en) * 2005-12-09 2006-05-24 中兴通讯股份有限公司 Method for monitoring internal memory varible rewrite based on finite-state-machine
CN104794102A (en) * 2015-05-14 2015-07-22 哈尔滨工业大学 Embedded system on chip for accelerating Cholesky decomposition
CN105224482A (en) * 2015-10-16 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of FPGA accelerator card high-speed memory system
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016039651A1 (en) * 2014-09-09 2016-03-17 Intel Corporation Improved fixed point integer implementations for neural networks
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1776644A (en) * 2005-12-09 2006-05-24 中兴通讯股份有限公司 Method for monitoring internal memory varible rewrite based on finite-state-machine
CN104794102A (en) * 2015-05-14 2015-07-22 哈尔滨工业大学 Embedded system on chip for accelerating Cholesky decomposition
CN105224482A (en) * 2015-10-16 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of FPGA accelerator card high-speed memory system
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANTAO QIU ET AL.: "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network", 《FPGA’16 PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564165B (en) * 2018-03-13 2024-01-23 上海交通大学 Method and system for optimizing convolutional neural network by fixed point
CN108564165A (en) * 2018-03-13 2018-09-21 上海交通大学 The method and system of convolutional neural networks fixed point optimization
CN111868754A (en) * 2018-03-23 2020-10-30 索尼公司 Information processing apparatus, information processing method, and computer program
CN108491890A (en) * 2018-04-04 2018-09-04 百度在线网络技术(北京)有限公司 Image method and device
CN108491890B (en) * 2018-04-04 2022-05-27 百度在线网络技术(北京)有限公司 Image method and device
CN110413255A (en) * 2018-04-28 2019-11-05 北京深鉴智能科技有限公司 Artificial neural network method of adjustment and device
CN110413255B (en) * 2018-04-28 2022-08-19 赛灵思电子科技(北京)有限公司 Artificial neural network adjusting method and device
CN110554913A (en) * 2018-05-30 2019-12-10 三星电子株式会社 Neural network system, operation method thereof and application processor
WO2019238029A1 (en) * 2018-06-12 2019-12-19 华为技术有限公司 Convolutional neural network system, and method for quantifying convolutional neural network
CN109034025A (en) * 2018-07-16 2018-12-18 东南大学 A kind of face critical point detection system based on ZYNQ
CN112805727A (en) * 2018-10-08 2021-05-14 深爱智能科技有限公司 Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
CN109740619A (en) * 2018-12-27 2019-05-10 北京航天飞腾装备技术有限责任公司 Neural network terminal operating method and device for target identification
CN109711367A (en) * 2018-12-29 2019-05-03 北京中科寒武纪科技有限公司 Operation method, device and Related product
CN110889497A (en) * 2018-12-29 2020-03-17 中科寒武纪科技股份有限公司 Learning task compiling method of artificial intelligence processor and related product
CN111461310A (en) * 2019-01-21 2020-07-28 三星电子株式会社 Neural network device, neural network system and method for processing neural network model
CN111344719A (en) * 2019-07-22 2020-06-26 深圳市大疆创新科技有限公司 Data processing method and device based on deep neural network and mobile device
CN110569713B (en) * 2019-07-22 2022-04-08 北京航天自动控制研究所 Target detection system and method for realizing data serial-parallel two-dimensional transmission by using DMA (direct memory access) controller
CN110569713A (en) * 2019-07-22 2019-12-13 北京航天自动控制研究所 Target detection system and method for realizing data serial-parallel two-dimensional transmission by using DMA (direct memory access) controller
CN112396157A (en) * 2019-08-12 2021-02-23 美光科技公司 System, method and apparatus for communicating with data storage devices in neural network computing
CN110990060B (en) * 2019-12-06 2022-03-22 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN110990060A (en) * 2019-12-06 2020-04-10 北京瀚诺半导体科技有限公司 Embedded processor, instruction set and data processing method of storage and computation integrated chip
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN113240101A (en) * 2021-05-13 2021-08-10 湖南大学 Method for realizing heterogeneous SoC (system on chip) by cooperative acceleration of software and hardware of convolutional neural network
CN113240101B (en) * 2021-05-13 2022-07-05 湖南大学 Method for realizing heterogeneous SoC (system on chip) by cooperative acceleration of software and hardware of convolutional neural network
CN113361695A (en) * 2021-06-30 2021-09-07 南方电网数字电网研究院有限公司 Convolutional neural network accelerator

Also Published As

Publication number Publication date
CN107657316B (en) 2020-04-07
CN107688855A (en) 2018-02-13
CN107688855B (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN107657316A (en) The cooperative system of general processor and neural network processor designs
US10802992B2 (en) Combining CPU and special accelerator for implementing an artificial neural network
WO2022083536A1 (en) Neural network construction method and apparatus
EP3158529B1 (en) Model parallel processing method and apparatus based on multiple graphic processing units
CN107657263A (en) A kind of advanced treatment unit for being used to realize ANN
CN108122032B (en) Neural network model training method, device, chip and system
CN108122027B (en) Training method, device and chip of neural network model
CN107239829A (en) A kind of method of optimized artificial neural network
CN106681826B (en) Resource planning method, system and device for cluster computing architecture
CN106156810A (en) General-purpose machinery learning algorithm model training method, system and calculating node
CN106953862A (en) The cognitive method and device and sensor model training method and device of network safety situation
CN104035751A (en) Graphics processing unit based parallel data processing method and device
CN107766935B (en) Multilayer artificial neural network
EP4209902A1 (en) Memory allocation method, related device, and computer readable storage medium
JP2018165948A (en) Image recognition device, image recognition method, computer program, and product monitoring system
CN113449859A (en) Data processing method and device
CN111104242A (en) Method and device for processing abnormal logs of operating system based on deep learning
US20230185253A1 (en) Graph convolutional reinforcement learning with heterogeneous agent groups
CN116263701A (en) Computing power network task scheduling method and device, computer equipment and storage medium
Chen et al. Knowledge-based support for simulation analysis of manufacturing cells
CN112528108B (en) Model training system, gradient aggregation method and device in model training
CN117501245A (en) Neural network model training method and device, and data processing method and device
CN109325530A (en) Compression method based on the depth convolutional neural networks on a small quantity without label data
CN116436980A (en) Real-time video task end network edge cooperative scheduling method and device
CN116644804A (en) Distributed training system, neural network model training method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20180606

Address after: 100083, 17 floor, 4 Building 4, 1 Wang Zhuang Road, Haidian District, Beijing.

Applicant after: Beijing deep Intelligent Technology Co., Ltd.

Address before: 100084 Wang Zhuang Road, 1, Haidian District, Beijing, Tsinghua Tongfang Technology Plaza, block D, 1705

Applicant before: Beijing insight Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200907

Address after: Unit 01-19, 10 / F, 101, 6 / F, building 5, yard 5, Anding Road, Chaoyang District, Beijing 100029

Patentee after: Xilinx Electronic Technology (Beijing) Co., Ltd

Address before: 100083, 17 floor, 4 Building 4, 1 Wang Zhuang Road, Haidian District, Beijing.

Patentee before: BEIJING DEEPHI TECHNOLOGY Co.,Ltd.