CN106951926A - The deep learning systems approach and device of a kind of mixed architecture - Google Patents

The deep learning systems approach and device of a kind of mixed architecture Download PDF

Info

Publication number
CN106951926A
CN106951926A CN201710196532.0A CN201710196532A CN106951926A CN 106951926 A CN106951926 A CN 106951926A CN 201710196532 A CN201710196532 A CN 201710196532A CN 106951926 A CN106951926 A CN 106951926A
Authority
CN
China
Prior art keywords
module
reasoning
training
deep learning
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710196532.0A
Other languages
Chinese (zh)
Other versions
CN106951926B (en
Inventor
程归鹏
卢飞
江涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Intelligent Data Technology Co Ltd
Original Assignee
Shandong Intelligent Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Intelligent Data Technology Co Ltd filed Critical Shandong Intelligent Data Technology Co Ltd
Priority to CN201710196532.0A priority Critical patent/CN106951926B/en
Publication of CN106951926A publication Critical patent/CN106951926A/en
Application granted granted Critical
Publication of CN106951926B publication Critical patent/CN106951926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors

Abstract

The invention discloses the deep learning systems approach and device of a kind of mixed architecture, it is characterized in that comprising the following steps:When training dataset updates, training module re-starts deep learning network model and trains and store weights and offset parameter;Server end monitoring process monitors that Parameter File changes, and is encapsulated in data structure set in advance and notifies reasoning module;Reasoning module interrupts inference service, reads weights and biasing file content from server side and updates network model;Server end monitoring process is handled simultaneously to be needed the input file of reasoning and notifies reasoning module;The system and device includes server module, training module, reasoning module, EBI;The training of the present invention and reasoning mixing CPU+GPU+CAPI isomery deep learning systems, can make full use of resource, obtain higher Energy Efficiency Ratio, realize the direct access server internal memories of CAPI, and real-time online iteration updates the parameters such as inference pattern weights.

Description

The deep learning systems approach and device of a kind of mixed architecture
Technical field
The present invention relates to circuit design and the technical field of machine learning, more particularly to a kind of depth of mixed architecture Learning system method and device.
Background technology
21 century IT industry is developed rapidly, and band gives people huge interests and facility.Deep learning application point For training and two parts of reasoning, so that ImageNet is evaluated and tested as an example, AlexNet model training processes need 8,000,000 totally 1000 The picture of individual classification, by such as AlexNet model extractions feature and counting loss, is then updated by backpropagation such as SGD Weighting parameter, so as to constantly restrain model, finally gives preferable network model.The process of reasoning is exactly that network is passed through in input Model carries out a forward direction computing, so as to obtain final classification(It is typically chosen Top5)The process of accuracy rate.Deep learning application Training process need to use substantial amounts of computing resource and training data, current training platform generally uses NVIDIA height The GPU of performance such as Tesla P100, Titan X, GTX1080 etc. accelerate training process.After available model is obtained, it is deployed to Another platform is used for reasoning and externally provides service, because reasoning process only does a forward direction computing, so to calculating It is required that can be lower, it is more require to be embodied in time delay, there is the cloud service platform based on CPU currently used for the platform of reasoning , also have based on low-power consumption GPU server clusters, also using FPGA or special ASIC clusters etc..From low delay and efficiently , can be even better using FPGA and special ASIC on energy.And FPGA has more the flexibility of framework compared to ASIC, obtains Increasing concern.CAPI is uniformity OverDrive Processor ODP interface (Coherent Accelerator Processor Interface), be high speed bus interface agreement that IBM is released on POWER processors, physical interface form be PCI-E or The BlueLink that IBM is released.PSL layers are realized inside CAPI, it is ensured that the memory access uniformity between server, you can with logical Cross virtual address and directly have access to CPU internal memories, so as to greatly reduce access time delay.And the SNAP Framework that IBM is released Programmed environment, can use C/C++ easily to realize algorithm model.
Various deep learning method and devices, such as Publication No. CN106022472A China for this people's developmental research A kind of embedded deep learning processor of patent, the invention belongs to technical field of integrated circuits, specially a kind of based on FPGA's Embedded deep learning processor;The deep learning processor includes:Central processing unit(CPU), complete processor study and transport Necessary logical operation, control and storage work during row;Deep learning unit, the hardware of deep learning algorithm realizes list Member, is the core component for carrying out deep learning processing;The deep learning processor combines single with deep learning with reference to tradition CPU Member, wherein deep learning assembled unit can be combined by multiple deep learning units, with scalability, can be directed to different Calculation scale, is used as the core processor of artificial intelligence application.As shown in figure 5, Publication No. CN106156851A China is specially Sharp a kind of accelerator and method towards deep learning business, for carrying out deep learning to the pending data in server Calculate, including the calculation control module and first for being arranged at the network interface card of server end, being connected with the server by bus Memory and second memory;The calculation control module is PLD, including control unit, data storage list Member, logic storage unit and the EBI communicated respectively with the network interface card, first memory and second memory, first Communication interface and the second communication interface;The logic storage unit is used for storage depth and learns control logic;First storage Device is used for the weighted data and biased data for storing each layer of network;Using the present invention, computational efficiency, enhancing can be effectively improved Can power dissipation ratio.
Prior art has the following disadvantages:1)Conventional method is separated, it is necessary to safeguard two sets of platform rings using training with reasoning Border, resource is not fully utilized;2)Deep learning calculating is done using FPGA/CPLD completely, computing capability is not powerful enough, at present It is not particularly suited for large-scale Training scene;3)Communicate general by dma mode between FPGA/CPLD and server, data with The time delay of interaction is larger between cpu server.It is therefore desirable to propose a kind of new deep learning systems approach and device.
The content of the invention
For the deficiency for the prior art problem to be solved, the invention provides a kind of deep learning system of mixed architecture Method and device, has played the advantage and feature of respective module, has obtained higher Energy Efficiency Ratio, take full advantage of resource;CAPI The direct access to server memory is realized, time delay and programming complexity is reduced;The present invention solves the skill of its technical problem Art scheme is:
A kind of deep learning systems approach of mixed architecture, for realizing to deep learning training and reasoning, comprises the following steps:
S1, training dataset has more new change, and training module re-starts the training of deep learning network model, after terminating, network The weights and offset parameter of model are stored to file set in advance;
S2, server end monitoring process monitor Parameter File change, by the virtual address of weights and offset parameter memory space, Length information is encapsulated into data structure set in advance, and notifies reasoning module;
S3, reasoning module interrupts inference service, and weights and biasing file content are read from server side by EBI, and more New network model;
S4, server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies reasoning module, and reasoning module is completed After return the result to service end monitoring process.
Described step S1 specifically includes following sub-step:
S11, during the more new change of training dataset, does not change network model, it is necessary to re -training, so that the net after being updated Network weights and offset parameter;
S12, it is necessary to which the weights and offset parameter of each layer of network are stored with the form appointed with reasoning module after the completion of training To file set in advance;
Described step S2 specifically includes following sub-step:
S21, service end operational monitoring process, by calling reasoning module in kernel server built-in function interface and driving, control Operation, stopping and the parameter of reasoning module update;
S22, the service end monitoring process moment has monitored whether that weights offset parameter needs renewal, and obtains most recent parameters information;
S23, when having more kainogenesis, it is necessary to send the parameter file information ceasing and desisting order and update to reasoning module;
Described step S3 specifically includes following sub-step:
S31, reasoning module directly reads corresponding weights, offset information to internal RAM by virtual address from service end;
S32, reasoning module notifies monitoring process after the completion of reading, monitoring process is sent to operation order;
S33, reasoning module updates network model parameter, proceeds inference service.
The deep learning grid model of described mixed architecture uses the deep learning network mould for picture classification Type.
The deep learning system and device of a kind of mixed architecture, for realizing the parallelization behaviour to deep learning training and reasoning Make, described device includes server module, training module, reasoning module, EBI;The server module is included at CPU Manage device, DDR internal memories, network;The training module, reasoning module are connected with the server module by EBI, and Communication can be attached.
Described server module, which has, includes the control for deep learning, data processing, network interaction, parameter storage Function.
Described CPU processor is POWER processors;Described training module is for accelerating deep learning model training The GPU of process accelerates training module;Described reasoning module for can be pre-loaded with deep learning network model set in advance and CAPI reasoning modules for deep learning reasoning process.
Compared with prior art, beneficial effects of the present invention are embodied in:A kind of depth of mixed architecture of the present invention Learning system method, comprises the following steps:Training dataset has more new change, and training module re-starts deep learning network model Training, after terminating, the weights and offset parameter of network model are stored to file set in advance;Server end monitoring process is monitored To Parameter File change, the virtual address of weights and offset parameter memory space, length information are encapsulated into number set in advance According in structure, and notify reasoning module;Reasoning module interrupt inference service, by EBI from server side read weights and File content is biased, and updates network model;Server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies Service end monitoring process is returned the result to after the completion of reasoning module, reasoning module;The deep learning system of the mixed architecture Device, including server module, training module, reasoning module, EBI;In the server module CPU processor, DDR Deposit, network;The training module, reasoning module are connected with the server module by EBI, and can be connected Connect letter;The CPU+GPU+CAPI isomery deep learning systems that the present invention will be trained and reasoning is mixed using a set of platform, are played The advantage and feature of respective module, obtain higher Energy Efficiency Ratio, take full advantage of resource;CAPI is realized to server memory Directly access, reduce time delay and programming complexity;The parameter energy real-time online such as weights to inference pattern iteration updates.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the deep learning systems approach of mixed architecture of the present invention.
Fig. 2 is the Organization Chart of the deep learning device of mixed architecture of the present invention.
Fig. 3 is the Organization Chart of the deep learning device of embodiment of the present invention mixed architecture.
Fig. 4 is the present invention using the fundamental diagram exemplified by Alexnet deep learning network models.
Fig. 5 is structured flowchart of the prior embodiment towards the accelerator of deep learning business.
Embodiment
The present invention is described in further detail with reference to accompanying drawing 1 to Fig. 5, so that the public preferably grasps the embodiment party of the present invention Method, specific embodiment of the present invention is:
As shown in figure 1, a kind of deep learning systems approach of mixed architecture of the present invention, for realizing to deep learning Training and reasoning, comprise the following steps:
S1, training dataset has more new change, and training module re-starts the training of deep learning network model, after terminating, network The weights and offset parameter of model are stored to file set in advance;
S2, server end monitoring process monitor Parameter File change, by the virtual address of weights and offset parameter memory space, Length information is encapsulated into data structure set in advance, and notifies reasoning module;
S3, reasoning module interrupts inference service, and weights and biasing file content are read from server side by EBI, and more New network model;
S4, server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies reasoning module, and reasoning module is completed After return the result to service end monitoring process.
Described step S1 specifically includes following sub-step:
S11, during the more new change of training dataset, does not change network model, it is necessary to re -training, so that the net after being updated Network weights and offset parameter;
S12, it is necessary to which the weights and offset parameter of each layer of network are stored with the form appointed with reasoning module after the completion of training To file set in advance;
Described step S2 specifically includes following sub-step:
S21, service end operational monitoring process, by calling reasoning module in kernel server built-in function interface and driving, control Operation, stopping and the parameter of reasoning module update;
S22, the service end monitoring process moment has monitored whether that weights offset parameter needs renewal, and obtains most recent parameters information;
S23, when having more kainogenesis, it is necessary to send the parameter file information ceasing and desisting order and update to reasoning module;
Described step S3 specifically includes following sub-step:
S31, reasoning module directly reads corresponding weights, offset information to internal RAM by virtual address from service end;
S32, reasoning module notifies monitoring process after the completion of reading, finger daemon is sent to operation order;
S33, reasoning module updates network model parameter, proceeds inference service.
As shown in Fig. 2 the deep learning system and device of described mixed architecture, for realizing to deep learning training and The parallelization operation of reasoning, it is characterised in that:Described device includes server module, training module, reasoning module, bus and connect Mouthful;The server module CPU processor, DDR internal memories, network;The training module, reasoning module with the server mould Block is connected by EBI, and can be attached communication;Described server module is to include the control for deep learning System, data processing, network interaction, the server module of parameter store function;Described CPU processor is POWER processors;Institute The training module stated is for accelerating the GPU of deep learning model training process to accelerate training module;Described reasoning module is Deep learning network model set in advance can be pre-loaded with and for the CAPI reasoning modules of deep learning reasoning process;It is described The EBI of server module and training module is PCI-E or Nvlink buses;The server module and reasoning module Hardware interface be PCI-E or BlueLink, bus protocol is CAPI.
It is preferred that, as shown in figure 4, the deep learning grid model of described mixed architecture is directed to figure using a kind of The Alexnet deep learning network models of piece classification.For the ease of the understanding of the present invention program, below with deep using Alexnet Spend exemplified by learning network model, briefly explain the operation principle of the present invention:Described Alexnet deep learning network models are by 5 Relu, Pooling and Normalization behaviour are also added into layer convolutional layer and 3 layers of full articulamentum composition, part convolutional layer Make, last layer of full articulamentum exports the Softmax layers of 1000 classification.Alexnet models can be used for extensive picture point Class, according to the difference of training dataset, can do classification based training, and provide picture classification service for different situations.
Embodiment 1
As shown in figure 3, as preferred preferred forms, such as realizing Alexnet picture classification problem:The hybrid frame The deep learning device of structure, for realizing that the parallelization to deep learning training and reasoning is operated, including POWER8 processors, The server module of the compositions such as DDR internal memories, network;The GPU being connected with the server by bus accelerates training module GTX1080;The CAPI reasoning module ADM-PCIE-KU3 accelerator cards being connected with the server by bus.Described GPU instructions Practice the training process that module is used to accelerate deep learning model;Described reasoning module is pre-loaded with AlexNet network models, uses In the reasoning process of deep learning;Described server module is for the control of deep learning, data processing, network interaction, ginseng Number storage etc.;The EBI of the server module and training module is PCI-E or Nvlink buses;The server mould The hardware interface of block and reasoning module is PCI-E or BlueLink, and bus protocol is CAPI.
The deep learning systems approach of the device mixed architecture, realizes that step is as follows:
S1, uses SNAP Framework instruments(A kind of use C/C++ realizes the algorithm model work run in CAPI cards Tool)The layer network models of Alexnet 8 are realized, and are write with a brush dipped in Chinese ink into CAPI reasoning modules;
S2, based on Tensorflow depth frameworks, gets 3,000,000 pictures of 300 kinds of for example marked birds TFRecrods pictures, are supplied to two pieces of GTX1080 GPU to do distributed training as training dataset;
S3, monitoring process obtains newest training result pb files, parses weights therein and offset parameter to file A, and obtain Take virtual address and length information that parameter is stored;
S4, monitoring program calls CAPI kernel libraries function interface and driving, sends and encapsulates to ADM-PCIE-KU3 CAPI modules The data structure of parameter information;
S5, CAPI card analytic parameter address from structure, so that get parms information and the network model power of correspondence renewal storage Value and the parametric variable of biasing;
The picture reasoning request that monitoring program is sent is received in S6, CAPI clamping, and the Top5 results that network is exported are returned, can be external The picture identification service of the category is provided;
S7, while CAPI card offer services, training network can also constantly carry out the training of newly-increased classification, and will train Into parameter synchronization update into CAPI cards.It is achieved thereby that the synchronized update and iteration of training and reasoning.
Compared with prior art, beneficial effects of the present invention are embodied in:A kind of depth of mixed architecture of the present invention Learning system method, comprises the following steps:Training dataset has more new change, and training module re-starts deep learning network model Training, after terminating, the weights and offset parameter of network model are stored to file set in advance;Server end monitoring process is monitored To Parameter File change, the virtual address of weights and offset parameter memory space, length information are encapsulated into number set in advance According in structure, and notify reasoning module;Reasoning module interrupt inference service, by EBI from server side read weights and File content is biased, and updates network model;Server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies Service end monitoring process is returned the result to after the completion of reasoning module, reasoning module;The deep learning system of the mixed architecture Device, including server module, training module, reasoning module, EBI;In the server module CPU processor, DDR Deposit, network;The training module, reasoning module are connected with the server module by EBI, and can be connected Connect letter;The CPU+GPU+CAPI isomery deep learning systems that the present invention will be trained and reasoning is mixed using a set of platform, are played The advantage and feature of respective module, obtain higher Energy Efficiency Ratio, take full advantage of resource;CAPI is realized to server memory Directly access, reduce time delay and programming complexity;The parameter energy real-time online such as weights to inference pattern iteration updates.
Presently preferred embodiments of the present invention is the foregoing is only, but protection scope of the present invention is not restricted to the present invention Embodiment, it is all in the spirit and principles in the present invention, disclose within technical scope, any modification for being made, equally replace Change, improve, retrofit, should be included within the scope of the present invention.

Claims (9)

1. the deep learning systems approach of a kind of mixed architecture, for realizing to deep learning training and reasoning, it is characterised in that: Comprise the following steps:
S1, training dataset has more new change, and training module re-starts the training of deep learning network model, after terminating, network The weights and offset parameter of model are stored to file set in advance;
S2, server end monitoring process monitor Parameter File change, by the virtual address of weights and offset parameter memory space, Length information is encapsulated into data structure set in advance, and notifies reasoning module;
S3, reasoning module interrupts inference service, and weights and biasing file content are read from server side by EBI, and more New network model;
S4, server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies reasoning module, and reasoning module is completed After return the result to service end monitoring process.
2. according to the method described in claim 1, it is characterised in that:Described step S1 specifically includes following sub-step:
S11, during the more new change of training dataset, does not change network model, it is necessary to re -training, so that the net after being updated Network weights and offset parameter;
S12, it is necessary to which the weights and offset parameter of each layer of network are stored with the form appointed with reasoning module after the completion of training To file set in advance.
3. according to the method described in claim 1, it is characterised in that:Described step S2 specifically includes following sub-step:
S21, service end operational monitoring process, by calling reasoning module in kernel server built-in function interface and driving, control Operation, stopping and the parameter of reasoning module update;
S22, the service end monitoring process moment has monitored whether that weights offset parameter needs renewal, and obtains most recent parameters information;
S23, when having more kainogenesis, it is necessary to send the parameter file information ceasing and desisting order and update to reasoning module.
4. according to the method described in claim 1, it is characterised in that:Described step S3 specifically includes following sub-step:
S31, reasoning module directly reads corresponding weights, offset information to internal RAM by virtual address from service end;
S32, reasoning module notifies monitoring process after the completion of reading, monitoring process is sent to operation order;
S33, reasoning module updates network model parameter, proceeds inference service.
5. according to the method described in claim 1, it is characterised in that:Described network model is using a kind of for picture classification Deep learning model.
6. a kind of device of the deep learning system of the mixed architecture as described in any one of Claims 1 to 5, for realizing to depth The parallelization operation of learning training and reasoning, it is characterised in that:Described device includes server, training module, reasoning module, total Line interface;The server module includes CPU processor, DDR internal memories, network;The training module, reasoning module with service Device module is connected by EBI, and can be attached communication.
7. device according to claim 6, it is characterised in that:Described server module, which has, to be included being used for deep learning Control, data processing, network interaction, parameter store function.
8. device according to claim 6, it is characterised in that:Described CPU processor is POWER processors;Described Training module is for accelerating the GPU of deep learning model training process to accelerate training module.
9. device according to claim 6, it is characterised in that:Described reasoning module is that can be pre-loaded with deep learning net Network model, and for the CAPI reasoning modules of deep learning reasoning process.
CN201710196532.0A 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture Active CN106951926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710196532.0A CN106951926B (en) 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710196532.0A CN106951926B (en) 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture

Publications (2)

Publication Number Publication Date
CN106951926A true CN106951926A (en) 2017-07-14
CN106951926B CN106951926B (en) 2020-11-24

Family

ID=59474087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710196532.0A Active CN106951926B (en) 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture

Country Status (1)

Country Link
CN (1) CN106951926B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563512A (en) * 2017-08-24 2018-01-09 腾讯科技(上海)有限公司 A kind of data processing method, device and storage medium
CN107729268A (en) * 2017-09-20 2018-02-23 山东英特力数据技术有限公司 A kind of memory expansion apparatus and method based on CAPI interfaces
CN109064382A (en) * 2018-06-21 2018-12-21 北京陌上花科技有限公司 Image information processing method and server
CN109460826A (en) * 2018-10-31 2019-03-12 北京字节跳动网络技术有限公司 For distributing the method, apparatus and model modification system of data
TWI658365B (en) * 2017-10-30 2019-05-01 緯創資通股份有限公司 Connecting module
CN109726170A (en) * 2018-12-26 2019-05-07 上海新储集成电路有限公司 A kind of on-chip system chip of artificial intelligence
CN109886408A (en) * 2019-02-28 2019-06-14 北京百度网讯科技有限公司 A kind of deep learning method and device
CN109947682A (en) * 2019-03-21 2019-06-28 浪潮商用机器有限公司 A kind of server master board and server
CN110399234A (en) * 2019-07-10 2019-11-01 苏州浪潮智能科技有限公司 A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing
CN110533181A (en) * 2019-07-25 2019-12-03 深圳市康拓普信息技术有限公司 A kind of quick training method and system of deep learning model
CN110598855A (en) * 2019-09-23 2019-12-20 Oppo广东移动通信有限公司 Deep learning model generation method, device, equipment and storage medium
CN111052155A (en) * 2017-09-04 2020-04-21 华为技术有限公司 Distributed random gradient descent method for asynchronous gradient averaging
CN111147603A (en) * 2019-09-30 2020-05-12 华为技术有限公司 Method and device for networking reasoning service
CN111860260A (en) * 2020-07-10 2020-10-30 逢亿科技(上海)有限公司 High-precision low-computation target detection network system based on FPGA
CN112148470A (en) * 2019-06-28 2020-12-29 鸿富锦精密电子(天津)有限公司 Parameter synchronization method, computer device and readable storage medium
CN112465112A (en) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system
CN112541513A (en) * 2019-09-20 2021-03-23 百度在线网络技术(北京)有限公司 Model training method, device, equipment and storage medium
CN112581353A (en) * 2020-12-29 2021-03-30 浪潮云信息技术股份公司 End-to-end picture reasoning system facing deep learning model
CN112925533A (en) * 2019-12-05 2021-06-08 新唐科技股份有限公司 Microcontroller update system and method
CN112949427A (en) * 2021-02-09 2021-06-11 北京奇艺世纪科技有限公司 Person identification method, electronic device, storage medium, and apparatus
TWI741416B (en) * 2019-04-29 2021-10-01 美商谷歌有限責任公司 Virtualizing external memory as local to a machine learning accelerator
CN113537284A (en) * 2021-06-04 2021-10-22 中国人民解放军战略支援部队信息工程大学 Deep learning implementation method and system based on mimicry mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN104714852A (en) * 2015-03-17 2015-06-17 华中科技大学 Parameter synchronization optimization method and system suitable for distributed machine learning
US20160098633A1 (en) * 2014-10-02 2016-04-07 Nec Laboratories America, Inc. Deep learning model for structured outputs with high-order interaction
CN105825235A (en) * 2016-03-16 2016-08-03 博康智能网络科技股份有限公司 Image identification method based on deep learning of multiple characteristic graphs
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098633A1 (en) * 2014-10-02 2016-04-07 Nec Laboratories America, Inc. Deep learning model for structured outputs with high-order interaction
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network
CN104714852A (en) * 2015-03-17 2015-06-17 华中科技大学 Parameter synchronization optimization method and system suitable for distributed machine learning
CN105825235A (en) * 2016-03-16 2016-08-03 博康智能网络科技股份有限公司 Image identification method based on deep learning of multiple characteristic graphs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余子健等: "基于FPGA的卷积神经网络加速器", 《计算机工程》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563512A (en) * 2017-08-24 2018-01-09 腾讯科技(上海)有限公司 A kind of data processing method, device and storage medium
CN107563512B (en) * 2017-08-24 2023-10-17 腾讯科技(上海)有限公司 Data processing method, device and storage medium
CN111052155A (en) * 2017-09-04 2020-04-21 华为技术有限公司 Distributed random gradient descent method for asynchronous gradient averaging
CN111052155B (en) * 2017-09-04 2024-04-16 华为技术有限公司 Distribution of asynchronous gradient averages random gradient descent method
CN107729268A (en) * 2017-09-20 2018-02-23 山东英特力数据技术有限公司 A kind of memory expansion apparatus and method based on CAPI interfaces
CN107729268B (en) * 2017-09-20 2019-11-12 山东英特力数据技术有限公司 A kind of memory expansion apparatus and method based on CAPI interface
TWI658365B (en) * 2017-10-30 2019-05-01 緯創資通股份有限公司 Connecting module
CN109726159A (en) * 2017-10-30 2019-05-07 纬创资通股份有限公司 Link block
CN109726159B (en) * 2017-10-30 2020-12-04 纬创资通股份有限公司 Connection module
CN109064382A (en) * 2018-06-21 2018-12-21 北京陌上花科技有限公司 Image information processing method and server
CN109064382B (en) * 2018-06-21 2023-06-23 北京陌上花科技有限公司 Image information processing method and server
CN109460826A (en) * 2018-10-31 2019-03-12 北京字节跳动网络技术有限公司 For distributing the method, apparatus and model modification system of data
CN109726170A (en) * 2018-12-26 2019-05-07 上海新储集成电路有限公司 A kind of on-chip system chip of artificial intelligence
CN109886408A (en) * 2019-02-28 2019-06-14 北京百度网讯科技有限公司 A kind of deep learning method and device
CN109947682B (en) * 2019-03-21 2021-03-09 浪潮商用机器有限公司 Server mainboard and server
CN109947682A (en) * 2019-03-21 2019-06-28 浪潮商用机器有限公司 A kind of server master board and server
TWI741416B (en) * 2019-04-29 2021-10-01 美商谷歌有限責任公司 Virtualizing external memory as local to a machine learning accelerator
TWI777775B (en) * 2019-04-29 2022-09-11 美商谷歌有限責任公司 Virtualizing external memory as local to a machine learning accelerator
US11176493B2 (en) 2019-04-29 2021-11-16 Google Llc Virtualizing external memory as local to a machine learning accelerator
CN112148470B (en) * 2019-06-28 2022-11-04 富联精密电子(天津)有限公司 Parameter synchronization method, computer device and readable storage medium
CN112148470A (en) * 2019-06-28 2020-12-29 鸿富锦精密电子(天津)有限公司 Parameter synchronization method, computer device and readable storage medium
CN110399234A (en) * 2019-07-10 2019-11-01 苏州浪潮智能科技有限公司 A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing
CN110533181A (en) * 2019-07-25 2019-12-03 深圳市康拓普信息技术有限公司 A kind of quick training method and system of deep learning model
CN110533181B (en) * 2019-07-25 2023-07-18 南方电网数字平台科技(广东)有限公司 Rapid training method and system for deep learning model
CN112541513A (en) * 2019-09-20 2021-03-23 百度在线网络技术(北京)有限公司 Model training method, device, equipment and storage medium
CN110598855A (en) * 2019-09-23 2019-12-20 Oppo广东移动通信有限公司 Deep learning model generation method, device, equipment and storage medium
CN111147603A (en) * 2019-09-30 2020-05-12 华为技术有限公司 Method and device for networking reasoning service
CN112925533A (en) * 2019-12-05 2021-06-08 新唐科技股份有限公司 Microcontroller update system and method
CN111860260B (en) * 2020-07-10 2024-01-26 逢亿科技(上海)有限公司 High-precision low-calculation target detection network system based on FPGA
CN111860260A (en) * 2020-07-10 2020-10-30 逢亿科技(上海)有限公司 High-precision low-computation target detection network system based on FPGA
CN112465112A (en) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system
CN112465112B (en) * 2020-11-19 2022-06-07 苏州浪潮智能科技有限公司 nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system
CN112581353A (en) * 2020-12-29 2021-03-30 浪潮云信息技术股份公司 End-to-end picture reasoning system facing deep learning model
CN112949427A (en) * 2021-02-09 2021-06-11 北京奇艺世纪科技有限公司 Person identification method, electronic device, storage medium, and apparatus
CN113537284B (en) * 2021-06-04 2023-01-24 中国人民解放军战略支援部队信息工程大学 Deep learning implementation method and system based on mimicry mechanism
CN113537284A (en) * 2021-06-04 2021-10-22 中国人民解放军战略支援部队信息工程大学 Deep learning implementation method and system based on mimicry mechanism

Also Published As

Publication number Publication date
CN106951926B (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
CN107103113B (en) The Automation Design method, apparatus and optimization method towards neural network processor
CN108460457A (en) A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks
Guo et al. Cloud resource scheduling with deep reinforcement learning and imitation learning
CN105681628B (en) A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing
Cheung et al. A large-scale spiking neural network accelerator for FPGA systems
CN108090565A (en) Accelerated method is trained in a kind of convolutional neural networks parallelization
CN107704922A (en) Artificial neural network processing unit
CN109376843A (en) EEG signals rapid classification method, implementation method and device based on FPGA
CN108416436A (en) The method and its system of neural network division are carried out using multi-core processing module
CN105718996B (en) Cellular array computing system and communication means therein
CN108829515A (en) A kind of cloud platform computing system and its application method
CN108416433A (en) A kind of neural network isomery acceleration method and system based on asynchronous event
US20190138373A1 (en) Multithreaded data flow processing within a reconfigurable fabric
WO2022068663A1 (en) Memory allocation method, related device, and computer readable storage medium
CN110163353A (en) A kind of computing device and method
CN113642734A (en) Distributed training method and device for deep learning model and computing equipment
CN110321997A (en) High degree of parallelism computing platform, system and calculating implementation method
CN115828831B (en) Multi-core-chip operator placement strategy generation method based on deep reinforcement learning
CN110163350A (en) A kind of computing device and method
CN113449839A (en) Distributed training method, gradient communication device and computing equipment
CN106776466A (en) A kind of FPGA isomeries speed-up computation apparatus and system
CN109117949A (en) Flexible data stream handle and processing method for artificial intelligence equipment
Zhang et al. A parallel strategy for convolutional neural network based on heterogeneous cluster for mobile information system
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant