CN109492753A - A kind of method of the stochastic gradient descent of decentralization - Google Patents

A kind of method of the stochastic gradient descent of decentralization Download PDF

Info

Publication number
CN109492753A
CN109492753A CN201811309202.9A CN201811309202A CN109492753A CN 109492753 A CN109492753 A CN 109492753A CN 201811309202 A CN201811309202 A CN 201811309202A CN 109492753 A CN109492753 A CN 109492753A
Authority
CN
China
Prior art keywords
working node
node
parameter
local
gradient descent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811309202.9A
Other languages
Chinese (zh)
Inventor
蒋帆
蒋一帆
吴维刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201811309202.9A priority Critical patent/CN109492753A/en
Publication of CN109492753A publication Critical patent/CN109492753A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of methods of the stochastic gradient descent of decentralization, the parallel stochastic gradient descent method of traditional centralization in traditional distributed deep learning frame is improved to the parallel stochastic gradient descent method of decentralization to be trained, central server node is removed, remaining working node and operated adjacent node communication carry out local model training and parameter updates, it is repeatedly trained by more working nodes to finally obtain locally optimal solution to the continuous tuning of parameter, to complete distributed deep learning.

Description

A kind of method of the stochastic gradient descent of decentralization
Technical field
The present invention relates to depth learning technology field, more particularly to a kind of stochastic gradient descent of decentralization Method.
Background technique
Currently, not stopping paying out today of exhibition in artificial intelligence, deep learning has become an important neck of artificial intelligence Domain, distributed deep learning algorithm have iterative, and the update and non-once completion of model need loop iteration multiple;Have Fault-tolerance, even if generating some mistakes in each cycle, the final convergence of model is unaffected;It is convergent non-equal with parameter Even property, some parameters just no longer change by several circulations in model, and other parameters take a long time to restrain, these features are determined Determined deep learning algorithm applied to machine learning can not be with the increase of machine and ability is linearly promoted, because of vast resources It will be wasted in communication, waiting, coordination etc., in order to make up this defect, parameter server is suggested dedicated for big Scale optimizes the frame of processing, is used for the training of large-scale data, for example, TB even PB rank and large-scale mould Shape parameter.In large-scale Optimization Framework, usually has billions of or even hundred billion ranks parameters and need to estimate, therefore, When design faces the system of this challenge, the algorithm optimized in extensive topic model dependent on SGD or L-BFGS is needed Solve to need to consume when frequently accessing modification model parameter enormous bandwidth, improve degree of parallelism, it is synchronous wait caused by delay with And the problems such as fault-tolerant, therefore the parallel stochastic gradient descent algorithm for the parameter server concentrated frequently with one, band is distributed Formula deep learning.
But the distributed deep learning frame with parameter server, it can achieve in the case where network is unobstructed preferably Effect.However in reality, network environment is not necessarily always optimal situation, under the network condition of low bandwidth and high delay, property It can significantly decrease, the reason is that on parameter server node, because to be communicated with all nodes, in network The problem of will appear network congestion in the case where bad, to reduce operating rate.In addition, as network model is more and more multiple Miscellaneous, call duration time has increasing accounting.Largely call duration time is bigger to the pressure of parameter server, at this time Call duration time just becomes bottleneck.
Therefore, the communication time in distributed deep learning training how is reduced, improving operational efficiency is art technology The problem of personnel's urgent need to resolve.
Summary of the invention
In view of this, can be applicable to data simultaneously the present invention provides a kind of method of the stochastic gradient descent of decentralization In capable distributed deep learning frame, by the central node in traditional parallel stochastic gradient descent method, i.e. parameter service Device node removes, to save communication time, improves network transfer speeds.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of method of the stochastic gradient descent of decentralization, comprises the following specific steps that:
Step 1: by the Segmentation of Data Set for needing to be trained at n block, distributing to one for each individual described piece Specific working node;
Step 2: the data of each working node sampling model training from assigned block are for described in training Working node local model;
Step 3: each working node uses iterative method and parallel stochastic gradient descent method to carry out the work simultaneously The parameter of node, which updates, to be calculated;
Wherein specific step is as follows for the working node parameter update:
Step 31: local working node being initialized first: the initial value x of parameter is set0;Step-length γ is set;If Set weight matrix W;The number of iterations K is set;
Step 32: concentrating the data randomly selected for iteration in the local data of the local working node;
Step 33: stochastic gradient descent method being used to the data and the parameter of the local working node, is used FormulaThe gradient u of iteration is found out, wherein xiFor the updated parameter of local node described in last iteration;
Step 34: obtaining the parameter of the local working node and operated adjacent node, and obtained from weight matrix W The weight for taking the operated adjacent node and the working node, obtains provisional parameter x ' after being weighted;
Step 35: on the provisional parameter x ' that gradient u obtained in step 33 and step 34 step are obtained, bringing into Stochastic gradient descent formulaIt receives the undated parameter x of the local working node and is updated;
Step 36: the gradient u of the local working node and the operated adjacent node being detected, with described The gradient of working node and the ratio between the gradient of the operated adjacent node redistribute the weight in the weight matrix W, Specific calculating process are as follows: the local working node and the operated adjacent node gradient are compared, obtain decimal than big Several ratio is multiplied with the ratio with the weight of the operated adjacent node and is adjusted the rear operated adjacent node The weight, then the sum of the weight that subtracts the operated adjacent node with 1 obtains the local working node adjustment The weight afterwards;Step 37: judging whether that completing K iteration if it is completes institute if being otherwise again introduced into step 32 The parameter for stating working node updates and the weight distribution, completes the process of the local model training.
Preferably, the weight matrix W is all initialized, and local working node and operated adjacent node weight are initialized as 1/3。
Preferably, working node weight adjusted is saved into the weight matrix W.
It can be seen via above technical scheme that compared with prior art, the present disclosure provides a kind of decentralizations The method of stochastic gradient descent is removed the central server node in traditional parallel stochastic gradient descent method, so that often The parameter of a working node more new capital is carried out in local and operated adjacent node, mutually transmits information, root between working node Weight is redistributed according to the ratio of local working node gradient and operated adjacent node gradient, and saves work section with weight matrix Weighing factor between point carries out using iterative method when parameter update, and each time when iteration, each working node itself executes one Secondary stochastic gradient descent algorithm, before using gradients affect parameter, first to the parameter of local working node and operated adjacent node Acquisition provisional parameter is weighted with weight, with the gradient of provisional parameter and local working node to local working node into Row parameter updates, and is finally completed model training process.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 attached drawing is flow chart schematic diagram provided by the invention;
Fig. 2 attached drawing is working node communication structure schematic diagram provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a kind of methods of the stochastic gradient descent of decentralization, comprise the following specific steps that:
S1: the Segmentation of Data Set being trained will be needed at n block, each individual block is distributed into a specific work Make node;
S2: the data of each working node sampling model training from assigned block are local for training node Model;
S3: each working node uses iterative method and parallel stochastic gradient descent method to carry out the parameter of working node more simultaneously It is new to calculate;
Wherein specific step is as follows for the update of working node parameter:
S31: local working node is initialized first: the initial value x of parameter is set0;Step-length γ is set;Setting power Weight matrix W;The number of iterations K is set;
S32: the local data for the node that works in this locality concentrates the data randomly selected for iteration;
S33: data and parameter to local working node find out the gradient u and ginseng of iteration using stochastic gradient descent method Number, gradient calculation formula areWherein xiFor the last updated parameter of iteration local node;
S34: obtaining the parameter of local working node and operated adjacent node, and local work is obtained from weight matrix W The weight of node and operated adjacent node obtains provisional parameter x ' after being weighted;
S35: provisional parameter x ' obtained in gradient u and S34 according to obtained in S33, using stochastic gradient descent formulaIt receives the undated parameter x of local working node and is updated;
S36: detecting the gradient u of local working node and operated adjacent node, with the gradient u and phase of working node The ratio between gradient u of adjacent working node redistributes the weight in weight matrix W, specific calculating process are as follows: by local working node It is compared with operated adjacent node gradient u, the decimal ratio several than greatly is obtained, with weight and the ratio phase of operated adjacent node Multiply the weight for being adjusted rear adjacent node, then the sum of the weight for subtracting operated adjacent node with 1 obtains local working node tune Weight after whole;
S37: judging whether to complete K iteration, if being otherwise again introduced into step 32, if it is completes working node Parameter updates and weight distribution, completes the process of local model training, and redistributing weight matrix is to adjust different operating section Weight between point makes to obtain bigger weight with the more similar adjacent node of local node gradient u, to accelerate model instruction Convergence rate in white silk.
In order to further optimize the above technical scheme, weight matrix W is all initialized, local working node and operated adjacent Node weight is initialized as 1/3.
In order to further optimize the above technical scheme, working node weight updates result and saves into weight matrix W.
Embodiment
During decentralization stochastic gradient descent model training, its essence is by conventional center stochastic gradient descent Center Parameter server node in method removes, so that the parameter of each working node more new capital is in local and operated adjacent section It being carried out between point, training dataset is divided into n block first, each individually block is assigned to a specific working node, Same each working node can be in local training pattern.Each working node carries out local model training simultaneously, needs first to define The weight matrix of one all working node that are connected, the then initial parameter, step-length and the number of iterations of initial work node, Carry out working node local model training when, working node each first finds out local gradient, then with adjacent work section Point carries out parameter exchange, i.e., by the local working node of the parameter and acquisition of local working node and operated adjacent node and adjacent The weight of working node is weighted to obtain provisional parameter, uses later to the gradient of provisional parameter and local working node Stochastic gradient descent obtains the undated parameter of local working node and carries out the update of local node parameter, while to local work The gradient difference of node and operated adjacent node is detected, and redistributes weight with the ratio between gradient.
Weight matrix W is the matrix of a n*n, and every a line weight represents a local working node with all working node Between weights influence relationship, in the weights initialisation of every a line, by local working node and two operated adjacent nodes Weighted value is initialized as 1/3, remaining is initialized as 0, that is, indicates local working node with remaining non-conterminous working node weight It is 0, does not have weights influence.
Since the present invention does not have a central server node, the communication complexity of most busy working node be by need into What the complexity of the corresponding figure of row model training determined, although call duration time on each working node compared with centralization with The call duration time of working node in the method for machine gradient decline increased, but due to the parameter use of each working node with The number that the decline of machine gradient updates is constant and the present invention eliminates central server node, so while calculating time phase difference Seldom, but the used time generally of the invention is shorter, and especially under low bandwidth and the network condition of high delay, call duration time advantage is more Obviously.
For communicating requirement, working node requires same central server in the method for conventional center stochastic gradient descent Node communication, therefore it is required that data differences in asynchronous communication between all working node cannot be too big, and the present invention due to It only needs to be communicated with operated adjacent node, so only need to guarantee operated adjacent node data similarity, therefore this The invention scope of application is wider.
The present invention provides a kind of methods of the stochastic gradient descent of decentralization to carry out distributed deep learning, will be traditional Data parallel stochastic gradient descent method in central server node remove so that the parameter of each working node more new capital It works in this locality and is carried out between node and operated adjacent node, information is mutually transmitted between working node, calculate each work section The gradient value of point, first by the weight of the parameter of operated adjacent node and operated adjacent node and the weight of local working node into Row ranking operation, the parameter value after then enabling weighting influences the parameter of local working node, further according to local working node gradient Weight is redistributed with the ratio of operated adjacent node gradient and is saved in weight matrix, to complete local working node ginseng Distributed deep learning training is realized in several updates.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (3)

1. a kind of method of the stochastic gradient descent of decentralization, which is characterized in that comprise the following specific steps that:
Step 1: the Segmentation of Data Set that is trained will be needed at n block, by each individual described piece distribute to one it is specific Working node;
Step 2: each working node sampling model training data from assigned block is used to train the work section The local model of point;
Step 3: each working node uses iterative method and stochastic gradient descent method to carry out the working node parameter simultaneously The calculating of update;
Wherein specific step is as follows for the working node parameter update:
Step 31: the local working node being initialized first: the initial value x of parameter is set0;Step-length γ is set;Setting Weight matrix W;The number of iterations K is set;
Step 32: concentrating the data randomly selected for iteration in the local data of the local working node;
Step 33: are found out by this using stochastic gradient descent method and is changed for the data and the parameter of the local working node The gradient u in generation;
Step 34: obtaining the parameter of the operated adjacent node and the local working node, and obtained from weight matrix W The weight of the operated adjacent node and the local working node is taken, then obtains provisional parameter x ' after being weighted;
Step 35: the provisional parameter x ' that the gradient u and step 34 step according to obtained in step 33 obtain, using random Gradient descent method obtains the undated parameter of local working node;
Step 36: the gradient of the gradient u and the operated adjacent node being detected, according to the local working node Gradient and the ratio between the gradient of the operated adjacent node redistribute the weight in the weight matrix W;
Step 37: judging whether that completing K iteration if it is completes the work section if being otherwise again introduced into step 32 The process of the weight distribution that the parameter of point updates, the local model training is completed.
2. a kind of method of the stochastic gradient descent of decentralization according to claim 1, which is characterized in that the weight Matrix W all initializes, and local working node and operated adjacent node weight are initialized as 1/3.
3. a kind of method of the stochastic gradient descent of decentralization according to claim 1, which is characterized in that each institute Stating working node tool, there are two the operated adjacent nodes.
CN201811309202.9A 2018-11-05 2018-11-05 A kind of method of the stochastic gradient descent of decentralization Pending CN109492753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811309202.9A CN109492753A (en) 2018-11-05 2018-11-05 A kind of method of the stochastic gradient descent of decentralization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811309202.9A CN109492753A (en) 2018-11-05 2018-11-05 A kind of method of the stochastic gradient descent of decentralization

Publications (1)

Publication Number Publication Date
CN109492753A true CN109492753A (en) 2019-03-19

Family

ID=65693869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811309202.9A Pending CN109492753A (en) 2018-11-05 2018-11-05 A kind of method of the stochastic gradient descent of decentralization

Country Status (1)

Country Link
CN (1) CN109492753A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929878A (en) * 2019-10-30 2020-03-27 同济大学 Distributed random gradient descent method
CN110956265A (en) * 2019-12-03 2020-04-03 腾讯科技(深圳)有限公司 Model training method and related device
CN111178503A (en) * 2019-12-16 2020-05-19 北京邮电大学 Mobile terminal-oriented decentralized target detection model training method and system
CN112001501A (en) * 2020-08-14 2020-11-27 苏州浪潮智能科技有限公司 Parameter updating method, device and equipment of AI distributed training system
CN112688809A (en) * 2020-12-21 2021-04-20 声耕智能科技(西安)研究院有限公司 Diffusion adaptive network learning method, system, terminal and storage medium
CN112861991A (en) * 2021-03-09 2021-05-28 中山大学 Learning rate adjusting method for neural network asynchronous training
CN113254215A (en) * 2021-06-16 2021-08-13 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic equipment
CN113870588A (en) * 2021-08-20 2021-12-31 深圳市人工智能与机器人研究院 Traffic light control method based on deep Q network, terminal and storage medium
CN114398949A (en) * 2021-12-13 2022-04-26 鹏城实验室 Training method of impulse neural network model, storage medium and computing device
US11875256B2 (en) 2020-07-09 2024-01-16 International Business Machines Corporation Dynamic computation in decentralized distributed deep learning training
US11886969B2 (en) 2020-07-09 2024-01-30 International Business Machines Corporation Dynamic network bandwidth in distributed deep learning training
US11977986B2 (en) 2020-07-09 2024-05-07 International Business Machines Corporation Dynamic computation rates for distributed deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714852A (en) * 2015-03-17 2015-06-17 华中科技大学 Parameter synchronization optimization method and system suitable for distributed machine learning
US20160180162A1 (en) * 2014-12-22 2016-06-23 Yahoo! Inc. Generating preference indices for image content
US9659248B1 (en) * 2016-01-19 2017-05-23 International Business Machines Corporation Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations
CN107194396A (en) * 2017-05-08 2017-09-22 武汉大学 Method for early warning is recognized based on the specific architecture against regulations in land resources video monitoring system
CN107578094A (en) * 2017-10-25 2018-01-12 济南浪潮高新科技投资发展有限公司 The method that the distributed training of neutral net is realized based on parameter server and FPGA
CN108122027A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of training method of neural network model, device and chip
CN108287763A (en) * 2018-01-29 2018-07-17 中兴飞流信息科技有限公司 Parameter exchange method, working node and parameter server system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180162A1 (en) * 2014-12-22 2016-06-23 Yahoo! Inc. Generating preference indices for image content
CN104714852A (en) * 2015-03-17 2015-06-17 华中科技大学 Parameter synchronization optimization method and system suitable for distributed machine learning
US9659248B1 (en) * 2016-01-19 2017-05-23 International Business Machines Corporation Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations
CN108122027A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of training method of neural network model, device and chip
CN107194396A (en) * 2017-05-08 2017-09-22 武汉大学 Method for early warning is recognized based on the specific architecture against regulations in land resources video monitoring system
CN107578094A (en) * 2017-10-25 2018-01-12 济南浪潮高新科技投资发展有限公司 The method that the distributed training of neutral net is realized based on parameter server and FPGA
CN108287763A (en) * 2018-01-29 2018-07-17 中兴飞流信息科技有限公司 Parameter exchange method, working node and parameter server system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929878A (en) * 2019-10-30 2020-03-27 同济大学 Distributed random gradient descent method
CN110929878B (en) * 2019-10-30 2023-07-04 同济大学 Distributed random gradient descent method
CN110956265A (en) * 2019-12-03 2020-04-03 腾讯科技(深圳)有限公司 Model training method and related device
CN111178503A (en) * 2019-12-16 2020-05-19 北京邮电大学 Mobile terminal-oriented decentralized target detection model training method and system
US11977986B2 (en) 2020-07-09 2024-05-07 International Business Machines Corporation Dynamic computation rates for distributed deep learning
US11886969B2 (en) 2020-07-09 2024-01-30 International Business Machines Corporation Dynamic network bandwidth in distributed deep learning training
US11875256B2 (en) 2020-07-09 2024-01-16 International Business Machines Corporation Dynamic computation in decentralized distributed deep learning training
CN112001501B (en) * 2020-08-14 2022-12-23 苏州浪潮智能科技有限公司 Parameter updating method, device and equipment of AI distributed training system
CN112001501A (en) * 2020-08-14 2020-11-27 苏州浪潮智能科技有限公司 Parameter updating method, device and equipment of AI distributed training system
CN112688809B (en) * 2020-12-21 2023-10-03 声耕智能科技(西安)研究院有限公司 Diffusion self-adaptive network learning method, system, terminal and storage medium
CN112688809A (en) * 2020-12-21 2021-04-20 声耕智能科技(西安)研究院有限公司 Diffusion adaptive network learning method, system, terminal and storage medium
CN112861991A (en) * 2021-03-09 2021-05-28 中山大学 Learning rate adjusting method for neural network asynchronous training
CN113254215A (en) * 2021-06-16 2021-08-13 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic equipment
CN113870588A (en) * 2021-08-20 2021-12-31 深圳市人工智能与机器人研究院 Traffic light control method based on deep Q network, terminal and storage medium
CN114398949A (en) * 2021-12-13 2022-04-26 鹏城实验室 Training method of impulse neural network model, storage medium and computing device

Similar Documents

Publication Publication Date Title
CN109492753A (en) A kind of method of the stochastic gradient descent of decentralization
CN106297774B (en) A kind of the distributed parallel training method and system of neural network acoustic model
CN108460457A (en) A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks
CN104714852B (en) A kind of parameter synchronization optimization method and its system suitable for distributed machines study
CN106156810A (en) General-purpose machinery learning algorithm model training method, system and calculating node
CN108829441A (en) A kind of parameter update optimization system of distribution deep learning
CN110533183B (en) Task placement method for heterogeneous network perception in pipeline distributed deep learning
WO2023240845A1 (en) Distributed computation method, system and device, and storage medium
CN103561055B (en) Web application automatic elastic extended method under conversation-based cloud computing environment
CN107291550B (en) A kind of Spark platform resource dynamic allocation method and system for iterated application
CN110046048B (en) Load balancing method based on workload self-adaptive fast redistribution
CN104881322B (en) A kind of cluster resource dispatching method and device based on vanning model
Zhan et al. Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking
CN105930591A (en) Realization method for register clustering in clock tree synthesis
CN106020944B (en) It is a kind of to configure the method and system for carrying out data downloading based on background data base
CN104639626A (en) Multi-level load forecasting and flexible cloud resource configuring method and monitoring and configuring system
CN108986063A (en) The method, apparatus and computer readable storage medium of gradient fusion
CN106250240A (en) A kind of optimizing and scheduling task method
CN110059829A (en) A kind of asynchronous parameters server efficient parallel framework and method
CN104346214B (en) Asynchronous task managing device and method for distributed environment
CN107016214A (en) A kind of parameter based on finite state machine relies on the generation method of model
CN108958852A (en) A kind of system optimization method based on FPGA heterogeneous platform
CN107436865A (en) A kind of word alignment training method, machine translation method and system
CN104462329A (en) Operation process digging method suitable for diversified environment
Cao et al. SAP-SGD: Accelerating distributed parallel training with high communication efficiency on heterogeneous clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20211029