CN109492753A - A kind of method of the stochastic gradient descent of decentralization - Google Patents
A kind of method of the stochastic gradient descent of decentralization Download PDFInfo
- Publication number
- CN109492753A CN109492753A CN201811309202.9A CN201811309202A CN109492753A CN 109492753 A CN109492753 A CN 109492753A CN 201811309202 A CN201811309202 A CN 201811309202A CN 109492753 A CN109492753 A CN 109492753A
- Authority
- CN
- China
- Prior art keywords
- working node
- node
- parameter
- local
- gradient descent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000011478 gradient descent method Methods 0.000 claims abstract description 12
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000009826 distribution Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 11
- 238000004891 communication Methods 0.000 abstract description 8
- 230000007423 decrease Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241001282153 Scopelogadus mizolepis Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000004087 circulation Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a kind of methods of the stochastic gradient descent of decentralization, the parallel stochastic gradient descent method of traditional centralization in traditional distributed deep learning frame is improved to the parallel stochastic gradient descent method of decentralization to be trained, central server node is removed, remaining working node and operated adjacent node communication carry out local model training and parameter updates, it is repeatedly trained by more working nodes to finally obtain locally optimal solution to the continuous tuning of parameter, to complete distributed deep learning.
Description
Technical field
The present invention relates to depth learning technology field, more particularly to a kind of stochastic gradient descent of decentralization
Method.
Background technique
Currently, not stopping paying out today of exhibition in artificial intelligence, deep learning has become an important neck of artificial intelligence
Domain, distributed deep learning algorithm have iterative, and the update and non-once completion of model need loop iteration multiple;Have
Fault-tolerance, even if generating some mistakes in each cycle, the final convergence of model is unaffected;It is convergent non-equal with parameter
Even property, some parameters just no longer change by several circulations in model, and other parameters take a long time to restrain, these features are determined
Determined deep learning algorithm applied to machine learning can not be with the increase of machine and ability is linearly promoted, because of vast resources
It will be wasted in communication, waiting, coordination etc., in order to make up this defect, parameter server is suggested dedicated for big
Scale optimizes the frame of processing, is used for the training of large-scale data, for example, TB even PB rank and large-scale mould
Shape parameter.In large-scale Optimization Framework, usually has billions of or even hundred billion ranks parameters and need to estimate, therefore,
When design faces the system of this challenge, the algorithm optimized in extensive topic model dependent on SGD or L-BFGS is needed
Solve to need to consume when frequently accessing modification model parameter enormous bandwidth, improve degree of parallelism, it is synchronous wait caused by delay with
And the problems such as fault-tolerant, therefore the parallel stochastic gradient descent algorithm for the parameter server concentrated frequently with one, band is distributed
Formula deep learning.
But the distributed deep learning frame with parameter server, it can achieve in the case where network is unobstructed preferably
Effect.However in reality, network environment is not necessarily always optimal situation, under the network condition of low bandwidth and high delay, property
It can significantly decrease, the reason is that on parameter server node, because to be communicated with all nodes, in network
The problem of will appear network congestion in the case where bad, to reduce operating rate.In addition, as network model is more and more multiple
Miscellaneous, call duration time has increasing accounting.Largely call duration time is bigger to the pressure of parameter server, at this time
Call duration time just becomes bottleneck.
Therefore, the communication time in distributed deep learning training how is reduced, improving operational efficiency is art technology
The problem of personnel's urgent need to resolve.
Summary of the invention
In view of this, can be applicable to data simultaneously the present invention provides a kind of method of the stochastic gradient descent of decentralization
In capable distributed deep learning frame, by the central node in traditional parallel stochastic gradient descent method, i.e. parameter service
Device node removes, to save communication time, improves network transfer speeds.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of method of the stochastic gradient descent of decentralization, comprises the following specific steps that:
Step 1: by the Segmentation of Data Set for needing to be trained at n block, distributing to one for each individual described piece
Specific working node;
Step 2: the data of each working node sampling model training from assigned block are for described in training
Working node local model;
Step 3: each working node uses iterative method and parallel stochastic gradient descent method to carry out the work simultaneously
The parameter of node, which updates, to be calculated;
Wherein specific step is as follows for the working node parameter update:
Step 31: local working node being initialized first: the initial value x of parameter is set0;Step-length γ is set;If
Set weight matrix W;The number of iterations K is set;
Step 32: concentrating the data randomly selected for iteration in the local data of the local working node;
Step 33: stochastic gradient descent method being used to the data and the parameter of the local working node, is used
FormulaThe gradient u of iteration is found out, wherein xiFor the updated parameter of local node described in last iteration;
Step 34: obtaining the parameter of the local working node and operated adjacent node, and obtained from weight matrix W
The weight for taking the operated adjacent node and the working node, obtains provisional parameter x ' after being weighted;
Step 35: on the provisional parameter x ' that gradient u obtained in step 33 and step 34 step are obtained, bringing into
Stochastic gradient descent formulaIt receives the undated parameter x of the local working node and is updated;
Step 36: the gradient u of the local working node and the operated adjacent node being detected, with described
The gradient of working node and the ratio between the gradient of the operated adjacent node redistribute the weight in the weight matrix W,
Specific calculating process are as follows: the local working node and the operated adjacent node gradient are compared, obtain decimal than big
Several ratio is multiplied with the ratio with the weight of the operated adjacent node and is adjusted the rear operated adjacent node
The weight, then the sum of the weight that subtracts the operated adjacent node with 1 obtains the local working node adjustment
The weight afterwards;Step 37: judging whether that completing K iteration if it is completes institute if being otherwise again introduced into step 32
The parameter for stating working node updates and the weight distribution, completes the process of the local model training.
Preferably, the weight matrix W is all initialized, and local working node and operated adjacent node weight are initialized as
1/3。
Preferably, working node weight adjusted is saved into the weight matrix W.
It can be seen via above technical scheme that compared with prior art, the present disclosure provides a kind of decentralizations
The method of stochastic gradient descent is removed the central server node in traditional parallel stochastic gradient descent method, so that often
The parameter of a working node more new capital is carried out in local and operated adjacent node, mutually transmits information, root between working node
Weight is redistributed according to the ratio of local working node gradient and operated adjacent node gradient, and saves work section with weight matrix
Weighing factor between point carries out using iterative method when parameter update, and each time when iteration, each working node itself executes one
Secondary stochastic gradient descent algorithm, before using gradients affect parameter, first to the parameter of local working node and operated adjacent node
Acquisition provisional parameter is weighted with weight, with the gradient of provisional parameter and local working node to local working node into
Row parameter updates, and is finally completed model training process.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 attached drawing is flow chart schematic diagram provided by the invention;
Fig. 2 attached drawing is working node communication structure schematic diagram provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a kind of methods of the stochastic gradient descent of decentralization, comprise the following specific steps that:
S1: the Segmentation of Data Set being trained will be needed at n block, each individual block is distributed into a specific work
Make node;
S2: the data of each working node sampling model training from assigned block are local for training node
Model;
S3: each working node uses iterative method and parallel stochastic gradient descent method to carry out the parameter of working node more simultaneously
It is new to calculate;
Wherein specific step is as follows for the update of working node parameter:
S31: local working node is initialized first: the initial value x of parameter is set0;Step-length γ is set;Setting power
Weight matrix W;The number of iterations K is set;
S32: the local data for the node that works in this locality concentrates the data randomly selected for iteration;
S33: data and parameter to local working node find out the gradient u and ginseng of iteration using stochastic gradient descent method
Number, gradient calculation formula areWherein xiFor the last updated parameter of iteration local node;
S34: obtaining the parameter of local working node and operated adjacent node, and local work is obtained from weight matrix W
The weight of node and operated adjacent node obtains provisional parameter x ' after being weighted;
S35: provisional parameter x ' obtained in gradient u and S34 according to obtained in S33, using stochastic gradient descent formulaIt receives the undated parameter x of local working node and is updated;
S36: detecting the gradient u of local working node and operated adjacent node, with the gradient u and phase of working node
The ratio between gradient u of adjacent working node redistributes the weight in weight matrix W, specific calculating process are as follows: by local working node
It is compared with operated adjacent node gradient u, the decimal ratio several than greatly is obtained, with weight and the ratio phase of operated adjacent node
Multiply the weight for being adjusted rear adjacent node, then the sum of the weight for subtracting operated adjacent node with 1 obtains local working node tune
Weight after whole;
S37: judging whether to complete K iteration, if being otherwise again introduced into step 32, if it is completes working node
Parameter updates and weight distribution, completes the process of local model training, and redistributing weight matrix is to adjust different operating section
Weight between point makes to obtain bigger weight with the more similar adjacent node of local node gradient u, to accelerate model instruction
Convergence rate in white silk.
In order to further optimize the above technical scheme, weight matrix W is all initialized, local working node and operated adjacent
Node weight is initialized as 1/3.
In order to further optimize the above technical scheme, working node weight updates result and saves into weight matrix W.
Embodiment
During decentralization stochastic gradient descent model training, its essence is by conventional center stochastic gradient descent
Center Parameter server node in method removes, so that the parameter of each working node more new capital is in local and operated adjacent section
It being carried out between point, training dataset is divided into n block first, each individually block is assigned to a specific working node,
Same each working node can be in local training pattern.Each working node carries out local model training simultaneously, needs first to define
The weight matrix of one all working node that are connected, the then initial parameter, step-length and the number of iterations of initial work node,
Carry out working node local model training when, working node each first finds out local gradient, then with adjacent work section
Point carries out parameter exchange, i.e., by the local working node of the parameter and acquisition of local working node and operated adjacent node and adjacent
The weight of working node is weighted to obtain provisional parameter, uses later to the gradient of provisional parameter and local working node
Stochastic gradient descent obtains the undated parameter of local working node and carries out the update of local node parameter, while to local work
The gradient difference of node and operated adjacent node is detected, and redistributes weight with the ratio between gradient.
Weight matrix W is the matrix of a n*n, and every a line weight represents a local working node with all working node
Between weights influence relationship, in the weights initialisation of every a line, by local working node and two operated adjacent nodes
Weighted value is initialized as 1/3, remaining is initialized as 0, that is, indicates local working node with remaining non-conterminous working node weight
It is 0, does not have weights influence.
Since the present invention does not have a central server node, the communication complexity of most busy working node be by need into
What the complexity of the corresponding figure of row model training determined, although call duration time on each working node compared with centralization with
The call duration time of working node in the method for machine gradient decline increased, but due to the parameter use of each working node with
The number that the decline of machine gradient updates is constant and the present invention eliminates central server node, so while calculating time phase difference
Seldom, but the used time generally of the invention is shorter, and especially under low bandwidth and the network condition of high delay, call duration time advantage is more
Obviously.
For communicating requirement, working node requires same central server in the method for conventional center stochastic gradient descent
Node communication, therefore it is required that data differences in asynchronous communication between all working node cannot be too big, and the present invention due to
It only needs to be communicated with operated adjacent node, so only need to guarantee operated adjacent node data similarity, therefore this
The invention scope of application is wider.
The present invention provides a kind of methods of the stochastic gradient descent of decentralization to carry out distributed deep learning, will be traditional
Data parallel stochastic gradient descent method in central server node remove so that the parameter of each working node more new capital
It works in this locality and is carried out between node and operated adjacent node, information is mutually transmitted between working node, calculate each work section
The gradient value of point, first by the weight of the parameter of operated adjacent node and operated adjacent node and the weight of local working node into
Row ranking operation, the parameter value after then enabling weighting influences the parameter of local working node, further according to local working node gradient
Weight is redistributed with the ratio of operated adjacent node gradient and is saved in weight matrix, to complete local working node ginseng
Distributed deep learning training is realized in several updates.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (3)
1. a kind of method of the stochastic gradient descent of decentralization, which is characterized in that comprise the following specific steps that:
Step 1: the Segmentation of Data Set that is trained will be needed at n block, by each individual described piece distribute to one it is specific
Working node;
Step 2: each working node sampling model training data from assigned block is used to train the work section
The local model of point;
Step 3: each working node uses iterative method and stochastic gradient descent method to carry out the working node parameter simultaneously
The calculating of update;
Wherein specific step is as follows for the working node parameter update:
Step 31: the local working node being initialized first: the initial value x of parameter is set0;Step-length γ is set;Setting
Weight matrix W;The number of iterations K is set;
Step 32: concentrating the data randomly selected for iteration in the local data of the local working node;
Step 33: are found out by this using stochastic gradient descent method and is changed for the data and the parameter of the local working node
The gradient u in generation;
Step 34: obtaining the parameter of the operated adjacent node and the local working node, and obtained from weight matrix W
The weight of the operated adjacent node and the local working node is taken, then obtains provisional parameter x ' after being weighted;
Step 35: the provisional parameter x ' that the gradient u and step 34 step according to obtained in step 33 obtain, using random
Gradient descent method obtains the undated parameter of local working node;
Step 36: the gradient of the gradient u and the operated adjacent node being detected, according to the local working node
Gradient and the ratio between the gradient of the operated adjacent node redistribute the weight in the weight matrix W;
Step 37: judging whether that completing K iteration if it is completes the work section if being otherwise again introduced into step 32
The process of the weight distribution that the parameter of point updates, the local model training is completed.
2. a kind of method of the stochastic gradient descent of decentralization according to claim 1, which is characterized in that the weight
Matrix W all initializes, and local working node and operated adjacent node weight are initialized as 1/3.
3. a kind of method of the stochastic gradient descent of decentralization according to claim 1, which is characterized in that each institute
Stating working node tool, there are two the operated adjacent nodes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811309202.9A CN109492753A (en) | 2018-11-05 | 2018-11-05 | A kind of method of the stochastic gradient descent of decentralization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811309202.9A CN109492753A (en) | 2018-11-05 | 2018-11-05 | A kind of method of the stochastic gradient descent of decentralization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109492753A true CN109492753A (en) | 2019-03-19 |
Family
ID=65693869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811309202.9A Pending CN109492753A (en) | 2018-11-05 | 2018-11-05 | A kind of method of the stochastic gradient descent of decentralization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109492753A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929878A (en) * | 2019-10-30 | 2020-03-27 | 同济大学 | Distributed random gradient descent method |
CN110956265A (en) * | 2019-12-03 | 2020-04-03 | 腾讯科技(深圳)有限公司 | Model training method and related device |
CN111178503A (en) * | 2019-12-16 | 2020-05-19 | 北京邮电大学 | Mobile terminal-oriented decentralized target detection model training method and system |
CN112001501A (en) * | 2020-08-14 | 2020-11-27 | 苏州浪潮智能科技有限公司 | Parameter updating method, device and equipment of AI distributed training system |
CN112688809A (en) * | 2020-12-21 | 2021-04-20 | 声耕智能科技(西安)研究院有限公司 | Diffusion adaptive network learning method, system, terminal and storage medium |
CN112861991A (en) * | 2021-03-09 | 2021-05-28 | 中山大学 | Learning rate adjusting method for neural network asynchronous training |
CN113254215A (en) * | 2021-06-16 | 2021-08-13 | 腾讯科技(深圳)有限公司 | Data processing method and device, storage medium and electronic equipment |
CN113870588A (en) * | 2021-08-20 | 2021-12-31 | 深圳市人工智能与机器人研究院 | Traffic light control method based on deep Q network, terminal and storage medium |
CN114398949A (en) * | 2021-12-13 | 2022-04-26 | 鹏城实验室 | Training method of impulse neural network model, storage medium and computing device |
US11875256B2 (en) | 2020-07-09 | 2024-01-16 | International Business Machines Corporation | Dynamic computation in decentralized distributed deep learning training |
US11886969B2 (en) | 2020-07-09 | 2024-01-30 | International Business Machines Corporation | Dynamic network bandwidth in distributed deep learning training |
US11977986B2 (en) | 2020-07-09 | 2024-05-07 | International Business Machines Corporation | Dynamic computation rates for distributed deep learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104714852A (en) * | 2015-03-17 | 2015-06-17 | 华中科技大学 | Parameter synchronization optimization method and system suitable for distributed machine learning |
US20160180162A1 (en) * | 2014-12-22 | 2016-06-23 | Yahoo! Inc. | Generating preference indices for image content |
US9659248B1 (en) * | 2016-01-19 | 2017-05-23 | International Business Machines Corporation | Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations |
CN107194396A (en) * | 2017-05-08 | 2017-09-22 | 武汉大学 | Method for early warning is recognized based on the specific architecture against regulations in land resources video monitoring system |
CN107578094A (en) * | 2017-10-25 | 2018-01-12 | 济南浪潮高新科技投资发展有限公司 | The method that the distributed training of neutral net is realized based on parameter server and FPGA |
CN108122027A (en) * | 2016-11-29 | 2018-06-05 | 华为技术有限公司 | A kind of training method of neural network model, device and chip |
CN108287763A (en) * | 2018-01-29 | 2018-07-17 | 中兴飞流信息科技有限公司 | Parameter exchange method, working node and parameter server system |
-
2018
- 2018-11-05 CN CN201811309202.9A patent/CN109492753A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160180162A1 (en) * | 2014-12-22 | 2016-06-23 | Yahoo! Inc. | Generating preference indices for image content |
CN104714852A (en) * | 2015-03-17 | 2015-06-17 | 华中科技大学 | Parameter synchronization optimization method and system suitable for distributed machine learning |
US9659248B1 (en) * | 2016-01-19 | 2017-05-23 | International Business Machines Corporation | Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations |
CN108122027A (en) * | 2016-11-29 | 2018-06-05 | 华为技术有限公司 | A kind of training method of neural network model, device and chip |
CN107194396A (en) * | 2017-05-08 | 2017-09-22 | 武汉大学 | Method for early warning is recognized based on the specific architecture against regulations in land resources video monitoring system |
CN107578094A (en) * | 2017-10-25 | 2018-01-12 | 济南浪潮高新科技投资发展有限公司 | The method that the distributed training of neutral net is realized based on parameter server and FPGA |
CN108287763A (en) * | 2018-01-29 | 2018-07-17 | 中兴飞流信息科技有限公司 | Parameter exchange method, working node and parameter server system |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929878A (en) * | 2019-10-30 | 2020-03-27 | 同济大学 | Distributed random gradient descent method |
CN110929878B (en) * | 2019-10-30 | 2023-07-04 | 同济大学 | Distributed random gradient descent method |
CN110956265A (en) * | 2019-12-03 | 2020-04-03 | 腾讯科技(深圳)有限公司 | Model training method and related device |
CN111178503A (en) * | 2019-12-16 | 2020-05-19 | 北京邮电大学 | Mobile terminal-oriented decentralized target detection model training method and system |
US11977986B2 (en) | 2020-07-09 | 2024-05-07 | International Business Machines Corporation | Dynamic computation rates for distributed deep learning |
US11886969B2 (en) | 2020-07-09 | 2024-01-30 | International Business Machines Corporation | Dynamic network bandwidth in distributed deep learning training |
US11875256B2 (en) | 2020-07-09 | 2024-01-16 | International Business Machines Corporation | Dynamic computation in decentralized distributed deep learning training |
CN112001501B (en) * | 2020-08-14 | 2022-12-23 | 苏州浪潮智能科技有限公司 | Parameter updating method, device and equipment of AI distributed training system |
CN112001501A (en) * | 2020-08-14 | 2020-11-27 | 苏州浪潮智能科技有限公司 | Parameter updating method, device and equipment of AI distributed training system |
CN112688809B (en) * | 2020-12-21 | 2023-10-03 | 声耕智能科技(西安)研究院有限公司 | Diffusion self-adaptive network learning method, system, terminal and storage medium |
CN112688809A (en) * | 2020-12-21 | 2021-04-20 | 声耕智能科技(西安)研究院有限公司 | Diffusion adaptive network learning method, system, terminal and storage medium |
CN112861991A (en) * | 2021-03-09 | 2021-05-28 | 中山大学 | Learning rate adjusting method for neural network asynchronous training |
CN113254215A (en) * | 2021-06-16 | 2021-08-13 | 腾讯科技(深圳)有限公司 | Data processing method and device, storage medium and electronic equipment |
CN113870588A (en) * | 2021-08-20 | 2021-12-31 | 深圳市人工智能与机器人研究院 | Traffic light control method based on deep Q network, terminal and storage medium |
CN114398949A (en) * | 2021-12-13 | 2022-04-26 | 鹏城实验室 | Training method of impulse neural network model, storage medium and computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492753A (en) | A kind of method of the stochastic gradient descent of decentralization | |
CN106297774B (en) | A kind of the distributed parallel training method and system of neural network acoustic model | |
CN108460457A (en) | A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks | |
CN104714852B (en) | A kind of parameter synchronization optimization method and its system suitable for distributed machines study | |
CN106156810A (en) | General-purpose machinery learning algorithm model training method, system and calculating node | |
CN108829441A (en) | A kind of parameter update optimization system of distribution deep learning | |
CN110533183B (en) | Task placement method for heterogeneous network perception in pipeline distributed deep learning | |
WO2023240845A1 (en) | Distributed computation method, system and device, and storage medium | |
CN103561055B (en) | Web application automatic elastic extended method under conversation-based cloud computing environment | |
CN107291550B (en) | A kind of Spark platform resource dynamic allocation method and system for iterated application | |
CN110046048B (en) | Load balancing method based on workload self-adaptive fast redistribution | |
CN104881322B (en) | A kind of cluster resource dispatching method and device based on vanning model | |
Zhan et al. | Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking | |
CN105930591A (en) | Realization method for register clustering in clock tree synthesis | |
CN106020944B (en) | It is a kind of to configure the method and system for carrying out data downloading based on background data base | |
CN104639626A (en) | Multi-level load forecasting and flexible cloud resource configuring method and monitoring and configuring system | |
CN108986063A (en) | The method, apparatus and computer readable storage medium of gradient fusion | |
CN106250240A (en) | A kind of optimizing and scheduling task method | |
CN110059829A (en) | A kind of asynchronous parameters server efficient parallel framework and method | |
CN104346214B (en) | Asynchronous task managing device and method for distributed environment | |
CN107016214A (en) | A kind of parameter based on finite state machine relies on the generation method of model | |
CN108958852A (en) | A kind of system optimization method based on FPGA heterogeneous platform | |
CN107436865A (en) | A kind of word alignment training method, machine translation method and system | |
CN104462329A (en) | Operation process digging method suitable for diversified environment | |
Cao et al. | SAP-SGD: Accelerating distributed parallel training with high communication efficiency on heterogeneous clusters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20211029 |