CN114118381B - Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication - Google Patents
Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication Download PDFInfo
- Publication number
- CN114118381B CN114118381B CN202111470644.3A CN202111470644A CN114118381B CN 114118381 B CN114118381 B CN 114118381B CN 202111470644 A CN202111470644 A CN 202111470644A CN 114118381 B CN114118381 B CN 114118381B
- Authority
- CN
- China
- Prior art keywords
- adaptive
- communication
- sparse
- learning
- adaptive aggregation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004891 communication Methods 0.000 title claims abstract description 93
- 238000004220 aggregation Methods 0.000 title claims abstract description 78
- 230000002776 aggregation Effects 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 76
- 230000003044 adaptive effect Effects 0.000 claims abstract description 79
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims abstract description 12
- 238000003062 neural network model Methods 0.000 claims abstract description 12
- 230000006835 compression Effects 0.000 claims description 7
- 238000007906 compression Methods 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 230000010287 polarization Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 8
- 238000013139 quantization Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 1
- 229920002430 Fibre-reinforced plastic Polymers 0.000 description 1
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000011151 fibre-reinforced plastic Substances 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention relates to the field of distributed learning, and discloses a learning method, a device, equipment and a medium based on self-adaptive aggregation sparse communication; performing sparse processing on target information corresponding to the target node; calculating a convergence result according to a preset sequence and a Lyapunov function; training the deep neural network model to obtain a learning method, adaptively skipping some communication rounds by adaptive selection rules, and further reducing the number of communication bits by sparsifying the transmission information. For the polarization of the top-k sparsification operator, the algorithm uses an error feedback format, and the technical effect of fully utilizing the computing capacity of the distributed cluster is realized.
Description
Technical Field
The present disclosure relates to the field of distributed learning, and in particular, to a learning method, apparatus, device, and medium based on adaptive aggregation sparse communication.
Background
Random optimization algorithms implemented on distributed computing architectures are increasingly being used to address large-scale machine learning issues. One key bottleneck in such systems is the communication overhead for exchanging information such as random gradients between different nodes. The sparse communication method and the adaptive aggregation method for reserving the memory are frames in various technologies proposed for solving the problem. Intuitively, multiprocessor co-training for a task may speed up the training process and reduce training time. However, the cost of communication between processors often hinders scalability of a distributed system. Worse yet, the performance of multiple processors may be lower than the performance of a single processor when the ratio of computation to communication is low.
Therefore, how to fully utilize the computing power of the distributed clusters is a technical problem to be solved.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a learning method, a learning device, learning equipment and learning media based on self-adaptive aggregation sparse communication, and aims to solve the problem that the computing capacity of a distributed cluster cannot be fully utilized in the prior art.
In order to achieve the above object, the present invention provides a learning method based on adaptive aggregation sparse communication, the method comprising:
acquiring an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule;
performing sparse processing on target information corresponding to the target node;
calculating a convergence result according to a preset sequence and a Lyapunov function;
the deep neural network model is trained to obtain a learning method.
Optionally, the step of acquiring the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule includes:
acquiring a preset self-adaptive aggregation rule;
dividing all nodes into two disjoint sets M between nodes communicating with a server according to the adaptive aggregation rule t And
upon detection of the t-th iteration, M is used t Intermediate node and new gradient information, and inReducing iterative communication rounds from M to |M by reusing old compression gradient information of nodes t I to determine the target node.
Optionally, the step of performing sparse processing on the target information corresponding to the target node includes:
and selecting a top-k gradient component for the target information corresponding to the target node in iteration, and setting the rest gradient components to zero so that zero elements are free from communication.
Optionally, the step of selecting a top-k gradient component of the target information corresponding to the target node in the iteration and setting zero for the rest gradient components, so that the zero element is free from communication, further includes:
using an error feedback technology, and incorporating the error generated by the sparsification into the next step to ensure convergence;
definition of helper sequencesWherein->Is the error at the t-th iteration on the m-node.
Optionally, the step of calculating the convergence result according to the preset sequence in combination with the lyapunov function includes:
recording deviceAnd the learning rate is selected as
Wherein c γ >0 is a constant, giving:
thereby converging the calculation result.
Optionally, the step of training the deep neural network model to obtain a learning method includes:
the training is performed using the following iterative format,
wherein,
optionally, after the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule, the method further includes:
in combination with the adaptive aggregation rule, the following iteration format is utilized for iteration
Wherein M is t Andthe working set is respectively communicated with the server and not communicated with in the t-th iteration.
In addition, in order to achieve the above object, the present invention also proposes a learning device based on adaptive aggregation sparse communication, which is characterized in that the device includes:
the node determining module is used for acquiring the self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule;
the sparse processing module is used for carrying out sparse processing on the target information corresponding to the target node;
the result acquisition module is used for calculating a convergence result according to a preset sequence and combining with the Lyapunov function;
and the model training module is used for training the deep neural network model to obtain a learning method.
In addition, to achieve the above object, the present invention also proposes a computer apparatus including: the system comprises a memory, a processor and an adaptive aggregate sparse communication-based learning program stored on the memory and executable on the processor, the adaptive aggregate sparse communication-based learning program configured to implement the adaptive aggregate sparse communication-based learning method as described above.
In addition, in order to achieve the above object, the present invention also proposes a medium having stored thereon a learning program based on adaptive aggregation sparse communication, which when executed by a processor, implements the steps of the learning method based on adaptive aggregation sparse communication as described above.
The method comprises the steps of obtaining an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule; performing sparse processing on target information corresponding to the target node; calculating a convergence result according to a preset sequence and a Lyapunov function; training the deep neural network model to obtain a learning method, adaptively skipping some communication rounds by adaptive selection rules, and further reducing the number of communication bits by sparsifying the transmission information. For the polarization of the top-k sparsification operator, an error feedback format is used by the algorithm, so that the technical effect of fully utilizing the computing capacity of the distributed cluster is realized.
Drawings
FIG. 1 is a schematic structural diagram of a learning device based on adaptive aggregation sparse communication in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention based on adaptive aggregated sparse communications;
fig. 3 is a comparison diagram of four algorithms based on adaptive aggregated sparse communications according to an embodiment of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a learning device based on adaptive aggregation sparse communication in a hardware running environment according to an embodiment of the present invention.
As shown in fig. 1, the learning device based on adaptive aggregation sparse communication may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the structure shown in fig. 1 does not constitute a limitation of the learning device based on adaptive aggregated sparse communications, and may include more or fewer components than illustrated, or may combine certain components, or may be a different arrangement of components.
As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a learning program based on adaptive aggregate sparse communication may be included in the memory 1005 as one storage medium.
In the learning device based on adaptive aggregation sparse communication shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the learning device based on the adaptive aggregation sparse communication according to the present invention may be disposed in the learning device based on the adaptive aggregation sparse communication, where the learning device based on the adaptive aggregation sparse communication invokes the learning program based on the adaptive aggregation sparse communication stored in the memory 1005 through the processor 1001, and executes the learning method based on the adaptive aggregation sparse communication provided by the embodiment of the present invention.
The embodiment of the invention provides a learning method based on self-adaptive aggregation sparse communication, and referring to fig. 2, fig. 2 is a flow diagram of a first embodiment of the learning method based on self-adaptive aggregation sparse communication.
In this embodiment, the learning method based on adaptive aggregation sparse communication includes the following steps:
step S10: and acquiring an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule. A step of
It should be noted that the size and complexity of Machine Learning (ML) models and data sets have increased significantly over the last decades, resulting in higher computational intensity and thus more time consuming training processes. With the development of distributed training, it is accelerated using multiple processors. A large number of distributed machine learning tasks can be described as
Where ω is the parameter vector to be learned, d is the dimension of the parameter, M: = {1,..m } represents a set of distributed nodes,is a smooth loss function (not necessarily convex) on node m, but +.>Is and probability distributionRelated are independent random data samples.
It will be appreciated that for simplicity, definitions
In a specific implementation, a random gradient descent algorithm (SGD) is the dominant force for solving the problem, and the iterative format is
Where gamma is the learning rate and where,is the small batch of data that node m selects at the t-th iteration.
Further, the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule includes: acquiring a preset self-adaptive aggregation rule; dividing all nodes into two disjoint sets M between nodes communicating with a server according to the adaptive aggregation rule t Andupon detection of the t-th iteration, M is used t Intermediate node and new gradient information, and is +.>Reducing iterative communication rounds from M to |M by reusing old compression gradient information of nodes t I to determine the target node.
Further, after the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule, the method further includes:
in combination with the adaptive aggregation rule, the following iteration format is utilized for iteration
Wherein M is t Andthe working set is respectively communicated with the server and not communicated with in the t-th iteration.
It should be noted that, it is important to reduce the number of communication rounds to improve the communication efficiency. The high-order information (newton-type method) is used to replace the conventional gradient information, thereby reducing the number of communication rounds. A distributed preprocessing acceleration gradient method is proposed to reduce the number of communication rounds. There are also many new aggregation techniques, such as periodic aggregation and adaptive aggregation techniques, developed for skipping certain communications. Each node is allowed to independently perform local model updates and periodically average the generated models. The delayed aggregation gradient (LAG) method updates the model at the server side and the nodes only adaptively upload information with sufficient information content. Unfortunately, while LAG has good performance in deterministic settings (i.e., full gradients), its performance in random settings is significantly degraded. More recent efforts have been made in random settings for adaptive aggregation algorithms. The communication audit distributed random gradient descent algorithm (CSGD) increases the batch size to mitigate the effects of random gradient noise. The inert aggregation random gradient algorithm (LASG) designs a group of new self-adaptive communication rules customized for the random ladder measurement, and achieves a good experimental effect.
In particular implementations, a highly efficient communication algorithm, i.e., a SASG, that combines sparse communication with an adaptive aggregated random gradient is presented herein. Our SASG method can be the same asThe number of communication bits and the number of communication rounds are saved without sacrificing the required convergence characteristics. Considering that in a distributed learning system, not all the communication rounds between a server and a node are equally important, we can adjust the frequency of communication between the node and the server according to the importance of the node to transmit information. More specifically, in terms of reducing the number of communication rounds, we formulate an adaptive selection rule that divides a set of nodes M into two disjoint sets M t And->At the t-th iteration we will use only M t New gradient information of selected node in (b) while reusing +.>The old compression gradient information of the intermediate node can reduce the communication round of each iteration from M to |M t | a. The invention relates to a method for producing a fibre-reinforced plastic composite. On the other hand, since the quantization method is only used in the common single-precision floating point operationThe maximum compression ratio of 32 times can be achieved, so that a more effective thinning method is adopted in the algorithm. In particular, we will select the top-k gradient component (in terms of absolute value) at each iteration and set the remaining gradient components to zero so that zero value elements are free from communication, thereby significantly reducing the number of communication bits.
In particular implementations, note that the communication rounds between the server and the nodes do not all contribute the same in a distributed learning system, so we employ an adaptive aggregation method to develop an aggregation rule that can skip inefficient communication rounds. This adaptive aggregation method is derived from the Lazy Aggregation Gradient (LAG) method, which designs an adaptive option to detect nodes with little gradient change and reuse old gradients. In combination with such an adaptive aggregation rule, the following iterative format can be obtained
Wherein M is t Andthe working set is respectively communicated with the server and not communicated with in the t-th iteration.
Step S20: and carrying out sparse processing on the target information corresponding to the target node.
It should be noted that such studies are mainly developed around the ideas of quantization and sparsification. Quantization methods compress information by transmitting lower bits instead of the original 32-bit data. The quantization random gradient descent algorithm (QSGD) provides additional flexibility to control the trade-off between per-iteration communication cost and convergence speed, using adjustable quantization levels. Ternary gradients are used to reduce the communication data size. It reduces each component of the gradient to its sign bit (one bit). The sparsification method aims to reduce the number of elements transmitted per iteration. These methods can be divided into two main categories: random sparsification and deterministic sparsification. Random sparsification is the random selection of components for communication. This method is named random-k, where k represents the number of selected components. This random selection method is typically an unbiased estimate of the original gradient, making it very friendly to theoretical analysis. Unlike stochastic sparsification, deterministic sparsification retains only the k components with the largest stochastic gradient by considering the size of each component. This method is also known as top-k. It is apparent that this approach should use some error feedback or accumulation procedure to ensure that all gradient information is ultimately added to the model, although with some delay, compared to the unbiased approach.
In particular implementations, after the adaptive selection process, the selected node sends sparse information derived by the top-k operator to the parameter server.
Further, the step of performing sparse processing on the target information corresponding to the target node includes: and selecting a top-k gradient component for the target information corresponding to the target node in iteration, and setting the rest gradient components to zero so that zero elements are free from communication.
Further, the step of selecting a top-k gradient component of the target information corresponding to the target node in the iteration process and setting zero for the rest gradient components, so that the zero element is free from communication, further includes: using an error feedback technology, and incorporating the error generated by the sparsification into the next step to ensure convergence; definition of helper sequencesWherein->Is the error at the t-th iteration on the m-node.
Step S30: and calculating a convergence result according to a preset sequence and combining the Lyapunov function.
It should be noted that, in this embodiment, a biased top-k sparsification operator is applied, and the convergence analysis is more complicated due to the introduced compression error. We define an auxiliary sequence v t } t=0,1,... The sequence can be regarded as { ω } t } t=0,1,... Error of (2)And (5) approximating. We obtain the convergence result of the SASG algorithm by analyzing the sequence, the convergence speed matches the original SGD.
Further, the step of calculating the convergence result according to the preset sequence in combination with the lyapunov function includes: recording deviceAnd the learning rate is selected as
Wherein c γ >0 is a constant, giving:
thereby converging the calculation result.
It will be appreciated that our algorithm ensures convergence and achieves a sub-linear convergence rate despite skipping many communication rounds and performing communication compression. In other words, the SASG algorithm can still achieve convergence rates on the same order of magnitude as the SGD approach using well-designed adaptive aggregation rules and sparse communication techniques.
Step S40: the deep neural network model is trained to obtain a learning method.
Further, the training the deep neural network model to obtain the learning method includes the steps of:
the training is performed using the following iterative format,
wherein,
in a specific implementation, the SASG algorithm is benchmarked using an inert-polymeric random gradient (LASG) method, a sparsification method, and a distributed SGD. Experience shows that up to 99% gradient information is not necessary in each iteration, so we use top-1% sparsification operators in the SASG algorithm and the sparsification method. In all experiments, the training data was distributed among m=10 nodes, each node performing one training iteration using 10 samples. We completed the following three set-up evaluations, each experiment being repeated five times. MNIST the MNIST dataset contains 70,000 handwritten digits in 10 categories, with 60,000 examples in the training set and 10,000 examples in the test set. We consider a two-layer Fully Connected (FC) neural network model, with 512 neurons in the second layer for class 10 classification on MNIST. For all algorithms we choose a learning rate γ=0.005. For the adaptive aggregation algorithms SASG and last, we set d=10, αd=1/2γ, d=1, 2. CIFAR-10[39 ]]The dataset consisted of 60,000 color images in 10 categories, each category having 6,000 images. We tested the res net18 model on the CIFAR-10 dataset using all algorithms described above. This experiment performed common data enhancement techniques such as random clipping, random flipping, and normalization. The basic learning rate was set to γ=0.01, and the learning rate was attenuated to 0.001 at the 20 th lot. For SASG and LASG, we set d=10, αd=1/γ, d=1, 2. CIFAR-100 the CIFAR-100 dataset contains 60,000 color images of 100 categories of 600 images each. There are 500 training images and 100 test images for each category. We tested VGG16 model on CIFAR-100 dataset [41]. The experiment performed a similar data enhancement technique. The basic learning rate was set to γ=0.01, and the learning rate was attenuated to 0.001 at the 30 th lot. For SASG and LASG we set d=10, αd=4/D/γ 2 D=1, 2,..10. Our experimental results are based on the PyTorch implementation of all methods run on a Ubuntu 20.04 machine equipped with a Nvidia RTX-2080Ti GPU.
It can be immediately obtained by calculating the parameter numbers of different models, and the communication bit numbers required by different algorithms to reach the same base line can be obtained. The last column of fig. 3 shows that the SASG algorithm combined with the adaptive aggregation technique and sparse communication significantly reduces the number of communication bits required for the model to achieve the same performance, far superior to the LASG and sparse algorithms.
The embodiment obtains an adaptive aggregation rule and determines a target node according to the adaptive aggregation rule; performing sparse processing on target information corresponding to the target node; calculating a convergence result according to a preset sequence and a Lyapunov function; training the deep neural network model to obtain a learning method, adaptively skipping some communication rounds by adaptive selection rules, and further reducing the number of communication bits by sparsifying the transmission information. For the polarization of the top-k sparsification operator, an error feedback format is used by the algorithm, so that the technical effect of fully utilizing the computing capacity of the distributed cluster is realized.
In addition, the embodiment of the invention also provides a medium, wherein the medium is stored with a learning program based on the adaptive aggregation sparse communication, and the learning program based on the adaptive aggregation sparse communication realizes the steps of the learning method based on the adaptive aggregation sparse communication when being executed by a processor.
Other embodiments or specific implementation manners of the learning device based on adaptive aggregation sparse communication according to the present invention may refer to the above method embodiments, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (4)
1. A learning method based on adaptive aggregation sparse communication, the method comprising:
acquiring an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule;
performing sparse processing on target information corresponding to the target node;
calculating a convergence result according to a preset sequence and a Lyapunov function;
training the deep neural network model to obtain a learning method;
the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule comprises the following steps:
acquiring a preset self-adaptive aggregation rule;
dividing all nodes into two disjoint sets M between nodes communicating with a server according to the adaptive aggregation rule t And
upon detection of the t-th iteration, M is used t New gradient information for selected nodes in a network, while reusingIntermediate node old compression gradient information reduces iteration communication rounds from M to |M t I to determine a target node;
the step of performing sparse processing on the target information corresponding to the target node includes:
selecting top-k gradient components for target information corresponding to the target node in iteration, and setting the rest gradient components to zero so that zero value elements are free from communication;
the step of selecting a top-k gradient component and setting the rest gradient components to zero when the target information corresponding to the target node is iterated, so that zero value elements are free from communication, and the method further comprises the steps of:
using an error feedback technology, and incorporating the error generated by the sparsification into the next step to ensure convergence;
definition of helper sequencesWherein->Is the error at the t-th iteration on the m-node;
the step of calculating the convergence result according to the preset sequence and combining with the Lyapunov function comprises the following steps:
recording deviceAnd the learning rate is selected as follows:
wherein c γ >0 is a constant, giving:
converging the calculation result;
the training of the deep neural network model to obtain the learning method comprises the following steps:
training is performed using the following iterative format:
wherein,
after the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule, the method further comprises the following steps:
in combination with the adaptive aggregation rule, iterating by using the following iteration format:
wherein M is t Andworking sets which are communicated with the server and are not communicated in the t-th iteration respectively; m: = {1,..m } represents a set of distributed nodes; />Is the small batch of data that node m selects at the t-th iteration.
2. A learning apparatus based on adaptive aggregation sparse communication, wherein the learning method based on adaptive aggregation sparse communication according to claim 1 is adopted, the apparatus comprising:
the node determining module is used for acquiring the self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule;
the sparse processing module is used for carrying out sparse processing on the target information corresponding to the target node;
the result acquisition module is used for calculating a convergence result according to a preset sequence and combining with the Lyapunov function;
and the model training module is used for training the deep neural network model to obtain a learning method.
3. A learning device based on adaptive aggregated sparse communications, the device comprising: a memory, a processor, and an adaptive aggregated sparse communication based learning program stored on the memory and executable on the processor, the adaptive aggregated sparse communication based learning program configured to implement the steps of the adaptive aggregated sparse communication based learning method of claim 1.
4. A medium having stored thereon a learning program based on adaptive aggregated sparse communication, which when executed by a processor, implements the steps of the learning method based on adaptive aggregated sparse communication of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111470644.3A CN114118381B (en) | 2021-12-03 | 2021-12-03 | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111470644.3A CN114118381B (en) | 2021-12-03 | 2021-12-03 | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114118381A CN114118381A (en) | 2022-03-01 |
CN114118381B true CN114118381B (en) | 2024-02-02 |
Family
ID=80366670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111470644.3A Active CN114118381B (en) | 2021-12-03 | 2021-12-03 | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114118381B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116341628B (en) * | 2023-02-24 | 2024-02-13 | 北京大学长沙计算与数字经济研究院 | Gradient sparsification method, system, equipment and storage medium for distributed training |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784002A (en) * | 2020-09-07 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Distributed data processing method, device, computer equipment and storage medium |
CN112424797A (en) * | 2018-05-17 | 2021-02-26 | 弗劳恩霍夫应用研究促进协会 | Concept for the transmission of distributed learning of neural networks and/or parametric updates thereof |
CN112766502A (en) * | 2021-02-27 | 2021-05-07 | 上海商汤智能科技有限公司 | Neural network training method and device based on distributed communication and storage medium |
CN113159287A (en) * | 2021-04-16 | 2021-07-23 | 中山大学 | Distributed deep learning method based on gradient sparsity |
CN113315604A (en) * | 2021-05-25 | 2021-08-27 | 电子科技大学 | Adaptive gradient quantization method for federated learning |
CN113467949A (en) * | 2021-07-07 | 2021-10-01 | 河海大学 | Gradient compression method for distributed DNN training in edge computing environment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11501160B2 (en) * | 2019-03-28 | 2022-11-15 | International Business Machines Corporation | Cloud computing data compression for allreduce in deep learning |
-
2021
- 2021-12-03 CN CN202111470644.3A patent/CN114118381B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112424797A (en) * | 2018-05-17 | 2021-02-26 | 弗劳恩霍夫应用研究促进协会 | Concept for the transmission of distributed learning of neural networks and/or parametric updates thereof |
CN111784002A (en) * | 2020-09-07 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Distributed data processing method, device, computer equipment and storage medium |
CN112766502A (en) * | 2021-02-27 | 2021-05-07 | 上海商汤智能科技有限公司 | Neural network training method and device based on distributed communication and storage medium |
CN113159287A (en) * | 2021-04-16 | 2021-07-23 | 中山大学 | Distributed deep learning method based on gradient sparsity |
CN113315604A (en) * | 2021-05-25 | 2021-08-27 | 电子科技大学 | Adaptive gradient quantization method for federated learning |
CN113467949A (en) * | 2021-07-07 | 2021-10-01 | 河海大学 | Gradient compression method for distributed DNN training in edge computing environment |
Also Published As
Publication number | Publication date |
---|---|
CN114118381A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lei et al. | GCN-GAN: A non-linear temporal link prediction model for weighted dynamic networks | |
US11726769B2 (en) | Training user-level differentially private machine-learned models | |
Ma et al. | Layer-wised model aggregation for personalized federated learning | |
US20230376856A1 (en) | Communication Efficient Federated Learning | |
EP3362918A1 (en) | Systems and methods of distributed optimization | |
CN113191484A (en) | Federal learning client intelligent selection method and system based on deep reinforcement learning | |
CN109597965B (en) | Data processing method, system, terminal and medium based on deep neural network | |
CN111158912A (en) | Task unloading decision method based on deep learning in cloud and mist collaborative computing environment | |
CN113850272A (en) | Local differential privacy-based federal learning image classification method | |
CN111737743A (en) | Deep learning differential privacy protection method | |
CN113778691B (en) | Task migration decision method, device and system | |
CN114118381B (en) | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication | |
US20230342606A1 (en) | Training method and apparatus for graph neural network | |
CN112417500A (en) | Data stream statistical publishing method with privacy protection function | |
US20210326757A1 (en) | Federated Learning with Only Positive Labels | |
CN112884513A (en) | Marketing activity prediction model structure and prediction method based on depth factorization machine | |
US7617172B2 (en) | Using percentile data in business analysis of time series data | |
WO2024066143A1 (en) | Molecular collision cross section prediction method and apparatus, device, and storage medium | |
CN114830137A (en) | Method and system for generating a predictive model | |
CN110768825A (en) | Service flow prediction method based on network big data analysis | |
Chen et al. | Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices | |
Kushwaha et al. | Optimal device selection in federated learning for resource-constrained edge networks | |
CN113760407A (en) | Information processing method, device, equipment and storage medium | |
Zhang et al. | FedMPT: Federated Learning for Multiple Personalized Tasks Over Mobile Computing | |
CN113034343A (en) | Parameter-adaptive hyperspectral image classification GPU parallel method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |