CN114118381A - Learning method, device, equipment and medium based on adaptive aggregation sparse communication - Google Patents
Learning method, device, equipment and medium based on adaptive aggregation sparse communication Download PDFInfo
- Publication number
- CN114118381A CN114118381A CN202111470644.3A CN202111470644A CN114118381A CN 114118381 A CN114118381 A CN 114118381A CN 202111470644 A CN202111470644 A CN 202111470644A CN 114118381 A CN114118381 A CN 114118381A
- Authority
- CN
- China
- Prior art keywords
- adaptive
- communication
- sparse
- target node
- adaptive aggregation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 94
- 230000002776 aggregation Effects 0.000 title claims abstract description 82
- 238000004220 aggregation Methods 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 78
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 67
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims abstract description 12
- 238000003062 neural network model Methods 0.000 claims abstract description 12
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 1
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Mobile Radio Communication Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of distributed learning, and discloses a learning method, a device, equipment and a medium based on self-adaptive aggregation sparse communication, wherein a self-adaptive aggregation rule is obtained, and a target node is determined according to the self-adaptive aggregation rule; performing sparse processing on target information corresponding to the target node; calculating a convergence result by combining a preset sequence with a Lyapunov function; training the deep neural network model to obtain a learning method, adaptively skipping some communication turns through an adaptive selection rule, and further reducing the number of communication bits by thinning transmission information. For the bias of the top-k sparse operator, an error feedback format is used in the algorithm, and the technical effect of fully utilizing the computing power of the distributed cluster is achieved.
Description
Technical Field
The present application relates to the field of distributed learning, and in particular, to a learning method, apparatus, device, and medium based on adaptive aggregation sparse communication.
Background
Stochastic optimization algorithms implemented on distributed computing architectures are increasingly being used to handle large-scale machine learning problems. One key bottleneck in such systems is the communication overhead for exchanging information such as random gradients between different nodes. The sparse communication method and the adaptive aggregation method for reserving the memory are frameworks of various technologies proposed for solving the problem two. Intuitively, multi-processor collaborative training for a task can speed up the training process and reduce training time. However, the cost of communication between processors often hinders scalability of distributed systems. Worse, the performance of multiple processors may be lower than the performance of a single processor when the ratio of computation to communication is low.
Therefore, how to fully utilize the computing power of the distributed cluster becomes a technical problem to be solved urgently.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a learning method, a device, equipment and a medium based on self-adaptive aggregation sparse communication, and aims to solve the problem that the computing capacity of a distributed cluster cannot be fully utilized in the prior art.
In order to achieve the above object, the present invention provides a learning method based on adaptive aggregation sparse communication, including:
obtaining a self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule;
performing sparse processing on target information corresponding to the target node;
calculating a convergence result by combining a preset sequence with a Lyapunov function;
the deep neural network model is trained to obtain a learning method.
Optionally, the step of obtaining an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule includes:
acquiring a preset self-adaptive aggregation rule;
dividing all nodes into a plurality of nodes according to the self-adaptive aggregation rule among the nodes communicating with the serverTwo disjoint sets MtAnd
using M when it is detected that the t-th iteration is performedtThe middle node and the new gradient information are inIntermediate reuse of old compression gradient information of nodes reduces iterative communication turn from M to | MtL to determine the target node.
Optionally, the step of performing sparse processing on the target information corresponding to the target node includes:
and selecting top-k gradient components for target information corresponding to the target node during iteration, and setting the rest gradient components to zero, so that zero elements are free from communication.
Optionally, after the step of selecting a top-k gradient component and setting the rest gradient components to zero in the iteration of the target information corresponding to the target node, so that zero elements are exempted from communication, the method further includes:
an error feedback technology is used, and errors generated by sparsification are brought into the next step to ensure convergence;
Optionally, the step of calculating a convergence result according to a preset sequence and a lyapunov function includes:
Wherein c isγ>0 is a constant, giving:
thereby converging the calculation result.
Optionally, the step of training the deep neural network model to obtain a learning method includes:
the following iterative format is used for training,
wherein the content of the first and second substances,
optionally, after the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule, the method further includes:
iteration is performed by combining the self-adaptive aggregation rule and utilizing the following iteration format
Wherein M istAndworking sets with and without communication with the server in the t-th iteration, respectively.
In addition, to achieve the above object, the present invention further provides an adaptive aggregation sparse communication based learning apparatus, including:
the node determining module is used for acquiring a self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule;
the sparse processing module is used for carrying out sparse processing on the target information corresponding to the target node;
the result acquisition module is used for calculating a convergence result according to a preset sequence and the Lyapunov function;
and the model training module is used for training the deep neural network model to obtain a learning method.
In addition, to achieve the above object, the present invention also provides a computer device, including: the adaptive sparse communication aggregation system comprises a memory, a processor and an adaptive sparse communication aggregation-based learning program stored on the memory and operable on the processor, wherein the learning program is configured to implement the learning method based on adaptive sparse communication aggregation as described above.
Furthermore, to achieve the above object, the present invention further proposes a medium having stored thereon an adaptive aggregated sparse communication based learning program, which when executed by a processor implements the steps of the adaptive aggregated sparse communication based learning method as described above.
The method comprises the steps of obtaining a self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule; performing sparse processing on target information corresponding to the target node; calculating a convergence result by combining a preset sequence with a Lyapunov function; training the deep neural network model to obtain a learning method, adaptively skipping some communication turns through an adaptive selection rule, and further reducing the number of communication bits by thinning transmission information. For the bias of the top-k sparse operator, an error feedback format is used in the algorithm, and the technical effect of fully utilizing the computing power of the distributed cluster is achieved.
Drawings
FIG. 1 is a schematic structural diagram of a learning device based on adaptive aggregation sparse communication in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flow chart of the adaptive aggregation sparse communication based solution of the present invention;
fig. 3 is a comparison diagram of four algorithms based on adaptive aggregation sparse communication according to an embodiment of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a learning device based on adaptive aggregation sparse communication in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the learning device based on adaptive aggregation sparse communication may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the architecture shown in fig. 1 does not constitute a limitation of an adaptive aggregate sparse communication based learning device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, the memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and a learning program based on adaptive aggregation sparse communication.
In the learning apparatus based on adaptive aggregation sparse communication shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the learning apparatus based on adaptive aggregation sparse communication according to the present invention may be disposed in the learning apparatus based on adaptive aggregation sparse communication, and the learning apparatus based on adaptive aggregation sparse communication calls the learning program based on adaptive aggregation sparse communication stored in the memory 1005 through the processor 1001 and executes the learning method based on adaptive aggregation sparse communication according to the present invention.
The embodiment of the invention provides a learning method based on adaptive aggregation sparse communication, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the learning method based on adaptive aggregation sparse communication.
In this embodiment, the learning method based on adaptive aggregation sparse communication includes the following steps:
step S10: and acquiring a self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule. A
It should be noted that over the past decades, Machine Learning (ML) models and datasets have increased significantly in size and complexity, resulting in higher computational intensity and therefore a more time-consuming training process. With this comes the development of distributed training, which uses multiple processors for acceleration. A large number of distributed machine learning tasks can be described as
Where ω is the parameter connection to be learned, d is the dimension of the parameter, M: m represents a set of distributed nodes,is the smooth loss function (not necessarily convex) at node m, andis and probability distributionThe correlations are independent random data samples.
In a specific implementation, a stochastic gradient descent algorithm (SGD) is the main force for solving the problem, and the iteration format is
Where gamma is the learning rate, where,is the small batch of data that node m selected at the t-th iteration.
Further, the step of obtaining an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule includes: acquiring a preset self-adaptive aggregation rule; dividing all nodes into two disjoint sets M according to the adaptive aggregation rule between nodes communicating with a servertAndusing M when it is detected that the t-th iteration is performedtThe middle node and the new gradient information are inIntermediate reuse of old compression gradient information of nodes reduces iterative communication turn from M to | MtL to determine the target node.
Further, after the step of obtaining the adaptive aggregation rule and determining the target node according to the adaptive aggregation rule, the method further includes:
iteration is performed by combining the self-adaptive aggregation rule and utilizing the following iteration format
Wherein M istAndworking sets with and without communication with the server in the t-th iteration, respectively.
It should be noted that it is important to reduce the number of communication rounds to improve the communication efficiency. High-order information (Newton type method) is used for replacing traditional gradient information, and therefore the number of communication rounds is reduced. A distributed preprocessing acceleration gradient method is proposed to reduce the number of communication rounds. There are also many new aggregation techniques, such as periodic aggregation and adaptive aggregation techniques, developed to skip certain communications. Each node is allowed to perform local model updates independently and the generated model is periodically averaged. The delayed aggregation gradient (LAG) method updates the model at the server side, and the nodes only adaptively upload information with a sufficient amount of information. Unfortunately, while LAG has good performance in a deterministic setting (i.e., full gradient), its performance in a stochastic setting drops significantly. More recent efforts have been made to adapt the aggregation algorithm in a random setting. A communication censoring distributed random gradient descent algorithm (CSGD) increases the batch size to mitigate the effects of random gradient noise. The inert aggregation random gradient algorithm (LASG) designs a group of new self-adaptive communication rules customized for random gradient, and achieves good experimental effect.
In particular implementations, an efficient communication algorithm combining sparse communication with adaptive aggregated random gradients, SASG, is presented herein. Our SASG approach can save both the number of communication bits and the communication round without sacrificing the required convergence properties. Considering that in a distributed learning system, not all server and node communication rounds are equally important, we can adjust the communication frequency between a node and a server according to the importance of the node to transmit information. More specifically, in terms of reducing the number of communication rounds, an adaptive selection rule is established to divide the node set M into two disjoint sets MtAndat the t-th iteration, we will use only MtSelection incNew gradient information of fixed nodecAt the same time reuseThe intermediate nodes compress gradient information, so that the communication turn of each iteration can be reduced from M to | MtL. On the other hand, the quantization method can only reach the maximum compression ratio of 32 times in common single-precision floating-point operation, so that a more effective thinning method is adopted in the algorithm. In particular, we will select the top-k gradient components (in terms of absolute values) at each iteration and set the remaining gradient components to zero, so that zero-valued elements are kept from communication, thereby significantly reducing the number of bits communicated.
In particular implementations, we have noted that the communication rounds between servers and nodes do not all contribute the same in a distributed learning system, so we use an adaptive aggregation approach to develop aggregation rules that can skip inefficient communication rounds. This adaptive aggregation method, derived from the Lazy Aggregation Gradient (LAG) method, envisages an adaptive selection to detect nodes with small gradient changes and reuse the old gradients. In combination with such adaptive aggregation rules, the following iterative format can be obtained
Wherein M istAndworking sets with and without communication with the server in the t-th iteration, respectively.
Step S20: and performing sparse processing on the target information corresponding to the target node.
It should be noted that such research is mainly developed around the idea of quantization and sparsification. The quantization method compresses information by transmitting lower bits instead of the original 32-bit data. The quantized random gradient descent algorithm (QSGD) provides additional flexibility to control the trade-off between communication cost per iteration and convergence speed, with adjustable quantization levels. Ternary gradients are used to reduce the communication data size. It reduces each component of the gradient to its sign bit (one bit). The sparsification method aims at reducing the number of elements transmitted per iteration. These methods can be divided into two broad categories: stochastic sparsification and deterministic sparsification. Random sparsification is the random selection of some components for communication. This method is named random-k, where k denotes the number of selected components. This random selection method is usually an unbiased estimation of the original gradient, making it very friendly to theoretical analysis. Unlike stochastic sparsification, deterministic sparsification retains only the k components with the largest random gradient by considering the magnitude of each component. This method is also called top-k. Compared to the unbiased solution, it is clear that this method should use some error feedback or accumulation procedure to ensure that all gradient information is eventually added to the model, despite some delay.
In a specific implementation, after the adaptive selection process, the selected nodes send sparse information derived by the top-k operator to the parameter server.
Further, the step of performing sparse processing on the target information corresponding to the target node includes: and selecting top-k gradient components for target information corresponding to the target node during iteration, and setting the rest gradient components to zero, so that zero elements are free from communication.
Further, after the step of selecting a top-k gradient component and setting the rest gradient components to zero so that zero elements are free from communication for the target information corresponding to the target node during iteration, the method further includes: an error feedback technology is used, and errors generated by sparsification are brought into the next step to ensure convergence; defining auxiliary sequencesWhereinIs the error at the t-th iteration on the m-node.
Step S30: and calculating a convergence result according to a preset sequence and the Lyapunov function.
It should be noted that, in the embodiment, a biased top-k sparsification operator is applied, and a convergence analysis is more complicated due to a compression error introduced. We define an auxiliary sequence vt}t=0,1,...The sequence can be viewed as { ωt}t=0,1,...Is approximated. We get the convergence result of the SASG algorithm by analyzing the sequence, and the convergence rate matches the original SGD.
Further, the step of calculating the convergence result according to the preset sequence and the lyapunov function includes: note the bookAnd the learning rate is selected as
Wherein c isγ>0 is a constant, giving:
thereby converging the calculation result.
It will be appreciated that our algorithm guarantees convergence and achieves a sub-linear convergence rate despite skipping many communication rounds and performing communication compression. In other words, the SASG algorithm uses well-designed adaptive aggregation rules and sparse communication techniques, and still achieves the same order of convergence speed as the SGD method.
Step S40: the deep neural network model is trained to obtain a learning method.
Further, the step of training the deep neural network model to obtain a learning method includes:
the following iterative format is used for training,
wherein the content of the first and second substances,
in a specific implementation, the SASG algorithm is benchmark tested using an inert aggregated random gradient (LASG) method, a sparsification method, and a distributed SGD. Experience has shown that up to 99% of the gradient information is not necessary in each iteration, so we use the top-1% sparsification operator in the SASG algorithm and sparsification method. In all experiments, the training data was distributed among 10 nodes, each node using 10 samples for one training iteration. We completed the evaluation under the following three settings, and each experiment was repeated five times. MNIST data set contains 70,000 handwritten digits in 10 categories, with 60,000 examples in the training set and 10,000 examples in the test set. We consider a two-layer fully-connected (FC) neural network model, the second layer having 512 neurons for class 10 classification on MNIST. For all algorithms, we choose the learning rate γ to be 0.005. For the adaptive aggregation algorithms SASG and LASG, we set D-10, α D-1/2 γ, D-1, 2. CIFAR-10[39 ]]The data set consisted of 60,000 color images in 10 categories, each with 6,000 images. We tested the ResNet18 model using all of the algorithms described above on the CIFAR-10 dataset. The experiment performed common data enhancement techniques such as random cropping, random flipping, and normalization. The basic learning rate was set to γ of 0.01, and the learning rate was attenuated to 0.001 at the 20 th batch. For SASG and LASG, we set D to 10, α D to 1/γ, D to 1,2,...,10. CIFAR-100 data set contains 60,000 color images in 100 categories, 600 images in each category. Each category has 500 training images and 100 test images. We tested the VGG16 model [41 ] on the CIFAR-100 dataset]. This experiment performed similar data enhancement techniques. The basic learning rate was set to γ of 0.01, and the learning rate was attenuated to 0.001 at the 30 th batch. For SASG and LASG, we set D10, α D4/D/γ 21,2, 10. Our experimental results were based on PyTorch implementation of all methods run on a Ubuntu 20.04 machine equipped with an Nvidia RTX-2080Ti GPU.
Immediately, the number of communication bits required by different algorithms to reach the same baseline can be obtained by calculating the number of parameters of different models. The last column of fig. 3 shows that the SASG algorithm combined with adaptive aggregation technique and sparse communication significantly reduces the number of communication bits required by the model to achieve the same performance, far better than the LASG and sparse algorithms.
The embodiment obtains a self-adaptive aggregation rule and determines a target node according to the self-adaptive aggregation rule; performing sparse processing on target information corresponding to the target node; calculating a convergence result by combining a preset sequence with a Lyapunov function; training the deep neural network model to obtain a learning method, adaptively skipping some communication turns through an adaptive selection rule, and further reducing the number of communication bits by thinning transmission information. For the bias of the top-k sparse operator, an error feedback format is used in the algorithm, and the technical effect of fully utilizing the computing power of the distributed cluster is achieved.
Furthermore, an embodiment of the present invention further provides a medium, where the medium stores an adaptive aggregation sparse communication based learning program, and the adaptive aggregation sparse communication based learning program, when executed by a processor, implements the steps of the adaptive aggregation sparse communication based learning method as described above.
Other embodiments or specific implementation manners of the learning device based on adaptive aggregation sparse communication according to the present invention may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., a rom/ram, a magnetic disk, an optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A learning method based on adaptive aggregation sparse communication is characterized by comprising the following steps:
obtaining a self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule;
performing sparse processing on target information corresponding to the target node;
calculating a convergence result by combining a preset sequence with a Lyapunov function;
the deep neural network model is trained to obtain a learning method.
2. The method of claim 1, wherein the step of obtaining an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule comprises:
acquiring a preset self-adaptive aggregation rule;
dividing all nodes into two disjoint sets M according to the adaptive aggregation rule between nodes communicating with a servertAnd
3. The method of claim 1, wherein the step of sparsifying target information corresponding to the target node comprises:
and selecting top-k gradient components for target information corresponding to the target node during iteration, and setting the rest gradient components to zero, so that zero elements are free from communication.
4. The method of claim 3, wherein the step of the target information corresponding to the target node iteratively selecting top-k gradient components and setting the remaining gradient components to zero such that zero elements are communication-free further comprises:
an error feedback technology is used, and errors generated by sparsification are brought into the next step to ensure convergence;
7. the method of any of claims 1 to 6, wherein after the step of obtaining an adaptive aggregation rule and determining a target node according to the adaptive aggregation rule, further comprising:
iteration is performed by combining the self-adaptive aggregation rule and utilizing the following iteration format
8. An apparatus for learning based on adaptive aggregated sparse communication, the apparatus comprising:
the node determining module is used for acquiring a self-adaptive aggregation rule and determining a target node according to the self-adaptive aggregation rule;
the sparse processing module is used for carrying out sparse processing on the target information corresponding to the target node;
the result acquisition module is used for calculating a convergence result according to a preset sequence and the Lyapunov function;
and the model training module is used for training the deep neural network model to obtain a learning method.
9. An adaptive aggregated sparse communication based learning device, the device comprising: a memory, a processor and an adaptive aggregated sparse communication based learning program stored on the memory and executable on the processor, the adaptive aggregated sparse communication based learning program being configured to implement the steps of the adaptive aggregated sparse communication based learning method of any one of claims 1 to 7.
10. A medium having stored thereon an adaptive aggregated sparse communication based learning program, which when executed by a processor implements the steps of the adaptive aggregated sparse communication based learning method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111470644.3A CN114118381B (en) | 2021-12-03 | 2021-12-03 | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111470644.3A CN114118381B (en) | 2021-12-03 | 2021-12-03 | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114118381A true CN114118381A (en) | 2022-03-01 |
CN114118381B CN114118381B (en) | 2024-02-02 |
Family
ID=80366670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111470644.3A Active CN114118381B (en) | 2021-12-03 | 2021-12-03 | Learning method, device, equipment and medium based on self-adaptive aggregation sparse communication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114118381B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116341628A (en) * | 2023-02-24 | 2023-06-27 | 北京大学长沙计算与数字经济研究院 | Gradient sparsification method, system, equipment and storage medium for distributed training |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311539A1 (en) * | 2019-03-28 | 2020-10-01 | International Business Machines Corporation | Cloud computing data compression for allreduce in deep learning |
CN111784002A (en) * | 2020-09-07 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Distributed data processing method, device, computer equipment and storage medium |
CN112424797A (en) * | 2018-05-17 | 2021-02-26 | 弗劳恩霍夫应用研究促进协会 | Concept for the transmission of distributed learning of neural networks and/or parametric updates thereof |
CN112766502A (en) * | 2021-02-27 | 2021-05-07 | 上海商汤智能科技有限公司 | Neural network training method and device based on distributed communication and storage medium |
CN113159287A (en) * | 2021-04-16 | 2021-07-23 | 中山大学 | Distributed deep learning method based on gradient sparsity |
CN113315604A (en) * | 2021-05-25 | 2021-08-27 | 电子科技大学 | Adaptive gradient quantization method for federated learning |
CN113467949A (en) * | 2021-07-07 | 2021-10-01 | 河海大学 | Gradient compression method for distributed DNN training in edge computing environment |
-
2021
- 2021-12-03 CN CN202111470644.3A patent/CN114118381B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112424797A (en) * | 2018-05-17 | 2021-02-26 | 弗劳恩霍夫应用研究促进协会 | Concept for the transmission of distributed learning of neural networks and/or parametric updates thereof |
US20200311539A1 (en) * | 2019-03-28 | 2020-10-01 | International Business Machines Corporation | Cloud computing data compression for allreduce in deep learning |
CN111784002A (en) * | 2020-09-07 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Distributed data processing method, device, computer equipment and storage medium |
CN112766502A (en) * | 2021-02-27 | 2021-05-07 | 上海商汤智能科技有限公司 | Neural network training method and device based on distributed communication and storage medium |
CN113159287A (en) * | 2021-04-16 | 2021-07-23 | 中山大学 | Distributed deep learning method based on gradient sparsity |
CN113315604A (en) * | 2021-05-25 | 2021-08-27 | 电子科技大学 | Adaptive gradient quantization method for federated learning |
CN113467949A (en) * | 2021-07-07 | 2021-10-01 | 河海大学 | Gradient compression method for distributed DNN training in edge computing environment |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116341628A (en) * | 2023-02-24 | 2023-06-27 | 北京大学长沙计算与数字经济研究院 | Gradient sparsification method, system, equipment and storage medium for distributed training |
CN116341628B (en) * | 2023-02-24 | 2024-02-13 | 北京大学长沙计算与数字经济研究院 | Gradient sparsification method, system, equipment and storage medium for distributed training |
Also Published As
Publication number | Publication date |
---|---|
CN114118381B (en) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lei et al. | GCN-GAN: A non-linear temporal link prediction model for weighted dynamic networks | |
Joseph et al. | Impact of regularization on spectral clustering | |
WO2017066509A1 (en) | Systems and methods of distributed optimization | |
CN112529071B (en) | Text classification method, system, computer equipment and storage medium | |
CN109032630B (en) | Method for updating global parameters in parameter server | |
CN111209930B (en) | Method and device for generating trust policy and electronic equipment | |
CN114461386A (en) | Task allocation method and task allocation device | |
CN114461929A (en) | Recommendation method based on collaborative relationship graph and related device | |
CN114118381A (en) | Learning method, device, equipment and medium based on adaptive aggregation sparse communication | |
US20210326757A1 (en) | Federated Learning with Only Positive Labels | |
Yu et al. | Heterogeneous federated learning using dynamic model pruning and adaptive gradient | |
CN114358216A (en) | Quantum clustering method based on machine learning framework and related device | |
TW202001611A (en) | Reliability evaluating method for multi-state flow network and system thereof | |
US20210232895A1 (en) | Flexible Parameter Sharing for Multi-Task Learning | |
CN111563598A (en) | Method and system for predicting quantum computation simulation time | |
CN113034343B (en) | Parameter-adaptive hyperspectral image classification GPU parallel method | |
CN115220833A (en) | Method for optimizing neural network model and method for providing graphic user interface | |
CN113760407A (en) | Information processing method, device, equipment and storage medium | |
CN114819163A (en) | Quantum generation countermeasure network training method, device, medium, and electronic device | |
Chen et al. | Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices | |
CN114580649A (en) | Method and device for eliminating quantum Pagli noise, electronic equipment and medium | |
CN113313253A (en) | Neural network compression method, data processing device and computer equipment | |
Shokrzade et al. | ELM-NET, a closer to practice approach for classifying the big data using multiple independent ELMs | |
CN113159297A (en) | Neural network compression method and device, computer equipment and storage medium | |
Campobello et al. | LBGS: a smart approach for very large data sets vector quantization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |