CN112688809A

CN112688809A - Diffusion adaptive network learning method, system, terminal and storage medium

Info

Publication number: CN112688809A
Application number: CN202011521741.6A
Authority: CN
Inventors: 张萌飞; 靳丹琦; 陈捷; 雷攀
Original assignee: Shenggeng Intelligent Technology Xi'an Research Institute Co ltd
Current assignee: Shenggeng Intelligent Technology Xi'an Research Institute Co ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-04-20
Anticipated expiration: 2040-12-21
Also published as: CN112688809B

Abstract

A diffusion self-adaptive network learning method, system, terminal and storage medium, the method includes a random gradient descent process and a time-averaged variance reduction random gradient descent process, in which each node of a distributed network runs P_kSub-least mean square strategy and collect this P_kInput data received in a secondary run; in the time-averaged reduction variance stochastic gradient descent process, the length of the data pair is P by using the previously collected data_kAveraging the random gradients in the time window of (a) to obtain an estimate of the mean gradient, and using the estimate to update the weight equation for variance reduction in the next m iterations. Meanwhile, a diffusion self-adaptive network learning system, a terminal and a storage medium are provided. The invention overcomes the defect that the traditional variance-reducing random gradient descent algorithm cannot be used in an online learning environment, and is applied to the self-adaptive diffusionIn the network algorithm, the online estimation performance of the distributed diffusion network is improved.

Description

Diffusion adaptive network learning method, system, terminal and storage medium

Technical Field

The invention belongs to the field of adaptive signal processing, and relates to a diffusion adaptive network learning method, a diffusion adaptive network learning system, a diffusion adaptive network learning terminal and a storage medium.

Background

In a multi-node network, due to the dispersion of the physical positions of nodes and the consideration of the limitation of the communication capacity among the nodes and the requirements of safety, robustness and the like, the network cannot transmit data in a large scale and collect all data in a central node for analysis by adopting a centralized strategy, so that the requirement of distributed processing is reflected; in the context of big data, data is often collected in a streaming manner over time, and a system model or parameters need to be re-estimated at each moment.

The advent of adaptive algorithms in distributed networks just meets these needs. In the last decade, a great deal of research is carried out on adaptive algorithms in a distributed network in the field, and the application of the adaptive algorithms is explored. According to the cooperation mode and the information flow mode among the nodes, the cooperation strategy in the distributed network mainly comprises three types: incremental strategies, consensus strategies, and diffusion strategies. The increment strategy forms a Hamiltonian loop in the network and sequentially accesses each node to carry out information interaction. Although the traffic required for the incremental strategy is theoretically small, building a Hamiltonian loop in any network is itself an NP-hard problem. Furthermore, such loops are very sensitive to failure of either the node or the link, and thus the incremental strategy is not well suited for distributed online adaptive signal processing. In the consensus strategy and the diffusion strategy, each node needs to communicate with the neighbor nodes in real time, and the global target parameters in the network are cooperatively estimated by utilizing the information exchange between the node and the neighbor nodes. Because each node needs to acquire the information of all the neighbor nodes at every moment, the two strategies need more communication resources than an increment strategy, but can fully utilize the cooperation of the nodes in the distributed network structure. In addition, the diffusion strategy provides the nodes with continuous adaptation and learning capabilities, and is a key research strategy in distributed adaptive signal processing because the scalability has great advantages and has proved to have better stability and dynamic range than the consensus strategy. The distributed adaptive diffusion least mean square algorithm is a random gradient descent algorithm, and the gradient noise of the random gradient greatly hinders the rapid convergence of the algorithm. Therefore, the research on how to reduce the influence of the gradient noise is of great significance to the improvement of the diffusion-based distributed online learning algorithm. Among the many alternatives, it is straightforward to apply a variance reducing random gradient algorithm to a distributed adaptive network. The variance reduction stochastic gradient algorithm is designed to minimize the loss function defined on all data samples of the batch. Typical algorithms include the random variance reduction gradient (SVRG) algorithm and the SAGA algorithm. The SVRG algorithm takes two cycles: the true gradient is calculated in the outer loop, and the variance random gradient is reduced and the inner loop is calculated. Whereas SAGA algorithms perform only one cycle, but require more memory to estimate true gradients, they are a great improvement in performance over the original stochastic gradient descent algorithms, however, their design is based on samples collected in batches rather than by learning the flow data in the problem online.

Disclosure of Invention

The invention aims to provide a diffusion adaptive network learning method, a diffusion adaptive network learning system, a terminal and a storage medium aiming at the problem that the variance reducing random gradient descent algorithm in the prior art cannot be used in an online learning environment, and the variance reducing random gradient descent algorithm can be applied to the online learning of the diffusion adaptive network to improve the online estimation performance of the distributed diffusion network.

In order to achieve the purpose, the invention has the following technical scheme:

a diffusion self-adaptive network learning method comprises a random gradient descent process and a time-averaged variance reduction random gradient descent process, wherein in the random gradient descent process, each node of a distributed network runs P_kThe sub-least mean square strategy and collects this P_kInput received in secondary operationData; using previously collected data during the time-averaged decreasing variance stochastic gradient descent for a length of P_kAveraging the random gradients in the time window to obtain an estimated value of an average gradient, and updating a weight equation for reducing the variance by using the estimated value in the next m-time iterative computations; at the very beginning of P_kAt each moment, executing a random gradient descent process, and when the iteration number is more than P_kAnd then, calculating the average gradient under the window function, and further realizing the reduction of the variance of the random gradient, thereby accelerating the convergence speed of the self-adaptive algorithm of the whole diffusion network.

As a preferred scheme of the diffusion adaptive network learning method of the invention:

executing a random gradient descent strategy through a distributed network, and obtaining an estimation result w of a diffusion strategy by a node k at the moment i_k,iAnd collecting input signal stream data x at time i_k,iRepeating this strategy until i is greater than the length P of the window function_kAnd then stop.

the network of the random gradient descent strategy has a global cost function of the form

Wherein

N represents the total number of nodes in the network, and the symbol E (-) represents the data x_k,iThe distribution of (a) is desired.

the first and second convergence step sizes of the node k of the random gradient descent strategy satisfy

Wherein, delta_kRepresenting a cost function J_kThe gradient vector of (a) satisfies δ_k-Lipschitz continuous stripAnd (3) a component.

from i > P_kInitially, the distributed network implements a reduced variance stochastic gradient descent strategy, first w_k,i-1Is assigned to the inner loop variable

Namely, it is

The average gradient is then estimated using a window function

Wherein

Representing a cost function J_kAt input as signal x_k,iWhen, to w_k,i-1Of the gradient of (c).

the number m of the inner loop and the length P of the window function_kThere is a set relationship:

as a preferred scheme of the diffusion adaptive network learning method of the invention: in the next m times of iterative calculation, the obtained inner loop variable is utilized

And average gradient

Calculating a reduced variance random gradient

Then the node k obtains an estimation result w of the diffusion strategy at the moment i_k,i(ii) a Within m times of executionAfter loop iteration, the inner loop variables are updated again

Until the algorithm converges;

in the random gradient of decreasing variance, the first convergence step of node k is satisfied

Second order convergence step size satisfy

Wherein v is_kRepresenting a cost function

Is v_kStrongly convex.

The invention also provides a diffusion self-adaptive network learning system, which comprises:

a random gradient descent execution module for operating each node of the distributed network by P_kThe sub-least mean square strategy and collects this P_kInput data received in a secondary run;

a time-averaged variance-reducing stochastic gradient descent execution module for executing data with length P collected by the stochastic gradient descent execution module_kAveraging the random gradients in the time window to obtain an estimated value of an average gradient, and updating a weight equation for reducing the variance by using the estimated value in the next m-time iterative computations;

a timing control module for controlling the timing of the first P_kAt each moment, controlling to execute a random gradient descent process, and when the iteration number is more than P_kAnd controlling to calculate the average gradient under the window function, thereby reducing the variance of the random gradient.

The invention also provides a terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the diffusion adaptive network learning method when executing the computer program.

The present invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the diffusion adaptive network learning method.

Compared with the prior art, the invention has the following beneficial effects: the diffusion self-adaptive network learning method is a method for accelerating random gradient convergence in a streaming data processing environment, overcomes the defect that the traditional variance-reducing random gradient descent algorithm cannot be used in an online learning environment, and is applied to a self-adaptive diffusion network algorithm, so that the online estimation performance of a distributed diffusion network is improved. The invention effectively reduces the gradient noise in the online estimation of the distributed diffusion network, thereby accelerating the convergence speed of the algorithm and improving the performance of the algorithm. The method has certain expansibility, is not limited to a diffusion strategy, and can also be applied to other distributed strategies, such as an increment strategy, a consistency strategy and the like.

Drawings

FIG. 1 is a schematic diagram of an implementation of the diffusion adaptive network learning method of the present invention;

FIG. 2 is a flow chart of the design of the diffusion adaptive network learning method of the present invention;

fig. 3 shows the inventive system with a loss function model J for L50, N16 and N16 network nodes_k(w；x_k,i)＝(d_k,i-w^Tx_k,i)²The reduced variance diffusion strategy of (1).

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The signal model and the related quantities of the problem studied by the invention are introduced as follows:

consider a distributed network of N nodes. At each node k, an unknown parameter vector of length lx 1 needs to be estimated

An input vector of length Lx 1 can be observed at node kx_k,i。

The invention provides a diffusion self-adaptive network learning method, which comprises the following steps:

s1: executing a random gradient descent strategy by a distributed network, and obtaining an estimation result w of the diffusion strategy by a node k at the moment i_k,iAnd collecting input signal stream data x at time i_k,iRepeating this strategy until i is greater than the length P of the window function_kStopping the operation;

in the random gradient descent strategy, the global cost function in the network is in the form of

Wherein

N represents the total number of nodes in the network, and the symbol E (-) represents the data x_k,iThe distribution of (a) is desired;

in the random gradient descent strategy, the first-order and second-order convergence step length of the node k meet

Wherein delta_kRepresenting a cost function J_kThe gradient vector of (a) satisfies δ_kLipschitz continuous conditions.

S2: from i > P_kInitially, the distributed network implements a reduced variance stochastic gradient descent strategy.

Firstly, w is_k,i-1Is assigned to the inner loop variable

Namely, it is

The number m of inner loops and the length P of the window function in step S1_kThe sizes are close to each other and are generally set as

S3: estimating average gradient using window function

Wherein

S4: in the next m iterations (i.e., inner loop), the inner loop variables obtained in step S2 are used

Average gradient obtained in step S3

A random gradient of decreasing variance is calculated:

then the node k obtains an estimation result w of the diffusion strategy at the moment i_k,i；

The first-order convergence step length of the descending strategy node k for reducing the random gradient variance satisfies

Second order convergence step size satisfy

Wherein v is_kRepresenting a cost function

Is v_kStrongly convex.

S5, after m times of inner loop iteration, re-executing the step S2, updating the inner loop variable

S6: steps S2-S6 are repeatedly executed until the algorithm converges.

Examples

The experimental setup was as follows: wherein d is_k,iFrom a linear model

To obtain z_k,iIs a variance of

White gaussian noise, embodiments of the present invention assume all node optimization quantities w for convenience₁＝…＝w_N＝w^＊，w^＊Sampling in standard normal distribution, and setting fusion matrix C-I in diffusion strategy₁₆For a non-cooperative policy setting a ═ I₁₆Where matrix I represents the identity matrix and for the cooperation strategy is set as the standard union matrix A, its elements

Representing the number of neighbor nodes of the node k; in control experiments, the step size of the non-cooperative diffusion strategy was set to μ₁＝…＝μ_N0.0012, the step size of the reduced variance diffusion strategy and the least mean square diffusion strategy is set to μ₁＝…＝μ_N＝0.0015。

As shown in fig. 1 and fig. 2, a diffusion adaptive network learning method includes the following steps:

s1: executing a random gradient descent strategy by a distributed network, and obtaining an estimation result w of the diffusion strategy by a node k at the moment i_k,iAnd collecting input signal stream data x at time i_k,iRepeating this strategy until i is greater than the length P of the window function_kIs stopped, wherein x_k,iIs a Gaussian random vector, P in the comparative experiment_kRespectively 50 and 150, step size set to μ₁＝…＝μ_N＝0.0015，Initialization w_k,0Is an arbitrary value; the global cost function in the stochastic gradient descent strategy network is in the form of

Wherein the content of the first and second substances,

n represents the total number of nodes in the network, and the symbol E (-) represents the data x_k,iThe distribution of (a) is desired. The first and second convergence step sizes of the node k of the stochastic gradient descent strategy satisfy

Wherein, delta_kRepresenting a cost function J_kThe gradient vector of (a) satisfies δ_kLipschitz continuous conditions.

S2: from i > P_kInitially, the distributed network implements a reduced variance stochastic gradient descent strategy, first w_k,i-1Is assigned to the inner loop variable

Namely, it is

Setting the number of internal cycles

S3: estimating average gradient using window function

Wherein

Representing a cost function J_kAt input as signal x_k,iWhen, to w_k,i-1A gradient of (a);

s4: in the next m iterations (i.e. inner loop), the obtained inner loop variables are utilized

And the resulting average gradient

Calculating a reduced variance random gradient

Then the node k obtains an estimation result w of the diffusion strategy at the moment i_k,i(ii) a The first-order convergence step length of the descending strategy node k for reducing the random gradient variance satisfies

Second order convergence step size satisfy

Wherein v is_kRepresenting a cost function

Is v_kStrongly convex.

S5, after m times of inner loop iteration, re-executing S2 and updating inner loop variables

S6: the steps S2-S6 are repeatedly executed until the algorithm converges.

As can be seen from FIG. 3, the online learning method of the diffusion adaptive network for reducing the random gradient variance provided by the invention has better performance compared with the standard least mean square diffusion strategy, and the effectiveness of the variance reduction technology is verified. In addition, the larger window P_kThe convergence speed of the algorithm can be increased compared to a smaller window because of the large P_kThe average gradient can be estimated more accurately.

the random gradient descent execution module enables each node of the distributed network to operate a sub-minimum mean square strategy and collects input data received in the operation;

the time average variance reduction random gradient descent execution module is used for averaging the random gradients in a time window with the length of the random gradients to obtain an average gradient estimation value by using the data collected by the random gradient descent execution module, and updating a weight equation for reducing the variance by using the estimation value in the next m-time iterative calculation;

and the time sequence control module is used for controlling and executing a random gradient descending process at the initial moment, and controlling and calculating the average gradient under the window function when the iteration number is greater than the initial iteration number so as to reduce the variance of the random gradient.

The invention further provides a terminal device, which includes a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the steps of the diffusion adaptive network learning method when executing the computer program.

The present invention also proposes a computer readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned diffusion adaptive network learning method according to the present invention.

The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to perform the method of the invention.

The terminal can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment, and can also be a processor and a memory. The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The memory may be used to store computer programs and/or modules, and the processor may implement the various functions of the diffusion adaptive web learning system by running or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory.

The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the technical solution of the present invention, and it should be understood by those skilled in the art that the technical solution can be modified and replaced by a plurality of simple modifications and replacements without departing from the spirit and principle of the present invention, and the modifications and replacements also fall into the protection scope covered by the claims.

Claims

1. A diffusion adaptive network learning method is characterized in that: comprises a random gradient descent process and a time-averaged variance reduction random gradient descent process, wherein each node of the distributed network runs P in the random gradient descent process_kThe sub-least mean square strategy and collects this P_kInput data received in a secondary run; using previously collected data during the time-averaged decreasing variance stochastic gradient descent for a length of P_kAveraging the random gradients in the time window to obtain an estimated value of an average gradient, and updating a weight equation for reducing the variance by using the estimated value in the next m-time iterative computations; at the very beginning of P_kAt each moment, executing a random gradient descent process, and when the iteration number is more than P_kAnd then, calculating the average gradient under the window function, and further realizing the reduction of the variance of the random gradient, thereby accelerating the convergence speed of the self-adaptive algorithm of the whole diffusion network.

2. The diffusion adaptive network learning method of claim 1, wherein:

3. The diffusion adaptive network learning method of claim 2, wherein:

Wherein

4. The diffusion adaptive network learning method of claim 2, wherein:

5. The diffusion adaptive network learning method of claim 2, wherein:

Namely, it is

The average gradient is then estimated using a window function

Wherein

6. The diffusion adaptive network learning method of claim 5, wherein:

7. the diffusion adaptive network learning method of claim 5, wherein: in the next m times of iterative calculation, the obtained inner loop variable is utilized

And average gradient

Calculating a reduced variance random gradient

Then the node k obtains an estimation result w of the diffusion strategy at the moment i_k,i(ii) a After the m internal loop iterations are executed, the internal loop variables are updated again

Until the algorithm converges;

Second order convergence step size satisfy

Wherein v is_kRepresenting a cost function

Is v_kStrongly convex.

8. A diffusion adaptive web learning system, comprising:

9. A terminal device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that: the processor, when executing the computer program, performs the steps of the diffusion adaptive network learning method of any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when being executed by a processor, carries out the steps of the diffusion adaptive network learning method according to any one of claims 1 to 7.