CN117113274A

CN117113274A - Heterogeneous network data-free fusion method and system based on federal distillation

Info

Publication number: CN117113274A
Application number: CN202311092271.XA
Authority: CN
Inventors: 段昕汝; 陈桂茸; 陈爱网; 陈晨; 姬伟峰; 闫家栋
Original assignee: Air Force Engineering University of PLA
Current assignee: Air Force Engineering University of PLA
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-11-24

Abstract

The invention discloses a heterogeneous network data-free fusion method and system based on federal distillation, which belong to the technical field of information processing, and train a CGAN model through a federal learning method and are used for optimizing local data to generate a training set with independent same distribution characteristics, so that the training efficiency and accuracy of the model are improved; generating a transfer set for distillation by using a CGAN network instead of migrating a small amount of samples from source data, meeting the confidentiality requirement of the data, and migrating local model knowledge in a data-free manner; the local model is aggregated by utilizing the federal distillation method, so that isomorphic requirements of the local model and the global model in the traditional federal learning algorithm are weakened, an edge server holding data can design the local model according to the pertinence of a local data structure, and information loss possibly caused by standardized data preprocessing is relieved.

Description

Heterogeneous network data-free fusion method and system based on federal distillation

Technical Field

The invention relates to the technical field of information processing, in particular to a heterogeneous network data-free fusion method and system based on federal distillation.

Background

With the wide application of modern information technologies represented by cloud computing, big data, internet of things, unmanned technology and the like in the military field, equipment systems are rapidly developed, combat equipment is continuously updated, the acceleration and transformation of war forms and combat technologies are further promoted, and the war in the future presents an informatization, intelligent and synergistic development trend. The combined combat is a necessary requirement for winning modern and future warfare, in the combined combat system, the data is used as strategic resource for supporting efficient command decision, plays an important bottom layer supporting role, and the data proper management and efficient utilization become important power for promoting the overall transition of combat capability and the advanced change of combat style. The method realizes the data security interconnection among different combat systems, plays a supporting role of data resources in command decision further, realizes an intelligent data fusion system for high-speed calculation, storage and retrieval, builds an intelligent model driven by big data, and has important significance for accelerating the construction of an intelligent complex network information system and assisting the intelligent development of military.

Due to the stage and independence of early system construction and the pertinence of strategic purposes, the isolation degree between different systems is high, and the data island becomes a key elbow for military data construction. Meanwhile, due to the special strategic status of the military data, the application of the big data in the military field is like a double-edged sword, and the risk of information leakage hidden in the informatization process is fully considered while the modernization of national defense and army is accelerated

Under the background of rapid development of intelligent technology, in order to better utilize various existing data resources and break data barriers, a safe and reliable method is needed to comprehensively integrate the existing information resources, and various risks caused by information leakage are avoided. Traditional machine learning techniques have a centralized nature and can present serious data security problems. Under the condition that the federal learning ensures that the data is not local, the edge server trains and generates a local model by utilizing the local data, and the local model is used for global sharing, so that communication overhead and privacy risks caused by a large amount of data transmission are avoided, and a new method is provided for integrating data resources.

However, in the prior art, due to the limitation of the aggregation algorithm, all the device models participating in federal training are required to be completely isomorphic, and the problem of data isomerism is not fully considered, but in actual situations, isomerism is often ubiquitous, particularly for the data of local devices, the data collected by each party has large variability, and two isomerism types, namely, distribution isomerism and structure isomerism, often exist, which may cause slow convergence or global model drift problems. Secondly, in the traditional federal learning algorithm, such as Fedavg, the same model structure and training strategy are generally used, however, for different types of sample data, the extraction structure often has large difference, and the unified model training is not beneficial to improving the precision. Meanwhile, the traditional federal learning technology exchanges model gradients in each round of training, so that a large amount of communication overhead is generated, and the model convergence rate and accuracy are affected due to the fact that local data distribution training can have larger deviation.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a heterogeneous network data-free fusion method and system based on federal distillation, which are used for fusing local private data information of a plurality of scattered nodes through an intelligent technology, designing a network model in a targeted manner, simultaneously jointly training a neural network model under the condition of not migrating source data, effectively improving the data resource utilization rate of a federal learning system and providing technical support for constructing an intelligent information system.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the heterogeneous network data-free fusion method based on federal distillation is characterized by comprising the following steps of,

s1: establishing a centralized federation architecture, deploying federation learning schemes on corresponding server nodes in a combined combat system, and coordinating all edge server nodes through the central server nodes;

s2: the central server initializes a global neural network model and a CGAN model;

s3: training a CGAN network model by using a federal learning method;

s4: the edge server utilizes CGAN to optimize data distribution to enable the data distribution to meet the independent same distribution characteristic, builds a local neural network model, trains independently to obtain a local parameterized model, and sends the local parameterized model to the central server;

s5: the central server generates a transfer set by using the sample labels and the CGAN network, takes the local parameterized models trained by the plurality of edge servers as teacher models, and trains a global neural network model by generating the transfer set.

Further, the specific operation of step S1 includes the steps of,

s101: determining a training task and broadcasting by a central server;

s102: the edge server evaluates the training task based on the local data, determines whether to participate in the training task and sends a response to the central server;

s103: the central server selects edge servers participating in the training task from the response set and deploys the federal learning scheme.

Further, the specific operation of step S3 includes the steps of,

s301: the central server sends the CGAN parameterized model to an edge server participating in model training;

s302: the edge server trains the CGAN network based on the local private data;

s303: the edge server calculates the model gradient and transmits the gradient encryption to the central server;

s304: the central server receives the gradient parameters and calculates the global loss of the model update by a weighted average methodAnd updating the CGAN network;

s305: the central server sends the updated global CGAN model to all edge servers for the next iteration until the model converges;

s306: and the central server sends the trained CGAN parameterized model and the global model to the edge server.

Further, the specific operation of training the CGAN network in step S302 includes

Wherein,CGAN parameterized model for edge servers, θ ^G Generator, θ ^D As a discriminator, α is a learning rate.

Further, in step S304, a global loss is calculated for the model update by a weighted average methodThe specific operation of (a) is as follows:

wherein,to generate the model loss on edge server k.

Further, the specific operation of step S4 includes the steps of,

s401: correcting local data distribution by each edge server through the CGAN, and generating a training set so that the training set and other edge server node training sets meet IID characteristics;

s402: each edge server pertinently designs a neural network model according to the training set sample structure to finish initialization;

s403: each edge server independently trains a local parameterized model until the model converges;

s404: each edge server sends the local parameterized model to the central server.

Further, the specific operation of step S5 includes the steps of,

s501: the central server generates a transfer set by using the sample tag and the CGAN;

s502: the central server takes the local models trained by the plurality of edge servers as a teacher model, and trains a global neural network model by generating a transfer set;

wherein,knowledge distillation loss for global model; />Predicting loss of real tags for student model, < ->For teacher model and student modelP, minimizing the difference between logarithmic outputs _global Soft decision vectors for teacher and student models.

Furthermore, the information data-free fusion system for implementing the heterogeneous network data-free fusion method based on federal distillation is characterized by comprising a central server and a plurality of edge servers, wherein a federal learning scheme is deployed on the server nodes, and the edge server nodes are coordinated through the central server nodes.

The beneficial effects of the invention are as follows:

1. the invention provides a heterogeneous network data-free fusion method based on federal distillation, which aims to improve the problems of system safety, data isomerism and target diversity existing in the traditional federal learning algorithm and is improved on the basis of the traditional federal optimization algorithm, knowledge distillation is introduced to solve the problem of data isomerism commonly existing in each unit, and an aggregation knowledge refinement server model is used for replacing directly aggregated model parameters, so that the safety of the federal learning system is enhanced, and the safety risk protection proxy data is reduced by keeping the unaware of a local model structure by an aggregation server; and the CGAN network is utilized to integrate local information and data into distributed knowledge to regulate global model training, so that the knowledge distillation independent of any external data model is realized. The effectiveness of the method is verified through 4 groups of experimental data sets, and experimental results show that compared with other three federal learning algorithms, the method can achieve a better effect by using fewer aggregation rounds, is superior to the existing federal learning algorithm in convergence speed and model accuracy, and can effectively reduce communication between an edge server and a central server.

2. The federal data-free fusion method decouples the training of the local model from the global so as to adjust the training algorithm and the network model structure according to the local target and allow a plurality of data sources to train the local model in a targeted manner; extracting knowledge by using a teacher-global model architecture instead of directly performing an entry weighted average on local model parameters, allowing to keep a certain unknowability on the local training algorithm and model structure; the CGAN is utilized to realize data enhancement, the model training efficiency and the convergence speed are improved, the communication overhead is reduced, a method without data fusion is used, the generated data is used for replacing local small-batch samples as a transfer set, and the safety of a local data source is ensured.

Drawings

FIG. 1 is a flow chart of a fusion method in the invention;

FIG. 2 is a schematic view of the overall framework of the present invention;

fig. 3 is a schematic diagram of the CGAN architecture according to the present invention;

FIG. 4 is a schematic diagram of CGAN data enhancement according to the present invention;

FIG. 5 is a schematic diagram of federal data free distillation polymerization in accordance with the present invention;

FIG. 6 is a graph showing the comparison of the accuracy of each algorithm model on different data sets in the simulation experiment of the present invention;

FIG. 7 is a graph showing the results of model loss comparisons for various algorithms on different data sets in a simulation experiment according to the present invention.

Detailed Description

In order to enable those skilled in the art to better understand the technical solution of the present invention, the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Embodiment one:

as shown in fig. 1 and 2, the heterogeneous network data-free fusion method based on federal distillation comprises the following steps,

s1: establishing a centralized federation architecture, deploying federation learning schemes on corresponding server nodes in a combined combat system, and coordinating server nodes with data through the central server node, namely edge server nodes, to jointly train a CGAN model;

the architecture diagram of the condition generation countermeasure network (CGAN) is shown in fig. 3, and comprises a generator G and a discriminator D, wherein the generator is limited by a class label y in the training process, and the generator generates a sample (x ^* Y), the discriminator learns to distinguish the true sample (x, y) from the generated sample (x) ^* |y，y)。

The specific operation of step S1 includes the following steps,

s101: determining a training task and broadcasting by a central server;

s103: the central server selects an edge server participating in a training task from the response set and deploys a federal learning scheme;

further, step S2: the central server initializes a global neural network model and a CGAN model;

the neural network model in the invention adopts a Convolutional Neural Network (CNN): CNN is a deep neural network with convolution structure, which adopts a mode of local connection and weight sharing to reduce the number of weights, reduce the complexity of the model and simultaneously alleviate the problem of model overfitting. Convolutional neural networks are often used for image processing, images are directly used as network input, feature extraction is achieved through a convolutional layer and a pooling layer in an implicit layer, a loss function is minimized through a gradient descent method, weights in the network are reversely adjusted layer by layer, and the accuracy of the network is improved through multiple rounds of iterative training.

The hidden layers of the convolutional neural network generally comprise a low hidden layer and a high hidden layer, wherein the low hidden layer is formed by alternately forming a convolutional layer and a pooling layer, the high hidden layer is a fully connected layer, the hidden layer and the logistic regression classifier correspond to the hidden layer and the logistic regression classifier of the traditional multi-layer perceptron, and the input of the first fully connected layer is a characteristic image obtained after characteristic extraction by the convolutional layer and the pooling layer. The output layer is a classifier, typically using logistic regression or Softmax regression, etc., for classifying the input image. Wherein the convolution layer performs convolution operation with the input image using a trainable filter f (x) and superimposes the bias b _x Can enhance certain characteristics in the original signal and reduce noise at the same time to obtain a convolution layer C _x . The pooling layer typically takes the form of downsampling, which reduces the data space by a variety of forms of pooling functions, such as maximum pooling (Maxpooling), while ensuring feature invariance while preventing overfitting. The full-connection layer adopts softmax full-connection to lift the low hidden layerAnd taking the acquired picture characteristics as an activation value to carry out subsequent operation.

Further, step S3: training a CGAN network model by using a federal learning method;

specifically, S301: the central server sends the CGAN parameterized model to an edge server participating in model training;

s302: the edge server trains the CGAN network based on the local private data;

global lossThe calculation method of (1) is as follows:

wherein,to generate the model loss on edge server k.

Further, step S4: the edge server utilizes CGAN to optimize data distribution to enable the data distribution to meet the independent same distribution characteristic, builds a local neural network model, trains independently to obtain a local parameterized model, and sends the local parameterized model to the central server;

specifically, S401: correcting local data distribution by each edge server through the CGAN, and generating a training set so that the training set and other edge server node training sets meet IID characteristics;

Further, step S5: the central server generates a transfer set by using the sample labels and the CGAN network, takes the local parameterized models trained by the plurality of edge servers as teacher models, and trains a global neural network model by generating the transfer set.

Specifically, S501: the central server generates a transfer set by using the sample tag and the CGAN, so as to realize data enhancement;

the local device utilizes the generator aggregation knowledge to identify the target labels which are lack in the local data sample, and generates high-quality samples similar to the global data distribution based on the target labels so as to realize sample enhancement until IID characteristics are met, so that the global data distribution P _joint With local data distribution P _k Satisfy P _k ＝P _joint As shown in fig. 4.

S502: the central server takes the local models trained by a plurality of edge servers as teacher models, and trains a global neural network model by generating a transfer set, as shown in figure 5;

wherein,knowledge distillation loss for global model; />Predicting loss of real tags for student model, < ->For minimizing the difference between logarithmic outputs of teacher model and student model, p _global Soft decision vectors for teacher and student models; />And p _global Measured using the Kullback-Leibler divergence,

simulation experiment:

in the simulation experiment, the source data in the edge server is considered to have higher confidentiality, so that a federal learning method is adopted for CGAN network training to protect data safety. And secondly, an edge server holding the data has certain computing capacity, and can train a CGAN parameterized model by using local private data and carry out encryption computation on model parameters. In each round of training, the edge server trains based on the global model generated by the previous round of training and counts updates.

In the simulation experiment, different data sets are selected to verify the effectiveness of the method, including MNIST, EMNIST, CELEBA data sets and true FOQA data sets widely used for machine learning task research and evaluation. The MNIST data set comprises 70000 gray image samples, the dimension of each sample is 28x28 pixels, and the sample label corresponds to 10 types; the EMNIST data set is a data set expanded based on the MNIST data set and comprises 6 types of samples such as capital letters, lowercase letters, numbers, symbols and the like; the CELEBA dataset contained 202599 picture samples of 10177 celebrities and were all feature tagged, with 40 different attribute tags attached to each image. The FOQA dataset is a real dataset of NASA research team open source, comprising 99837 sample data of different airlines corresponding to class 4 tags, each data sample in the dataset being 160 x 20 dimensions.

The experiment divides the MNIST data set into 20 independent and uniformly distributed subsets which are used as local private data of the edge server, wherein each subset comprises a local training set, a local verification set and a local test set. Furthermore, there are no identical samples between all subsets.

The experiment is developed based on a centralized system architecture, and comprises 1 aggregation server and 20 edge training nodes, wherein a training set and a testing set are divided into 20 groups and distributed to different simulated edge training nodes so as to restore the setting of mutual data isolation among different nodes in an actual application scene. 200 iterations are set in the experiment, the same super parameters are used for all edge nodes, the batch size epochs is 32, the learning rate is 0.01, the optimizer is Adam, and the distillation temperature parameter is 10.

In order to reflect the effectiveness and usability of the method, the performance of the FedND algorithm provided by the invention is compared with that of the FedAVg algorithm, the FedProx algorithm and the FedDistill algorithm by combining the experiment with a convolutional neural network, and the experimental result is evaluated by adopting the following indexes:

(1) Accuracy Accuracy: the correctly classified samples account for the proportion of all test samples;

(2) Model Loss: and measuring the degree of difference between the prediction result of the global model and the real label, and recording the change trend of the loss function of the global model.

In order to verify the effectiveness of the algorithm on heterogeneous samples, the invention utilizes various data sets to carry out comparison experiments. For MNIST and EMNIST data sets, the data sets are divided into 20 groups by using a Dirichlet function, the generated sample number of each group of distribution is controlled by adjusting Dirichlet distribution parameters, so that the sample distribution of each data subset is different, the data isomerism setting is met, and the local model is used for training; for CELLBA, pictures belonging to different celebrities are randomly gathered into disjoint groups to increase the isomerism of the data; the different sample data is randomly partitioned for the FOQA data set to represent the heterogeneity of the data subset.

Simulation results of different algorithms on 4 data sets are compared with that of fig. 6, which shows comparison of model training convergence processes under different algorithms, wherein the abscissa represents the training round number and the ordinate represents model accuracy. Experimental results show that when the number of iteration rounds is less than 200, the algorithm can learn faster to enable the global model to converge, and the method is slightly superior to other 3 comparison groups in terms of model accuracy. The results of fig. 6 show that under the same conditions, the method can achieve better training effect with smaller communication round number, and reduce the interaction times required by model convergence, thereby reducing communication overhead and information exposure in practical application.

In addition, fig. 7 shows the loss comparison of model training processes under different algorithms, the abscissa represents the training round number, and the ordinate reflects the change condition of the global model loss function. It can be seen that as the number of iteration rounds increases, the FedND achieves lower losses on the experimental dataset.

To further investigate the effect of data isomerism on model accuracy under this approach, we developed further validations using MNIST and EMNIST datasets. And quantifying the isomerism degree of the data subset by using a Dirichlet distribution function in a numpy library, and controlling the shape of the distribution by setting the value of the super parameter alpha so as to verify the relationship between the isomerism degree of the data distribution and the model precision under the method. The experimental results are shown in table 1 below, wherein a larger value of α indicates a more concentrated probability distribution in the sample, a weaker distribution heterogeneity among the data subsets, a more dispersed probability distribution with a smaller value of α, and a stronger distribution heterogeneity among the data subsets. First, experimental results show that the method proposed herein is slightly higher in model accuracy than other control groups under the same hyper-parametric conditions. Secondly, experimental results reflect the effect of data isomerism on model performance: fedND is robust to different magnitudes of isomerism, and particularly when data distribution is highly heterogeneous, the gain effect of the method on the global model is more remarkable.

TABLE 1 model accuracy on MNIST and EMNIST datasets

Embodiment two:

a second embodiment provides an information and data-free fusion system for implementing the federal distillation-based heterogeneous network data-free fusion method described in the first embodiment, which is characterized by comprising a central server and a plurality of edge servers, wherein a federal learning scheme is deployed on the server nodes, and the edge server nodes are coordinated through the central server nodes.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The heterogeneous network data-free fusion method based on federal distillation is characterized by comprising the following steps of,

s3: training a CGAN network model by using a federal learning method;

2. The heterogeneous network data-free fusion method based on federal distillation according to claim 1, wherein the specific operation of step S1 comprises the steps of,

s101: determining a training task and broadcasting by a central server;

3. The heterogeneous network data-free fusion method based on federal distillation according to claim 2, wherein the specific operation of step S3 comprises the steps of,

s302: the edge server trains the CGAN network based on the local private data;

4. The heterogeneous network data-free fusion method based on federal distillation according to claim 3, wherein the specific operation of training the CGAN network in step S302 comprises

5. The heterogeneous network data-free fusion method based on federal distillation according to claim 4, wherein the global loss is calculated for the model update by means of weighted average in step S304The specific operation of (a) is as follows:

wherein,to generate the model loss on edge server k.

6. The heterogeneous network data-free fusion method based on federal distillation according to claim 3, wherein the specific operation of step S4 comprises the steps of,

7. The heterogeneous network data-free fusion method based on federal distillation according to claim 6, wherein the specific operation of step S5 comprises the steps of,

wherein,knowledge distillation loss for global model; />For students' modelsThe loss of the predicted and actual tags is compared to each other,for minimizing the difference between logarithmic outputs of teacher model and student model, p _global Soft decision vectors for teacher and student models.

8. A data-free information fusion system implementing the federal distillation-based heterogeneous network data-free fusion method of any of claims 1-7, comprising a central server and a plurality of edge servers, server nodes having federal learning schemes deployed thereon, and coordinating the edge server nodes through the central server node.