CN115271033B - Medical image processing model construction and processing method based on federal knowledge distillation - Google Patents

Medical image processing model construction and processing method based on federal knowledge distillation Download PDF

Info

Publication number
CN115271033B
CN115271033B CN202210783921.4A CN202210783921A CN115271033B CN 115271033 B CN115271033 B CN 115271033B CN 202210783921 A CN202210783921 A CN 202210783921A CN 115271033 B CN115271033 B CN 115271033B
Authority
CN
China
Prior art keywords
pulse
tensor
training
distillation
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210783921.4A
Other languages
Chinese (zh)
Other versions
CN115271033A (en
Inventor
刘贵松
刘哲通
解修蕊
黄鹂
蒋太翔
杨新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kashgar Electronic Information Industry Technology Research Institute
Southwestern University Of Finance And Economics
Original Assignee
Kashgar Electronic Information Industry Technology Research Institute
Southwestern University Of Finance And Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kashgar Electronic Information Industry Technology Research Institute, Southwestern University Of Finance And Economics filed Critical Kashgar Electronic Information Industry Technology Research Institute
Priority to CN202210783921.4A priority Critical patent/CN115271033B/en
Publication of CN115271033A publication Critical patent/CN115271033A/en
Application granted granted Critical
Publication of CN115271033B publication Critical patent/CN115271033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Abstract

The invention relates to the field of medical image processing, and provides a federal knowledge-based medical image processing model construction method, which comprises the steps of training a child node network by using a private data set, and forward transmitting the trained child node network on a public data set to obtain a first pulse tensor and uploading the first pulse tensor to a central node; after the center node receives the data, carrying out distillation training based on the public data set to obtain a distillation product; the distillation products of all the child nodes are aggregated to obtain global parameters, after the central node network is updated by using the global parameters, the global parameters are transmitted forward on a public data set, and the obtained second pulse tensor is distributed to all the child nodes; the child node receives a second pulse tensor for distillation training on the public data set, synchronously updates the network parameters of the child node, and enters circulation training; and stopping training until the preset number of wheels or the preset value is reached. The invention also provides a processing method for processing the medical image to be processed by using the constructed model.

Description

Medical image processing model construction and processing method based on federal knowledge distillation
Technical Field
The invention belongs to the field of medical image processing, and particularly relates to a medical image processing model construction and a processing method thereof based on federal knowledge distillation.
Background
With the development and improvement of medical imaging technology and deep learning technology, medical image processing based on a deep neural network has become an important technology in medical research and clinical diagnosis. In recent years, federal learning (Federated Learning, FL) has been paid attention to by researchers in the field of medical image processing, and it can realize aggregate learning of scattered medical image data on the premise of privacy security, and fully references various kinds of patient data. In medical research, researchers often need to know specific information of certain internal tissue organs based on imaging of medical technology in order to give decisions about the most accurate treatment plan possible when performing quantitative analysis, real-time monitoring or treatment planning of such tissue organs. Therefore, biomedical images play an extremely important role in treatment, and medical images generated by various patients are gradually accumulated. However, due to the decentralized nature of institutions such as hospitals and the specificity of medical images, the distribution of these medical image data is extremely decentralized and has strict privacy regulations, and it is extremely difficult to use them directly in a central application to training of neural networks. If training is performed in a scattered manner, the problems of insufficient data amount, insufficient labels and the like are faced. Therefore, research on how to train the neural network for medical image processing by using scattered data under the guarantee of privacy security has extremely high value and important significance.
Federal learning (Federated Learning, FL) is a distributed training mode that allows devices to participate in co-training of deep neural network models without exchanging local privacy data with other devices or central nodes. In traditional distributed learning, data for network training is typically required to be transmitted to a central node or cloud data node. However, such a procedure will lead to data leakage, which is difficult to apply in fields with high privacy requirements such as medical image processing. Federal learning is based on communication without participation and encrypted transmission, which provides a solution to the privacy disclosure problem.
The impulse neural network (Spiking Neural Network, SNN) is a new generation neural network model that is distinguished from traditional artificial neural networks. The method achieves the purpose of reducing training power consumption by simulating the potential change and nerve pulse of biological neurons and replacing real number output in the traditional artificial neural network with discrete binary sequences. The leak-discharge model (LIF) is a classical pulsed neuron model, widely cited in SNN studies. Researchers have proposed an explicit, discrete mathematical formula of the LIF model that can be implemented on a computing device, namely:
approximating a gradient function. In a pulsed neural network, since the derivative of the unit step function describing the pulsing at zero tends to infinity, it is determined that it cannot be directly gradient-dropped as in a conventional neural network, and an approximation function needs to be found instead. Some impulse neural network training schemes employ a rectangular function (rectangular function) to serve this role. The rectangular function is mathematically defined as:
wherein a is a rectangular shape parameter, sign is a truth function, V th Is the discharge threshold.
The federal learning framework (Federated Learning on Spiking Neural Network, FLSNN) based on a pulsed neural network is a distributed training mode applied to pulsed neural networks that allows devices to participate in co-training of the SNN model without exchanging local privacy data with other devices or central nodes. How to use scattered privacy data to train the deep neural network with low power consumption is a problem to be solved by the federal learning of the impulse neural network.
Knowledge distillation (Knowledge Distillation, KD) is a network knowledge extraction scheme that can transfer knowledge of a trained larger scale neural network to a smaller scale network, such that small networks exhibit very close effects to large networks. In knowledge distillation, a network that obtains knowledge is called a student network (student network), and a network that transfers knowledge is called a teacher network (teacher network). Knowledge distillation reflects that knowledge of the network is not only present in the parameters, but can also be embodied via the output. Knowledge distillation can be divided into three categories, depending on the knowledge source used for the distillation:
response-based distillation
Feature-based distillation
Relationship-based distillation
Researchers in traditional artificial neural networks have proposed federal distillation (Federated Distillation, FD) framework based on output communications, optimizing the communication consumption of bang learning.
Distillation loss function. In knowledge distillation, a teacher network and a student network communicate information through specific forms of knowledge to assist training. During training, the matching of knowledge is reflected by the loss function, so that the selection and definition of an appropriate loss function plays a decisive role in the effect of distillation. In general, the distillation loss function is defined as follows:
L=L hard +λL soft
where L represents the final loss of the network, L hard The loss of the hard tag is determined by the output value of the student network and the original tag of the training data set; l (L) soft Representing soft label losses, is determined by the knowledge form of the student network and the knowledge of the teacher network, typically using a Cross-Entropy (Cross-Entropy) loss function. λ is a weight parameter that coordinates the specific gravity of the teacher's network to participate in the training. In the design of a scheme for knowledge distillation, defining a soft label loss function is often a core problem.
The computerized tomography (Computed Tomography, CT) is a cross-section scanning technique which uses precisely collimated X-ray beams, gamma rays, ultrasonic waves and the like, and surrounds a certain part of a human body together with a detector with extremely high sensitivity, and has the characteristics of quick scanning time, clear images and the like, and can be used for checking various diseases. The CT uses X-ray beam to scan the layer of a certain thickness of human body, the detector receives X-ray transmitted through the layer, and after converting into visible light, the visible light is converted into electric signal by photoelectric conversion, and then converted into digital by analog/digital converter, and input into computer for processing.
Disclosure of Invention
The invention provides a medical image processing model construction method based on federal knowledge distillation and an image processing method thereof, aiming at solving the problems of high privacy requirement and low model reliability caused by distillation loss in federal distillation and solving the problems of cost and effect matching in the electronic computer tomography medical image recognition technology.
The invention solves the technical problems and adopts the following technical scheme: a medical image processing model construction method based on federal knowledge distillation comprises the following steps:
step 1, collecting training data, constructing a training set, including: based on the common data set required by distillation obtained by preprocessing and sorting the open medical image data; based on the privacy CT image data of each medical institution participating in training, carrying out coordination pretreatment according to a public data set to obtain a private data set of the medical institution; the private data sets are in one-to-one correspondence with the medical institutions participating in training, and are mutually independent;
step 2, respectively constructing pulse neural networks of child nodes and central nodes, wherein the child nodes are in one-to-one correspondence with the private data sets;
training the corresponding sub-node pulse neural network by using each private data set, obtaining a first pulse tensor corresponding to the sub-node based on forward propagation of the public data set by using each sub-node pulse neural network obtained after training, and uploading the first pulse tensor to a central node;
step 4, the center node carries out distillation training based on a public data set according to the received first pulse tensor of each child node to obtain distillation products corresponding to each child node;
step 5, all distillation products are aggregated to obtain global parameters, the central node impulse neural network is updated by the global parameters, and a second impulse tensor is obtained based on forward propagation of a public data set by the updated central node impulse neural network and distributed to all child nodes;
step 6, each child node receives the second pulse tensor, carries out distillation training based on the public data set, and updates the pulse neural network parameters;
step 7, judging whether the number of the preset rounds of cyclic training or the model reaches a preset value, if so, stopping training, wherein the trained pulse neural network of the central node is a medical image processing model based on federal knowledge distillation; otherwise, returning to the step 3.
Further, in the step 3, training the child node impulse neural network corresponding to the private data sets by using the private data sets includes the following steps: the child node pulse neural network calculates the hard tag loss function L in the forward direction based on the corresponding private data set hard And gradient, and training the pulse neural network by using a back propagation algorithmThe number of rounds, update the parameter;
wherein the hard tag loss function is as follows:
wherein L is hard For the hard tag loss function, v represents the frequency vector calculated based on the output pulse,representing the true label vector, the calculation method is as follows:
wherein tar represents the category of the real label, onehot represents the one-time coding, and the coding mode is as follows:
where i represents the element index of the tag vector.
Further, in the step 3, the first pulse tensor is uploaded to the central node after binary compression; in the step 4, after receiving the child node compression tensor, the central node obtains a first pulse tensor after decompression.
Specifically, the binary compressing the first pulse tensor includes the following steps:
step 31: zeroing and initializing tensor sc for storing compression results;
step 32: the following calculation is performed for each element value sc of sc in the time window order:
wherein s is t For pairs in the pulse tensor to be compressedT time window pulse values of the position element; sc is the element value in sc, and the variable is iteratively calculated as t;the element value of sc after calculation;
step 33: after all the calculation is completed, binary compression is completed;
in the step 4, the compressing tensor is decompressed to obtain a first pulse tensor, which includes the following steps:
step 41: zeroing and initializing a decompression tensor for storing a decompression result;
step 42: performing a reduction operation on each element value in the compression tensor to obtain an ordered pulse sequence, and storing the obtained result in a position corresponding to the decompression tensor in an inverted order;
step 43: after all the calculation execution is completed, the decompressed tensor is the first pulse tensor.
Further, in the step 4, the center node performs distillation training based on the public data set according to the received first pulse tensor of each child node to obtain a distillation product corresponding to each child node, and the method includes the following steps:
defining a distillation loss function, and stopping training when training the pulse neural network for a preset number of rounds by using a back propagation algorithm; wherein, distillation loss function is:
L soft =L T +λL F
wherein L is soft L is a distillation loss function T To mean square error like loss of pulses, L F For class cross entropy loss for relaxation, λ represents the relaxation variable, which is a preset parameter;
wherein, the mean square error-like loss for the pulse:
class cross entropy loss for relaxation:
wherein, C is the class number of the training data, T is the size of the impulse neural network time window; s is(s) ct Andthe values of elements in a batch of predicted pulse matrix and a target pulse matrix are respectively; p is p c And->Respectively representing a predicted frequency vector and a target frequency vector, and calculating through a pulse matrix, wherein the calculation mode is as follows:
further, in the step 5, all distillation products are polymerized to obtain global parameters, including:
the central node reserves an aggregation network buffer area and performs aggregation according to the following steps:
step 51, generating a copy of the pulse neural network of the current central node and placing the copy in an aggregation buffer area of the central node; selecting any first pulse tensor to be aggregated, performing distillation training on the copy based on the public data set, and updating to obtain a first network parameter;
step 52, judging whether all the child nodes are aggregated, if yes, taking the first network parameter as a global parameter; otherwise, go to step 53;
step 53, updating the central node pulse neural network copy based on the first network parameter, and copying the first copy in the aggregation buffer area to generate a copy thereof as a first copy;
step 54, selecting any first pulse tensor to be aggregated, performing distillation training on the copy based on the public data set, and updating to obtain a second copy and a second network parameter;
step 55, randomly selecting part of the public data set to have tag data, and generating a temporary test data set;
step 56, based on the temporary test data set, testing the first copy and the second copy respectively to obtain the test accuracy of the first copy and the second copy, generating an aggregation weight based on the test accuracy of the first copy and the second copy, and then calculating and updating the first network parameter by using the aggregation weight;
step 57, return to step 52.
Further, in the step 56, generating an aggregate weight based on the test accuracy of the two, and then calculating and updating the first network parameter by using the aggregate weight includes:
step 561, generating an aggregation weight based on the test accuracy rates a and a':
wherein (alpha, alpha') is an aggregation weight, delta is a preset retention factor, tau is a preset difference factor, and softmax represents a normalized exponential function;
step 562, calculating and updating a first network parameter by using the aggregation weight:
wherein,is a new first network parameter; w is a first network parameter corresponding to the first copy;
w' is a second network parameter corresponding to the second copy.
The invention also provides a medical image processing method based on federal knowledge distillation, a medical image processing model based on federal knowledge distillation is constructed according to the medical image processing model construction method based on federal knowledge distillation, and the medical image to be processed is processed by using the model.
The invention designs a new federal aggregation scheme, wherein a central node in the federal distillation of the impulse neural network needs to integrate information uploaded by sub-nodes so as to properly aggregate characteristics and output, thereby solving the problem of reduced model reliability caused by distillation loss. The distillation loss function is set, and the central node and the child nodes can utilize the output of the pulse neural network of the other party to carry out model training on the central node and the child nodes, so that information contained in the output can be extracted in a targeted mode. The invention can update the neural network parameters in training by utilizing a back propagation algorithm when training the impulse neural network, and improve the classification accuracy and training speed of the impulse neural network model. Meanwhile, when the output of both sides is utilized, the invention carries out lossless compression on the pulse output tensor of the pulse neural network, reduces federal communication overhead, and adjusts the demand proportion between the model accuracy and the communication cost.
Drawings
Fig. 1 is a flow chart of model construction in embodiment 1 of the present invention.
Fig. 2 is a flowchart of training a impulse neural network using a back propagation algorithm in embodiment 1 of the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps. For a better understanding of the present invention, reference is made to the following description of the invention, taken in conjunction with the accompanying drawings and the following examples.
Example 1
The example provides a medical image processing model construction method based on federal knowledge distillation, as shown in fig. 1, comprising the following steps:
s101, training data are collected, and a training set is constructed.
In this example, the training data includes two training sets. The first method is based on preprocessing and sorting open medical image data, including image overturning, cutting, translation, normalization and the like, and determining the specification of the image, and finally obtaining a public data set required by distillation. In another aspect, the private CT image data of each participating medical institution is coordinated and preprocessed according to the public data set to obtain the private data set of the medical institution. The coordination preprocessing, namely, the image data of the medical institution are correspondingly adjusted according to the specifications, the channel number and other formats of the public data set.
The private data sets corresponding to each medical institution participating in training are set, the private data sets are mutually independent, the private data sets are not mutually propagated, and the privacy requirement is guaranteed.
S102, respectively constructing impulse neural networks of child nodes and central nodes, and using a classical deep neural network model VGGNet; the central node reserves an aggregation network buffer area; each medical institution participating in the training corresponds to a child node and to the private data set one-to-one.
According to the federal aggregation scheme, the child nodes are arranged for each medical institution participating in training, so that each medical institution has an independent pulse neural network to perform independent training, wherein the central node in federal distillation of the pulse neural network only needs to integrate information uploaded by the child nodes, does not need to participate in training of private data sets held by the child nodes, and meets the requirement of high privacy.
The construction of the pulse neural network of the child node and the central node comprises the steps of initializing parameters of each layer of the network and setting training super parameters, and the method is as follows:
definition of a pulsed neural network structure: comprises a total layer number, a convolution layer and a full connection layer; defining parameters of each layer of the convolution layer, an activation function used, whether a pooling layer exists or not, and the like; defining parameters of the full connection layer, whether a Dropout layer is used or not, and the like;
defining impulse neural network super parameters: including a network discharge threshold V th A time window length T, etc.;
defining federal learning hyper-parameters including global total round number epoch g Number of partial training rounds epoch l
Training hyper-parameters are defined, including learning rate, batch size, and optimizers, etc.
The solution of the present invention does not depend on a specific impulse neural network model, where the network structure can be embedded in a common impulse neural network model, for example: the network structure can be set by adopting a VGG network in the traditional artificial neural network, and LIF nodes are added to form a pulse neural network; the discharge threshold value is V th Window t=8, =0.1; the global total number of wheels and the local number of wheels are respectively set as epoch g =50 and epoch l =8; selecting cross entropy as a training loss function, adam as an optimization algorithm, and setting super parameters: the learning rate was 0.001 and the batch size was 64.
And S103, training the child node impulse neural network corresponding to the private data set by utilizing each private data set, obtaining a first impulse tensor corresponding to the child node based on forward propagation of the public data set by utilizing each child node impulse neural network obtained after training, and carrying out binary compression on the first impulse tensor and uploading the first impulse tensor to the central node. The first pulse tensor, namely, the output obtained by utilizing the child node pulse neural network obtained after training is based on the forward propagation of the public data set.
The method for training the child node impulse neural network by utilizing each private data set comprises the following steps of: the child node pulse neural network calculates the hard tag loss function L in the forward direction based on the corresponding private data set hard And gradient, and training the pulse neural network by using a back propagation algorithm for a preset number of epoch rounds l And updating the parameters.
Wherein the hard tag loss function is as follows:
wherein L is hard For the hard tag loss function, v represents the frequency vector calculated based on the output pulse,representing the true label vector, the calculation method is as follows:
where tar is the class of real tags, onehot represents one-hot coding, i.e.:
where i is the element index of the tag vector.
Binary compression is performed on the first pulse tensor, and the method comprises the following steps:
step 1: zeroing and initializing tensor sc for storing the compression result;
step 2: operations are performed on sc in accordance with a time window order:wherein s is t For a t time window pulse value of a corresponding position element in a pulse tensor to be compressed, sc is an element variable in sc, and each element value is calculated by t iteration of the variable; />Element variables in the calculated sc;
step 3: after all calculations are performed, sc is the binary compression tensor required.
The example is a lossless compression algorithm for pulse output tensor of a pulse neural network. The algorithm can compress the pulse in a lossless manner, and reduce federal communication overhead.
The decompression method for the compression algorithm is as follows:
step 1: zeroing and initializing a decompression tensor for storing a decompression result;
step 2: performing a reduction operation on each element value in the compression tensor to obtain an ordered pulse sequence, and storing the obtained result in a position corresponding to the decompression tensor in an inverted order;
step 3: after all the calculation execution is completed, the decompressed tensor is the first pulse tensor.
The element reduction operation comprises the following specific steps:
copy element values into buffer variable sr and initialize pulse sequence array s i ],i=1,2,…,T;
Initializing i=1, and performing loop calculation: s is(s) i =srmod 2, sr=srdiv2, i=i+1, where mod represents a remainder operation, div represents an integer division operation; cycling until i=t+1;
after the calculation is completed, the sequence array [ s ] i ]I.e. the required pulse sequence.
And S104, the central node decompresses the received compression tensor to obtain a first pulse tensor of each child node, and performs distillation training based on the public data set according to the first pulse tensor to obtain a distillation product corresponding to each child node.
That is, the central node receives the first pulse tensor output by all the child nodes, and calculates the soft tag loss function L in an orderly forward direction on the common data set soft And gradient, training the central node impulse neural network by using a back propagation algorithm for a preset training round number epoch l And obtaining a distillation product corresponding to each sub-node.
Specifically, the method comprises the following steps:
first, a distillation loss function, i.e., a soft label loss function L, is defined soft The method comprises the following steps:
L soft =L T +λL F
wherein L is soft L is a distillation loss function T To mean square error like loss of pulses, L F Cross entropy loss for class for relaxation; λ represents a relaxation variable, which is a preset parameter, and generally may be 0.1 to 10, and the lower the value, the higher the degree of strictness in fitting the output pulse tensor to the target pulse tensor, but the problem of overfitting may occur.
Wherein, the mean square error-like loss for the pulse:
class cross entropy loss for relaxation:
wherein, C is the class number of the training data, T is the size of the impulse neural network time window; s is(s) ct Andthe values of the elements in the batch of predicted pulse matrices and the target pulse matrices, respectively. P is p c And->The method is characterized in that the method comprises the steps of predicting a frequency vector and a target frequency vector respectively, and calculating through a pulse matrix, wherein the calculation mode is as follows:
based on the defined loss function, the central node impulse neural network is iteratively trained by using a back propagation algorithm to achieve the local training round number epoch in the super-parameters l And stopping training when the training is stopped.
In the above steps S103 and S104, the method flow for training the neural network for the back propagation algorithm of the child node and the central node impulse neural network is shown with reference to fig. 2, and the method flow includes the following steps:
s201, setting the current training wheel number variable to 0;
s202, selecting a part of (Batch) training data in a data set as the training data of the round;
s203, acquiring an output predicted value through forward propagation;
s204, calculating a loss function value;
s205, performing optimization through a back propagation algorithm, and updating various parameters of the neural network;
s206, adding 1 to the training round number variable;
s207, judging whether a preset local training round is reached, if so, jumping to S208, and if not, jumping to S202;
s208 ends the training.
Based on the BP back propagation algorithm, the neural network parameters can be updated in training, and the classification accuracy and training speed of the impulse neural network model are improved.
Then, the distillation products of all the child nodes are polymerized to obtain global parameters, which are specifically as follows:
s301, dividing an aggregation network buffer area reserved by a central node into a first aggregation buffer area and a second aggregation buffer area;
s302, generating a copy of the current central node impulse neural network, and placing the copy in a first aggregation buffer area;
s303 selects any one of the first pulse tensors to be aggregated and substitutes it into L soft Distilling and training the copy on the public data set, and updating the network parameters of the copy as first network parameters;
s303, judging whether all child nodes are aggregated, if yes, taking the first network parameter as a global parameter; otherwise, go to step S304;
s304, updating a central node impulse neural network copy placed in a first aggregation buffer area based on a first network parameter and taking the central node impulse neural network copy as a first copy; copying the first copy to generate a copy thereof, and placing the copy in a second polymerization buffer area;
s305, selecting any first pulse tensor to be aggregated, carrying out distillation training on the copy based on the public data set, and updating to obtain a second copy and a second network parameter;
s306, randomly selecting part of the public data set to have tag data, and generating a temporary test data set;
s307, based on the temporary test data set, testing the first copy and the second copy respectively to obtain test accuracy a and a' of the first copy and the second copy, generating an aggregation weight based on the test accuracy of the first copy and the second copy, and then calculating and updating a first network parameter by using the aggregation weight;
s308 returns to step S303.
The method for generating the aggregation weight based on the test accuracy rates a and a' is as follows:
wherein, (alpha, alpha') is an aggregation weight, delta is a preset retention factor, and delta is usually more than 0.5; τ is a preset difference factor; softmax represents the softmax function used for normalization of weights;
new network parameters are calculated using the aggregate weights, as follows:
wherein,is a new network parameter; w is a first network parameter corresponding to the first copy; w' is a second network parameter corresponding to the second copy.
Finally, after updating the central node impulse neural network by using the global parameter, forward transmitting the central node impulse neural network on a public data set to obtain a second impulse tensor, and distributing the second impulse tensor to all the child nodes; the child node receives a second pulse tensor for carrying out distillation training on the public data set, synchronously updates the child node pulse neural network parameters, and enters step S103 as a trained child node pulse neural network for cyclic training; until the global total round number epoch is circularly trained g After the model reaches a preset value, stopping training; the pulsed neural network of the central node is a medical image processing model based on federal knowledge distillation.
Example 2
The present example proposes a medical image processing method specifically related to federal knowledge distillation, comprising constructing the federal knowledge distillation-based medical image processing model in example 1, and applying it to medical image processing.
According to the above flow, after the algorithm is finished, the accuracy of classifying and identifying the medical images is improved by about 5% compared with the independent training of the mechanism, and meanwhile, the communication cost on federal learning is reduced by 90%, which is a great optimization.

Claims (6)

1. The medical image processing model construction method based on federal knowledge distillation is characterized by comprising the following steps of:
step 1, collecting training data, constructing a training set, including: based on the common data set required by distillation obtained by preprocessing and sorting the open medical image data; based on the privacy CT image data of each medical institution participating in training, carrying out coordination pretreatment according to a public data set to obtain a private data set of the medical institution;
the private data sets are in one-to-one correspondence with the medical institutions participating in training, and are mutually independent;
step 2, respectively constructing pulse neural networks of child nodes and central nodes, wherein the child nodes are in one-to-one correspondence with the private data sets;
training the corresponding sub-node pulse neural network by using each private data set, obtaining a first pulse tensor corresponding to the sub-node based on forward propagation of the public data set by using each sub-node pulse neural network obtained after training, and uploading the first pulse tensor to a central node;
step 4, the center node carries out distillation training based on a public data set according to the received first pulse tensor of each child node to obtain distillation products corresponding to each child node;
step 5, all distillation products are aggregated to obtain global parameters, the central node impulse neural network is updated by the global parameters, and a second impulse tensor is obtained based on forward propagation of a public data set by the updated central node impulse neural network and distributed to all child nodes;
step 6, each child node receives the second pulse tensor, carries out distillation training based on the public data set, and updates the pulse neural network parameters;
step 7, judging whether the number of the preset rounds of cyclic training or the model reaches a preset value, if so, stopping training, wherein the trained pulse neural network of the central node is a medical image processing model based on federal knowledge distillation; otherwise, returning to the step 3;
in step 5, all distillation products are polymerized to obtain global parameters, including:
the central node reserves an aggregation network buffer area and performs aggregation according to the following steps:
step 51, generating a copy of the pulse neural network of the current central node and placing the copy in an aggregation buffer area of the central node; selecting any first pulse tensor to be aggregated, performing distillation training on the copy based on the public data set, and updating to obtain a first network parameter;
step 52, judging whether all the child nodes are aggregated, if yes, taking the first network parameter as a global parameter; otherwise, go to step 53;
step 53, updating the central node pulse neural network copy based on the first network parameter, and copying the first copy in the aggregation buffer area to generate a copy thereof as a first copy;
step 54, selecting any first pulse tensor to be aggregated, performing distillation training on the copy based on the public data set, and updating to obtain a second copy and a second network parameter;
step 55, randomly selecting part of the public data set to have tag data, and generating a temporary test data set;
step 56, based on the temporary test data set, testing the first copy and the second copy respectively to obtain the test accuracy of the first copy and the second copy, generating an aggregation weight based on the test accuracy of the first copy and the second copy, and then calculating and updating the first network parameter by using the aggregation weight;
step 57, returning to step 52;
in step 56, generating an aggregate weight based on the test accuracy of both, and then updating the first network parameter using the aggregate weight calculation, including:
step 561, generating an aggregation weight based on the test accuracy rates a and a':
wherein (alpha, alpha') is an aggregation weight, delta is a preset retention factor, tau is a preset difference factor, and softmax represents a normalized exponential function;
step 562, calculating and updating a first network parameter by using the aggregation weight:
wherein,is a new first network parameter; w is a first network parameter corresponding to the first copy; w' is a second network parameter corresponding to the second copy.
2. The method for constructing a medical image processing model based on federal knowledge distillation according to claim 1, wherein in the step 3, each private data set is used to train the corresponding sub-node impulse neural network, and the method comprises the following steps: the child node pulse neural network calculates the hard tag loss function L in the forward direction based on the corresponding private data set hard And gradient, training the pulse neural network by using a back propagation algorithm for a preset number of rounds, and updating parameters;
wherein the hard tag loss function is as follows:
wherein L is hard As a hard tag loss function, v denotes based on output pulseThe calculated frequency vector is used to determine the frequency of the signal,representing the true label vector, the calculation method is as follows:
wherein tar represents the category of the real label, onehot represents the one-time coding, and the coding mode is as follows:
where i represents the element index of the tag vector.
3. The method for constructing a medical image processing model based on federal knowledge distillation according to claim 1, wherein in step 3, the first pulse tensor is uploaded to the central node after binary compression; in the step 4, after receiving the child node compression tensor, the central node obtains a first pulse tensor after decompression.
4. The method for constructing a federal knowledge distillation based medical image processing model according to claim 3,
the binary compression of the first pulse tensor comprises the following steps:
step 31: zeroing and initializing tensor sc for storing compression results;
step 32: the following calculation is performed for each element value sc of sc in the time window order:
wherein s is t For corresponding position elements in the pulse tensor to be compressedt time window pulse value; sc is the element value in sc, and the variable is iteratively calculated as t;the element value of sc after calculation;
step 33: after all the calculation is completed, binary compression is completed;
in the step 4, the compressing tensor is decompressed to obtain a first pulse tensor, which includes the following steps:
step 41: zeroing and initializing a decompression tensor for storing a decompression result;
step 42: performing a reduction operation on each element value in the compression tensor to obtain an ordered pulse sequence, and storing the obtained result in a position corresponding to the decompression tensor in an inverted order;
step 43: after all the calculation execution is completed, the decompressed tensor is the first pulse tensor.
5. The method for constructing a medical image processing model based on federal knowledge distillation according to any one of claims 1, 3 and 4, wherein in the step 4, the center node performs distillation training based on a common data set according to the received first pulse tensor of each child node to obtain a distillation product corresponding to each child node, and the method comprises the following steps:
defining a distillation loss function, and stopping training when training the pulse neural network for a preset number of rounds by using a back propagation algorithm; wherein, distillation loss function is:
L soft =L T +λL F
wherein L is soft L is a distillation loss function T To mean square error like loss of pulses, L F For class cross entropy loss for relaxation, λ represents the relaxation variable, which is a preset parameter;
wherein, the mean square error-like loss for the pulse:
class cross entropy loss for relaxation:
wherein, C is the class number of the training data, T is the size of the impulse neural network time window; s is(s) ct Andthe values of elements in a batch of predicted pulse matrix and a target pulse matrix are respectively; p is p c And->Respectively representing a predicted frequency vector and a target frequency vector, and calculating through a pulse matrix, wherein the calculation mode is as follows:
6. a medical image processing method based on federal knowledge distillation is characterized in that: the medical image processing model based on federal knowledge distillation constructed according to the medical image processing model construction method based on federal knowledge distillation as set forth in any one of claims 1 to 5, and image processing is performed on a medical image to be processed using the model.
CN202210783921.4A 2022-07-05 2022-07-05 Medical image processing model construction and processing method based on federal knowledge distillation Active CN115271033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210783921.4A CN115271033B (en) 2022-07-05 2022-07-05 Medical image processing model construction and processing method based on federal knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210783921.4A CN115271033B (en) 2022-07-05 2022-07-05 Medical image processing model construction and processing method based on federal knowledge distillation

Publications (2)

Publication Number Publication Date
CN115271033A CN115271033A (en) 2022-11-01
CN115271033B true CN115271033B (en) 2023-11-21

Family

ID=83762766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210783921.4A Active CN115271033B (en) 2022-07-05 2022-07-05 Medical image processing model construction and processing method based on federal knowledge distillation

Country Status (1)

Country Link
CN (1) CN115271033B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704296B (en) * 2023-08-04 2023-11-03 浪潮电子信息产业股份有限公司 Image processing method, device, system, equipment and computer storage medium

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1342773A (en) * 1991-03-18 2002-04-03 佛罗里达大学研究基金会 Producing ethanol by recombination host
CN105243649A (en) * 2015-11-09 2016-01-13 天津大学 Image denoising method based on secondary noise point detection
CN111369576A (en) * 2020-05-28 2020-07-03 腾讯科技(深圳)有限公司 Training method of image segmentation model, image segmentation method, device and equipment
CN112703457A (en) * 2018-05-07 2021-04-23 强力物联网投资组合2016有限公司 Method and system for data collection, learning and machine signal streaming for analysis and maintenance using industrial internet of things
CN112947500A (en) * 2021-02-10 2021-06-11 复旦大学 Underwater vehicle water flow monitoring system
CN113205863A (en) * 2021-06-04 2021-08-03 广西师范大学 Training method of individualized model based on distillation semi-supervised federal learning
CN113330292A (en) * 2018-07-31 2021-08-31 科罗拉多大学评议会法人团体 System and method for applying machine learning to analyze microscopic images in high throughput systems
CN113361606A (en) * 2021-06-07 2021-09-07 齐鲁工业大学 Deep map attention confrontation variational automatic encoder training method and system
CN113408743A (en) * 2021-06-29 2021-09-17 北京百度网讯科技有限公司 Federal model generation method and device, electronic equipment and storage medium
CN113518007A (en) * 2021-07-06 2021-10-19 华东师范大学 Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
CN113553918A (en) * 2021-06-30 2021-10-26 电子科技大学 Machine-made invoice character recognition method based on pulse active learning
WO2021223873A1 (en) * 2020-05-08 2021-11-11 Ecole Polytechnique Federale De Lausanne (Epfl) System and method for privacy-preserving distributed training of machine learning models on distributed datasets
CN113705823A (en) * 2020-05-22 2021-11-26 华为技术有限公司 Model training method based on federal learning and electronic equipment
WO2021257893A1 (en) * 2020-06-19 2021-12-23 Cleerly, Inc. Systems, methods, and devices for medical image analysis, diagnosis, risk stratification, decision making and/or disease tracking
CN113989561A (en) * 2021-10-29 2022-01-28 河海大学 Parameter aggregation updating method, equipment and system based on asynchronous federal learning
CN114154643A (en) * 2021-11-09 2022-03-08 浙江师范大学 Federal distillation-based federal learning model training method, system and medium
WO2022060264A1 (en) * 2020-09-18 2022-03-24 Telefonaktiebolaget Lm Ericsson (Publ) Methods and systems for updating machine learning models
CN114269344A (en) * 2019-06-25 2022-04-01 微生物公司 Compositions and methods for treating or preventing ocular infections with felodivir
CN114429219A (en) * 2021-12-09 2022-05-03 之江实验室 Long-tail heterogeneous data-oriented federal learning method
CN114492745A (en) * 2022-01-18 2022-05-13 天津大学 Knowledge distillation mechanism-based incremental radiation source individual identification method
CN114626550A (en) * 2022-03-18 2022-06-14 支付宝(杭州)信息技术有限公司 Distributed model collaborative training method and system
WO2022126706A1 (en) * 2020-12-19 2022-06-23 中国科学院深圳先进技术研究院 Method and device for accelerating personalized federated learning
CN114692732A (en) * 2022-03-11 2022-07-01 华南理工大学 Method, system, device and storage medium for updating online label

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4245310B2 (en) * 2001-08-30 2009-03-25 忠正 藤村 Diamond suspension aqueous solution excellent in dispersion stability, metal film containing this diamond, and product thereof
WO2019222401A2 (en) * 2018-05-17 2019-11-21 Magic Leap, Inc. Gradient adversarial training of neural networks
US11188799B2 (en) * 2018-11-12 2021-11-30 Sony Corporation Semantic segmentation with soft cross-entropy loss
US20210406782A1 (en) * 2020-06-30 2021-12-30 TieSet, Inc. System and method for decentralized federated learning

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1342773A (en) * 1991-03-18 2002-04-03 佛罗里达大学研究基金会 Producing ethanol by recombination host
CN105243649A (en) * 2015-11-09 2016-01-13 天津大学 Image denoising method based on secondary noise point detection
CN112703457A (en) * 2018-05-07 2021-04-23 强力物联网投资组合2016有限公司 Method and system for data collection, learning and machine signal streaming for analysis and maintenance using industrial internet of things
CN113330292A (en) * 2018-07-31 2021-08-31 科罗拉多大学评议会法人团体 System and method for applying machine learning to analyze microscopic images in high throughput systems
CN114269344A (en) * 2019-06-25 2022-04-01 微生物公司 Compositions and methods for treating or preventing ocular infections with felodivir
WO2021223873A1 (en) * 2020-05-08 2021-11-11 Ecole Polytechnique Federale De Lausanne (Epfl) System and method for privacy-preserving distributed training of machine learning models on distributed datasets
CN113705823A (en) * 2020-05-22 2021-11-26 华为技术有限公司 Model training method based on federal learning and electronic equipment
CN111369576A (en) * 2020-05-28 2020-07-03 腾讯科技(深圳)有限公司 Training method of image segmentation model, image segmentation method, device and equipment
WO2021257893A1 (en) * 2020-06-19 2021-12-23 Cleerly, Inc. Systems, methods, and devices for medical image analysis, diagnosis, risk stratification, decision making and/or disease tracking
WO2022060264A1 (en) * 2020-09-18 2022-03-24 Telefonaktiebolaget Lm Ericsson (Publ) Methods and systems for updating machine learning models
WO2022126706A1 (en) * 2020-12-19 2022-06-23 中国科学院深圳先进技术研究院 Method and device for accelerating personalized federated learning
CN112947500A (en) * 2021-02-10 2021-06-11 复旦大学 Underwater vehicle water flow monitoring system
CN113205863A (en) * 2021-06-04 2021-08-03 广西师范大学 Training method of individualized model based on distillation semi-supervised federal learning
CN113361606A (en) * 2021-06-07 2021-09-07 齐鲁工业大学 Deep map attention confrontation variational automatic encoder training method and system
CN113408743A (en) * 2021-06-29 2021-09-17 北京百度网讯科技有限公司 Federal model generation method and device, electronic equipment and storage medium
CN113553918A (en) * 2021-06-30 2021-10-26 电子科技大学 Machine-made invoice character recognition method based on pulse active learning
CN113518007A (en) * 2021-07-06 2021-10-19 华东师范大学 Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
CN113989561A (en) * 2021-10-29 2022-01-28 河海大学 Parameter aggregation updating method, equipment and system based on asynchronous federal learning
CN114154643A (en) * 2021-11-09 2022-03-08 浙江师范大学 Federal distillation-based federal learning model training method, system and medium
CN114429219A (en) * 2021-12-09 2022-05-03 之江实验室 Long-tail heterogeneous data-oriented federal learning method
CN114492745A (en) * 2022-01-18 2022-05-13 天津大学 Knowledge distillation mechanism-based incremental radiation source individual identification method
CN114692732A (en) * 2022-03-11 2022-07-01 华南理工大学 Method, system, device and storage medium for updating online label
CN114626550A (en) * 2022-03-18 2022-06-14 支付宝(杭州)信息技术有限公司 Distributed model collaborative training method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"A Federated Learning Aggregation Algorithm for Pervasive Computing:Evaluation and ComParision";Sannara EK等;《IEEE International Conference on Pervasive Computing and Communications》;第1-10页 *
"Decentralized Federated Learning:A Segmented Gossip Approach";Chenghao Hu等;《arxiv》;第1-7页 *
"Federal SNN Distillation: A Low-Communication-Cost Federated Learning Framework for Spiking Neural Networks";Zhetong Liu等;《Journal of Physics: Conference Series》;第2216卷(第1期);第1-8页摘要和第1-3节 *

Also Published As

Publication number Publication date
CN115271033A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
WO2021120936A1 (en) Chronic disease prediction system based on multi-task learning model
JP7305656B2 (en) Systems and methods for modeling probability distributions
CN109036553A (en) A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
WO2016192612A1 (en) Method for analysing medical treatment data based on deep learning, and intelligent analyser thereof
CN106778014A (en) A kind of risk Forecasting Methodology based on Recognition with Recurrent Neural Network
Cai et al. Improved deep convolutional neural networks using chimp optimization algorithm for Covid19 diagnosis from the X-ray images
CN115271033B (en) Medical image processing model construction and processing method based on federal knowledge distillation
CN112819831B (en) Segmentation model generation method and device based on convolution Lstm and multi-model fusion
Purnama et al. Disease classification based on dermoscopic skin images using convolutional neural network in teledermatology system
CN115471716A (en) Chest radiographic image disease classification model lightweight method based on knowledge distillation
Sarp et al. Simultaneous wound border segmentation and tissue classification using a conditional generative adversarial network
CN116110597A (en) Digital twinning-based intelligent analysis method and device for patient disease categories
Zhu et al. An automatic classification of the early osteonecrosis of femoral head with deep learning
CN111477337A (en) Infectious disease early warning method, system and medium based on individual self-adaptive transmission network
CN114820450A (en) CT angiography image classification method suitable for Li's artificial liver treatment
CN110335160A (en) A kind of medical treatment migratory behaviour prediction technique and system for improving Bi-GRU based on grouping and attention
JP7365747B1 (en) Disease treatment process abnormality identification system based on hierarchical neural network
CN117038096A (en) Chronic disease prediction method based on low-resource medical data and knowledge mining
Chen et al. Gingivitis identification via GLCM and artificial neural network
CN116011559A (en) Zero sample distillation system and method for case classification based on pseudo word sequence generation
CN116309754A (en) Brain medical image registration method and system based on local-global information collaboration
US20230087494A1 (en) Determining image similarity by analysing registrations
CN115171896A (en) System and method for predicting long-term death risk of critically ill patient
KR20220111215A (en) Apparatus and method for predicting drug-target interaction using deep neural network model based on self-attention
CN109119159B (en) Deep learning medical diagnosis system based on rapid weight mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant