CN110782042A

CN110782042A - Method, device, equipment and medium for combining horizontal federation and vertical federation

Info

Publication number: CN110782042A
Application number: CN201911035368.0A
Authority: CN
Inventors: 梁新乐; 刘洋; 陈天健; 董苗波
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-02-11
Anticipated expiration: 2039-10-29
Also published as: CN110782042B; WO2021083276A1

Abstract

The invention discloses a horizontal federal and vertical federal combined method, a device, equipment and a medium, wherein the horizontal federal and vertical federal combined method comprises the steps of obtaining available public information, inputting the available public information into a preset vertical federal service party to obtain vector information, training a vertical federal model of the preset vertical federal service party based on the vector information, updating the network weight of each preset reinforcement learning model, periodically inputting each updated preset reinforcement learning model into a preset horizontal federal server, and iteratively updating each updated preset reinforcement learning model. The technical problem of high resource consumption of a computing system of a reinforcement learning model in the prior art is solved.

Description

Method, device, equipment and medium for combining horizontal federation and vertical federation

Technical Field

The invention relates to the technical field of machine learning of financial technology (Fintech), in particular to a horizontal federal and vertical federal combined method, equipment and medium.

Background

With the continuous development of financial technologies, especially internet technology and finance, more and more technologies (such as distributed, Blockchain, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, such as higher requirements on the distribution of backlog of the financial industry.

With the gradual development of artificial intelligence, extensive research has been conducted on optimization control in the industry by means of reinforcement learning, in the prior art, reinforcement learning models generally utilize data collected by the reinforcement learning models to perform learning, optimization and control, but data collected by the reinforcement learning models often cause difficulties, for example, a high-speed radar of an unmanned vehicle cannot pass through a shield, and the unmanned vehicle cannot obtain more comprehensive data (such as distribution of surrounding vehicles, running states and the like) due to the limitation of the height of an image sensor, so that the sample processing efficiency of the reinforcement learning models is low, the model control performance is poor, further, in order to obtain a better optimization control result, the reinforcement learning models alone perform learning, optimization and control, the method needs to consume a large amount of computing system resources, so that the technical problem that the computing system resources of the reinforcement learning model are high in consumption exists in the prior art.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a medium for combining a horizontal federation and a vertical federation, and aims to solve the technical problem of high resource consumption of a computing system of a reinforcement learning model in the prior art.

In order to achieve the above object, an embodiment of the present invention provides a horizontal federal and vertical federal combined method, which is applied to a horizontal federal and vertical federal combined device, and includes:

acquiring available public information, and inputting the available public information into a preset longitudinal federal service side to acquire vector information;

training a longitudinal federal model of the preset longitudinal federal service party based on the vector information, and updating the network weight of each preset reinforcement learning model;

and inputting each updated preset reinforcement learning model into a preset horizontal federated server at regular intervals, and performing iterative updating on each updated preset reinforcement learning model.

Optionally, the step of training the longitudinal federal model of the preset longitudinal federal service provider based on the vector information and updating the network weight of each preset reinforcement learning model includes:

receiving sensor data sent by each preset reinforcement learning model, and generating control information through the longitudinal federal model based on the sensor data and the vector information;

training the longitudinal federated model in a training environment corresponding to the control information to obtain reward information and state information of the next time step;

and storing the reward information, the next time step state information and the control information as sample information, and updating the network weight of each preset reinforcement learning model based on the sample information.

Optionally, the step of updating the network weight of each preset reinforcement learning model based on the sample information includes:

inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;

comparing the training output value with a real output value corresponding to the training data to obtain a model error value;

comparing the model error value with a preset error threshold value, and finishing the training of the preset reinforcement learning model if the model error value is smaller than the preset error threshold value;

and if the model error value is larger than or equal to the preset error threshold value, updating the network weight of the preset reinforcement learning model based on the model error value, and retraining the preset reinforcement learning model.

Optionally, the step of periodically inputting each updated preset reinforcement learning model into a preset horizontal federal server, and iteratively updating each updated preset reinforcement learning model includes:

regularly inputting the updated preset reinforcement learning models into the preset transverse federal server so as to carry out transverse federal on the updated preset reinforcement learning models based on preset federal rules to obtain a transverse federal model;

and iteratively updating each updated preset reinforcement learning model based on the transverse federal model.

Optionally, each of the updated preset reinforcement learning models includes updated model parameters,

the step of regularly inputting each updated preset reinforcement learning model into the preset transverse federal server so as to carry out transverse federal on each updated preset reinforcement learning model based on preset federal rules, and the step of obtaining the transverse federal model comprises the following steps:

periodically inputting each updated model parameter into the preset horizontal federated server to fuse each updated model parameter to obtain a global model parameter;

and distributing the global model parameters to each updated preset reinforcement learning model, and training the updated preset reinforcement learning model based on the global model parameters to obtain the transverse federated model.

Optionally, the preset longitudinal federal service side comprises a longitudinal federal model, the longitudinal federal model comprises a current weight value,

the step of inputting the available public information into a preset longitudinal federal service side to obtain vector information comprises the following steps:

inputting the available public information as a current input value into the longitudinal federal model to obtain a current output value;

comparing the current output value with a preset current real value to obtain a current error value;

and calculating a partial derivative of a preset loss function based on the current weight value and the current error value to obtain vector information corresponding to the current weight value and the current error value together.

Optionally, the step of acquiring the available public information includes:

receiving a message request of a preset reinforcement learning model, and acquiring identification information in the message request through a preset longitudinal federal party;

and matching the available public information corresponding to the identification information in a preset public data source through the preset longitudinal federal party on the basis of the identification information.

The invention also provides a transverse federal and longitudinal federal combined device, which is applied to a transverse federal and longitudinal federal combined device, and comprises:

the input module is used for acquiring the available public information and inputting the available public information into a preset longitudinal federal service side to acquire vector information;

the first updating module is used for training the longitudinal federal model of the preset longitudinal federal service party based on the vector information and updating the network weight of each preset reinforcement learning model;

and the second updating module is used for inputting the updated preset reinforcement learning models into a preset horizontal federal server periodically and carrying out iterative updating on the updated preset reinforcement learning models.

Optionally, the first updating module includes:

the acquisition unit is used for receiving the sensor data sent by each preset reinforcement learning model and generating control information through the longitudinal federal model based on the sensor data and the vector information;

the first training unit is used for training the longitudinal federated model under the training environment corresponding to the control information to obtain reward information and state information of the next time step;

and the first updating unit is used for storing the reward information, the next time step state information and the control information as sample information and updating the network weight of each preset reinforcement learning model based on the sample information.

Optionally, the first updating unit includes:

the first training subunit is used for inputting the sample information as training data into the preset reinforcement learning model so as to train the preset reinforcement learning model and obtain a training output value;

a comparison subunit, configured to compare the training output value with a real output value corresponding to the training data to obtain a model error value;

the first judging subunit is configured to compare the model error value with a preset error threshold, and complete training of the preset reinforcement learning model if the model error value is smaller than the preset error threshold;

and the second judging subunit is configured to, if the model error value is greater than or equal to the preset error threshold value, update the network weight of the preset reinforcement learning model based on the model error value, and train the preset reinforcement learning model again.

Optionally, the second updating module includes:

the regular sending unit is used for inputting the updated preset reinforcement learning models into the preset transverse federal server regularly so as to carry out transverse federal on the updated preset reinforcement learning models based on preset federal rules and obtain transverse federal models;

and the second updating unit is used for performing iterative updating on each updated preset reinforcement learning model based on the transverse federated model.

Optionally, the periodic transmission unit includes:

the fusion subunit is used for periodically inputting the parameters of each updated model into the preset horizontal federated server so as to fuse the parameters of each updated model and obtain global model parameters;

and the second training subunit is configured to distribute the global model parameters to each updated preset reinforcement learning model, so as to train the updated preset reinforcement learning model based on the global model parameters, and obtain the horizontal federal model.

Optionally, the input module comprises:

the input unit is used for inputting the available public information as a current input value into the longitudinal federal model to obtain a current output value;

the comparison unit is used for comparing the current output value with a preset current real value to obtain a current error value;

and the bias derivation unit is used for performing bias derivation on a preset loss function based on the current weight value and the current error value to obtain vector information corresponding to the current weight value and the current error value together.

Optionally, the input module comprises:

the receiving unit is used for receiving a message request of a preset reinforcement learning model and acquiring identification information in the message request through a preset longitudinal federal party;

and the matching unit is used for matching the available public information corresponding to the identification information in a preset public data source through the preset longitudinal federal party on the basis of the identification information.

The invention also provides a horizontal federal and vertical federal combined device, which comprises: a memory, a processor, and a program of the horizontal federal and vertical federal combined procedures stored on the memory and operable on the processor, the program of the horizontal federal and vertical federal combined procedures when executed by the processor being operable to implement the steps of the horizontal federal and vertical federal combined procedures as described above.

The present invention also provides a medium, which is a computer-readable storage medium, and on which a program implementing the horizontal federal and vertical federal combined procedures is stored, the program implementing the steps of the horizontal federal and vertical federal combined procedures as described above when executed by a processor.

According to the method, available public information is obtained, the available public information is input into a preset longitudinal federal service party, vector information is obtained, then based on the vector information, a longitudinal federal model of the preset longitudinal federal service party is trained, network weights of all preset reinforcement learning models are updated, furthermore, updated preset reinforcement learning models are input into a preset transverse federal server regularly, and updated preset reinforcement learning models are updated iteratively. That is, in the application, firstly, available public information is obtained, then, the available public information is input into a preset longitudinal federated server, vector information is obtained, further, training of the longitudinal federated model is performed based on the vector information, so as to update the network weight of each preset reinforcement learning model, and finally, each updated preset reinforcement learning model is periodically input into a preset transverse federated server, so as to perform iterative update on each updated preset reinforcement learning model. That is, the present application inputs the available public information into a preset longitudinal federated model, performs longitudinal federated learning on the preset longitudinal federated model, and further updates each preset reinforcement learning model, so the training data for model training in the present application is more comprehensive and wide, so that the control performance of the model is improved, the model is more robust, training of the model using single local data is avoided, further, by periodically inputting each updated preset reinforcement learning model into a preset transverse federated server, performing transverse federated learning on each preset reinforcement learning model, iteratively updating each updated preset reinforcement learning model, increasing the effective training data of each preset reinforcement learning model, further reducing the training process with low training effect, further, the consumption of the resources of the computing system of the single preset reinforcement learning model is reduced, so that the technical problem of high consumption of the resources of the computing system of the reinforcement learning model in the prior art is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a first embodiment of a horizontal federal and vertical federal combined procedure of the present invention;

FIG. 2 is a schematic diagram of a tree interface complete logic model for analyzing an application software interface in the horizontal federation and vertical federation joint method of the present invention;

FIG. 3 is a flow chart diagram of the method for building the interface complete logic model according to the horizontal federation and the vertical federation;

FIG. 4 is a schematic flow chart of a second embodiment of a horizontal federal and vertical federal combined procedure in accordance with the present invention;

fig. 5 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a transverse federal and longitudinal federal combined method, which is applied to a transverse federal and longitudinal federal combined device, and in a first embodiment of the transverse federal and longitudinal federal combined method, which is applied to the application, referring to fig. 1, the transverse federal and longitudinal federal combined method comprises the following steps:

step S10, acquiring available public information, inputting the available public information into a preset longitudinal federal service side, and acquiring vector information;

in this embodiment, it should be noted that the vector information refers to gradient information generated in a training process of a preset reinforcement learning model, where the gradient is a vector, a bias derivative of a preset loss function can be obtained, and a negative direction of the gradient is a direction in which a current value of the function approaches a minimum value, that is, a negative direction of the gradient is a direction in which a loss function value decreases fastest, and a step length of the gradient is a maximum change rate of the loss function value. The preset longitudinal federated server is a preset server and can be used for performing longitudinal federated learning by combining different preset reinforcement learning models, wherein the longitudinal federated learning is to take out the part of users and data with the same users and different user data characteristics of the participants to perform joint machine learning training under the condition that the data characteristics of the participants are overlapped less and the users are overlapped more. For example, assuming that there are two participants a and B belonging to the same region, where participant a is a bank and participant B is an e-commerce platform, participants a and B have more of the same users in the same region, but a and B have different businesses and the recorded user data characteristics are different, in particular, the recorded user data characteristics of a and B may be complementary, in such a scenario, vertical federal learning may be used to help a and B build a joint machine learning prediction model, helping a and B provide better service to customers.

Acquiring available public information, inputting the available public information into a preset longitudinal federal service party, acquiring vector information, and specifically, sending a message request to the preset longitudinal federal service party through a preset reinforcement learning model, wherein the message request comprises identification information, based on the identification information, the public information federal party can acquire the available public information corresponding to the identification information from a preset public data source, further input the available public information into a longitudinal federal model in the public information federal party, and acquire the vector information, for example, if a longitudinal federal model is trained by a batch gradient descent method adopted in the longitudinal federal model, the available public information is input into the longitudinal federal model in the public information federal party as a batch of training values, and the longitudinal federal model output value is acquired, further, a difference degree between an output value of the longitudinal federal model and a true value corresponding to the training value is calculated, that is, a current error value of the current training is calculated, and then partial derivatives of a model error and a model weight of the longitudinal federal model are respectively obtained for a preset loss function, wherein the loss function is a quadratic function of the model weight and the model error, and then a partial derivative value corresponding to the current weight value and the current error value together is obtained, that is, a gradient vector value is obtained, that is, the vector information is obtained.

Wherein, in step S10, the step of acquiring the available public information includes:

step S11, receiving a message request of a preset reinforcement learning model, and acquiring identification information in the message request through a preset longitudinal federal party;

in this embodiment, a message request of a preset reinforcement learning model is received, identification information in the message request is acquired through a preset longitudinal federal party, specifically, the message request of each preset reinforcement learning model is sent to a public information federal party, and then the identification information in the message request is acquired through the public information federal party, wherein the message request includes identification information, the identification information includes geographic position coordinates, license plate numbers and the like, and the acquisition of the identification information in the message request includes methods of tag matching, keyword matching and the like.

And step S12, based on the identification information, matching the available public information corresponding to the identification information in a preset public data source through the preset longitudinal federal party.

In this embodiment, it should be noted that the common data source includes model training information of a plurality of reinforcement learning models, where the model training information includes available common information and unavailable common information.

Based on the identification information, the preset longitudinal federal party matches the available public information corresponding to the identification information in a preset public data source, specifically, the identification information comprises identification labels, identification keywords, identification character strings and the like, model training information in the public data source is compared one by one through the preset longitudinal federal party, the model training information comprising the identification information is selected, and the available public information is obtained.

Wherein, in step S10, the preset longitudinal federal service side includes a longitudinal federal model including a current weight value,

step S13, inputting the available public information as the current input value into the longitudinal federal model to obtain the current output value;

in this embodiment, it should be noted that the longitudinal federal model includes a neural network model, and one of the current input values corresponds to one of the current output values.

Inputting the available public information as a current input value into the vertical federal model to obtain a current output value, specifically, inputting the available public information as a current input value into the vertical federal model, and performing data processing on the available public information by a preset data processing method, wherein the preset data processing method includes convolution processing, pooling processing, full-connection processing, and the like, wherein if the current input value is an image, the convolution refers to a process of performing element-by-element multiplication and then summation on an image matrix corresponding to the image and a convolution kernel to obtain an image characteristic value, the convolution kernel refers to a weight matrix corresponding to an interface image characteristic, the pooling refers to a process of integrating image characteristic values obtained by convolution to obtain a new characteristic value, and the full-connection can be regarded as a special convolution processing, and obtaining a one-dimensional vector corresponding to the image as a result of the special convolution processing, and further obtaining the current output value, wherein the current output value comprises the image, the vector, a judgment result, a characteristic value and the like.

Step S14, comparing the current output value with a preset current real value to obtain a current error value;

in this embodiment, it should be noted that each of the current input values corresponds to a current true value, and the true value of the time point before the point is the theoretical output value of the model.

And comparing the current output value with a preset current real value to obtain a current error value, wherein for example, if the current output value is X and the preset current real value is Y, the difference value between the training output value and the real output value is X-Y and the current error value is (X-Y)/X.

Step S15, based on the current weight value and the current error value, a bias derivative is calculated for a preset loss function, and vector information corresponding to both the current weight value and the current error value is obtained.

In this embodiment, it should be noted that the preset loss function refers to a quadratic function with respect to the model weight and the model error.

Based on the current weight value and the current error value, calculating a partial derivative of a preset loss function to obtain vector information corresponding to both the current weight value and the current error value, specifically, calculating a partial derivative of the preset loss function with respect to the model weight and the model error, where the current weight value and the current error value are specific point values in the preset loss function, and further obtain a partial derivative corresponding to the specific point values, and further obtain vector information corresponding to both the current weight value and the current error value, for example, if the preset loss function is f (x, y), the model weight is x, and the model error is y, then the gradient vector, that is, the partial derivative is y

If the current weight value is 0.5 and the current error value is 0.1, the vector information is the ladder when x is 0.5 and y is 0.1The degree vector value.

Step S20, training the longitudinal federal model of the preset longitudinal federal service side based on the vector information, and updating the network weight of each preset reinforcement learning model;

in this embodiment, it should be noted that the vector information includes a gradient vector.

Training the longitudinal federal model of the preset longitudinal federal service party based on the vector information, updating the network weight of each preset reinforcement learning model, specifically, training the longitudinal federal model of the preset longitudinal federal service party based on the vector information to obtain sample information, training each preset reinforcement learning model based on the sample information, and updating the network weight of each preset reinforcement learning model.

And step S30, inputting each updated preset reinforcement learning model into a preset horizontal federal server periodically, and performing iterative updating on each updated preset reinforcement learning model.

In this embodiment, it should be noted that the preset horizontal federal server is a preset server, and may combine different preset reinforcement learning models to perform horizontal federal learning, where the horizontal federal learning is to extract a part of data with the same data characteristics of participants but not identical users to perform joint machine learning when the data characteristics of the participants overlap more and the users overlap less. For example, if two banks in different regions are provided in a participant, the user groups of the two banks are respectively from the regions where the two banks are located, the intersection of the two banks is small, but the businesses of the two banks are very similar, and the recorded user data features are mostly the same, the two banks can be helped to construct a joint model to predict the customer behavior of the two banks by using horizontal federal learning, in addition, all information interaction in the embodiment can be selected to be encrypted, and whether the encryption is performed or not can be selected by the user.

And inputting each updated preset reinforcement learning model into a preset horizontal federated server at regular intervals, and performing iterative updating on each updated preset reinforcement learning model. Specifically, the updated model parameters of each preset reinforcement learning model are periodically input into a preset horizontal federal server, the model parameters are fused to obtain global model parameters, the model parameters include gradient information, weight information and the like, the global model parameters are further distributed to each preset reinforcement learning model, each preset reinforcement learning model uses the received global model parameters as a starting point of local model training or as the latest model parameters of the local model to start training or continue training the preset reinforcement learning model, as shown in fig. 2, the reinforcement learning Agent1 and the reinforcement learning Agent2 are different reinforcement learning models, the data is stored as a data storage library for storing sample information, and the data source is used for receiving sensor data sent by each preset reinforcement learning model, the controller is used for realizing the operation corresponding to the control information

In the embodiment, available public information is obtained, the available public information is input into a preset longitudinal federal service side to obtain vector information, then a longitudinal federal model of the preset longitudinal federal service side is trained based on the vector information, the network weight of each preset reinforcement learning model is updated, furthermore, each updated preset reinforcement learning model is input into a preset transverse federal server periodically, and each updated preset reinforcement learning model is updated in an iterative manner. That is, in this embodiment, first, available public information is obtained, and then, the available public information is input to a preset longitudinal federated server to obtain vector information, further, based on the vector information, training of the longitudinal federated model is performed to update the network weight of each preset reinforcement learning model, and finally, each updated preset reinforcement learning model is periodically input to a preset transverse federated server to perform iterative update on each updated preset reinforcement learning model. That is, in the present embodiment, the available public information is input into the preset longitudinal federated model, and the longitudinal federated learning is performed on the preset longitudinal federated model, so that the preset reinforcement learning models are updated, in the present embodiment, the training data for model training is more comprehensive and wide, so that the control performance of the model is improved, the model is more robust, further, the updated preset reinforcement learning models are periodically input into the preset horizontal federated server, the horizontal federated learning is performed on the preset reinforcement learning models, and the updated preset reinforcement learning models are iteratively updated, further, the control performance and robustness of the model are improved, the effective training data of the preset reinforcement learning models are increased, further, the training process with low training effect is reduced, and further, the consumption of the resources of the computing system of the single preset reinforcement learning model is reduced, so that the technical problem of high consumption of the resources of the computing system of the reinforcement learning model in the prior art is solved.

Further, referring to fig. 3, in another embodiment of the method for federated across federations and federated lengthwise federations based on the first embodiment of the present application, the step of training each of the predetermined reinforcement learning models to update each of the predetermined reinforcement learning models based on each of the vector information includes:

step S21, receiving sensor data sent by each preset reinforcement learning model, and generating control information through the longitudinal federal model based on the sensor data and the vector information;

in this embodiment, it should be noted that, based on the control information, a preset reinforcement learning model may be controlled by a preset controller, for example, if the longitudinal federal model is an unmanned vehicle, the traveling speed and the traveling direction of the unmanned vehicle may be controlled by the control information.

Receiving sensor data sent by each preset reinforcement learning model, generating control information through the longitudinal federal model based on the sensor data and the vector information, specifically, acquiring the sensor data from a local data source corresponding to the preset reinforcement learning model, and sending the sensor data to a preset public federal party, wherein the sensor data includes distance sensor data, pressure sensor data, speed sensor data and the like, that is, the sensor data indicates state information of the current time step of the longitudinal federal model, and further generating the control information through the longitudinal federal model based on the sensor data and the vector information, wherein the direction of a gradient vector corresponding to the vector information is the direction in which the longitudinal federal model needs to be trained, so that the longitudinal federal model trains the state information of the next time step, the control information may control the longitudinal federated model to train towards next time step state information.

Step S22, training the longitudinal federal model under the training environment corresponding to the control information to obtain reward information and state information of the next time step;

in this embodiment, it should be noted that the reward information is obtained by calculation through a preset reward function, the reward function is used for adding a non-linear factor to the longitudinal federal model, the next time step state information is the model state information of the longitudinal federal model after the network weight of the longitudinal federal model is updated after the longitudinal federal model is trained, and before the longitudinal federal model is updated, that is, before the next time step state information is obtained, it is determined whether the update is favorable for reducing a model error, if the model error can be reduced, the update is performed, and if the model error cannot be reduced, the update is not performed.

In a training environment corresponding to the control information, training the longitudinal federal model to obtain reward information and state information of a next time step, and specifically, in the training environment corresponding to the control information, training the longitudinal federal model to obtain reward information and network weight of each neuron of a neural network in the longitudinal federal model, that is, to obtain reward information and state information of the next time step, wherein the neuron includes a convolutional layer, a pooling layer, a full connection layer, and the like.

Step S23, storing the reward information, the next time step status information, and the control information as sample information, and updating the network weight of each of the preset reinforcement learning models based on the sample information.

In this embodiment, the reward information, the next time step state information, and the control information are stored as sample information, and the network weight of each of the preset reinforcement learning models is updated based on the sample information, specifically, the reward information, the next time step state information, and the control information are combined into sample information and stored in a data repository corresponding to each of the preset reinforcement learning models, and then each of the preset reinforcement learning models can extract the sample information from the data repository corresponding to each of the preset reinforcement learning models for training, and update the network weight of each of the preset reinforcement learning models according to a training result.

In step S23, the step of updating the network weight of each of the preset reinforcement learning models based on the sample information includes:

step S231, inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;

in this embodiment, the sample information is input to the preset reinforcement learning model as training data to train the preset reinforcement learning model to obtain a training output value, and specifically, the sample information is input to the preset reinforcement learning model as training data to perform data processing on the training data, where the data processing includes convolution, pooling, full connection, and the like, so as to obtain a training output value, where the training output value includes an image, a vector, a numerical value, and the like.

Step S232, comparing the training output value with a real output value corresponding to the training data to obtain a model error value;

in this embodiment, the training output value is compared with a real output value corresponding to the training data to obtain a model error value, specifically, for example, if the training output value is X and the real output value is Y, a difference between the training output value and the real output value is X-Y, and the current error value is (X-Y)/X.

Step S233, comparing the model error value with a preset error threshold, and if the model error value is smaller than the preset error threshold, completing training of the preset reinforcement learning model;

in this embodiment, it should be noted that the condition that the model error value is smaller than the preset error threshold is one of optional training completion conditions for completing the training of the preset reinforcement learning model, where the training completion conditions further include loss function convergence, model parameter convergence, maximum iteration number reaching, maximum training time reaching, and the like, and the model parameter includes the model error value.

In step S234, if the model error value is greater than or equal to the preset error threshold, the network weight of the preset reinforcement learning model is updated based on the model error value, and the preset reinforcement learning model is retrained.

In this embodiment, it should be noted that the network weight is a convolution kernel or a weight matrix.

If the model error value is greater than or equal to the preset error threshold value, updating the network weight of the preset reinforcement learning model based on the model error value, and retraining the preset reinforcement learning model, specifically, if the model error value is greater than or equal to the preset error threshold value, obtaining a corresponding gradient vector value based on the model error value, updating the network weight of the preset reinforcement learning model based on the gradient vector value, and retraining the preset reinforcement learning model until a preset training completion condition is reached.

In this embodiment, the control information is generated by the longitudinal federal model by receiving the sensor data sent by each preset reinforcement learning model, based on the sensor data and the vector information, and then the longitudinal federal model is trained in the training environment corresponding to the control information to obtain reward information and next time step status information, and further, the reward information, the next time step status information and the control information are stored as sample information, and the network weight of each preset reinforcement learning model is updated based on the sample information. That is, in this embodiment, first, sensor data is obtained, then, control information is generated through the longitudinal federated model based on the sensor data and the vector information, further, in a training environment corresponding to the control information, training of the longitudinal federated model is performed, reward information and next time step state information are obtained, and finally, the reward information, the next time step state information and the control information are stored, so as to obtain sample information, so that, based on the sample information, each of the preset reinforcement learning models is updated. That is, in the embodiment, the available common information corresponding to each of the preset reinforcement learning models is converted into sample information, so that the purpose of training and updating each of the preset reinforcement learning models by combining data of a plurality of the preset reinforcement learning models is achieved, the control performance and the robustness of each of the preset reinforcement learning models are greatly enhanced, the model training time and the training amount of a single preset reinforcement learning model are reduced, and the resource consumption of a computing system of the single preset reinforcement learning model is reduced.

Further, referring to fig. 4, in another embodiment of the horizontal federal and vertical federal combined method based on the first embodiment and the second embodiment of the present application, the step of periodically inputting each updated preset reinforcement learning model into a preset horizontal federal server, and the step of iteratively updating each updated preset reinforcement learning model includes:

step S31, regularly inputting each updated preset reinforcement learning model into the preset horizontal federal server, so as to perform horizontal federal on each updated preset reinforcement learning model based on preset federal rules, and obtain a horizontal federal model;

in this embodiment, it should be noted that the preset horizontal federal server is a preset server that can be used for horizontal federal learning, and the regular period may be set by a user, for example, if the regular period is set to 10 minutes, the updated preset reinforced learning models are sent to the preset horizontal federal server every 10 minutes.

The updated preset reinforcement learning models are periodically input into the preset transverse federal server to perform transverse federal on the updated preset reinforcement learning models based on preset federal rules to obtain transverse federal models, specifically, the updated preset reinforcement learning models are periodically input into the preset transverse federal server to send model parameters of the preset reinforcement learning models to the transverse federal server to fuse the model parameters to obtain global model parameters, and the preset reinforcement learning models are updated based on the global model parameters to obtain transverse federal models.

Wherein each of the updated preset reinforcement learning models comprises updated model parameters,

step S311, regularly inputting each updated model parameter into the preset horizontal federated server to fuse each updated model parameter to obtain a global model parameter;

in this embodiment, each update model parameter is periodically input to the preset horizontal federal server to fuse each update model parameter, so as to obtain a global model parameter, specifically, each update model parameter is input to the preset horizontal federal server to perform data processing of a preset rule on each update model parameter, where the data processing of the preset rule includes averaging, weighted averaging, and the like, so as to obtain the global model parameter, and a weight ratio corresponding to each update model parameter participating in weighted averaging is set by a user.

Step S312, distributing the global model parameters to each updated preset reinforcement learning model, so as to train the updated preset reinforcement learning model based on the global model parameters, and obtain the horizontal federal model.

In this embodiment, the global model parameters are distributed to each updated preset reinforcement learning model, so that the updated preset reinforcement learning model is trained based on the global model parameters to obtain the horizontal federated model, specifically, the global model parameters are distributed to each updated preset reinforcement learning model, so that the global model parameters are used as a model training starting point of each preset reinforcement learning model or directly replace local model parameters of each preset reinforcement learning model, and then the updated preset reinforcement learning model is trained to obtain the horizontal federated model.

And step S32, performing iterative updating on each updated preset reinforcement learning model based on the transverse federal model.

In this embodiment, each updated preset reinforcement learning model is iteratively updated based on the horizontal federal model, specifically, based on global model parameters in the horizontal federal model, the global model parameters are used as model training starting points of each preset reinforcement learning model or directly replace local model parameters of each preset reinforcement learning model, so as to train the updated preset reinforcement learning model, and determine whether the trained preset reinforcement learning model reaches training completion conditions, if the training completion conditions are reached, the training of the preset reinforcement learning model is completed, if the training completion conditions are not reached, the network weights of the preset reinforcement learning model are updated, the preset reinforcement learning model is retrained, and the training completion conditions are known to be reached, the training completion conditions comprise loss function convergence, model parameter convergence, maximum iteration times, maximum training time and the like.

In this embodiment, each updated preset reinforcement learning model is periodically input to the preset horizontal federal server, so that each updated preset reinforcement learning model is subjected to horizontal federal based on preset federal rules to obtain a horizontal federal model, and then each updated preset reinforcement learning model is subjected to iterative update based on the horizontal federal model. That is, in this embodiment, each updated preset reinforcement learning model is periodically input to the preset horizontal federal server, so as to perform horizontal federation on each updated preset reinforcement learning model based on preset federal rules to obtain a horizontal federal model, and then perform iterative update on each updated preset reinforcement learning model based on the horizontal federal model. That is, the implementation provides a method for performing horizontal federation, in which updated preset reinforcement learning models are periodically input to a preset horizontal federation server, the updated preset reinforcement learning models are combined to learn, a horizontal federation model corresponding to each updated preset reinforcement learning model is obtained, and then, based on the horizontal federation model, each updated preset reinforcement learning model is iteratively updated, so that the control performance and robustness of the model are further improved, the model training time and training amount of a single preset reinforcement learning model are reduced, and further, the resource consumption of a computing system of the single preset reinforcement learning model is reduced, and therefore, a foundation is laid for solving the technical problems of poor control performance and low robustness of the reinforcement learning model in the prior art.

Referring to fig. 5, fig. 5 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 5, the horizontal federal and vertical federal combined facility may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.

Optionally, the horizontal and vertical federal combined devices may further include a rectangular user interface, a network interface, a camera, RF (Radio Frequency) circuits, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

It will be understood by those skilled in the art that the configuration of the horizontal and vertical federal combined plant illustrated in fig. 5 is not intended to be limiting of the horizontal and vertical federal combined plant and may include more or fewer components than those illustrated, or some components in combination, or a different arrangement of components.

As shown in fig. 5, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and horizontal and vertical federal union programs. The operating system is a program that manages and controls the hardware and software resources of the horizontal and vertical federal combined device, and supports the operation of the horizontal and vertical federal combined device and other software and/or programs. The network communication module is used to implement communication between the components within the memory 1005, as well as communication with other hardware and software in the federated system across and across federations.

In the horizontal federal and vertical federal combined facility shown in fig. 5, the processor 1001 is configured to execute a horizontal federal and vertical federal combined program stored in the memory 1005, and implement the steps of any one of the horizontal federal and vertical federal combined methods described above.

The specific implementation of the horizontal federal and vertical federal combined device of the present invention is basically the same as the embodiments of the horizontal federal and vertical federal combined method, and is not described herein again.

The invention also provides a lateral federal and longitudinal federal combined device, which comprises:

Optionally, the first updating module includes:

Optionally, the first updating unit includes:

Optionally, the second updating module includes:

Optionally, the periodic transmitting unit includes:

Optionally, the input module comprises:

The specific implementation of the horizontal federal and vertical federal combined device of the present invention is basically the same as the above-mentioned embodiments of the horizontal federal and vertical federal combined method, and is not described herein again.

The present invention provides a medium which is a computer readable storage medium storing one or more programs which are further executable by one or more processors for implementing the steps of any of the above-described horizontal and vertical federal federation methods.

The specific implementation of the medium of the present invention is basically the same as the embodiments of the above-mentioned horizontal federal and vertical federal combined method, and is not described herein again.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of horizontal federation and vertical federation combination, the method comprising:

2. The method according to claim 1, wherein the step of training the longitudinal federal model of the preset longitudinal federal service provider to update the network weight of each preset reinforcement learning model based on the vector information comprises:

3. The method of claim 2, wherein the step of updating the network weights of the pre-set reinforcement learning models based on the sample information comprises:

4. The method of claim 1, wherein the step of periodically inputting each updated pre-set reinforcement learning model to a pre-set horizontal federated server, and the step of iteratively updating each updated pre-set reinforcement learning model comprises:

5. The method of claim 4, wherein each of the updated pre-defined reinforcement learning models comprises updated model parameters,

6. The method of claim 1, wherein the pre-set longitudinal federal service policy includes a longitudinal federal model, the longitudinal federal model including current weight values,

7. The method of claim 1, wherein the step of obtaining available public information comprises:

8. A reinforcement learning apparatus for federal learning, which is applied to a reinforcement learning device for federal learning, the reinforcement learning apparatus for federal learning comprising:

9. A federally learned reinforcement learning apparatus, comprising: a memory, a processor, and a program stored on the memory for implementing the reinforcement learning method for federal learning,

the memory is used for storing a program of a reinforced learning method realized in federal learning;

the processor is configured to execute a program for implementing the reinforcement learning method for federal learning so as to implement the steps of the reinforcement learning method for federal learning of any one of claims 1 to 7.

10. A medium having stored thereon a program for a method of reinforcement learning for federal learning, the program for a method of reinforcement learning for federal learning being executed by a processor to perform the steps of a method of reinforcement learning for federal learning as claimed in any one of claims 1 to 7.