WO2023024378A1 - 多智能体模型的训练方法、装置、电子设备、存储介质及程序产品 - Google Patents

多智能体模型的训练方法、装置、电子设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023024378A1
WO2023024378A1 PCT/CN2021/142157 CN2021142157W WO2023024378A1 WO 2023024378 A1 WO2023024378 A1 WO 2023024378A1 CN 2021142157 W CN2021142157 W CN 2021142157W WO 2023024378 A1 WO2023024378 A1 WO 2023024378A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter value
parameter
unpredictable
parameters
agent model
Prior art date
Application number
PCT/CN2021/142157
Other languages
English (en)
French (fr)
Inventor
何元钦
康焱
刘洋
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023024378A1 publication Critical patent/WO2023024378A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a multi-agent model training method, device, electronic equipment, computer readable storage medium and computer program product.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Horizontal federated learning in related technologies usually trains machine learning models by different parties and a collaborative party. Its goal is to use the limited data of all parties to jointly train a global model under the premise of ensuring data security. Because the global model uses the data of each participant for training, the effect of the model can approach the situation where the data of each participant is trained together, which is significantly better than the effect of the model obtained by each participant only based on its own data.
  • the use of multi-agent models is very different from traditional machine learning, and it is impossible to apply federated learning to solve multi-agent model verification according to the traditional federated machine learning model training method.
  • Embodiments of the present application provide a multi-agent model training method, device, electronic device, computer-readable storage medium, and computer program product, which can improve model prediction accuracy while ensuring local data security.
  • An embodiment of the present application provides a multi-agent model training method, based on a federated learning system, the system includes a collaborator device and at least two participant devices, and the method is executed by the participant device, including:
  • the participant device inputs the training parameter values of the predictable parameters into the local multi-agent model, and under the condition of fixing the training parameter values, respectively inputs multiple parameter value groups into the multi-agent model for prediction, Get multiple prediction results;
  • the set of parameter values includes at least one parameter value of an unpredictable parameter
  • the parameter values of each of the unpredictable parameters are aggregated to obtain intermediate parameter values corresponding to each of the unpredictable parameters;
  • the embodiment of the present application also provides a multi-agent model training device, the device comprising:
  • the acquisition module is configured to input the training parameter values of the predictable parameters to the local multi-agent model by the participant equipment, and input multiple parameter value groups into the multi-agent model respectively under the condition of fixing the training parameter values
  • the volume model is predicted to obtain multiple prediction results; wherein, the parameter value group includes at least one parameter value of an unpredictable parameter;
  • a comparison module configured to determine an impact factor for each of the parameter value groups based on the plurality of prediction results and actual results corresponding to each of the prediction results;
  • An aggregation module configured to aggregate the parameter values of each of the unpredictable parameters based on each of the parameter value groups and the corresponding impact factors, to obtain intermediate parameter values corresponding to each of the unpredictable parameters;
  • a sending module configured to send the obtained intermediate parameter value to a cooperating device, where the intermediate parameter value is used to trigger the cooperating device to aggregate the intermediate parameter values sent by multiple participant devices , to obtain the target parameter value corresponding to each of the unpredictable parameters;
  • the update module is configured to receive target parameter values corresponding to each of the unpredictable parameters returned by the coordinating device, and update the multi-agent model based on the target parameter values.
  • An embodiment of the present application provides an electronic device, including:
  • the processor is configured to implement the multi-agent model training method provided in the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiment of the present application provides a computer-readable storage medium, which stores executable instructions, and is used to cause a processor to execute the method to implement the multi-agent model training method provided in the embodiment of the present application.
  • An embodiment of the present application provides a computer program product, including a computer program, and when the computer program is executed by a processor, the multi-agent model training method provided in the embodiment of the present application is implemented.
  • the training method, device, electronic equipment, and computer-based multi-agent model based on the horizontal federated learning architecture can Read storage media and computer program products, obtain intermediate parameter values through local aggregation of unpredictable parameters by the participating parties and send them to the collaborating party, and based on the target parameter value obtained by secondary aggregation of the received intermediate parameter values by the collaborating party , to update the multi-agent model.
  • the agent model ensures the security of local data, solves the problem of data islands in the field of multi-agent models, and realizes joint modeling among multiple parties, thereby improving the accuracy of model prediction.
  • Fig. 1 is a schematic diagram of the implementation scene of the training method of the multi-agent model provided by the embodiment of the present application;
  • Fig. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 3 is a comparison diagram of the verification process of the multi-agent model provided by the embodiment of the present application and the training process of the machine learning model;
  • Fig. 4 is a schematic flow chart of the training method of the multi-agent model provided by the embodiment of the present application.
  • Fig. 5 is an optional flowchart of the training method of the multi-agent model provided by the embodiment of the present application.
  • Fig. 6A is an optional schematic diagram of aggregation of unpredictable parameters of a multi-agent model provided by an embodiment of the present application
  • Fig. 6B is an optional schematic diagram of aggregation of unpredictable parameters of a multi-agent model provided by the embodiment of the present application;
  • Fig. 7A is an optional flowchart of the multi-agent model training method provided by the embodiment of the present application.
  • FIG. 7B is an optional flowchart of the multi-agent model training method provided by the embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a prediction method for a multi-agent model provided in an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a training method for a multi-agent model provided in an embodiment of the present application.
  • Fig. 10 is a horizontal federated learning method of a multi-agent model provided by the embodiment of the present application.
  • Fig. 11 is an optional schematic diagram of aggregation of unpredictable parameters of a multi-agent model provided by an embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of a training device for a multi-agent model provided in an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a prediction device for a multi-agent model provided by an embodiment of the present application.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific order for objects. Understandably, “first ⁇ second ⁇ third” is used in Where permitted, the specific order or sequence may be interchanged such that the embodiments of the application described herein can be practiced in other sequences than illustrated or described herein.
  • Federated learning refers to the method of machine learning by uniting different participants (participants, or parties, also known as data owners, or clients). In federated learning, participants do not need to expose their own data to other participants and coordinators (coordinator, also known as parameter server (parameter server), or aggregation server (aggregation server)), so federated learning can be very good Protect user privacy and ensure data security.
  • horizontal federated learning is to take out the part of the data with the same data characteristics of the participants but not the same users for joint machine learning when the data characteristics of each participant overlap more, but the users overlap less. For example, there are two banks in different regions, and their user groups come from their respective regions, and the mutual intersection is very small. But their businesses are very similar, and most of the recorded user data features are the same. Horizontal federated learning can be used to help two banks build a joint model to predict their customer behavior.
  • the simulation method of the multi-agent model is a calculation used to simulate the actions and interactions of agents (independent individuals or common groups, such as organizations and teams) Model.
  • the multi-agent model is a microscopic model that reproduces and predicts complex phenomena by simulating the simultaneous actions and interactions of multiple agents. This process is the emergence from a low (micro) level to a high (macro) level.
  • ABS urban traffic conditions and disease transmission can be simulated.
  • ABS can be used to simulate the spread of new crown virus to help predict the development of the new crown virus epidemic. And analyze the suppression effect of different intervention methods on the epidemic.
  • Homomorphic Encryption is a symmetric encryption algorithm.
  • the purpose of homomorphic encryption is to find an encryption algorithm that can perform addition and multiplication operations on the ciphertext, so that the encrypted The result obtained by performing a certain operation on the ciphertext is exactly equal to the ciphertext obtained by performing the expected operation on the plaintext before encryption and then encrypting it.
  • Homomorphic encryption effectively ensures that the data processor can directly process the ciphertext of the data, but cannot know the plaintext information of the data it processes. This characteristic of homomorphic encryption enables users' data and privacy to be guaranteed corresponding security. Therefore, homomorphic encryption is applied in many real-world scenarios to ensure data security.
  • an encryption function satisfies additive homomorphism and multiplicative homomorphism at the same time, it is called fully homomorphic encryption.
  • Various encrypted operations addition, subtraction, multiplication, division, polynomial evaluation, exponent, logarithm, trigonometric function, etc. can be completed by using this encryption function.
  • a simulated ABS model of a well-built multi-agent model can be applied to different regions, and only needs to adjust its predictable parameters (such as the age of the population, sex ratio, etc.) according to the corresponding situation in the target region, and then verify Given the values of the unpredictable parameters, the model can be used to predict and analyze the subsequent development of the outbreak in the target area.
  • the larger the area involved in the simulation the more agents used to build the model, the better the effect of the model, and the more accurately it can reflect the real situation of the system.
  • the embodiment of the present application provides a multi-agent model training method, device, electronic equipment, computer-readable storage medium, and computer program product, so that multi-participant equipment can jointly train a multi-agent model under the coordination of the coordinating equipment.
  • Agent model and to ensure the security of local data, to solve the problem of data islands in the field of multi-agent model.
  • the implementation scenario of the training method of the multi-agent model provided by the embodiment of the present application is described below, see Figure 1, which is the multi-agent model provided by the embodiment of the present application Schematic diagram of the implementation scenario of the training method of the model.
  • the participant devices 200-1, 200-2, ..., 200-n are connected to the collaborator device 400 through the network 300, wherein the participant device 200- 1, 200-2, ..., 200-n may be institutions that store predictable parameters, unpredictable parameters, and real values of predicted targets, such as hospitals, and the collaborating party device 400 may be a credible institution.
  • the devices 200-1, 200-2, ..., 200-n and the collaborating party's device 400 assist each other in federated learning so that the participating devices 200-1, 200-2, ..., 200-n can obtain a multi-agent model
  • the network 300 may be a wide area network or a local area network, or a combination of the two, using wireless or wired links for data transmission.
  • Participant devices are used to input the training parameter values of predictable parameters into the local multi-agent model, and in the case of fixing the training parameter values
  • multiple parameter value groups are respectively input to the multi-agent model for prediction, and multiple prediction results are obtained; wherein, the parameter value group includes at least one parameter value of an unpredictable parameter; based on multiple prediction results and the actual corresponding prediction results
  • the influence factor of each parameter value group is determined; based on each parameter value group and the corresponding influence factor, the parameter values of each unpredictable parameter are aggregated to obtain the intermediate parameter value corresponding to each unpredictable parameter; the obtained intermediate parameter The value is sent to the partner device.
  • the coordinating party device (including the coordinating party device 400 ) is configured to aggregate the intermediate parameter values sent by multiple participant devices to obtain target parameter values corresponding to each unpredictable parameter; and send the target parameter value to the participant device.
  • the participant devices are also used to receive the target parameter values corresponding to the unpredictable parameters returned by the coordinating device, and compare the multi- The agent model is updated.
  • the trained multi-agent model can be applied to the modeling of the new crown epidemic that has recently spread around the world, realizing joint modeling among multiple cities, regions, and countries, improving the prediction accuracy of the model, and serving the public and Policymakers provide more accurate data.
  • the participant devices 200-1, 200-2, ..., 200-n and the coordinating party device 400 may be independent physical servers, or server clusters or distributed systems composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Deliver Network, CDN), and big data and Cloud servers for basic cloud computing services such as artificial intelligence platforms.
  • the participant devices 200-1, 200-2, ..., 200-n and the collaborator device 400 can also be smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, etc., but are not limited thereto .
  • the participant devices 200-1, 200-2, . . . , 200-n and the cooperating device 400 may be connected directly or indirectly through wired or wireless communication, which is not limited in this application.
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 200 shown in FIG. 2 includes: at least one processor 210 , a memory 250 , at least one network interface 220 and a user interface 230 .
  • Various components in the electronic device 200 are coupled together through the bus system 240 .
  • the bus system 240 is used to realize connection and communication between these components.
  • the bus system 240 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 240 in FIG.
  • Processor 210 can be a kind of integrated circuit chip, has signal processing capability, such as general-purpose processor, digital signal processor (Digital Signal Processor, DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware Components, etc., wherein the general-purpose processor can be a microprocessor or any conventional processor, etc.
  • DSP Digital Signal Processor
  • User interface 230 includes one or more output devices 231 that enable presentation of media content, including one or more speakers and/or one or more visual displays.
  • the user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
  • Memory 250 may be removable, non-removable or a combination thereof.
  • Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
  • Memory 250 optionally includes one or more storage devices located physically remote from processor 210 .
  • Memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read Only Memory, ROM), and the volatile memory can be random access memory (Random Access Memory, RAM).
  • ROM Read Only Memory
  • RAM Random Access Memory
  • memory 250 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
  • Operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • Network communication module 252 for reaching other computing devices via one or more (wired or wireless) network interfaces 220
  • exemplary network interfaces 220 include: Bluetooth, Wireless Fidelity (Wireless Fidelity, WiFi), and Universal Serial Bus Universal Serial Bus (USB), etc.;
  • the input processing module 253 is configured to detect one or more user inputs or interactions from one or more of the input devices 232 and translate the detected inputs or interactions.
  • the training device of the multi-agent model provided by the embodiment of the present application can be realized by software
  • Fig. 2 shows a training device 254 of the multi-agent model stored in the memory 250, which can be a program and a plug-in and other forms of software, including the following software modules: acquisition module 2541, comparison module 2542, aggregation module 2543, sending module 2544, and update module 2545, these modules are logical, so any combination or combination can be performed according to the realized functions Further splitting, the functions of each module will be explained below.
  • the multi-agent model training device provided by the embodiment of the present application can be realized by combining software and hardware.
  • the multi-agent model training device provided by the embodiment of the present application can be implemented by using hardware
  • a processor in the form of a code processor which is programmed to execute the multi-agent model training method provided by the embodiment of the present application
  • a processor in the form of a hardware decoding processor can use one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), DSP, Programmable Logic Device (Programmable Logic Device, PLD), Complex Programmable Logic Device (Complex Programmable Logic Device, CPLD), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other electronic components.
  • the process of obtaining an updated multi-agent model specifically includes building an initial multi-agent model (construction model), verifying the multi-agent model (verification process) and testing Multi-agent model (test procedure).
  • the construction of the initial multi-agent model refers to the initialization of the model parameters, the preset loss function (for updating the multi-agent model), etc.; Prediction parameters; the testing process refers to testing the correctness of the multi-agent model by modifying the output results of the model.
  • the process of obtaining a converged machine learning model specifically includes building an initial machine learning model, training a machine learning model, and testing a machine learning model. iterative update.
  • the verification process of the multi-agent model on real data is similar to the training process in machine learning, that is, to optimize the values of unpredictable parameters so that the results predicted by the model are as close as possible to the real data.
  • Fig. 4 is a schematic flow chart of the training method of the multi-agent model provided by the embodiment of the present application
  • the training method of the multi-agent model provided by the embodiment of the present application includes:
  • Step 101 the participant device inputs the training parameter values of the predictable parameters into the local multi-agent model, and in the case of fixing the training parameter values, respectively inputs multiple parameter value groups into the multi-agent model for prediction, and obtains multi-agent prediction results; wherein, the parameter value group includes at least one parameter value of an unpredictable parameter.
  • the value of the predictable parameter here is determined according to the local conditions of each party. For example, it can be the age, occupation, gender, and daily travel trajectory of local residents, or the gender, Age, occupation, the number of infected people, and the action trajectory of the target disease infected person; here, the training parameter value of the predictable parameter is based on the difference in the training purpose of the local multi-agent model, and the different predictable parameters obtained, namely In the process of training and optimizing a multi-agent model, the value of predictable parameters is fixed. As an example, if the multi-agent model is used to predict the number of local disease deaths, the total number of local residents, residents Gender, age, etc.
  • the fixed predictable parameter at this time can be the number of contacts between healthy users and sick users; correspondingly, it can be determined by changing the predictable parameter, that is, the number of contacts between healthy users and sick users New disease transmission probability.
  • the parameter value group includes at least one parameter value of an unpredictable parameter.
  • the value of the unpredictable parameter cannot be deduced from existing data or experience, and the predicted value obtained by bringing the unpredictable parameter into the model is required. It is obtained by comparing with the corresponding real value, that is, by adjusting the value of the unpredictable parameter, the model result is consistent with the actual prediction target, the optimal value is determined, and the accuracy of the simulation result is verified on the test data, that is to say , select the appropriate value of the unpredictable parameter, so that the simulation results of the model conform to the real data (distribution) as much as possible.
  • FIG. 5 is a training method for the multi-agent model provided by the embodiment of the present application.
  • An optional schematic flow chart of FIG. 4, step 101 can also be implemented in the following manner:
  • Step 1011 acquire the number of unpredictable parameters, and determine the number of parameter value groups based on the number of unpredictable parameters.
  • the number of unpredictable parameters that need to be optimized is determined, so that the number of parameter value groups is determined based on the number of unpredictable parameters.
  • the number of parameter value groups may be n+1.
  • Step 1012 based on the number of parameter value groups, determine the parameter values of the unpredictable parameters in each parameter value group.
  • parameter values of unpredictable parameters corresponding to the number of parameter value groups are selected.
  • the number of parameter value groups is n+1, select n+1 parameter values as the unpredictable parameters in each parameter value group.
  • the parameter value group is The four groups are A, B, C and D, where the parameter values of the unpredictable parameters include A(a 1 , b 1 , c 1 , d 1 ), B(a 2 , b 2 , c 2 , d 2 ), C (a 3 , b 3 , c 3 , d 3 ) and D (a 4 , b 4 , c 4 , d 4 ).
  • selecting the parameter value of the unpredictable parameter includes obtaining the parameter type of each unpredictable parameter in the parameter value group, and then determining the corresponding parameter value range according to the parameter type corresponding to each unpredictable parameter, and then according to each unpredictable parameter
  • the parameter value range of the parameter determines the parameter value of each unpredictable parameter.
  • the unpredictable parameter can be the transmission coefficient of the disease, or it can be the influence of weather, age, gender, etc. on the transmission of the disease.
  • determine the The value range of the unpredictable parameter is 0-K, and then the parameter value of the unpredictable parameter is randomly selected from the range of 0-K.
  • a is an unpredictable parameter to be optimized with a value range of 0-K, then a 1 , a 2 , a 3 and a 4 are all parameter values between (0, K).
  • Step 1013 respectively input the parameter values of the unpredictable parameters in each parameter value group to the multi-agent model for prediction, and obtain multiple prediction results corresponding to multiple parameter value groups.
  • A(a 1 , b 1 , c 1 , d 1 ), B(a 2 , b 2 , c 2 , d 2 ), C(a 3 , b 3 , c 3 , d 3 ) and D(a 4 , b 4 , c 4 , d 4 ) are respectively input to the multi-agent model for prediction, and the prediction results corresponding to group A, the prediction results corresponding to group B, the prediction results corresponding to group C and the prediction results corresponding to group D are obtained result.
  • Step 102 based on multiple prediction results and actual results corresponding to each prediction result, determine the impact factor of each parameter value group.
  • the impact factor can be used to characterize the degree of influence of unpredictable parameters in each parameter value group, that is, to characterize the degree of influence of each parameter value.
  • determining the impact factor of each parameter value group based on multiple prediction results and actual results corresponding to each prediction result includes determining each The prediction accuracy corresponding to the parameter value group; the prediction accuracy corresponding to each parameter value group is used as the corresponding impact factor.
  • the prediction accuracy may be the weight corresponding to each parameter value group.
  • determining the influence factor of each parameter value group based on multiple prediction results and actual results corresponding to each prediction result includes determining each The loss values corresponding to each parameter value group; based on the loss value corresponding to each parameter value group, determine the impact factor of the corresponding parameter value group.
  • the reciprocal of the loss value can be used as the influence factor of the corresponding parameter value group. The larger the loss value is, the smaller the reciprocal of the loss value is, the smaller the influence factor is, or the loss value can be used as the influence factor of the corresponding parameter value group.
  • Factor the greater the loss value, the greater the impact factor.
  • the embodiment of the present application does not limit the method of determining the impact factor of the corresponding parameter value group through the loss value.
  • Step 103 based on each parameter value group and the corresponding impact factor, the parameter values of each unpredictable parameter are aggregated to obtain the intermediate parameter value corresponding to each unpredictable parameter.
  • the weights corresponding to each parameter value group are multiplied by the parameter values of unpredictable parameters to obtain the product results corresponding to each parameter value group, and then The multiplication results corresponding to each parameter value group are accumulated to obtain the accumulation result, and finally the accumulation result is used as the intermediate parameter value of the unpredictable parameter.
  • the parameter groups here are A(a 1 , b 1 , c 1 , d 1 ), B(a 2 , b 2 , c 2 , d 2 ), C(a 3 , b 3 , c 3 , d 3 ) and D(a 4 , b 4 , c 4 , d 4 ), the corresponding weights are x, y, z and k, then the intermediate parameter value P of the unpredictable parameter is (a 1 *x+a 2 *y+a 3 *z+a 4 *k, b 1 *x+b 2 *y+b 3 *z+b 4 *k, c 1 *x+c 2 *y+c 3 *z+c 4 *k, d 1 *x+d 2 *y+d 3 *z+d 4 *k).
  • multiple parameter value groups are sorted based on the influence factors of each parameter value group to obtain the sorting result; based on Sorting results, selecting a parameter value group of a target quantity from a plurality of parameter value groups; wherein, the target quantity is less than the quantity of a plurality of parameter value groups; obtaining the average value of the parameter value of the unpredictable parameter in the parameter value group of the target quantity;
  • the mean value serves as an intermediate parameter value for unpredictable parameters.
  • the impact factor is the reciprocal of the loss value
  • sort multiple parameter value groups from large to small or small to large based on the size of the loss value, and then select the target from the sorted parameter value group A number of parameter value groups, where the target number is less than the number of parameter value groups.
  • the parameter groups here are A(a 1 , b 1 , c 1 , d 1 ), B(a 2 , b 2 , c 2 , d 2 ), C(a 3 , b 3 , c 3 , d 3 ) and D(a 4 , b 4 , c 4 , d 4 ), based on the size of the loss value, determine the optimal model parameter value group A, the worst model parameter value group D and other model parameter value groups B and C .
  • the process of aggregating the parameter values of the unpredictable parameters in the selected target number of parameter value groups includes obtaining the average value of the parameter values of the unpredictable parameters in the target number of parameter value groups, and then using the average value as the unpredictable parameter.
  • the process of using the average value as the intermediate parameter value of the unpredictable parameter is described, exemplarily, optimize n select n parameter groups from n+1 parameter groups, and average the parameter values of the corresponding unpredictable parameters in the n parameter groups to use as the intermediate parameter value of the parameter values of the unpredictable parameters.
  • the average value can also be used to update the multi-agent model, and then the average value and the selected target number
  • the parameter value groups are aggregated, that is, the parameter value groups of the target number are selected again, and the parameter values of the unpredictable parameters in the parameter value groups of the target number selected again are averaged, and then the above-mentioned updating of the multi-agent model is continued.
  • the process and the process of aggregation again are iterated, and the average value obtained by the last aggregation is used as the intermediate parameter value of the unpredictable parameter. In this way, each participant iteratively optimizes its unpredictable parameter preset rounds locally, and obtains its final average value, which is the intermediate parameter value.
  • FIG. 6B is an optional schematic diagram of the aggregation of unpredictable parameters of a multi-agent model provided by the embodiment of the present application.
  • the multiple parameter value groups can also be sorted based on the weight of each parameter value group, and the target can be selected from the multiple parameter value groups based on the sorting results.
  • multiple parameter value groups can also be sorted based on the loss value, based on the sorting results, from multiple Select the parameter value groups of the target number from the parameter value groups, wherein the target number is less than the number of multiple parameter value groups, and then multiply the weights corresponding to the selected parameter value groups with the parameter values of the unpredictable parameters, Obtain the multiplication result corresponding to each parameter value group, and then accumulate the multiplication result corresponding to each parameter value group to obtain the accumulation result, and finally use the accumulation result as the intermediate parameter value of the unpredictable parameter, the embodiment of the present application is based on each parameter value group And the corresponding impact factors, there is no limit to the way of aggregation of parameter values of unpredictable parameters.
  • Step 104 Send the obtained intermediate parameter value to the cooperating device, wherein the intermediate parameter value is used to trigger the cooperating device to aggregate the intermediate parameter values sent by multiple participating devices to obtain target parameters corresponding to each unpredictable parameter value.
  • privacy protection is performed on the intermediate parameter values of each unpredictable parameter after obtaining the intermediate parameter values, and the privacy-protected intermediate parameter values are obtained;
  • the privacy protection method can be fuzzy processing on the intermediate parameter values, for example, adding Noise, differential privacy processing, etc., what the coordinating device obtains is the parameter value obtained by at least two participant devices after performing privacy processing on the intermediate parameter value. When the parameter value is set, the noise in it will cancel each other out, without affecting the aggregation result of the intermediate parameter value.
  • the processing method of privacy protection can also be to perform homomorphic encryption on intermediate parameter values.
  • the coordinating party there are many ways for the coordinating party to aggregate the intermediate parameter values sent by multiple participant devices.
  • the center point uploaded by the participant is averaged, or the participant uploads the loss value of the optimal model parameter value group or the worst model parameter value group at the same time in addition to uploading the geometric center point, or other than the worst model parameter value group
  • the participants are sorted according to the loss value, and multiple better center points are selected for averaging to obtain a new center point.
  • the embodiment of the present application does not limit the process of the parameter aggregation operation performed by the coordinating party.
  • Step 105 receiving target parameter values corresponding to unpredictable parameters returned by the coordinating device, and updating the multi-agent model based on the target parameter values.
  • FIG. 7A is an optional flow chart of the multi-agent model training method provided by the embodiment of the present application.
  • the entire model training process is divided into two stages, and the first stage is local
  • the multi-agent model training until the model reaches the convergence condition, the intermediate parameter values at the time of convergence are uploaded to the collaborating party device (parameter aggregation device), where the intermediate parameter value is used to trigger the collaborating party device to perform the second stage parameter Aggregation operation, in order to adapt to preliminary modeling or rapid modeling scenarios, the parameter aggregation in the second stage can be performed only once, and the entire model will converge.
  • FIG. 7B is an optional flow chart of the multi-agent model training method provided by the embodiment of the present application.
  • the participants can also only conduct local multi-agent model training Parameter aggregation, that is, to upload each intermediate parameter value to the partner device, wherein the intermediate parameter value is used to trigger the second stage parameter aggregation operation of the partner device only once, and then return the aggregated target parameter value to each participant equipment for each participant’s equipment to update the local model, and then continue to simulate the local multi-agent model based on the updated model, and then upload the intermediate parameter values to the collaborating party’s equipment, and continue the above process until the local The multi-agent model converges.
  • the participant device updates the local multi-agent model based on the target parameter value, and then compares the target parameter value with the value selected before the model update.
  • the target number of parameter value groups is input to the updated local multi-agent model, and the target parameter value and the target number of parameter value groups selected before the model update are aggregated, that is, the target number of parameter value groups is selected again, Calculate the average value of the parameter values of the unpredictable parameters in the target number of parameter value groups selected again, and send them to the coordinating device as intermediate parameter values, and then continue the above process.
  • the training of the multi-agent model after the training of the multi-agent model is completed, other uses of the multi-agent model can be realized by changing the actual parameter values of the predictable parameters, where the actual parameter values are different from the training parameters of the predictable parameters value; as an example, the predictable parameters include the sex, age, occupation, and number of infected persons of the target disease, and the actual parameter values may be the sex, age, occupation, and number of infected persons of the target disease in the target area, Then the actual parameter values are input into the updated multi-agent model for prediction, so that the number of deaths caused by the target disease in the target area can be obtained.
  • the predictable parameters include the sex, age, occupation, and number of infected persons of the target disease
  • the actual parameter values may be the sex, age, occupation, and number of infected persons of the target disease in the target area
  • the multi-agent model is used to predict the data related to the disease, which improves the accuracy of the model prediction, and then timely controls the situation related to the disease, so as to quickly dispatch medical resources and timely carry out disease prevention and control.
  • the multi-agent model is updated.
  • the multi-agent model is updated.
  • Jointly optimize the value of unpredictable parameters so as to obtain a multi-agent model with better conformity between simulation results and real data, and ensure the security of local data, solve the problem of data islands in the field of multi-agent models, and realize multi-participant cooperation.
  • Co-modeling among them improves the prediction accuracy of the model.
  • Fig. 8 is a schematic flowchart of the prediction method of the multi-agent model provided by the embodiment of the present application, the prediction method based on the multi-agent model provided by the embodiment of the present application include:
  • step 201 the participant device acquires an actual parameter value of a predictable parameter, wherein the actual parameter value is different from a training parameter value of the predictable parameter.
  • obtaining the actual parameter values of the predictable parameters includes obtaining the total number of residents in the target area, the sex, age, and occupation of the residents, and the sex, age, occupation of the target disease infected person, and the activity track of the infected person.
  • the target area can be a certain city or a certain country
  • the target disease can be a new type of disease with strong transmission
  • the target disease infected person can be at least one foreign disease infected person who flows into the target area from an area outside the target area , or it could be a free-moving local spreader not subject to disease control in the target area.
  • Step 202 input actual parameter values into the updated multi-agent model for prediction, and obtain corresponding prediction results.
  • the acquired total number of residents in the target area, the sex, age, occupation of the residents, and the sex, age, occupation of the target disease infected person, and the activity trajectory of the infected person are input into the updated multi-intelligence
  • the body model can predict the impact of the target disease infection on the residents in the target area, that is, the number of new infections in the target area caused by the target disease infection can be obtained.
  • the updated multi-agent model can accurately predict the impact of the target disease infected person on the target area, that is, the number of infections. , it is possible to fully prepare medical resources, provide timely treatment for disease-infected persons, and avoid the problem of rising mortality due to insufficient medical resources.
  • the updated multi-agent model can also be used to predict urban traffic conditions, that is, to predict the number of vehicles congested within the target time period for the target road segment in the target area within a certain period of time in the future, specifically including obtaining predictable parameters
  • the actual parameter values of the target area are the population travel trajectory, office area distribution, holiday time, etc.; here, the target area can be different central areas of the city.
  • the acquired population travel trajectory, office area The distribution, holiday time, etc. are input into the updated multi-agent model, which can predict the number of congested vehicles in the target road segment in the target area within the target time period.
  • the updated multi-agent model can accurately predict the congestion situation of the target road section in the target area within the target time period, So as to make timely traffic control.
  • the multi-agent model is updated.
  • the multi-agent model is updated.
  • Jointly optimize the value of unpredictable parameters so as to obtain a multi-agent model with better conformity between simulation results and real data, and ensure the security of local data, solve the problem of data islands in the field of multi-agent models, and realize multi-participant cooperation.
  • Co-modeling among them improves the prediction accuracy of the model.
  • FIG. 9 is a schematic flowchart of a training method for a multi-agent model provided in an embodiment of the present application, including:
  • Step 301 each participant device initializes a local multi-agent model.
  • each participant as the data holder, has relatively little user overlap and relatively large user feature overlap in the data set owned by each participant, and each participant has the label of the corresponding user; for example, each participant It can be hospitals in different regions, and the users they reach are residents in different regions (that is, different samples), but the business is the same (that is, the characteristics are the same); correspondingly, the collaborating party device can be a credible institution.
  • Fig. 10 is a horizontal federated learning method of a multi-agent model provided by the embodiment of the present application.
  • each participant device has the same multi-agent model, with its own private predictable parameters X 1, E , ..., X N, E , and its own unpredictable parameters X 1, V , ... , X N, V , and the target variables Y 1, gt , ..., Y N, gt of the local multi-agent model simulation of each party.
  • the local multi-agent model is initialized by determining the value of the predictable parameter X E , the structure of the multi-agent model, the prediction target Y gt and selecting the unpredictable parameter X V .
  • Step 302 input the parameter values of the predictable parameters into the local multi-agent model.
  • the private predictable parameters X 1 , E , . . . , X N , E are input to the local ABS model.
  • Step 303 in the case of fixing the parameter value of the predictable parameter, input multiple parameter value groups into the multi-agent model for prediction respectively, and obtain multiple prediction results.
  • each participant initializes three sets of values (which can be regarded as a point), and each set includes a value of the two parameters. These three sets of parameters were brought into the model for simulation, and the model prediction results corresponding to the three sets of parameters were obtained.
  • Step 304 respectively comparing multiple predicted results with corresponding actual results.
  • the purpose of the multi-agent model is to predict the number of local deaths, then within a certain period of time, the actual number of deaths in the local area is the actual result, and comparing multiple predicted results with the corresponding actual results is [ A 1 , b 1 ], [a 2 , b 2 ] and [a 3 , b 3 ] respectively correspond to the predicted death toll and the local actual death toll.
  • Step 305 based on the comparison result, determine the loss value corresponding to each parameter value group.
  • the mean square error (MSE) is usually used as the loss function to calculate the loss value corresponding to each parameter value group.
  • Step 306 sort the multiple loss values to obtain the optimal model parameter value group, the worst model parameter value group and other model parameter value groups.
  • Step 307 aggregate parameter values of unpredictable parameters of all model parameter value groups except the worst model parameter value group to obtain intermediate parameter values corresponding to each unpredictable parameter.
  • the aggregation of parameter values of unpredictable parameters can be to obtain the geometric center point of the optimal model parameter value group and other model parameter value groups.
  • the model parameters are updated based on the model parameter value group [(a 1 +a 3 )/2, (b 1 +b 3 )/2] corresponding to C, and the [a 1 , b 1 ], [a 3 , b 3 ] and [(a 1 +a 3 )/2, (b 1 +b 3 )/2] continue to be brought into the updated model for simulation, and the corresponding The prediction results of the three sets of model parameter value groups, and then continue the process of step 304-step 307, so that each participant iteratively optimizes its own unpredictable parameters N L rounds locally, and obtains their respective final geometric center points C i,V t+1 That is, the intermediate parameter value.
  • Step 308 sending the intermediate parameter value to the partner device.
  • the n participant devices send their respective final geometric center points C i, V t+1 to the coordinating device.
  • step 309 the coordinating device aggregates the received intermediate parameter values to obtain target parameter values corresponding to each unpredictable parameter.
  • Step 310 sending the target parameter value to each participant device.
  • the coordinating party device sends the target parameter value C Server, V t+1 corresponding to each unpredictable parameter obtained through aggregation to n participant devices.
  • Step 311 update the multi-agent model based on the target parameter value.
  • the participant device after obtaining the target parameter value, that is, the optimized unpredictable parameter, the participant device optimizes the local multi-agent model according to the unpredictable parameter.
  • the multi-agent model is updated.
  • the multi-agent model is updated.
  • Jointly optimize the value of unpredictable parameters so as to obtain a multi-agent model with better conformity between simulation results and real data, and ensure the security of local data, solve the problem of data islands in the field of multi-agent models, and realize multi-participant cooperation.
  • Co-modeling among them improves the prediction accuracy of the model.
  • FIG. 12 is a schematic structural diagram of the multi-agent model training device 254 provided by the embodiment of the present application.
  • the training device 254 of the agent model comprises:
  • the obtaining module 2541 is configured such that the participant device inputs the training parameter values of the predictable parameters into the local multi-agent model, and in the case of fixing the training parameter values, respectively inputs multiple parameter value groups into the multi-agent model.
  • the agent model performs prediction and obtains multiple prediction results; wherein, the parameter value group includes at least one parameter value of an unpredictable parameter;
  • the comparison module 2542 is configured to determine the impact factor of each of the parameter value groups based on the plurality of prediction results and the actual results corresponding to each of the prediction results;
  • the aggregation module 2543 is configured to aggregate the parameter values of each of the unpredictable parameters based on each of the parameter value groups and the corresponding impact factors, to obtain intermediate parameter values corresponding to each of the unpredictable parameters;
  • the sending module 2544 is configured to send the obtained intermediate parameter value to the cooperating device, where the intermediate parameter value is used to trigger the cooperating device to aggregate the intermediate parameter values sent by multiple participant devices Processing to obtain target parameter values corresponding to each of the unpredictable parameters;
  • the updating module 2545 is configured to receive target parameter values corresponding to the unpredictable parameters returned by the cooperating device, and update the multi-agent model based on the target parameter values.
  • the acquisition module 2541 is further configured to acquire the number of unpredictable parameters, and determine the number of parameter value groups based on the number of unpredictable parameters; based on the number of parameter value groups , determine the parameter values of the unpredictable parameters in each parameter value group; respectively input the parameter values of the unpredictable parameters in the parameter value groups to the multi-agent model for prediction, and obtain the parameters corresponding to the multiple parameter value groups multiple predictions.
  • the obtaining module 2541 is further configured to obtain the parameter type of each unpredictable parameter in the parameter value group; determine the corresponding parameter value range according to the parameter type corresponding to each unpredictable parameter; The parameter value range of each unpredictable parameter determines the parameter value of each unpredictable parameter.
  • the comparison module 2542 is further configured to determine the prediction accuracy corresponding to each parameter value group based on the prediction result corresponding to each parameter value group and the corresponding actual result; The prediction accuracy corresponding to each of the parameter value groups is used as the corresponding impact factor.
  • the aggregation module 2543 is further configured to multiply the prediction accuracy corresponding to each of the parameter value groups by the parameter value of the unpredictable parameter to obtain the corresponding to each of the parameter value groups A product result: accumulating the product results corresponding to each of the parameter value groups to obtain an accumulation result; using the accumulation result as an intermediate parameter value of the unpredictable parameter.
  • the comparison module 2542 is further configured to determine the loss value corresponding to each parameter value group based on the predicted result corresponding to each parameter value group and the corresponding actual result; The loss value corresponding to the parameter value group determines the impact factor of the corresponding parameter value group.
  • the aggregation module 2543 is further configured to sort the plurality of parameter value groups based on the impact factor of each parameter value group to obtain a sorting result; based on the sorting result, from the Selecting a target number of parameter value groups from a plurality of parameter value groups; wherein, the target number is smaller than the number of the plurality of parameter value groups; based on the selected target number of parameter value groups, for each parameter of the unpredictable parameter Values are aggregated to obtain intermediate parameter values corresponding to each of the unpredictable parameters.
  • the aggregation module 2543 is further configured to obtain the average value of the parameter values of the unpredictable parameters in the target number of parameter value groups; and use the average value as the middle value of the unpredictable parameters parameter value.
  • the sending module 2544 is further configured to perform privacy protection on the intermediate parameter values of the unpredictable parameters respectively to obtain the privacy-protected intermediate parameter values; and send the privacy-protected intermediate parameter values to the collaborative party device, wherein the intermediate parameter value is used to trigger the coordinating party device to aggregate the privacy-protected intermediate parameter values sent by multiple participant devices to obtain the target corresponding to each of the unpredictable parameters parameter value.
  • the device further includes a second acquisition module 1210 and a prediction module 1220, the second acquisition module 1210 measures the training parameter value of the parameter; the prediction module 1220 is configured to input the actual parameter value
  • the updated multi-agent model performs prediction and obtains corresponding prediction results.
  • the multi-agent model is updated.
  • the multi-agent model is updated.
  • Jointly optimize the value of unpredictable parameters so as to obtain a multi-agent model with better conformity between simulation results and real data, and ensure the security of local data, solve the problem of data islands in the field of multi-agent models, and realize multi-participant cooperation.
  • Co-modeling among them improves the prediction accuracy of the model.
  • FIG. 13 The prediction device 1200 based on the multi-agent model includes:
  • the second acquiring module 1210 is configured to acquire an actual parameter value of the predictable parameter, where the actual parameter value is different from the training parameter value of the predictable parameter;
  • the prediction module 1220 is configured to input the actual parameter values into the updated multi-agent model for prediction, and obtain corresponding prediction results.
  • the multi-agent model is updated.
  • the multi-agent model is updated.
  • Jointly optimize the value of unpredictable parameters so as to obtain a multi-agent model with better conformity between simulation results and real data, and ensure the security of local data, solve the problem of data islands in the field of multi-agent models, and realize multi-participant cooperation.
  • Co-modeling among them improves the prediction accuracy of the model.
  • the embodiment of the present application also provides an electronic device, and the electronic device includes:
  • the processor is configured to implement the multi-agent model training method provided in the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiment of the present application also provides a computer program product, including a computer program, and when the computer program is executed by a processor, the multi-agent model training method provided in the embodiment of the present application is implemented.
  • the embodiment of the present application also provides a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored.
  • the processor will be caused to execute the multi-intelligence system provided by the embodiment of the present application. Body model training method.
  • the computer-readable storage medium can be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; Various equipment.
  • executable instructions may take the form of programs, software, software modules, scripts, or code written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and its Can be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in a Hyper Text Markup Language (HTML) document in one or more scripts, in a single file dedicated to the program in question, or in multiple cooperating files (for example, files that store one or more modules, subroutines, or sections of code).
  • HTML Hyper Text Markup Language
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, on multiple computing devices distributed across multiple sites and interconnected by a communication network. to execute.
  • model when multiple participants train the multi-agent model with the same purpose, the values of unpredictable parameters are jointly optimized, so as to obtain a multi-agent whose simulation results are better in line with the real data.
  • Model and ensure the security of local data, solve the problem of data islands in the field of multi-agent models, and realize joint modeling among multiple participants, thereby improving the accuracy of model prediction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供一种多智能体模型的训练方法、装置、电子设备、存储介质及程序产品,包括:参与方设备将可预测参数的训练参数值输入至本地的多智能体模型,并在固定训练参数值的情况下,将多个参数值组分别输入至多智能体模型进行预测,得到多个预测结果,以与各预测结果对应的实际结果进行比较,从而确定每个参数值组的影响因子,进而对各不可预测参数的参数值进行聚合,得到对应各不可预测参数的中间参数值并发送至协作方设备,其中,中间参数值用于触发协作方设备对接收的中间参数值进行聚合处理,得到对应各不可预测参数的目标参数值;接收协作方设备返回的对应各不可预测参数的目标参数值,并基于目标参数值对多智能体模型进行更新。

Description

多智能体模型的训练方法、装置、电子设备、存储介质及程序产品
相关申请的交叉引用
本申请基于申请号为202110981895.1、申请日为2021年08月25日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种多智能体模型的训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法和技术及应用***。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
相关技术中的横向联邦学习通常由不同的参与方和一个协作方训练机器学习模型,其目标是利用各方有限的数据,在保障数据安全的前提下,共同训练一个全局模型。该全局模型因为利用了各参与方的数据进行训练,所以模型效果能够逼近将各参与方数据放在一起训练的情况,显著优于各参与方只基于自有数据得到的模型的效果。然而,多智能体的模型的使用与传统的机器学习十分不同,无法按照传统的联邦机器学习模型的训练方式来应用联邦学习解决多方智能体模型的验证。
发明内容
本申请实施例提供一种多智能体模型的训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够在保障本地数据安全的同时,提升模型预测准确度。
本申请实施例提供一种多智能体模型的训练方法,基于联邦学习***,所述***包括协作方设备及至少两个参与方设备,所述方法由参与方设备执行,包括:
参与方设备将可预测参数的训练参数值输入至本地的多智能体模型,并在固定所述训练参数值的情况下,将多个参数值组分别输入至所述多智能体模型进行预测,得到多个预测结 果;
其中,所述参数值组包括至少一个不可预测参数的参数值;
基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子;
基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值;
将得到的所述中间参数值发送至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值;
接收所述协作方设备返回的对应各所述不可预测参数的目标参数值,并基于所述目标参数值对所述多智能体模型进行更新。
本申请实施例还提供一种多智能体模型的训练装置,所述装置包括:
获取模块,配置为参与方设备将可预测参数的训练参数值输入至本地的多智能体模型,并在固定所述训练参数值的情况下,将多个参数值组分别输入至所述多智能体模型进行预测,得到多个预测结果;其中,所述参数值组包括至少一个不可预测参数的参数值;
对比模块,配置为基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子;
聚合模块,配置为基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值;
发送模块,配置为将得到的所述中间参数值发送至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值;
更新模块,配置为接收所述协作方设备返回的对应各所述不可预测参数的目标参数值,并基于所述目标参数值对所述多智能体模型进行更新。
本申请实施例提供一种电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的多智能体模型的训练方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的多智能体模型的训练方法。
本申请实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行 时实现本申请实施例提供的多智能体模型的训练方法。
本申请实施例具有以下有益效果:
相较于相关技术中多智能体的模型只能由数据拥有方单独训练的方式,应用本申请实施例提供的基于横向联邦学习架构的多智能体模型的训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品,通过参与方在本地对不可预测参数进行聚合后得到中间参数值并发送至协作方,并基于协作方对接收的中间参数值进行二次聚合后得到的目标参数值,以对多智能体模型进行更新,如此,当多个参与方对用途相同的多智能体模型进行训练时,联合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
附图说明
图1是本申请实施例提供的多智能体模型的训练方法的实施场景示意图;
图2是本申请实施例提供的电子设备的结构示意图;
图3是本申请实施例提供的多智能体模型的验证过程和机器学习模型的训练过程的对比图;
图4是本申请实施例提供的多智能体模型的训练方法的流程示意图;
图5是本申请实施例提供的多智能体模型的训练方法的一个可选的流程示意图;
图6A是本申请实施例提供的一个多智能体模型的不可预测参数聚合的一个可选示意图;
图6B是本申请实施例提供的一个多智能体模型的不可预测参数聚合的一个可选示意图;
图7A是本申请实施例提供的多智能体模型训练方法的一个可选的流程示意图;
图7B是本申请实施例提供的多智能体模型训练方法的一个可选的流程示意图;
图8是本申请实施例提供的多智能体模型的预测方法的流程示意图;
图9是本申请实施例提供的多智能体模型的训练方法的流程示意图;
图10是本申请实施例提供的一个多智能体模型的横向联邦学习方法;
图11是本申请实施例提供的一个多智能体模型的不可预测参数聚合的一个可选示意图;
图12是本申请实施例提供的多智能体模型的训练装置的结构示意图;
图13是本申请实施例提供的多智能体模型的预测装置的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地 详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)联邦学习(federated learning)是指通过联合不同的参与者(participant,或者party,也称为数据拥有者(data owner),或者客户(client))进行机器学习的方法。在联邦学习中,参与者并不需要向其它参与者和协调者(coordinator,也称为参数服务器(parameter server),或者聚合服务器(aggregation server))暴露自己拥有的数据,因而联邦学习可以很好的保护用户隐私和保障数据安全。
其中,横向联邦学习是在各个参与者的数据特征重叠较多,而用户重叠较少的情况下,取出参与者数据特征相同而用户不完全相同的那部分数据进行联合机器学习。比如有两家不同地区的银行,它们的用户群体分别来自各自所在的地区,相互的交集很小。但是它们的业务很相似,记录的用户数据特征很大部分是相同的。可以使用横向联邦学习来帮助两家银行构建联合模型来预测他们的客户行为。
2)多智能体模型的模拟方法(agent based simulation or agent based modeling,ABS或ABM),是一种用来模拟智能体(独立个体或共同群体,例如组织、团队)的行动和相互作用的计算模型。多智能体模型是一个微观模型,通过模拟多个智能体的同时行动和相互作用以再现和预测复杂现象。这个过程是从低(微观)层次到高(宏观)层次的涌现,通过ABS可以模拟城市交通情况和疾病传播等现象,例如,可以通过ABS模拟新冠病毒的传播,帮助预测新冠病毒疫情的发展情况和分析不同干预手段对疫情的抑制效果。这种场景下,通常涉及到3个部分,1)贴近真实分布的人群模型;2)人群之间的社交网络模型;3)疾病的传播模型;基于以上三部分模型和对应的参数,可以模拟在给定初始感染人数的情况下,疫 情的发展趋势。其中,除了模型中通过数据得到的参数和经验参数(称为可预测参数),还有部分参数的取值无法确定(称为不可预测参数),这部分参数取值就需要通过在真实数据上进行验证(validation)来得到,这里在真实数据上的验证步骤类似于机器学习中的训练步骤,即优化不可预测参数的值,让模型模拟的结果与真实数据尽量接近。一种常用的确定这些参数的方法是基于优化的方法,比如Nelder-Mead Optimization优化方法。
3)同态加密(Homomorphic Encryption,HE)是一种对称加密算法,同态加密的目的是找到一种加密算法,这种加密算法能够在密文上执行加法、乘法运算,使得对加密后的密文进行某种操作所得到的结果,恰好等于对加密前的明文进行预期操作后再加密得到的密文。同态加密有效保证了数据处理方可以直接对数据的密文进行相应的处理,而无法获知其所处理的数据明文信息。同态加密的这一特性使用户的数据和隐私可以得到相应的安全保障,因此,同态加密被应用于许多现实场景来保证数据的安全。
如果一个加密函数同时满足加法同态和乘法同态,称为全同态加密。使用这个加密函数可以完成各种加密后的运算(加减乘除、多项式求值、指数、对数、三角函数等)。
申请人发现,一个构建好的多智能体模型的模拟ABS模型,可以适用于不同的地区,只需要根据目标地区相应的情况调整其可预测参数(如人口的年龄,性别比例等),然后验证得出不可预测参数的值,即可使用该模型在目标地区预测和分析疫情的后续发展情况。通常,参与模拟的区域越大,构建模型使用的智能体越多,模型的效果越好,越能准确反应***的真实情况。然而由于各地区的人口分布、人***动情况以及疫情情况数据可能涉及隐私或安全问题,比较敏感,因此这些数据通常只有当地的具有公信力的机构有权限查看,无法汇总到一处用于训练/验证,所以各机构只能基于自有的有限的数据进行验证的模拟,得到的不可预测参数的值往往不是最优结果,模型效果会受到影响,可能导致预测的偏差。
基于此,本申请实施例提供一种多智能体模型训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品,使得多参与方设备在协作方设备的协调下可以共同训练一个多智能体的模型,并保障本地数据的安全,解决多智能体的模型领域的数据孤岛问题。
基于上述对本申请实施例中涉及的名词和术语的解释,下面说明本申请实施例提供的多智能体模型的训练方法的实施场景,参见图1,图1是本申请实施例提供的多智能体模型的训练方法的实施场景示意图,为实现支撑一个示例性应用,参与方设备200-1、200-2、……、200-n通过网络300连接协作方设备400,其中,参与方设备200-1、200-2、……、200-n可以是存储有可预测参数、不可预测参数以及预测目标的真实值的机构,例如可以是医院,协作方设备400可以是具有公信力的机构,参与方设备200-1、200-2、……、200-n和协作方 设备400互相协助进行联邦学习以使参与方设备200-1、200-2、……、200-n得到多智能体模型,网络300可以是广域网或者局域网,又或者是二者的组合,使用无线或有线链路实现数据传输。
参与方设备(包括参与方设备200-1、200-2、……、200-n),用于可预测参数的训练参数值输入至本地的多智能体模型,并在固定训练参数值的情况下,将多个参数值组分别输入至多智能体模型进行预测,得到多个预测结果;其中,参数值组包括至少一个不可预测参数的参数值;基于多个预测结果与各预测结果对应的实际结果,确定每个参数值组的影响因子;基于各参数值组以及相应的影响因子,对各不可预测参数的参数值进行聚合,得到对应各不可预测参数的中间参数值;将得到的中间参数值发送至协作方设备。
协作方设备(包括协作方设备400),用于对多个参与方设备发送的中间参数值进行聚合处理,得到对应各不可预测参数的目标参数值;将目标参数值发送至参与方设备。
参与方设备(包括参与方设备200-1、200-2、……、200-n),还用于接收协作方设备返回的对应各不可预测参数的目标参数值,并基于目标参数值对多智能体模型进行更新。
在实际应用中,训练得到的多智能体模型可以应用于近期在世界蔓延的新冠疫情的建模,实现多城市、多地区、多国家之间共同建模,提升模型预测准确度,为民众和政策制定者提供更为准确的数据。
在实际应用中,参与方设备200-1、200-2、……、200-n和协作方设备400可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式***,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Deliver Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。参与方设备200-1、200-2、……、200-n和协作方设备400同样可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。参与方设备200-1、200-2、……、200-n和协作方设备400可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
下面对本申请实施例提供的实施多智能体模型的训练方法的电子设备的硬件结构做详细说明,电子设备包括但不限于服务器或终端。参见图2,图2是本申请实施例提供的电子设备的结构示意图,图2所示的电子设备200包括:至少一个处理器210、存储器250、至少一个网络接口220和用户接口230。电子设备200中的各个组件通过总线***240耦合在一起。可以理解的是,总线***240用于实现这些组件之间的连接通信。总线***240除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在 图2中将各种总线都标为总线***240。
处理器210可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(Digital Signal Processor,DSP),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口230包括使得能够呈现媒体内容的一个或多个输出装置231,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口230还包括一个或多个输入装置232,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器250可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器250可选地包括在物理位置上远离处理器210的一个或多个存储设备。
存储器250包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(Read Only Memory,ROM),易失性存储器可以是随机存取存储器(Random Access Memory,RAM)。本申请实施例描述的存储器250旨在包括任意适合类型的存储器。
在一些实施例中,存储器250能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作***251,包括用于处理各种基本***服务和执行硬件相关任务的***程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块252,用于经由一个或多个(有线或无线)网络接口220到达其他计算设备,示例性的网络接口220包括:蓝牙、无线相容性认证(Wireless Fidelity,WiFi)、和通用串行总线(Universal Serial Bus,USB)等;
输入处理模块253,用于对一个或多个来自一个或多个输入装置232之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的多智能体模型的训练装置可以采用软件方式实现,图2示出了存储在存储器250中多智能体模型的训练装置254,其可以是程序和插件等形式的软件,包括以下软件模块:获取模块2541、对比模块2542、聚合模块2543,发送模块2544,以及更新模块2545,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分,将在下文中说明各个模块的功能。
在另一些实施例中,本申请实施例提供的多智能体模型的训练装置可以采用软硬件结合的方式实现,作为示例,本申请实施例提供的多智能体模型的训练装置可以是采用硬件译码 处理器形式的处理器,其被编程以执行本申请实施例提供的多智能体模型的训练方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(Application Specific Integrated Circuit,ASIC)、DSP、可编程逻辑器件(Programmable Logic Device,PLD)、复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或其他电子元件。
基于上述对本申请实施例的多智能体模型的训练方法的实施场景及电子设备的说明,下面说明本申请实施例提供的多智能体模型的训练方法。需要说明的是,本申请实施例中的多智能体模型的训练过程与传统机器学习模型的训练过程存在显著差异,参见图3,图3是本申请实施例提供的多智能体模型的验证过程和机器学习模型的训练过程的对比图,基于图3,得到一个更新完成的多智能体模型的过程具体包括构建初始多智能体模型(构建模型)、验证多智能体模型(验证过程)以及测试多智能体模型(测试过程)。其中,构建初始多智能体模型是指对模型参数进行初始化、预设损失函数(用于对多智能体模型进行更新)等;验证过程是指通过预设轮次的迭代来更新模型中的不可预测参数;测试过程是指通过修改模型的输出结果对多智能体模型的正确性进行测试。而得到一个已收敛的机器学习模型的过程具体包括构建初始机器学习模型、训练机器学习模型以及测试机器学习模型,其中,机器学习模型的训练阶段是通过训练样本数据对机器学习模型进行预测轮次的迭代更新。需要说明的是,多智能体模型的在真实数据上的验证过程类似于机器学习中的训练过程,即优化不可预测参数的值,让模型预测的结果与真实数据尽量接近。
参见图4,图4是本申请实施例提供的多智能体模型的训练方法的流程示意图,本申请实施例提供的多智能体模型的训练方法包括:
步骤101,参与方设备将可预测参数的训练参数值输入至本地的多智能体模型,并在固定训练参数值的情况下,将多个参数值组分别输入至多智能体模型进行预测,得到多个预测结果;其中,参数值组包括至少一个不可预测参数的参数值。
在实际实施时,这里的可预测参数的取值根据各方本地的情况确定,示例性地,可以是当地居民的年龄、职业、性别以及每天出行轨迹,又或者目标疾病的感染者的性别、年龄、职业,感染人数,以及目标疾病感染者的行动轨迹等;这里,可预测参数的训练参数值是基于本地的多智能体模型的训练目的的差异,所获取的不同的可预测参数,即在对一个多智能体模型进行训练优化的过程中,可预测参数的取值是固定的,作为一个示例,如果该多智能体模型用于预测当地疾病死亡人数,则当地居民的总人数、居民的性别、年龄等是在对该多智能体模型进行训练优化的过程中固定的可预测参数;相应地,在改变该多智能体模型的用 途时,只需调整可预测参数即可实现模型的其他用途,示例性地,当该模型用于预测另一地区的死亡人数,则将可预测参数调整为另一地区居民的总人数、居民的性别、年龄等;又或者该多智能体模型是用于预测疾病的传播概率,则此时固定的可预测参数可以是健康用户与患病用户的接触次数;相应地,可通过改变可预测参数即健康用户与患病用户的接触次数,来确定新的疾病传播概率。
在本申请实施例中,参数值组包括至少一个不可预测参数的参数值,不可预测参数的取值无法从已有数据或经验中推出,需要通过对将不可预测参数带入模型得到的预测值与相应的真实值进行比较从而得到,即通过调整不可预测参数的取值,使得模型结果与实际预测目标相符合,确定其最优值,并在测试数据上验证模拟结果准确性,也就是说,选取合适的不可预测参数的取值,使得模型的模拟结果尽可能符合真实数据(的分布)。
在一些实施例中,针对将多个参数值组分别输入至多智能体模型进行预测,得到多个预测结果的处理过程参见图5,图5是本申请实施例提供的多智能体模型的训练方法的一个可选的流程示意图,基于图4,步骤101还可以通过如下方式实现:
步骤1011,获取不可预测参数的数量,并基于不可预测参数的数量确定参数值组的数量。
在实际实施时,确定需要进行优化的不可预测参数的个数,从而基于不可预测参数的个数确定参数值组的个数。作为一个示例,当需要进行优化的不可预测参数的个数为n个时,参数值组的个数可以为n+1个。
步骤1012,基于参数值组的数量,确定各参数值组中不可预测参数的参数值。
在实际实施时,当确定可参数值组的个数后,基于参数值组的个数,选取与参数值组个数对应的不可预测参数的参数值。接上述示例,当参数值组的个数为n+1个时,选取n+1个参数值作为各参数值组中不可预测参数,接上述示例,当n为3时,参数值组即为4组为A、B、C以及D,这里的不可预测参数的参数值包括A(a 1,b 1,c 1,d 1)、B(a 2,b 2,c 2,d 2)、C(a 3,b 3,c 3,d 3)以及D(a 4,b 4,c 4,d 4)。
需要说明的是,这里选取不可预测参数的参数值包括获取参数值组中各不可预测参数的参数类型,然后根据各不可预测参数对应的参数类型,确定相应的参数值范围,再根据各不可预测参数的参数值范围,确定各不可预测参数的参数值。这里,不可预测参数可以为疾病的传播系数,或者可以是天气、年龄、性别等对疾病传播造成的影响,示例性地,当待优化的不可预测参数之一为疾病的传播系数时,确定该不可预测参数的取值范围为0-K,然后从0-K的范围内随机选取不可预测参数的参数值。接上述示例,例如这里的a是取值范围为0-K的待优化的不可预测参数,则a 1、a 2、a 3以及a 4均为(0,K)之间的参数值。
步骤1013,分别将各参数值组中不可预测参数的参数值输入至多智能体模型进行预测, 得到对应多个参数值组的多个预测结果。
接上述示例,将A(a 1,b 1,c 1,d 1)、B(a 2,b 2,c 2,d 2)、C(a 3,b 3,c 3,d 3)以及D(a 4,b 4,c 4,d 4)分别输入至多智能体模型进行预测,得到对应A组的预测结果,对应B组的预测结果,对应C组的预测结果以及对应D组的预测结果。
步骤102,基于多个预测结果与各预测结果对应的实际结果,确定每个参数值组的影响因子。
这里,影响因子可以用于表征每个参数值组中不可预测参数的影响程度,即用于表征每个参数值租的影响程度。
在一些实施例中,基于多个预测结果与各预测结果对应的实际结果,确定每个参数值组的影响因子包括分别基于每个参数值组对应的预测结果与相应的实际结果,确定每个参数值组对应的预测准确度;将每个参数值组对应的预测准确度作为相应的影响因子。这里,预测准确度可以为每个参数值组对应的权重。
在另一些实施例中,基于多个预测结果与各预测结果对应的实际结果,确定每个参数值组的影响因子包括分别基于每个参数值组对应的预测结果与相应的实际结果,确定每个参数值组对应的损失值;基于每个参数值组对应的损失值,确定相应参数值组的影响因子。在实际实施时,可以将损失值的倒数作为相应参数值组的影响因子,损失值越大,则损失值的倒数越小即影响因子越小,又或者将损失值作为相应参数值组的影响因子,损失值越大,则影响因子越大,这里,对于通过损失值确定相应参数值组的影响因子的方式,本申请实施例对此不做限制。
步骤103,基于各参数值组以及相应的影响因子,对各不可预测参数的参数值进行聚合,得到对应各不可预测参数的中间参数值。
在一些实施例中,当相应参数值组的影响因子为权重时,分别将各参数值组对应的权重与不可预测参数的参数值进行相乘,得到对应各参数值组的乘积结果,然后对各参数值组对应的乘积结果进行累加,得到累加结果,最后将累加结果作为不可预测参数的中间参数值。接上述示例,这里的参数组为A(a 1,b 1,c 1,d 1)、B(a 2,b 2,c 2,d 2)、C(a 3,b 3,c 3,d 3)以及D(a 4,b 4,c 4,d 4),相应的权重为将x、y、z以及k,则不可预测参数的中间参数值P为(a 1*x+a 2*y+a 3*z+a 4*k,b 1*x+b 2*y+b 3*z+b 4*k,c 1*x+c 2*y+c 3*z+c 4*k,d 1*x+d 2*y+d 3*z+d 4*k)。
在一些实施例中,当相应参数值组的影响因子与每个参数值组对应的损失值相关时,基于各参数值组的影响因子,对多个参数值组进行排序,得到排序结果;基于排序结果,从多个参数值组中选取目标数量的参数值组;其中,目标数量小于多个参数值组的数量;获取目标数量的参数值组中不可预测参数的参数值的平均值;将平均值作为不可预测参数的中间参 数值。
在实际实施时,当影响因子为损失值的倒数时,基于损失值的大小,从大到小或者从小到大的对多个参数值组进行排序,然后从排序后的参数值组中选取目标数量的参数值组,这里,目标数量为小于多个参数值组的数量。
接上述示例,这里的参数组为A(a 1,b 1,c 1,d 1)、B(a 2,b 2,c 2,d 2)、C(a 3,b 3,c 3,d 3)以及D(a 4,b 4,c 4,d 4),基于损失值的大小,确定最优模型参数值组A,最差模型参数值组D以及其它模型参数值组B和C。然后对选取出的目标数量的参数值组中不可预测参数的参数值进行聚合,即将a 1、a 2、a 3、a 4进行聚合,将b 1、b 2、b 3、b 4进行聚合,将c 1、c 2、c 3、c 4进行聚合以及将d 1、d 2、d 3、d 4进行聚合。
这里,对选取出的目标数量的参数值组中不可预测参数的参数值进行聚合的过程包括获取目标数量的参数值组中不可预测参数的参数值的平均值,然后将平均值作为不可预测参数的中间参数值,作为一个示例,对获取目标数量的参数值组中不可预测参数的参数值的平均值,将平均值作为不可预测参数的中间参数值的过程进行说明,示例性地,优化n个参数,从n+1个参数组中选取n个参数组,对n个参数组中相应的不可预测参数的参数值求平均值,以作为该不可预测参数的参数值的中间参数值。
需要说明的是,在得到目标数量的参数值组中不可预测参数的参数值的平均值后,还可以利用该平均值对多智能体模型进行更新,再对该平均值以及选取的目标数量的参数值组进行聚合,即再一次选取目标数量的参数值组,对再一次所选取的目标数量的参数值组中不可预测参数的参数值求取平均值,然后继续上述更新多智能体模型的过程并再一次聚合的过程,以此进行迭代,将最后一次所聚合得到的平均值作为不可预测参数的中间参数值。如此,各参与方本地迭代优化各自不可预测参数预设轮次,得到各自的最终平均值即中间参数值。
接上述示例,这里的参数组为A(a 1,b 1,c 1,d 1)、B(a 2,b 2,c 2,d 2)、C(a 3,b 3,c 3,d 3)以及D(a 4,b 4,c 4,d 4),基于损失值的大小,确定最优模型参数值组A,最差模型参数值组D以及其它模型参数值组B和C,接着求取最优模型参数值组和其它模型参数组的几何平均点,这里,参照图6A,图6A是本申请实施例提供的一个多智能体模型的不可预测参数聚合的一个可选示意图,此处求取A、B、C三组参数值组的几何平均点P,这里的P=[(a 1+a 2+a 3)/3,(b 1+b 2+b 3)/3,(c 1+c 2+c 3)/3,(d 1+d 2+d 3)/3]。在得到几何中心点P后,基于P对应的模型参数值组[(a 1+a 2+a 3)/3,(b 1+b 2+b 3)/3,(c 1+c 2+c 3)/3,(d 1+d 2+d 3)/3]对模型参数进行更新,这里,并将A、B、C、P继续带入更新后的模型进行模拟,得到分别对应四组模型参数值组的预测结果,这里,参见图6B,图6B是本申请实施例提供的一个多智能体模型的不可预测参数聚合的一个可选示意图,依据损失值的大小,从A、B、C、P四 组模型参数值组中继续确定最优模型参数值组,最差模型参数值组以及其它模型参数值组,接着求取最优模型参数值组和其它模型参数组的几何平均点,继续上述过程,如此,各参与方本地迭代优化各自不可预测参数预设轮次,得到各自的最终几何中心点即中间参数值。
如此,通过上述对参数值组的不可预测参数的参数值进行聚合的方式,不会产生额外的模拟量即不产生新的全局不可预测参数取值,从而各参与方无需对新值进行模拟,可以较单方本地优化更快更稳定的找到最优的不可预测参数值,减少了模拟次数和模型计算量。
在一些实施例中,当相应参数值组的影响因子为权重时,还可以基于各参数值组的权重,对多个参数值组进行排序,基于排序结果,从多个参数值组中选取目标数量的参数值组,其中,目标数量小于多个参数值组的数量,然后分别将所选取的各参数值组对应的权重与不可预测参数的参数值进行相乘,得到对应各参数值组的乘积结果,再对各参数值组对应的乘积结果进行累加,得到累加结果,最后将累加结果作为不可预测参数的中间参数值。
需要说明的是,对于基于各参数值组以及相应的影响因子,对各不可预测参数的参数值进行聚合的方式,还可以基于损失值对多个参数值组进行排序,基于排序结果,从多个参数值组中选取目标数量的参数值组,其中,目标数量小于多个参数值组的数量,然后分别将所选取的各参数值组对应的权重与不可预测参数的参数值进行相乘,得到对应各参数值组的乘积结果,再对各参数值组对应的乘积结果进行累加,得到累加结果,最后将累加结果作为不可预测参数的中间参数值,本申请实施例对基于各参数值组以及相应的影响因子,对各不可预测参数的参数值进行聚合的方式不做限制。
步骤104,将得到的中间参数值发送至协作方设备,其中,中间参数值用于触发协作方设备对多个参与方设备发送的中间参数值进行聚合处理,得到对应各不可预测参数的目标参数值。
在实际实施时,得到中间参数值后对各不可预测参数的中间参数值分别进行隐私保护,得到隐私保护后的中间参数值;这里隐私保护的方式可以为对中间参数值进行模糊处理,例如添加噪声、差分隐私处理等,协作方设备获得的即为至少两个参与方设备对中间参数值进行隐私处理后的参数值,应当理解的是,协作方设备在统计至少两个参与方设备的中间参数值时,其中的噪声将会互相抵消,不影响对中间参数值的聚合结果。此外,隐私保护的处理方式还可以为对中间参数值进行同态加密。
在实际实施时,协作方对多个参与方设备发送的中间参数值进行聚合处理的过程可以有多种方式,示例性地,对各参与方发送的中间参数值求几何平均,或者随机选取部分参与方上传的中心点进行平均,又或者在参与方除了上传几何中心点,同时上传最优模型参数值组或最差模型参数值组的损失值,或除最差模型参数值组之外其它所有模型参数值组的平均损 失值的基础上,根据损失值对参与方进行排序,选取较好的多个中心点进行平均,得到新的中心点。对于协作方进行参数聚合操作的过程本申请实施例对此不做限制。
步骤105,接收协作方设备返回的对应各不可预测参数的目标参数值,并基于目标参数值对多智能体模型进行更新。
需要说明的是,参与方基于目标参数值对多智能体模型进行更新有两种实现方式。
在一些实施例中,参见图7A,图7A是本申请实施例提供的多智能体模型训练方法的一个可选的流程示意图,这里,整个模型训练过程分成两个阶段完成,第一阶段是本地的多智能体模型训练,直至模型达到收敛条件后,将收敛时的各中间参数值上传至协作方设备(参数聚合设备),其中,中间参数值用于触发协作方设备进行第二阶段的参数聚合操作,为了适应初步建模或快速建模场景,第二阶段的参数聚合可只进行一次,整个模型就收敛。
在另一些实施例中,参见图7B,图7B是本申请实施例提供的多智能体模型训练方法的一个可选的流程示意图,这里,参与方还可以仅进行一次本地的多智能体模型的参数聚合,即将各中间参数值上传至协作方设备,其中,中间参数值用于触发协作方设备进行仅一次的第二阶段的参数聚合操作,然后将聚合后的目标参数值返回至各参与方设备,以供各参与方设备进行本地的模型更新,然后基于更新后的模型,继续进行本地的多智能体模型的模拟,再将各中间参数值上传至协作方设备,继续上述过程,直至本地的多智能体模型收敛。
需要说明的是,在上述第二种更新方式中,各参与方设备在得到目标参数值后,参与方设备基于目标参数值更新本地多智能体模型,再将目标参数值与模型更新前所选取的目标数量的参数值组输入至更新后的本地多智能体模型,对该目标参数值以及模型更新前所选取的目标数量的参数值组进行聚合,即再一次选取目标数量的参数值组,对再一次所选取的目标数量的参数值组中不可预测参数的参数值求取平均值,以作为中间参数值发送至协作方设备,然后继续上述过程。
在一些实施例中,在多智能体模型训练完成后,可以通过改变可预测参数的实际参数值来实现多智能体模型的其他用途,这里的实际参数值不同于所述可预测参数的训练参数值;作为一个示例,可预测参数包括目标疾病的感染者的性别、年龄、职业,以及感染人数,实际参数值可以是目标区域内目标疾病的感染者的性别、年龄、职业,以及感染人数,然后将实际参数值输入更新后的多智能体模型进行预测,从而可以得到目标区域内目标疾病导致的死亡人数。
如此,通过该多智能体模型进行与疾病相关的数据的预测,提升了模型预测准确度,进而及时掌控与疾病相关的情况,以快速调度医疗资源并及时进行疾病防治与管控。
应用本申请上述实施例,相较于相关技术中多智能体的模型只能由数据拥有方单独训练 的方式,通过参与方在本地对不可预测参数进行聚合后得到的中间参数值并发送至协作方,并基于协作方对接收到中间参数值进行二次聚合返回的目标参数值,以对多智能体模型进行更新,如此,当多个参与方对用途相同的多智能体模型进行训练时,联合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
在对本申请实施例提供的多智能体模型的训练方法进行说明之后,接下来对训练得到的多智能体模型的应用进行说明,这里,以疾病的传播预测的实际场景为例,对本申请实施例提供的多智能体模型的预测方法进行介绍,参见图8,图8是本申请实施例提供的多智能体模型的预测方法的流程示意图,本申请实施例提供的基于多智能体模型的预测方法包括:
步骤201,参与方设备获取可预测参数的实际参数值,其中,实际参数值不同于可预测参数的训练参数值。
在实际实施时,获取可预测参数的实际参数值包括获取目标区域内居民的总人数,居民的性别、年龄、职业,和目标疾病感染者的性别、年龄、职业,以及感染者的活动轨迹。这里,目标区域可以是某一城市或者某一国家,目标疾病可以是一种传播性强的新型疾病,目标疾病感染者可以是从目标区域以外的区域流入目标区域内的至少一个外来疾病感染者,或者也可以是在目标区域内没有接受疾病管控的自由行动的本地传播者。
步骤202,将实际参数值输入更新后的多智能体模型进行预测,得到相应的预测结果。
在实际实施时,将获取到的目标区域内居民的总人数,居民的性别、年龄、职业,和目标疾病感染者的性别、年龄、职业,以及感染者的活动轨迹输入至更新后的多智能体模型,可以预测目标疾病感染者对目标区域内居民的影响,即得到目标疾病感染者导致目标区域内的新增感染人数。
如此,在获取到具体的可预测参数值后,相较于之前的多智能体模型,通过更新后的多智能体模型可以准确的预测出目标疾病感染者对目标区域的影响即传染人数,这样,可以充分准备医疗资源,对疾病感染者进行及时治疗,避免由于医疗资源不足导致疾病死亡率上升的问题。
在一些实施例中,更新完成的多智能体模型还可以用于城市交通情况预测,即预测未来一段时间内,针对目标区域的目标路段在目标时间段内拥堵车辆数,具体包括获取可预测参数的实际参数值即目标区域的人口出行轨迹、办公区域分布、节假日时间等;这里,目标区域可以是城市的不同中心区域,在实际实施时,将获取到的目标区域的人口出行轨迹、办公 区域分布、节假日时间等输入至更新后的多智能体模型,可以预测目标区域的目标路段在目标时间段内拥堵车辆数。如此,在获取到具体的可预测参数值后,相较于之前的多智能体模型,通过更新后的多智能体模型可以准确的预测出目标区域的目标路段在目标时间段内的拥堵情况,从而及时做出交通管控。
应用本申请上述实施例,相较于相关技术中多智能体的模型只能由数据拥有方单独训练的方式,通过参与方在本地对不可预测参数进行聚合后得到的中间参数值并发送至协作方,并基于协作方对接收到中间参数值进行二次聚合返回的目标参数值,以对多智能体模型进行更新,如此,当多个参与方对用途相同的多智能体模型进行训练时,联合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
接下来以横向联邦学习的应用场景为例,对本申请实施例提供的多智能体模型的训练进行说明。在横向联邦学习的场景下,通常有一个协作方与至少两个参与方,也即对于模型的训练由一个协作方设备和至少两个参与方设备共同实施。参与方设备与协作方设备均可以是服务器,也可以是终端。参见图9,图9是本申请实施例提供的多智能体模型的训练方法的流程示意图,包括:
步骤301,各参与方设备初始化本地多智能体模型。
这里,在横向联邦学习的应用场景下,各参与方作为数据持有方,所拥有的数据集中用户重叠相对少而用户特征重叠相对较多,各参与方拥有对应用户的标签;比如各参与方可以为不同地区的医院,他们触达的用户为不同地区的居民(即样本不同),但是业务相同(即特征相同);相应地,协作方设备可以是具有公信力的机构。
参见图10,图10是本申请实施例提供的一个多智能体模型的横向联邦学习方法,这里展示了一个协作方设备和n个参与方设备,各参与方的结构与工作方式均相同。在本实施例中,各参与方设备都有一个相同的多智能体模型,有各自私有的可预测参数X 1,E,…,X N, E,各自的不可预测参数X 1,V,…,X N,V,以及各方本地多智能体模型模拟的目标变量Y 1, gt,…,Y N,gt。在具体实施时,通过确定可预测参数取值X E、多智能体模型结构、预测目标Y gt以及选取不可预测参数X V来初始化本地多智能体模型。
步骤302,将可预测参数的参数值输入至本地的多智能体模型。
继续参见图10,将各自私有的可预测参数X 1,E,…,X N,E输入至本地的ABS模型。
步骤303,在固定可预测参数的参数值的情况下,将多个参数值组分别输入至多智能体 模型进行预测,得到多个预测结果。
作为一个示例,这里以优化2个参数为例(a,b),各参与方初始化3组取值(可看作一个点),每组包含这2个参数的一种取值。将这3组参数分别带入模型进行模拟,得到对应三组参数的模型预测结果。这里继续参见图10,将各自的不可预测参数X 1,V,…,X N,V输入至本地的ABS模型,结合上述示例,这里的X 1,V对应参数a,X 2,V对应参数b,则各参与方初始化3组取值(可看作一个点)即为[a 1,b 1],[a 2,b 2]和[a 3,b 3],将这3组参数分别带入模型进行模拟,得到对应三组参数的模型预测结果也就是将[a1,b1],[a2,b2]和[a3,b3]带入模型进行模拟,得到分别对应三组参数的模型预测结果。
步骤304,分别将多个预测结果与相应的实际结果进行比较。
接上述示例,如果该多智能体模型的用途是预测当地死亡人数,则在某一时段内,当地实际死亡人数即是实际结果,将多个预测结果与相应的实际结果进行比较即是将[a 1,b 1],[a 2,b 2]和[a 3,b 3]分别对应的预测死亡人数与当地实际死亡人数进行比较。
步骤305,基于比较结果,确定每个参数值组对应的损失值。
在实际实施时,通常可用均方误差(MSE)作为损失函数来计算得到每个参数值组对应的损失值。
步骤306,对多个损失值进行排序,得到最优模型参数值组,最差模型参数值组以及其它模型参数值组。
接上述示例,确定[a 1,b 1],[a 2,b 2]和[a 3,b 3]分别对应的预测结果的损失值,对三个损失值进行排序,得到最优模型参数值组[a 1,b 1],最差模型参数值组[a 2,b 2]以及其它模型参数值组[a 3,b 3]。
步骤307,对除最差模型参数值组之外所有模型参数值组的不可预测参数的参数值进行聚合,得到对应各不可预测参数的中间参数值。
作为一个示例,这里对不可预测参数的参数值进行聚合可以是求取最优模型参数值组和其它模型参数值组的几何中心点,参照图11,图11是本申请实施例提供的一个多智能体模型的不可预测参数聚合的一个可选示意图,接上述示例,此处求最优模型参数值组[a 1,b 1]和其它模型参数值组[a 3,b 3]的几何中心点C,这里C=[(a 1+a 3)/2,(b 1+b 3)/2]。
需要说明的是,在得到几何中心点C后,基于C对应的模型参数值组[(a 1+a 3)/2,(b 1+b 3)/2]对模型参数进行更新,并将[a 1,b 1],[a 3,b 3]和[(a 1+a 3)/2,(b 1+b 3)/2]继续带入更新后的模型进行模拟,得到分别对应三组模型参数值组的预测结果,然后继续步骤304-步骤307的过程,如此,各参与方本地迭代优化各自不可预测参数N L轮,得到各自的最终几何中心点C i,V t+1即中间参数值。
步骤308,将中间参数值发送至协作方设备。
继续参见图10,n个参与方设备将各自的最终几何中心点C i,V t+1各发送至协作方设备。
步骤309,协作方设备对接收到的中间参数值进行聚合处理,得到对应各不可预测参数的目标参数值。
作为一个示例,列举三种具体的聚合方法对协作方对接收到的中间参数值进行聚合处理的过程进行详细说明,具体包括,a)一种典型的聚合方式为求几何平均,即C Server, V t+1=centroid(C 1,V t+1,…,C N,V t+1);b)随机选取部分参与方上传的中心点进行平均,如随机选取K方,K<N,C Server,V t+1=centroid(C 1,V t+1,…,C K,V t+1);c)参与方除了上传几何中心点,同时上传最优点或最差点的损失值,或除最差点之外其它所有点的平均损失值;根据损失值对参与方进行排序,选取最好的K个中心点进行平均,得到新的中心点,K<N,C Server,V t+1=centroid(C 1,V t+1,…,C K,V t+1)。
示例性地,协作方设备对接收到的几何中心点进行聚合处理,即对C 1,…,C n求几何平均,这里,若C 1=[x 1,y 1],C n=[x n,y n],则C Server,V t+1=[(x 1+…+x n)/n,(y 1+…+y n)/n]。
步骤310,将目标参数值发送至各参与方设备。
继续参见图10,协作方设备将通过聚合得到的对应各不可预测参数的目标参数值C Server, V t+1发送至n个参与方设备。
步骤311,基于目标参数值对多智能体模型进行更新。
在实际实施时,参与方设备在得到目标参数值即优化后的不可预测参数后,根据该不可预测参数对本地的多智能体模型进行优化。
应用本申请上述实施例,相较于相关技术中多智能体的模型只能由数据拥有方单独训练的方式,通过参与方在本地对不可预测参数进行聚合后得到的中间参数值并发送至协作方,并基于协作方对接收到中间参数值进行二次聚合返回的目标参数值,以对多智能体模型进行更新,如此,当多个参与方对用途相同的多智能体模型进行训练时,联合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
下面继续说明本申请实施例提供的多智能体模型的训练装置254,参见图12,图12是本申请实施例提供的多智能体模型的训练装置254的结构示意图,本申请实施例提供的多智能体模型的训练装置254包括:
获取模块2541,配置为参与方设备将可预测参数的训练参数值输入至本地的多智能体模 型,并在固定所述训练参数值的情况下,将多个参数值组分别输入至所述多智能体模型进行预测,得到多个预测结果;其中,所述参数值组包括至少一个不可预测参数的参数值;
对比模块2542,配置为基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子;
聚合模块2543,配置为基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值;
发送模块2544,配置为将得到的所述中间参数值发送至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值;
更新模块2545,配置为接收所述协作方设备返回的对应各所述不可预测参数的目标参数值,并基于所述目标参数值对所述多智能体模型进行更新。
在一些实施例中,所述获取模块2541,还配置为获取所述不可预测参数的数量,并基于所述不可预测参数的数量确定所述参数值组的数量;基于所述参数值组的数量,确定各参数值组中不可预测参数的参数值;分别将所述各参数值组中不可预测参数的参数值输入至所述多智能体模型进行预测,得到对应所述多个参数值组的多个预测结果。
在一些实施例中,所述获取模块2541,还配置为获取所述参数值组中各不可预测参数的参数类型;根据所述各不可预测参数对应的参数类型,确定相应的参数值范围;根据所述各不可预测参数的参数值范围,确定所述各不可预测参数的参数值。
在一些实施例中,所述对比模块2542,还配置为分别基于每个所述参数值组对应的预测结果与相应的实际结果,确定每个所述参数值组对应的预测准确度;将每个所述参数值组对应的预测准确度作为相应的影响因子。
在一些实施例中,所述聚合模块2543,还配置为分别将各所述参数值组对应的预测准确度与所述不可预测参数的参数值进行相乘,得到对应各所述参数值组的乘积结果;对各所述参数值组对应的乘积结果进行累加,得到累加结果;将所述累加结果作为所述不可预测参数的中间参数值。
在一些实施例中,所述对比模块2542,还配置为分别基于每个所述参数值组对应的预测结果与相应的实际结果,确定每个所述参数值组对应的损失值;基于每个所述参数值组对应的损失值,确定相应参数值组的影响因子。
在一些实施例中,所述聚合模块2543,还配置为基于各所述参数值组的影响因子,对所述多个参数值组进行排序,得到排序结果;基于所述排序结果,从所述多个参数值组中选取目标数量的参数值组;其中,所述目标数量小于所述多个参数值组的数量;基于选取的目标 数量的参数值组,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值。
在一些实施例中,所述聚合模块2543,还配置为获取所述目标数量的参数值组中所述不可预测参数的参数值的平均值;将所述平均值作为所述不可预测参数的中间参数值。
在一些实施例中,所述发送模块2544,还配置为对各所述不可预测参数的中间参数值分别进行隐私保护,得到隐私保护后的中间参数值;发送隐私保护后的中间参数值至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的、隐私保护后的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值。
在一些实施例中,所述装置还包括第二获取模块1210和预测模块1220,所述第二获取模块1210测参数的训练参数值;所述预测模块1220,配置为将所述实际参数值输入更新后的所述多智能体模型进行预测,得到相应的预测结果。
在一些实施例中,所述可预测参数包括目标疾病的感染者的性别、年龄、职业,以及感染人数;所述第二获取模块1210,还配置为获取目标区域内目标疾病的感染者的性别、年龄、职业,以及感染人数;所述预测模块1220,还配置为将所述目标区域内目标疾病的感染者的性别、年龄、职业,以及感染人数输入至更新后的所述多智能体模型,预测得到所述目标区域内所述目标疾病导致的死亡人数。
应用本申请上述实施例,相较于相关技术中多智能体的模型只能由数据拥有方单独训练的方式,通过参与方在本地对不可预测参数进行聚合后得到的中间参数值并发送至协作方,并基于协作方对接收到中间参数值进行二次聚合返回的目标参数值,以对多智能体模型进行更新,如此,当多个参与方对用途相同的多智能体模型进行训练时,联合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
下面说明本申请实施例提供的基于多智能体模型的预测装置1200,参见图13,图13是本申请实施例提供的基于多智能体模型的预测装置1200的结构示意图,本申请实施例提供的基于多智能体模型的预测装置1200包括:
第二获取模块1210,配置为获取所述可预测参数的实际参数值,所述实际参数值不同于所述可预测参数的训练参数值;
预测模块1220,配置为将所述实际参数值输入更新后的所述多智能体模型进行预测,得到相应的预测结果。
应用本申请上述实施例,相较于相关技术中多智能体的模型只能由数据拥有方单独训练的方式,通过参与方在本地对不可预测参数进行聚合后得到的中间参数值并发送至协作方,并基于协作方对接收到中间参数值进行二次聚合返回的目标参数值,以对多智能体模型进行更新,如此,当多个参与方对用途相同的多智能体模型进行训练时,联合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
本申请实施例还提供一种电子设备,所述电子设备包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的多智能体模型的训练方法。
本申请实施例还提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现本申请实施例提供的多智能体模型的训练方法。
本申请实施例还提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的多智能体模型的训练方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件***中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(Hyper Text Markup Language,HTML)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
综上所述,通过本申请实施例当多个参与方对用途相同的多智能体模型进行训练时,联 合优化不可预测参数的取值,从而获得模拟结果与真实数据符合更好的多智能体模型,并保障了本地数据的安全,解决多智能体的模型领域的数据孤岛问题,实现多参与方之间共同建模,从而提升了模型预测准确度。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (15)

  1. 一种多智能体模型的训练方法,基于联邦学习***,所述***包括协作方设备及至少两个参与方设备,所述方法由参与方设备执行,所述方法包括:
    参与方设备将可预测参数的训练参数值输入至本地的多智能体模型,并在固定所述训练参数值的情况下,将多个参数值组分别输入至所述多智能体模型进行预测,得到多个预测结果;
    其中,所述参数值组包括至少一个不可预测参数的参数值;
    基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子;
    基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值;
    将得到的所述中间参数值发送至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值;
    接收所述协作方设备返回的对应各所述不可预测参数的目标参数值,并基于所述目标参数值对所述多智能体模型进行更新。
  2. 根据权利要求1所述的方法,其中,所述将多个参数值组分别输入至所述多智能体模型进行预测,得到多个预测结果,包括:
    获取所述不可预测参数的数量,并基于所述不可预测参数的数量确定所述参数值组的数量;
    基于所述参数值组的数量,确定各参数值组中不可预测参数的参数值;
    分别将所述各参数值组中不可预测参数的参数值输入至所述多智能体模型进行预测,得到对应所述多个参数值组的多个预测结果。
  3. 根据权利要求2所述的方法,其中,所述确定各参数值组中不可预测参数的参数值,包括:
    获取所述参数值组中各不可预测参数的参数类型;
    根据所述各不可预测参数对应的参数类型,确定相应的参数值范围;
    根据所述各不可预测参数的参数值范围,确定所述各不可预测参数的参数值。
  4. 根据权利要求1所述的方法,其中,所述基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子,包括:
    分别基于每个所述参数值组对应的预测结果与相应的实际结果,确定每个所述参数值组 对应的预测准确度;
    将每个所述参数值组对应的预测准确度作为相应的影响因子。
  5. 根据权利要求4所述的方法,其中,所述基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值,包括:
    针对所述参数值组中任一所述不可预测参数执行以下操作:
    分别将各所述参数值组对应的预测准确度与所述不可预测参数的参数值进行相乘,得到对应各所述参数值组的乘积结果;
    对各所述参数值组对应的乘积结果进行累加,得到累加结果;
    将所述累加结果作为所述不可预测参数的中间参数值。
  6. 根据权利要求1所述的方法,其中,所述基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子,包括:
    分别基于每个所述参数值组对应的预测结果与相应的实际结果,确定每个所述参数值组对应的损失值;
    基于每个所述参数值组对应的损失值,确定相应参数值组的影响因子。
  7. 根据权利要求1所述的方法,其中,所述基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值,包括:
    基于各所述参数值组的影响因子,对所述多个参数值组进行排序,得到排序结果;
    基于所述排序结果,从所述多个参数值组中选取目标数量的参数值组;其中,所述目标数量小于所述多个参数值组的数量;
    基于选取的目标数量的参数值组,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值。
  8. 根据权利要求7所述的方法,其中,所述基于选取的目标数量的参数值组,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值,包括:
    针对所述参数值组中任一所述不可预测参数执行以下操作:
    获取所述目标数量的参数值组中所述不可预测参数的参数值的平均值;
    将所述平均值作为所述不可预测参数的中间参数值。
  9. 根据权利要求1所述的方法,其中,所述将得到的所述中间参数值发送至协作方设备,包括:
    对各所述不可预测参数的中间参数值分别进行隐私保护,得到隐私保护后的中间参数值;
    发送隐私保护后的中间参数值至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的、隐私保护后的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值。
  10. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取所述可预测参数的实际参数值,所述实际参数值不同于所述可预测参数的训练参数值;
    将所述实际参数值输入更新后的所述多智能体模型进行预测,得到相应的预测结果。
  11. 根据权利要求10所述的方法,其中,所述可预测参数包括目标疾病的感染者的性别、年龄、职业,以及感染人数;
    所述获取所述可预测参数的实际参数值,包括:
    获取目标区域内目标疾病的感染者的性别、年龄、职业,以及感染人数;
    所述将所述实际参数值输入更新后的所述多智能体模型进行预测,得到相应的预测结果,包括:
    将所述目标区域内目标疾病的感染者的性别、年龄、职业,以及感染人数输入至更新后的所述多智能体模型,预测得到所述目标区域内所述目标疾病导致的死亡人数。
  12. 一种多智能体模型的训练装置,所述装置包括:
    获取模块,配置为参与方设备将可预测参数的训练参数值输入至本地的多智能体模型,并在固定所述训练参数值的情况下,将多个参数值组分别输入至所述多智能体模型进行预测,得到多个预测结果;其中,所述参数值组包括至少一个不可预测参数的参数值;
    对比模块,配置为基于所述多个预测结果与各所述预测结果对应的实际结果,确定每个所述参数值组的影响因子;
    聚合模块,配置为基于各所述参数值组以及相应的影响因子,对各所述不可预测参数的参数值进行聚合,得到对应各所述不可预测参数的中间参数值;
    发送模块,配置为将得到的所述中间参数值发送至协作方设备,其中,所述中间参数值用于触发所述协作方设备对多个参与方设备发送的所述中间参数值进行聚合处理,得到对应各所述不可预测参数的目标参数值;
    更新模块,配置为接收所述协作方设备返回的对应各所述不可预测参数的目标参数值,并基于所述目标参数值对所述多智能体模型进行更新。
  13. 一种电子设备,所述电子设备包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至11任一项所 述的方法。
  14. 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至11任一项所述的方法。
  15. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至11任一项所述的方法。
PCT/CN2021/142157 2021-08-25 2021-12-28 多智能体模型的训练方法、装置、电子设备、存储介质及程序产品 WO2023024378A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110981895.1A CN113658689A (zh) 2021-08-25 2021-08-25 多智能体模型的训练方法、装置、电子设备及存储介质
CN202110981895.1 2021-08-25

Publications (1)

Publication Number Publication Date
WO2023024378A1 true WO2023024378A1 (zh) 2023-03-02

Family

ID=78492853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142157 WO2023024378A1 (zh) 2021-08-25 2021-12-28 多智能体模型的训练方法、装置、电子设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN113658689A (zh)
WO (1) WO2023024378A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935136A (zh) * 2023-08-02 2023-10-24 深圳大学 处理类别不平衡医学图像分类问题的联邦学习方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658689A (zh) * 2021-08-25 2021-11-16 深圳前海微众银行股份有限公司 多智能体模型的训练方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239845A (zh) * 2016-03-29 2017-10-10 中国石油化工股份有限公司 油藏开发效果预测模型的构建方法
CN109871702A (zh) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 联邦模型训练方法、***、设备及计算机可读存储介质
CN110797124A (zh) * 2019-10-30 2020-02-14 腾讯科技(深圳)有限公司 一种模型多端协同训练方法、医疗风险预测方法和装置
EP3742229A1 (en) * 2019-05-21 2020-11-25 ASML Netherlands B.V. Systems and methods for adjusting prediction models between facility locations
CN112584347A (zh) * 2020-09-28 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Uav异构网络多维资源动态管理方法
CN113095512A (zh) * 2021-04-23 2021-07-09 深圳前海微众银行股份有限公司 联邦学习建模优化方法、设备、介质及计算机程序产品
CN113658689A (zh) * 2021-08-25 2021-11-16 深圳前海微众银行股份有限公司 多智能体模型的训练方法、装置、电子设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733515B1 (en) * 2017-02-21 2020-08-04 Amazon Technologies, Inc. Imputing missing values in machine learning models
CN109118013A (zh) * 2018-08-29 2019-01-01 黑龙江工业学院 一种基于神经网络的经营数据预测方法、可读存储介质和预测***
CN110263936B (zh) * 2019-06-14 2023-04-07 深圳前海微众银行股份有限公司 横向联邦学习方法、装置、设备及计算机存储介质
CN110826725B (zh) * 2019-11-07 2022-10-04 深圳大学 基于认知的智能体强化学习方法、装置及***
CN111737749A (zh) * 2020-06-28 2020-10-02 南方电网科学研究院有限责任公司 基于联邦学习的计量装置告警预测方法及设备
CN112132277A (zh) * 2020-09-21 2020-12-25 平安科技(深圳)有限公司 联邦学习模型训练方法、装置、终端设备及存储介质
CN112329940A (zh) * 2020-11-02 2021-02-05 北京邮电大学 一种结合联邦学习与用户画像的个性化模型训练方法及***
CN112289448A (zh) * 2020-11-06 2021-01-29 新智数字科技有限公司 一种基于联合学习的健康风险预测方法和装置
CN112257873A (zh) * 2020-11-11 2021-01-22 深圳前海微众银行股份有限公司 机器学习模型的训练方法、装置、***、设备及存储介质
CN112447299A (zh) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 医护资源预测模型训练方法、装置、设备及存储介质
CN112700010A (zh) * 2020-12-30 2021-04-23 深圳前海微众银行股份有限公司 基于联邦学习的特征补全方法、装置、设备及存储介质
US11017322B1 (en) * 2021-01-28 2021-05-25 Alipay Labs (singapore) Pte. Ltd. Method and system for federated learning
CN113112321A (zh) * 2021-03-10 2021-07-13 深兰科技(上海)有限公司 智能量体方法、装置、电子设备及存储介质
CN113095508A (zh) * 2021-04-23 2021-07-09 深圳前海微众银行股份有限公司 回归模型构建优化方法、设备、介质及计算机程序产品

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239845A (zh) * 2016-03-29 2017-10-10 中国石油化工股份有限公司 油藏开发效果预测模型的构建方法
CN109871702A (zh) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 联邦模型训练方法、***、设备及计算机可读存储介质
EP3742229A1 (en) * 2019-05-21 2020-11-25 ASML Netherlands B.V. Systems and methods for adjusting prediction models between facility locations
CN110797124A (zh) * 2019-10-30 2020-02-14 腾讯科技(深圳)有限公司 一种模型多端协同训练方法、医疗风险预测方法和装置
CN112584347A (zh) * 2020-09-28 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Uav异构网络多维资源动态管理方法
CN113095512A (zh) * 2021-04-23 2021-07-09 深圳前海微众银行股份有限公司 联邦学习建模优化方法、设备、介质及计算机程序产品
CN113658689A (zh) * 2021-08-25 2021-11-16 深圳前海微众银行股份有限公司 多智能体模型的训练方法、装置、电子设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935136A (zh) * 2023-08-02 2023-10-24 深圳大学 处理类别不平衡医学图像分类问题的联邦学习方法

Also Published As

Publication number Publication date
CN113658689A (zh) 2021-11-16

Similar Documents

Publication Publication Date Title
Nguyen et al. Federated learning for internet of things: A comprehensive survey
Gao et al. STAN: spatio-temporal attention network for pandemic prediction using real-world evidence
Lin et al. ELECTRE II method to deal with probabilistic linguistic term sets and its application to edge computing
WO2023024378A1 (zh) 多智能体模型的训练方法、装置、电子设备、存储介质及程序产品
CN113159327B (zh) 基于联邦学习***的模型训练方法、装置、电子设备
Viana et al. Combining discrete-event simulation and system dynamics in a healthcare setting: A composite model for Chlamydia infection
Frias-Martinez et al. An agent-based model of epidemic spread using human mobility and social network information
Kishore et al. Lockdowns result in changes in human mobility which may impact the epidemiologic dynamics of SARS-CoV-2
Horton et al. Integrating evidence, politics and society: a methodology for the science–policy interface
CN110874648A (zh) 联邦模型的训练方法、***和电子设备
CN112749749B (zh) 基于分类决策树模型的分类方法、装置及电子设备
WO2022237194A1 (zh) 联邦学习***中账户的异常检测方法、装置及电子设备
CN112712182A (zh) 一种基于联邦学习的模型训练方法、装置及存储介质
Schneider et al. Social network analysis via multi-state reliability and conditional influence models
Miao et al. Federated deep reinforcement learning based secure data sharing for Internet of Things
Martín et al. Leveraging social networks for understanding the evolution of epidemics
Lin et al. DRL-based adaptive sharding for blockchain-based federated learning
Dum et al. Global systems science and policy
CN112308238A (zh) 解析模型的训练方法、装置、电子设备及存储介质
Xia et al. Synthesis of a high resolution social contact network for Delhi with application to pandemic planning
Wu et al. Development path based on the equalization of public services under the management mode of the Internet of Things
Agate et al. A framework for parallel assessment of reputation management systems
Kafsi et al. Mitigating epidemics through mobile micro-measures
Šuvakov et al. Agent-based simulations of emotion spreading in online social networks
CN112949866A (zh) 泊松回归模型的训练方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21954893

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE