CN113222169B

CN113222169B - Federal machine combination service method and system combining big data analysis feedback

Info

Publication number: CN113222169B
Application number: CN202110289138.8A
Authority: CN
Inventors: 邢廷炎; 周长兵; 刘思民
Original assignee: China University of Geosciences Beijing
Current assignee: China University of Geosciences Beijing
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2023-06-23
Anticipated expiration: 2041-03-18
Also published as: CN113222169A

Abstract

The invention discloses a federal machine combination service system combining big data analysis feedback, which comprises a plurality of devices (1) distributed at different addresses and a big data analysis and scheduling module (2); each device (1) comprises a data preprocessing module (3), a data acquisition module (6) and a data reading module (11); the big data analysis and scheduling module (2) analyzes and schedules all the equipment, the working module and the data progress data which participate in the federal machine learning; the big data analysis and scheduling module (2) is in data communication connection with the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5); the data acquisition module (6) is in data communication connection with the single machine storage module (7), and the single machine storage module (7) is also in data communication connection with the data preprocessing module (3) and the data reading module (11) respectively. According to the federal machine combination service system combining big data analysis feedback, before data recording training is carried out, data is cleaned on the data records, so that abnormal parts of the data records are removed, the accuracy of the data records is guaranteed, and the accuracy of a data model is guaranteed.

Description

Federal machine combination service method and system combining big data analysis feedback

Technical Field

The invention relates to the technical field of intelligent manufacturing, in particular to a federal machine combination service method and system combining big data analysis feedback.

Background

The twentieth century is the end of the year of intelligent production and intelligent manufacture, in the life of today, equipment is not only intelligent and automatic, but also independently operates from the former equipment to the collaborative operation of the present day, wherein the operation and collaboration of the cross-domain and cross-equipment are not separated, and the collaboration among different equipment or fields is necessarily involved. On the one hand, the self processing capacity of the processor needs to be improved, and on the other hand, the demand for computing capacity needs to be reduced, namely, the demand for optimizing the computing processing of artificial intelligence needs to be reduced, so that a great deal of research is also being conducted on algorithms nowadays, and the computing model is improved through the algorithms, so that the demand for computing is reduced. On the other hand, for various intelligent devices, in order to achieve the technical effects of various intelligent processes, the data of each sensing device or each organization needs to be combined, integrated logic calculation is carried out, integrated judgment is carried out after the data are summarized, but the data distributed in various places are integrated into huge difficulties and economic cost, and at present, technologies for carrying out joint access and processing on the data distributed in various places, such as federal machine learning, federal learning, joint learning and federal learning, are also known as a machine learning framework, and can effectively help a plurality of organizations to carry out data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations;

As in patent CN 110263936a, a lateral federal learning method is disclosed, which comprises: the community coordinator acquires global model parameters sent by the central coordinator and sends the global model parameters to each participant; model parameter updates sent by each participant and obtained by model training based on global model parameters are obtained, the model parameter updates are fused to obtain community model parameter updates, and whether the community model parameter updates need to be sent to a central coordinator or not is determined; if yes, the community model parameter update is sent to the central coordinator, the global model parameter update returned by the central coordinator is obtained, and the global model parameter update is sent to each participant, so that each participant carries out model training based on the global model parameter update. The invention also discloses a transverse federal learning device, equipment and a computer storage medium. The invention improves the learning efficiency of horizontal federal learning.

The invention discloses an execution method, a device, equipment and a readable storage medium of an automatic process, which relate to the field of financial science and technology, and the method comprises the following steps: acquiring multi-modal data corresponding to a terminal, and taking the multi-modal data as input of a deep learning model to obtain an intention analysis model; determining the behavior intention corresponding to the automatic flow in the terminal through the intention analysis model; and executing a target operation instruction corresponding to the automatic flow according to the behavior intention so as to execute the automatic flow. According to the invention, the behavior intention corresponding to the automatic flow is analyzed through the intention analysis model, the automatic flow is executed according to the behavior intention, and the occurrence of failure in execution of the automatic flow caused by the change of the execution environment corresponding to the automatic flow is avoided, so that the adaptability of the automatic flow to different execution environments is improved, namely, the adaptability of the automatic flow is improved, and the execution success rate of the automatic flow is improved.

Patent CN111882308A discloses a blockchain secure transaction method, comprising: collecting transaction request information of each node participating in the secure transaction; recording the transaction request information and verifying the legality; generating a transaction block, and packaging the transaction block into a block of a blockchain; acquiring and verifying the accounting right forwarded by the accounting right node; broadcasting the attribution condition of the verified accounting rights to each node so that each node can achieve consensus according to a preset consensus setting algorithm; and verifying the transaction request information through the accounting right, and after the verification is successful, notifying each node to record the transaction request information and synchronously updating account information of each node. The blockchain secure transaction method provided by the invention can solve the problems of prolonged transaction data confirmation, incapability of tampering illegal information, poor security, poor privacy and the like in the prior art.

Patent CN112257876a discloses a federal learning method, a federal learning device, computer equipment and a federal learning medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps that first computer equipment obtains sample label information corresponding to a sample identifier and obtains first fusion information corresponding to the sample identifier; the second computer equipment acquires second fusion information corresponding to the sample identifier, and sends the second fusion information to the first computer equipment; the first computer equipment acquires a gradient operator corresponding to the sample identifier based on the first fusion information, the second fusion information and the sample label information, and sends the gradient operator to the second computer equipment; the first computer device and the second computer device respectively adjust model parameters of a first sub-model and model parameters of a second sub-model in the machine learning model based on the gradient operator. The method improves the training speed of the model while ensuring the privacy of the user, enriches the information quantity of the characteristics of the sample and improves the accuracy of the model.

Patent CN112217706a discloses a data processing method, apparatus, and devices, on the one hand, devices in a data processing system are connected by adopting a ring structure, each device has two communication links with other devices, even if one of the communication links is temporarily interrupted, the device can communicate with other devices through the other communication links, and the data processing system has good stability and robustness. On the other hand, when the data processing system processes data, the model parameters determined by each device are sequentially transmitted according to the communication links, and the data transmitted between the devices are smaller after the received model parameters are fused with the model parameters determined by the devices, so that the model parameters do not need to be intensively transmitted to one device, the problems of overload and communication congestion can be effectively avoided, the data processing speed and efficiency can be effectively improved, and the stability of data processing is ensured.

Patent CN112330048A discloses a method and device for training scoring card model, storage medium and electronic device, the method comprises: carrying out box division on continuous variables in a data wide table to obtain discrete variables; inputting the variable into a logistic regression model with constraint, converting the logistic regression model into a grading card model, and calculating the compensation and scale of the grading card model, wherein the constraint condition of the logistic regression model is to limit the lower bound of the variable coefficient to be non-negative. According to the invention, the lower bound of the variable coefficient in the logistic regression model is non-negative, so that the problem that the model loses the original interpretation ability due to the fact that the individual variable coefficient is negative because of the fact that multiple collineation relations exist among independent variables when the scoring card model is trained by using the logistic regression algorithm in the related technology is solved, and the effects of avoiding repeated model iteration and reducing the time cost and the training expenditure of model training are achieved.

It can be seen that the following drawbacks exist in the aspect of knowledge migration technology based on federal learning on the market at present:

1. in the prior art, the application of big data is mainly used for the management of massive data, is used for economic prediction or business application, and is little in industrial application or industrial application guidance.

2. On the other hand, the data are grouped without the corresponding technology or the corresponding technical teaching, so that the data are grouped, and how the data are grouped is unknown, if empirically, the data are obviously unscientific, and the unscientific grouping obviously leads to inaccuracy of a model obtained by training the data.

3. In the prior art, when training is performed on data records, the size and the quantity of the data are not considered, and when all data are directly trained so as to obtain a model, the data are easy to cause too large data, so that on one hand, the data calculation amount is large, and the data calculation is difficult; meanwhile, the data training model is inaccurate due to the large data volume.

4. In the prior art, in the data record, abnormal data records possibly existing are not subjected to data preliminary cleaning, and abnormal data are easy to generate to cause abnormal model obtained through data training.

In view of the foregoing, it is desirable to provide a technical means of a federal machine learning service method that can quickly perform data training while reducing the need for the capabilities of a data processing system, so as to quickly process data to obtain a data model. However, there is no effective solution to the above technical problems in the prior art.

In view of the foregoing, it is desirable to provide a federal machine combination service method and system that combines big data analysis feedback to solve the foregoing technical problems.

Disclosure of Invention

In view of the above technical problems, the present invention is directed to providing a federal machine combination service method and system that combines big data analysis feedback to solve the problems set forth in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a federal machine combination service system combining big data analysis feedback comprises a plurality of devices distributed at different addresses, big data analysis and scheduling modules; each device comprises a data preprocessing module, a data acquisition module and a data reading module;

the data training fusion submodule is arranged on part of the equipment; the federal data training module is arranged on one of the devices; all the devices comprise a single-machine storage module, a local data storage module is arranged on the device provided with the data training fusion sub-module, and a global data storage module is arranged on the device provided with the federal data training module;

The big data analysis and scheduling module analyzes and schedules all the equipment, the working module and the data progress data which participate in the federal machine learning; the big data analysis and scheduling module is in data communication connection with the data reading module, the data training fusion sub-module and the federal data training module; the data acquisition module is in data communication connection with the single storage module, and the single storage module is also in data communication connection with the data preprocessing module and the data reading module respectively;

when the equipment is in operation, the data acquisition module acquires operation data and state data which are installed on the equipment to form data records, the data records are stored in the single-machine storage module of the equipment, the data preprocessing module reads the data records stored in the single-machine storage module, each data record is analyzed by utilizing a mathematical statistics method and set requirements, and when a certain data record is found to be obviously unreasonable, the data record is deleted;

the big data analysis and scheduling module utilizes the past operation characteristics of the equipment and the generated data record quantity characteristics to formulate grouping rules of the equipment, so that the big data analysis and scheduling module groups all the equipment, divides all the equipment into a plurality of groups according to a certain rule, ensures that at least one data training fusion sub-module exists in each group, sends the grouped information to the data reading module, the data training fusion sub-module and the federal data training module, and modifies the reading authority of the data records of the data reading module, the data training fusion sub-module and the federal data training module;

The data training fusion submodule establishes data communication connection with the data reading module of the corresponding group according to the reading authority distributed by the big data analysis and scheduling module, so that the data training fusion submodule reads the data records stored in the single machine storage module through the data reading module to perform data learning training to obtain a data federation submodule, and extracts a certain amount of data records from the data records applied in the obtained data federation submodule at random and sends the data records to the local data storage module;

the federation data training module reads the data federation sub-model and the data records stored in the local data storage module, obtains a total data federation model by adopting a parameter weighting mode for all the federation sub-models, performs data training by using the read extracted data records to obtain corresponding parameters, and further obtains the total data federation model which is sent to the global data storage module for storage;

and the big data analysis and scheduling module randomly extracts a certain number of data records from the single machine storage modules on all the devices, and is used for verifying the total data federation model, when the data records are verified by using the total data federation model, the total data federation model is built when the data output and the data in the data records meet the model precision requirement, otherwise, randomly extracted data records stored in the local data storage module are randomly extracted again, and the federation data training module is used for carrying out the process of building the total data federation model again.

Preferably, in the process of obtaining the total data federation model by adopting a parameter weighting mode by all federation sub-models, the parameter initial value adopts the big data analysis and scheduling module to analyze the characteristic of the data record quantity according to the previous parameter adopted by the total data federation model or the big data as the parameter initial value, and the data record training is performed by using the read extracted data record on the basis to obtain the final corresponding parameter, so that the total data federation model is obtained, and the convergence speed of the data record training is increased.

Preferably, when each group uses the data training fusion submodule to perform data training to generate the data federation submodule, in order to increase the dimension of training data, longitudinal data federation learning is adopted for part of the groups, and federation transfer learning is adopted for the rest of the groups; or part of the groups adopt longitudinal federal data learning, part of the groups adopt transverse federal learning, and the rest part adopts federal transfer learning.

Preferably, when the data preprocessing module performs data cleaning of the data records, the big data analysis and scheduling module performs preliminary data cleaning on the data by using the existing historical data records or data federation submodel, analyzes each data record, and rejects the data record when the data record deviates to a certain extent, so that the data record cleaning is more accurate.

Preferably, the big data analysis and scheduling module analyzes the reasons of abnormal data records by combining the unreasonable data records which are removed with the operation characteristics of the prior equipment when the data records are cleaned, so as to modify the data records or add the parameters of the data records.

Preferably, when all the devices are grouped by using the big data analysis and scheduling module, the size of the data record amount of each device is estimated and obtained in advance by using a big data estimation method, and when the devices are grouped, the large data record amount is grouped in the same group, and the small data record amount is one group, so that the large data record amount is prevented from flooding the small data record amount when the data are trained, and the overall data federation model accuracy is constructed.

Preferably, when all the devices are grouped by using the big data analysis and scheduling module, the size of the data record amount of each device is estimated and obtained in advance by using a big data estimation method, and when the devices are grouped, the number of the devices with the large data record amount is small, and the number of the devices with the small data record amount is large, so that the moderate number of the data records of each group is ensured, and the calculation amount of all the data training fusion submodules is suitable.

Preferably, for the data training fusion submodule and the federal data training module on a plurality of devices in a same group, the big data analysis and scheduling module designates one of the federal data training modules as the federal data training module of the service migration system, one of the data training fusion submodules in the same group is the data training fusion submodule, so that the data training of the same group is realized, and the data training task executed by one or more of the data training fusion submodules can be submitted to the big data analysis and scheduling module for execution.

In another aspect, the present application further provides a federal machine combination service method combined with big data analysis feedback, including a federal machine combination service system combined with big data analysis feedback, which is characterized by comprising the following steps:

step S1, initializing the federal machine combination service system combined with big data analysis feedback, wherein the big data analysis and scheduling module utilizes the past operation characteristics and the generated data record quantity characteristics of the equipment to formulate the grouping rules of the equipment,

the large data analysis and scheduling module divides all the devices into a plurality of groups according to a certain rule, specifically, the large data analysis and scheduling module predicts the size of the data record of each device by using a large data prediction method in advance, when all the devices are grouped, the large data record is grouped in the same group, and the small data record is one group, so that the large data record is prevented from flooding the small data record when the data training is performed, the small data record is performed, and the large data record is ensured, so that the moderate data record is ensured; the data training fusion sub-module is used for transmitting the information of the group to the data reading module, the data training fusion sub-module and the federal data training module, and modifying the reading authority of the data records of the data reading module, the data training fusion sub-module and the federal data training module;

Step S2, when the equipment is in operation, the data acquisition module acquires operation data and state data installed on the equipment to form a data record, and the data record is stored in the stand-alone storage module of the equipment;

step S3, the data preprocessing module reads the data records stored in the stand-alone storage module, analyzes each data record by utilizing a mathematical statistics method and set requirements, and deletes a certain data record when the data record is found to be obviously unreasonable;

step S4, the data training fusion submodule establishes data communication connection with the data reading modules of the corresponding groups according to the reading rights distributed by the big data analysis and scheduling module, so that the data training fusion submodule reads the data records stored in the stand-alone storage module through the data reading modules to perform data learning training to obtain a data federation submodule;

step S5, the data federation sub-model and random extraction of a certain amount of data records from the data records applied in the data federation sub-model are sent to the local data storage module;

Step S6, the federation data training module reads the data federation sub-model and the data records stored in the local data storage module, adopts a parameter weighting mode for all the federation sub-models to obtain a total data federation model, and performs data training by using the read extracted data records so as to obtain the total data federation model;

step S7, the big data analysis and scheduling module randomly extracts a certain number of data records from the single machine storage modules on all devices for checking the total data federation model, when the data records are checked by using the total data federation model and data in data output and data records meet the model precision requirement, the total data federation model is built,

and S8, if not, randomly extracting the randomly extracted data records stored in the local data storage module again, and establishing a total data federation model by using the federation data training module again.

Compared with the prior art, the invention has the beneficial effects that:

1. the federation machine combination service system combining big data analysis feedback breaks through the traditional global overall data training so as to form a data federation model, and adopts a distributed data federation training mode, so that on one hand, the data sample size can be improved, the training is more accurate, and on the other hand, the overall data cost can be reduced.

2. According to the federation machine combination service system combining big data analysis feedback, the big data analysis method is utilized to conduct data analysis on the data records acquired by all the devices, all the devices are estimated, the estimated data size of all the devices is obtained, and then the estimated data size is sent to the federation machine learning scheduling module so as to provide data grouping, so that the data grouping is more accurate and effective, and the technical problem of grouping is effectively solved.

3. According to the federation machine combination service system combining big data analysis feedback, when equipment is grouped for federation training, the large data record quantity is grouped in the same group, and the small data record quantity is grouped, so that the large data record quantity is prevented from submerging the small data record quantity when the data training is carried out, and the accuracy of the total data federation model is constructed; at the same time, the number of the devices with large data record quantity is small, and the number of the devices with small data record quantity is large, so that the moderate number of the data records of each group is ensured, and the calculation quantity of all the data training fusion sub-modules is suitable.

4. According to the federal machine combination service system combined with big data analysis feedback, data is cleaned for removing abnormal parts of the data records, and meanwhile, reasonable analysis is carried out on the abnormal parts of the data, so that the reasons of the abnormality are found.

Drawings

FIG. 1 is a schematic diagram of the overall structure of the present invention;

fig. 2 is a schematic diagram of a data flow structure of each module provided in the apparatus of the present invention.

In the figure: 1. an apparatus; 2. big data analysis and dispatch module; 3. a data preprocessing module; 4. a data fusion sub-module; 5. a federal data training module; 6. a data sensing module; 7. a stand-alone storage module; 8. a local data storage device; 9. a global data storage module; 10. grouping; 11. and a data reading module.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First embodiment:

a federal machine combination service system combining big data analysis feedback comprises a plurality of devices 1 distributed at different addresses and a big data analysis and scheduling module 2; each device 1 comprises a data preprocessing module 3, a data acquisition module 6 and a data reading module 11;

the data training fusion sub-module 4 is arranged on part of the equipment; the federal data training module 5, wherein the federal data training module 5 is arranged on one of the devices 1; all the devices 1 comprise a single storage module 7, a local data storage module 8 is arranged on the device provided with the data training fusion sub-module 4, and a global data storage module 9 is arranged on the device 1 provided with the federal data training module 5;

the big data analysis and scheduling module 2 analyzes and schedules all the equipment, the working module and the data progress data which participate in the federal machine learning; the big data analysis and scheduling module 2 is connected with the data reading module 11, the data training fusion sub-module 4 and the federal data training module 5 in a data communication manner; the data acquisition module 6 is in data communication connection with the single machine storage module 7, and the single machine storage module 7 is also respectively in data communication connection with the data preprocessing module 3 and the data reading module 11;

When the equipment 1 operates, the data acquisition module 6 acquires operation data and state data installed on the equipment 1 to form data records, the data records are stored in the single-machine storage module 7 of the equipment 1, the data preprocessing module 3 reads the data records stored in the single-machine storage module 7, each data record is analyzed by using a mathematical statistics method and set requirements, and when a certain data record is found to be obviously unreasonable, the data record is deleted;

the big data analysis and scheduling module 2 makes use of the previous operation characteristics of the equipment and the generated data record quantity characteristics to formulate grouping rules of the equipment, so that the big data analysis and scheduling module 2 groups all the equipment 1, divides all the equipment 1 into a plurality of groups 10 according to a certain rule, ensures that at least one data training fusion sub-module 4 exists in each group 10, sends the grouping information to the data reading module 11, the data training fusion sub-module 4 and the federal data training module 5, and modifies the reading authority of the data records of the data reading module 11, the data training fusion sub-module 4 and the federal data training module 5;

The data training fusion sub-module 4 establishes data communication connection with the data reading module 11 of the corresponding group according to the read permission distributed by the big data analysis and scheduling module 2, so that the data training fusion sub-module 4 reads the data records stored in the single machine storage module 7 through the data reading module 11 to perform data learning training to obtain a data federation sub-model, and extracts a certain amount of data records from the data records applied to the obtained data federation sub-model at random and sends the data records to the local data storage module 8;

the federation data training module 5 reads the data federation sub-model and the data records stored in the local data storage module 8, adopts a parameter weighting mode for all the federation sub-models to obtain a total data federation model, performs data training by using the read extracted data records to obtain corresponding parameters, thereby obtaining the total data federation model, and sends the total data federation model to the global data storage module 9 for storage;

the big data analysis and scheduling module 2 randomly extracts a certain number of data records from the single machine storage module 7 on all the devices 1, and is used for verifying the total data federal model, when the data records are verified by using the total data federal model, and data output and data in the data records meet the model precision requirement, the total data federal model is built, otherwise, randomly extracted data records stored in the local data storage module 8 are randomly extracted again, and the federal data training module 5 is used for carrying out the process of building the total data federal model again.

Preferably, in the process of obtaining the total data federation model by adopting a parameter weighting mode by all federation sub-models, the parameter initial value adopts the parameter adopted by the big data analysis and scheduling module 1 according to the previous total data federation model or the characteristic of analyzing the data record quantity by big data as the parameter initial value, and the data record training is performed by using the read extracted data record on the basis to obtain the final corresponding parameter, so that the total data federation model is obtained, and the convergence speed of the data record training is increased.

Preferably, when each group uses the data training fusion submodule 4 to perform data training to generate the data federation submodule, in order to increase the dimension of training data, longitudinal data federation learning is adopted for part of the groups, and federation transfer learning is adopted for the rest of the groups; or part of the groups adopt longitudinal federal data learning, part of the groups adopt transverse federal learning, and the rest part adopts federal transfer learning.

Preferably, when the data preprocessing module 3 performs data cleaning of the data records, the big data analysis and scheduling module 2 performs preliminary data cleaning on the data by using the existing historical data records or the data federation submodel, analyzes each data record, and rejects the data record when the data record deviates to a certain extent, so that the data record cleaning is more accurate.

Preferably, the big data analysis and scheduling module 2 analyzes the reasons of abnormal data records by combining the unreasonable data records removed with the previous equipment operation characteristics when cleaning the data records, so as to modify the data records or add parameters of the data records.

Preferably, when all the devices 1 are grouped by using the big data analysis and scheduling module 2, the size of the data record amount of each device 1 is estimated and obtained in advance by using a big data estimation method, and when the devices are grouped, the large data record amount is grouped in the same group, and the small data record amount is one group, so that the large data record amount is prevented from flooding the small data record amount when the data training is performed, and the total data federal model accuracy is constructed.

Preferably, when all the devices 1 are grouped by using the big data analysis and scheduling module 2, the size of the data record amount of each device 1 is estimated and obtained in advance by using the estimation method of the big data, and when the devices 1 are grouped, the number of the devices 1 with the large data record amount is small, and the number of the devices 1 with the small data record amount is large, so as to ensure that the number of the data records of each group is moderate, so that the calculation amount of all the data training fusion submodules 4 is suitable.

Preferably, for the data training fusion sub-module 4 and the federal data training module 5 on multiple devices in the same group, the big data analysis and scheduling module 2 designates one of the federal data training modules 5 as the federal data training module 5 of the service migration system, one of the data training fusion sub-modules 4 in the same group is a data training fusion sub-module, so as to implement data training in the same group, and the data training task executed by one or several of the data training fusion sub-modules 4 can be submitted to the big data analysis and scheduling module 2 for execution.

Specific embodiment II:

a federal machine combination service method combined with big data analysis feedback comprises a federal machine combination service system combined with big data analysis feedback, and comprises the following steps:

step S1, initializing the federal machine combination service system combined with big data analysis feedback, wherein the big data analysis and scheduling module 2 utilizes the past operation characteristics of the equipment and the generated data record quantity characteristics to formulate the grouping rules of the equipment,

the big data analysis and scheduling module 2 groups all the devices 1, divides all the devices 1 into a plurality of groups 10 according to a certain rule, specifically, the big data analysis and scheduling module 2 pre-estimates the data record quantity of each device 1 by using a big data pre-estimation method, when all the devices 1 are grouped, the large data record quantity is in the same group, and the small data record quantity is one group, so as to prevent the large data record quantity from flooding the small data record quantity when the data training is carried out, the small data record quantity of the large data record quantity of the small data record quantity of the device 1, and the large data record quantity of the device 1 is in order to ensure the moderate data record quantity of each group; and ensuring that at least one data training fusion sub-module 4 exists in each group 10, sending the information of the groups to the data reading module 11, the data training fusion sub-module 4 and the federal data training module 5, and modifying the reading authority of the data records of the data reading module 11, the data training fusion sub-module 4 and the federal data training module 5;

Step S2, when the equipment 1 is in operation, the data acquisition module 6 acquires operation data and state data installed on the equipment 1 to form a data record, and the data record is stored in the stand-alone storage module 7 of the equipment 1;

step S3, the data preprocessing module 3 reads the data records stored in the stand-alone storage module 7, analyzes each data record by utilizing a mathematical statistics method and set requirements, and deletes a certain data record when the data record is found to be obviously unreasonable;

step S4, the data training fusion sub-module 4 establishes a data communication connection with the data reading module 11 of the corresponding packet according to the read authority allocated by the big data analysis and scheduling module 2, so that the data training fusion sub-module 4 performs data learning training by reading the data record stored in the stand-alone storage module 7 through the data reading module 11 to obtain a data federation sub-model;

step S5, the data federation sub-model and random extraction of a certain amount of data records from the data records applied in the data federation sub-model are sent to the local data storage module 8;

Step S6, the federation data training module 5 reads the data federation sub-model and the data record stored in the local data storage module 8, and obtains a total data federation model by adopting a parameter weighting mode for all the federation sub-models, and performs data training by using the read extracted data record, thereby obtaining the total data federation model;

step S7, the big data analysis and scheduling module 2 arbitrarily extracts a certain number of data records from the stand-alone storage modules 7 on all the devices 1, for checking the total data federal model, and when the data records are checked by using the total data federal model and the data output and the data in the data records meet the model accuracy requirement, the total data federal model is built,

step S8, otherwise, the randomly extracted data record stored in the local data storage module 8 is randomly extracted again, and the federal data training module 5 is utilized again to perform the process of building the overall federal model of data.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A federal machine combination service system combining big data analysis feedback comprises a plurality of devices (1) distributed at different addresses, and a big data analysis and scheduling module (2); each device (1) comprises a data preprocessing module (3), a data acquisition module (6) and a data reading module (11);

The data training fusion sub-module (4), wherein the data training fusion sub-module (4) is arranged on part of the equipment; the federal data training module (5), the federal data training module (5) is arranged on one of the devices (1); all the devices (1) comprise a single storage module (7), a local data storage module (8) is arranged on the device provided with the data training fusion sub-module (4), and a global data storage module (9) is arranged on the device (1) provided with the federal data training module (5);

the big data analysis and scheduling module (2) analyzes and schedules all the equipment, the working module and the data progress data which participate in the federal machine learning; the big data analysis and scheduling module (2) is in data communication connection with the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5); the data acquisition module (6) is in data communication connection with the single machine storage module (7), and the single machine storage module (7) is also in data communication connection with the data preprocessing module (3) and the data reading module (11) respectively;

the method is characterized in that:

when the equipment (1) operates, the data acquisition module (6) acquires operation data and state data installed on the equipment (1) to form data records, the data records are stored in the single-machine storage module (7) of the equipment (1), the data preprocessing module (3) reads the data records stored in the single-machine storage module (7), each data record is analyzed by utilizing a mathematical statistics method and set requirements, and when a certain data record is found to be obviously unreasonable, the data record is deleted;

The big data analysis and scheduling module (2) makes grouping rules of the equipment by utilizing the past operation characteristics of the equipment and the generated data record quantity characteristics, so that the big data analysis and scheduling module (2) groups all the equipment (1), divides all the equipment (1) into a plurality of groups (10) according to a certain rule, ensures that at least one data training fusion sub-module (4) exists in each group (10), sends grouped information to the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5), and modifies the reading authority of the data records of the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5);

the data training fusion sub-module (4) establishes data communication connection with the data reading module (11) corresponding to the large data analysis and scheduling module (2) according to the reading authority distributed by the large data analysis and scheduling module (2), so that the data training fusion sub-module (4) reads the data records stored in the single machine storage module (7) through the data reading module (11) to perform data learning training to obtain a data federation sub-model, and extracts a certain amount of data records from the data records applied to the obtained data federation sub-model at random and sends the data records to the local data storage module (8);

The federation data training module (5) reads the data federation sub-model and the data records stored in the local data storage module (8), adopts a parameter weighting mode for all the federation sub-models to obtain a total data federation model, performs data training by using the read extracted data records to obtain corresponding parameters, and further obtains the total data federation model which is sent to the global data storage module (9) for storage;

the big data analysis and scheduling module (2) randomly extracts a certain number of data records from the single machine storage module (7) on all the devices (1) and is used for checking the total data federation model, when the data records are checked by using the total data federation model, the total data federation model is built when data output and data in the data records meet the model precision requirement, otherwise, randomly extracted data records stored in the local data storage module (8) are randomly extracted again, and the federation data training module (5) is used for building the total data federation model again;

wherein the manner of dividing all the devices (1) into a plurality of groups (10) according to a certain rule is as follows:

When all the devices (1) are grouped by utilizing the big data analysis and scheduling module (2), the size of the data recording amount of each device (1) is estimated and acquired in advance by utilizing a big data estimation method, and when the devices are grouped, the large data recording amount is grouped in the same group, and the small data recording amount is one group, so that the large data recording amount is prevented from submerging the small data recording amount when the data training is carried out; when all the devices (1) are grouped by utilizing the big data analysis and scheduling module (2), the size of the data recording quantity of each device (1) is estimated and acquired in advance by utilizing a big data estimation method, and when the devices are grouped, the number of the devices (1) with the large data recording quantity is small, and the number of the devices (1) with the small data recording quantity is large.

2. A federal machine portfolio service system incorporating big data analytics feedback in accordance with claim 1, wherein: in the process of obtaining a total data federation model by adopting a parameter weighting mode of all federation sub-models, the parameter initial value adopts the big data analysis and scheduling module (2) to analyze the characteristics of the data record quantity as the parameter initial value according to the parameters adopted by the previous total data federation model or by utilizing big data, and the data record training is carried out by utilizing the read extracted data record on the basis to obtain the final corresponding parameters, so that the total data federation model is obtained, and the convergence speed of the data record training is increased.

3. A federal machine portfolio service system incorporating big data analytics feedback in accordance with claim 1, wherein: when each group utilizes the data training fusion sub-module (4) to perform data training to generate the data federation sub-model, in order to increase the dimension of training data, longitudinal data federation learning is adopted for part of the groups, and federation transfer learning is adopted for the rest of the groups; or part of the groups adopt longitudinal federal data learning, part of the groups adopt transverse federal learning, and the rest part adopts federal transfer learning.

4. A federal machine portfolio service system incorporating big data analytics feedback in accordance with claim 1, wherein: when the data preprocessing module (3) performs data cleaning of data records, the big data analysis and scheduling module (2) performs preliminary data cleaning on the data by utilizing the existing historical data records or data federation submodel, analyzes each data record, and rejects the data record when the data record deviates to a certain extent so as to enable the data record cleaning to be more accurate.

5. A federal machine composition service system incorporating big data analysis feedback in accordance with claim 4, wherein: and the big data analysis and scheduling module (2) analyzes the reasons of abnormal data records by combining the unreasonable data records which are removed with the operation characteristics of the traditional equipment when the data records are cleaned, so as to modify the data records or add the parameters of the data records.

6. A federal machine portfolio service system incorporating big data analytics feedback in accordance with claim 1, wherein: the data training fusion sub-module (4) and the federal data training module (5) on a plurality of devices in the same group are subjected to data training, one federal data training module (5) in the large data analysis and scheduling module (2) is designated as the federal data training module (5) of the federal machine combined service system, one data training fusion sub-module (4) in the same group is a data training fusion sub-module, data training of the same group is realized, and data training tasks executed by one or a plurality of data training fusion sub-modules (4) in the data training fusion sub-module can be submitted to the large data analysis and scheduling module (2) for execution.

7. A federal machine composite service method incorporating big data analytics feedback, comprising the federal machine composite service system incorporating big data analytics feedback as claimed in any one of claims 1 to 6, comprising the steps of:

step S1, initializing the federal machine combination service system combined with big data analysis feedback, wherein the big data analysis and scheduling module (2) utilizes the past operation characteristics of the equipment and the generated data record quantity characteristics to formulate the grouping rules of the equipment,

The big data analysis and scheduling module (2) groups all the devices (1), divides all the devices (1) into a plurality of groups (10) according to a certain rule, specifically, the big data analysis and scheduling module (2) pre-estimates the data record quantity of each device (1) by using a big data pre-estimation method, when all the devices (1) are grouped, the large data record quantity is in the same group, and the small data record quantity is one group, so as to prevent the large data record quantity from flooding the small data record quantity when the data training is carried out, the small data record quantity of the large data record quantity of the devices (1) is divided into a plurality of groups, and the large data record quantity of the devices (1) is divided into a plurality of groups, so that the moderate data record quantity of each group is ensured; and ensuring that at least one data training fusion sub-module (4) exists in each group (10), sending the information of the group to the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5), and modifying the reading authority of the data records of the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5);

Step S2, when the equipment (1) is in operation, the data acquisition module (6) acquires operation data and state data installed on the equipment (1) to form a data record, and the data record is stored in the stand-alone storage module (7) of the equipment (1);

step S3, the data preprocessing module (3) reads the data records stored in the single machine storage module (7), analyzes each data record by utilizing a mathematical statistics method and a set requirement, and deletes a certain data record when the data record is found to be obviously unreasonable;

step S4, the data training fusion sub-module (4) establishes data communication connection with the data reading module (11) of the corresponding group according to the reading permission distributed by the big data analysis and scheduling module (2), so that the data training fusion sub-module (4) reads the data record stored in the single machine storage module (7) through the data reading module (11) to perform data learning training to obtain a data federation sub-model;

step S5, the data federation sub-model and random extraction of a certain amount of data records from the data records applied in the data federation sub-model are sent to the local data storage module (8);

Step S6, the federation data training module (5) reads the data federation sub-model and the data records stored in the local data storage module (8), adopts a parameter weighting mode for all the federation sub-models to obtain a total data federation model, and performs data training by using the read extracted data records so as to obtain the total data federation model;

step S7, the big data analysis and dispatch module (2) randomly extracts a certain number of data records from the single machine storage module (7) on all the devices (1) for checking the total data federation model, when the data records are checked by the total data federation model and the data output and the data in the data records meet the model precision requirement, the establishment of the total data federation model is completed,

step S8, otherwise, randomly extracting the randomly extracted data records stored in the local data storage module (8) again, and establishing a total data federation model by utilizing the federation data training module (5) again.

8. A federal machine portfolio service method in conjunction with big data analytics feedback in accordance with claim 7, wherein: the data training fusion sub-module (4) and the federal data training module (5) on a plurality of devices in the same group are subjected to data training, one federal data training module (5) in the large data analysis and scheduling module (2) is designated as the federal data training module (5) of the federal machine combined service system, one data training fusion sub-module (4) in the same group is a data training fusion sub-module, data training of the same group is realized, and data training tasks executed by one or a plurality of data training fusion sub-modules (4) in the data training fusion sub-module can be submitted to the large data analysis and scheduling module (2) for execution.