CN114925072A - Data management method, apparatus, system, device, medium, and program product - Google Patents

Data management method, apparatus, system, device, medium, and program product Download PDF

Info

Publication number
CN114925072A
CN114925072A CN202210667189.4A CN202210667189A CN114925072A CN 114925072 A CN114925072 A CN 114925072A CN 202210667189 A CN202210667189 A CN 202210667189A CN 114925072 A CN114925072 A CN 114925072A
Authority
CN
China
Prior art keywords
data
index
row
data table
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210667189.4A
Other languages
Chinese (zh)
Other versions
CN114925072B (en
Inventor
高晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhixing Technology Co Ltd
Original Assignee
Shenzhen Zhixing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhixing Technology Co Ltd filed Critical Shenzhen Zhixing Technology Co Ltd
Priority to CN202210667189.4A priority Critical patent/CN114925072B/en
Publication of CN114925072A publication Critical patent/CN114925072A/en
Application granted granted Critical
Publication of CN114925072B publication Critical patent/CN114925072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data management method, apparatus, system, electronic device, non-transitory computer-readable storage medium, and computer program product, which relate to the field of computer technologies, and in particular, to the field of federal learning and the field of privacy computation, and may be used to manage private data. The implementation scheme is as follows: in response to receiving an import instruction, acquiring data information corresponding to at least one first data row in a first data set, wherein the data information comprises at least one ID value, backtracking time and feature data corresponding to an attribute column; determining an index corresponding to each of the at least one first data row based on the ID value and the backtracking time corresponding to each of the at least one first data row, wherein the index corresponds to the index column; importing the index and the characteristic data corresponding to each of the at least one first data row into a data table; in response to receiving the first derivation instruction, a second data set associated with the federated learning task is derived from the data table based on an index corresponding to an index column in the data table.

Description

Data management method, apparatus, system, device, medium, and program product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of federal learning and the field of privacy computing, and more particularly, to a method, an apparatus, a system, an electronic device, a non-transitory computer-readable storage medium, and a computer program product for managing private data.
Background
Federal Machine Learning (Federal Machine Learning), also known as Federal Learning (Federal Learning), is a Machine Learning framework, and can effectively help a plurality of participants to perform data use and Machine Learning modeling under the condition of meeting user privacy protection and data security. Federal learning is used as a distributed machine learning paradigm, the problem of data island can be effectively solved, participators can complete a joint learning task on the basis of not sharing data, the data island can be technically broken, and AI (Artificial Intelligence) cooperation is realized.
Federal learning defines a machine learning framework under which the problem of different data owners collaborating without exchanging data can be solved by designing virtual models. The virtual model is an optimal model for all parties to aggregate data together, and the objective of federal learning is that the virtual model is infinitely close to a model obtained according to a traditional modeling mode, namely, the model is obtained by aggregating data of a plurality of data owners to one place for modeling. Under a federal mechanism, the identity and the status of each participant (namely, the data owner) are the same, and a shared data policy can be established. Since the privacy data of each participant is not transferred, the privacy of the user is not revealed or the data specification is not influenced. It should be noted that the federal learning task is not limited to federal modeling, and may be, for example, a federal query task, a federal statistical task, or the like.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.
Disclosure of Invention
The present disclosure provides a data management method, apparatus, system, electronic device, non-transitory computer readable storage medium, and computer program product.
According to one aspect of the disclosure, a data management method is provided, which is applied to a data management system in communication connection with any one of a plurality of participants executing the same federal learning task, and is characterized in that a data table is built in the data management system, and the data table comprises an index column and a plurality of attribute columns. The method comprises the following steps: in response to receiving an import instruction of the participant, acquiring data information corresponding to at least one first data row in a first data set to be imported, wherein the data information comprises at least one ID value, backtracking time and feature data corresponding to at least one attribute column; determining an index corresponding to each of the at least one first data line based on the at least one ID value and the backtracking time corresponding to each of the at least one first data line, wherein the index corresponds to the index column; importing the index and the characteristic data corresponding to each of the at least one first data row into a data table; and in response to receiving the first derivation instruction of the participant, deriving a second data set associated with the federated learning task from the data table based on at least one index corresponding to an index column in the data table.
According to another aspect of the present disclosure, a federated learning method is provided, which is applied to any one participant in a federated learning task, and the participant is in communication connection with a plurality of target participants in the federated learning task, and is characterized in that a data management system of the participant is internally provided with a data table, and the data table comprises an index column and at least one attribute column, wherein the data table further comprises at least one first data row, each first data row comprises an index corresponding to the index column and feature data corresponding to the at least one attribute column, and wherein the index is related to at least one ID value and a backtracking time of the corresponding first data row. The method comprises the following steps: sending a first derivation instruction to a data management system to obtain a second data set associated with the federal learning task, wherein the second data set is derived from a data table based on at least one index corresponding to an index column in the data table; and performing subsequent subtasks of the federated learning task with other ones of the plurality of target participants based on the second set of data.
According to another aspect of the present disclosure, there is provided a data management system applied to any one of a plurality of participants who perform the same federal learning task, wherein the data management system has a data table built therein, the data table including an index column and a plurality of attribute columns, the data management system including: the importing unit is configured to respond to an importing instruction of the participant, and acquire data information corresponding to at least one first data row in a first data set to be imported, wherein the data information comprises at least one ID value, backtracking time and feature data corresponding to at least one attribute column; determining an index corresponding to each of the at least one first data line based on the at least one ID value and the backtracking time corresponding to each of the at least one first data line, wherein the index corresponds to the index column; and importing the index and the at least one characteristic data corresponding to each of the at least one first data row into a data table; and a derivation unit configured to derive a second data set associated with the federated learning task from the data table based on at least one index corresponding to the index column in the data table in response to receiving the first derivation instruction of the participant.
According to another aspect of the present disclosure, there is provided a federated learning apparatus applied to any one participant in a federated learning task, where the participant is in communication connection with a plurality of target participants in the federated learning task, the federated learning apparatus is characterized in that a data table is built in a data management system of the participant, the data table includes an index column and at least one attribute column, wherein the data table further includes at least one first data row, each first data row includes an index corresponding to the index column and feature data corresponding to the at least one attribute column, and wherein the index is related to at least one ID value and backtracking time of the corresponding first data row, the apparatus includes: a sending unit configured to send a first derivation instruction to a data management system to obtain a second data set associated with the federated learning task, wherein the second data set is derived from the data table based on at least one index corresponding to an index column in the data table; and an execution unit configured to execute subsequent subtasks of the federated learning task with other ones of the plurality of target participants based on the second data set.
According to another aspect of the present disclosure, there is provided a federated learning system, including: the federal learning device described above.
According to the above federal learning system, further comprising: the data management system is provided.
According to another aspect of the present disclosure, there is provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the preceding.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method according to any of the above.
According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements a method according to any of the above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
FIG. 1 shows a flow diagram of a data management method according to an example embodiment of the present disclosure;
FIG. 2 illustrates a flow diagram of deriving a second data set associated with a federated learning task from a data table in accordance with an exemplary embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a federated learning method in accordance with an exemplary embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of a federated learning method in accordance with an exemplary embodiment of the present disclosure;
FIG. 5 illustrates a flow chart of a federated learning method in accordance with an exemplary embodiment of the present disclosure;
FIG. 6 shows a schematic diagram of a federated learning method in accordance with an exemplary embodiment of the present disclosure;
FIG. 7 shows a block diagram of a data management system according to an example embodiment of the present disclosure;
FIG. 8 illustrates a block diagram of a federated learning device in accordance with an exemplary embodiment of the present disclosure;
fig. 9 shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to define a positional relationship, a temporal relationship, or an importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.
The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element may be one or a plurality of. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.
Federal learning is a distributed machine learning framework with privacy protection and security encryption technology, and aims to enable each dispersed participant to collaborate to perform a Federal learning task on the premise of not disclosing privacy data to other participants. The federal learning task may be, for example, training of a machine learning model, and may also be, for example, a federal query task, a federal statistical task, or the like.
The backtracking time (or backtracking for short) is used for representing the recording time of the attribute of the data individual, so that the historical attribute of the individual can be conveniently recorded and inquired. In the field of federal learning modeling, it is an important process to trace back time of each participant participating in modeling with an individual. In the existing federal learning scene related to backtracking business, if the files of a local file system are used for storing data of different backtracking times, the data are processed more complexly; conventional databases typically do not support the use of backtracking scenarios.
The invention provides a data management method, a device, a system, an electronic device, a non-transitory computer readable storage medium and a computer program product, wherein a data table comprising index columns related to ID and backtracking time is established in a data management system of a participant of a federal learning task, so that data of the same individual at different backtracking times can be stored in the same data table, and specified indexed data can be exported to obtain a data set for a downstream federal learning task, thereby increasing the flexibility and convenience of data use and improving the overall efficiency of the federal learning task.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 shows a flow diagram of a federal learning method 100 in accordance with an exemplary embodiment of the present disclosure. The federated learning method 100 is applied to a data management system that is communicatively coupled to any one of a plurality of participants that perform the same federated learning task. The data management system is internally provided with a data table, and the data table comprises an index column and a plurality of attribute columns. As shown in fig. 1, the federal learning method 100 includes: step S102, responding to the received import instruction of the participant, and acquiring data information corresponding to at least one first data row in a first data set to be imported, wherein the data information comprises at least one ID value, backtracking time and characteristic data corresponding to at least one attribute column; step S104, determining respective corresponding indexes of at least one first data line based on at least one ID value and backtracking time of the at least one first data line, wherein the indexes correspond to the index columns; step S106, importing the index corresponding to each of at least one first data row and at least one characteristic data into a data table; and step S108, in response to receiving the first derivation instruction of the participant, deriving a second data set associated with the federal learning task from the data table based on at least one index corresponding to the index column in the data table. It is understood that steps S102-S106 may be performed sequentially in response to receiving the import instruction of the current participant.
Therefore, by acquiring data information such as at least one ID value, backtracking time and feature data of at least one first data row to be imported, determining the index of the first data row based on the at least one ID value and the backtracking time of each first data row, and importing the index and the feature data corresponding to the at least one first data row into a data table, multiple sets of feature data of the same individual at different backtracking times can be stored in the data table. And data derivation based on the index enables flexible and convenient derivation of data sets from data tables that meet specific ID and/or backtracking time requirements. Compared with a mode of independently managing data with different backtracking times in the related technology, the method disclosed by the invention reduces the difficulty in storing, managing and using the backtracking data in a federal learning scene related to backtracking services, simplifies the service flow of executing the federal learning task, and improves the overall efficiency of executing the federal learning task.
The ID in the embodiment of the present disclosure refers to an identifier capable of uniquely marking an object/individual, and may be, for example, a mobile phone number, an identity card number, or a user account number, which is not limited herein and may be set according to an actual application scenario. In an embodiment of the present disclosure, each individual may have one ID value or may have a plurality of ID values, which is not limited herein. In embodiments where an individual has multiple ID values, the ID values may each be stored in a data table in the data management system, and a given ID may be used alone in the use of the data. For example, a second data set associated with the federated learning task may be derived based on an index (ID value or joint index, as will be described below) that is related to the specified ID, thereby causing the federated learning system to perform the federated learning task based on the specified ID. It will be appreciated that an ID may be agreed upon in advance between the parties as a designated ID, or designated by one party and sent to the other party, or otherwise made the same to ensure successful execution of the federal learning task. In some embodiments, an ID set may refer to a set of a series of different ID values for the same ID (the same column in a data table).
The backtracking time in the embodiment of the present disclosure refers to an identifier that can be used to represent the recording time of the attribute/feature data of the object/individual, so as to facilitate recording and querying the historical attribute/feature data of the object/individual. For example, a plurality of features/attributes of the same object may be recorded at a plurality of different trace-back times, so as to obtain a plurality of sets of feature data corresponding to the plurality of trace-back times, respectively, and the feature data of different sets correspond to the same plurality of features/attributes. In a federated learning scenario involving a backtracking service, data of each participant for a federated learning task needs to meet a specific backtracking time requirement, for example, local feature data corresponding to a common sample of each participant at a specified backtracking time. It is understood that the backtracking time may be a time point or a time period, and is not limited herein.
In some embodiments, the data management system has built in one or more of a documentation database, a graphics database, a key-value pair database, etc. to store the data sets of the participants.
According to some embodiments, the data management system has a relational database built-in, wherein the first data set is stored in the relational database in the form of relational data. Further, the relational database may include a plurality of relational data tables, each of which may store data in a plurality of rows and columns.
Thus, by storing the data set in the form of relational data, operations such as retrieval, extraction, concatenation, and the like of specific rows and columns of the data set can be achieved. In this embodiment, the built-in relational database in the data management system may enable the derivation of characteristic data from the data set of the participants that meets specific ID and/or backtracking time requirements or expectations.
In the present disclosure, operations such as data import, export, query, and the like will be described with SQL as an example of a relational database, but such examples are not intended to limit the types of databases. It is understood that the data management method of the present disclosure can be performed on other relational databases or non-relational databases with similar functions, and such implementation falls within the scope of the present disclosure.
In some embodiments, the index columns may include an index column corresponding to the backtracking time and an index column corresponding to the ID value, so that the correspondence relationship between each data row and the backtracking time and the correspondence relationship between each data row and each individual are respectively established through the index columns and the index columns. In some embodiments, the index column may include a joint index column corresponding to a joint index related to both the trace-back time and the ID value, so that a joint index uniquely pointing to each data row is set in the data table for each data row, and a corresponding relationship between each data row and the trace-back time and a corresponding relationship between each data row and an individual are established through the joint index. In some embodiments, the index columns may include a combination of at least one of the index column corresponding to the trace-back time and the index column corresponding to the ID value and a joint index column, that is, at least one of the index column corresponding to the trace-back time and the index column corresponding to the ID value and an index column corresponding to a joint index, or include one or more other index columns capable of embodying the ID value and the trace-back time of the data row, which is not limited herein.
In some embodiments, the joint index for a data line may be derived by concatenating the corresponding ID value and the traceback time corresponding to the data line. It is to be understood that the joint index associated with both the ID value and the traceback time may be determined in other ways, and is not limited herein. In some embodiments, different parties may generate federated indexes in the same manner to facilitate performing federated learning tasks between multiple parties using federated indexes having the same patterns. In some embodiments, each individual has multiple ID values, and multiple joint indices associated with each of the multiple ID values may be set for each data row.
In some embodiments, the import instruction may be an instruction that indicates that trace data is newly added to a data table built into the data management system. The import instruction may indicate that a single data line is imported or may indicate that multiple data lines or data sets are imported in a batch. The import instruction may directly include data to be imported (e.g., individual data information, which may include ID values, feature data, and corresponding trace back times), or may indicate a data source of the data to be imported, such as a local file like CSV and TXT, a remote file address like HTTP and FTP, and a database table like MySQL, Oracle and Hive.
In some embodiments, for the multiple data sources (or the first data set corresponding to the data sources), the step S102 of obtaining data information corresponding to each of at least one first data line in the first data set to be imported may include, for example: directly reading the data information of each first data line for the local file; for the remote file address, the first data set can be downloaded to the local, and then the corresponding data information is read according to lines; corresponding data information can be inquired about the database table. It is to be understood that the above are only examples of several types of data sources and examples of the manner of obtaining data information, and those skilled in the art may also obtain data from various types of data sources in other manners, which are not limited herein. In one exemplary embodiment, the import instruction may be implemented using SQL CREATE statements.
In some embodiments, the data source itself has a backtracking time attribute (e.g., has a backtracking column), and the backtracking time corresponding to each individual can be directly imported as the backtracking time of the individual in the import process. In other embodiments, the data source itself does not have a traceback time attribute, and the traceback time of each individual in the data source can be considered consistent, so that the traceback time can be specified through parameters during import.
According to some embodiments, the import instruction may include a trace back time parameter, and the trace back time of each corresponding at least one first data line may be consistent with the trace back time parameter. Therefore, by setting the backtracking time for the data source without the backtracking time attribute, each data row imported into the data table is ensured to correspond to one backtracking time.
In some embodiments, when data import is performed, the trace-back time corresponding to the first data line may also be adjusted. For example, the index column related to the trace back time in the data table supports a specific discrete time point or time period, and the trace back time corresponding to the first data row is a continuous time point, these continuous time points may be approximated to the nearest specific discrete time point at the time of importing, and may be used as the trace back time of the first data row, or a time period including the time point may be used as the trace back time of the first data row, so as to determine the corresponding index.
In some embodiments, in step S104, the index corresponding to each of the at least one first data row may be determined by referring to the index column actually included in the data table. As described above, when the index column includes an index column corresponding to the trace-back time and an index column corresponding to the ID value, the ID value and the trace-back time corresponding to each of the at least one first data row may be determined as the index of the first data row; when the index column includes a joint index column corresponding to a joint index relating to both the traceback time and the ID value, a joint index of each first data row determined based on the respective ID value and the traceback time of the data row may be determined as an index of the first data row. In some embodiments, the ID value, the traceback time, and the joint index of each first data line may also be used as the index of the first data line. It is to be understood that the index related to the ID value and the trace-back time and the corresponding index column may be set by other means, and are not limited herein.
In some embodiments, as described above, the respective joint index of the at least one first data line may be obtained by concatenating the respective at least one ID value of the first data line with the trace-back time. In this way, joint indexes relating to both the ID value and the backtracking time can be generated relatively easily, and it is convenient for different parties to generate joint indexes having the same pattern for use in federal learning tasks among the parties.
In some embodiments, in step S106, importing the index and the at least one characteristic data corresponding to each of the at least one first data row into the data table may be, for example, importing at least one ID value, a trace-back time, and at least one characteristic data of each first data row into the data table, importing a joint index and at least one characteristic data of each first data row into the data table, importing at least one ID value and at least one trace-back time of each first data row, a joint index, and at least one characteristic data, or importing one or more indexes and at least one characteristic data capable of embodying at least one ID value and trace-back time of each first data row into the data table, and importing each index and each characteristic data into a corresponding index column and attribute column.
In some embodiments, preset conditions and rules may be set for the data import process to ensure smooth execution of the data import process and avoid errors. For example, it may be confirmed before importing whether the format of the data to be imported meets requirements, whether the attribute included in the data to be imported is consistent with the attribute column in the data table, whether the data to be imported conflicts with the data in the data table, and the like. It is understood that the person skilled in the art can set the corresponding preset conditions and rules according to the requirements, and the invention is not limited herein.
In some embodiments, problems arising from data conflicts in the data to be imported and the data table may be avoided as follows. Step S106 of importing the index corresponding to each of the at least one first data row and the at least one feature data into the data table may include: for each of the at least one first data row, in response to determining that the individual corresponding to the first data row is not present in the data table (e.g., the ID value corresponding to the individual is not found in the index column corresponding to one of the ID values in the data table), importing the first data row if other preset conditions and rules are satisfied; in response to determining that the individual corresponding to the first data row appears in the data table, but the backtracking time corresponding to the first data row is inconsistent with the backtracking time of the data row corresponding to the individual in the data table, importing the first data row under the condition that other preset conditions and rules are met; in response to determining that a third data row having the exact same index as the first data row is included in the data table (i.e., that the ID value and the corresponding backtracking time of the individual corresponding to the first data row and the third data row in the data table are both the same), the import of the first data row is discarded. It is understood that the indexes of the first data line and the third data line are identical, for example, the joint indexes corresponding to the first data line and the third data line are identical.
Therefore, by performing conflict detection on the indexes before the first data row is introduced, writing conflict data with consistent ID and backtracking time into the data table can be avoided, so that each data row in the data table is ensured to have a unique index, and a subsequent federal learning task can be correctly performed.
According to some embodiments, the export instruction may be an instruction that instructs data for a federal learning task to be exported from a data table built into the data management system. The export instruction may instruct to export part or all of the feature data of the data line corresponding to part or all of the index, or may export only the index for performing tasks such as sample alignment or export only the feature data for performing tasks such as data analysis, as will be described later. The export instruction may include a specified traceback time and ID set to indicate the condition of the data line that needs to be exported. The export instructions may also include specified attributes to indicate the characteristic data in the data row that needs to be exported. In one exemplary embodiment, the export instruction may be implemented using a SELECT statement of SQL and adding a WHERE condition.
The data management system may export the data into a file or may transfer the data to a corresponding federal learning system, such as via an interface. The second data set derived from the data sheet of the data management system can be used for a horizontal federal learning modeling task, a longitudinal federal learning modeling task and other various federal learning tasks; the method can be used for a federal learning framework containing a cooperative party, and can also be used for a federal learning framework containing no cooperative party, which is not limited herein. It will be appreciated that the execution of the federated learning task by the multiple participants described in this disclosure may collectively execute the federated learning task for multiple participants and collaborators.
In the following, how to derive data from a data table based on a first derivation instruction will be described in various cases of an index included in a first data row with reference to various embodiments.
According to some embodiments, the index of the first data line may include an ID value and a traceback time, and the first export instruction may include a specified traceback time and/or ID set. Step S108, deriving a second data set associated with the federal learning task from the data table based on at least one index corresponding to the index column in the data table, may include: in response to receiving the first export instruction of the participant, a second data set associated with the federated learning task is derived from the data table based on a set of backtracking times and/or IDs included in the first export instruction. In this way, the characteristic data of the data row of the backtracking time and/or the ID set specified in the data table can be derived.
According to some embodiments, the index of the first data row may comprise a joint index, the joint index relating to both the at least one ID value and the traceback time of the corresponding first data row. Step S108, deriving a second data set associated with the federal learning task from the data table based on at least one index corresponding to the index column in the data table, may include: in response to receiving the first derivation instruction of the participant, a second data set associated with the federated learning task is derived from the data table based on at least one federated index corresponding to an index column in the data table. Thus, by using the joint index, the data rows corresponding to the trace-back time and the data rows corresponding to the individual are established through a single index, and an index uniquely corresponding to each data row is obtained, so that the required data rows (for example, the data rows corresponding to the trace-back time, the ID set and/or the joint index indicated by the first export instruction) can be quickly searched from the data table.
In some embodiments, the first derivation instruction may include at least one preset joint index. Deriving a second data set associated with the federal learning task from the data table based on at least one joint index corresponding to the index column in the data table may include: and screening out a second data set corresponding to at least one preset joint index from the data table. Thus, by performing the filtering using the index uniquely corresponding to each data line, the data line indicated by the first export instruction can be quickly found from the data table for export.
In some embodiments, as described above, the respective joint index of each of the at least one first data row may include at least one joint index associated with each of the respective at least one ID value. Thus, by determining a respective joint index for each ID value, data derivation based on the joint index corresponding to the respective ID is enabled with more flexibility.
In some embodiments, as described above, the first derivation instruction may indicate the target ID, and the second data set may include a joint index associated with the target ID for each of the at least one second data row in the data table and/or an ID value corresponding to the target ID in the at least one ID value for each of the at least one second data row. Thus, by deriving the joint index and/or ID value associated with the target ID (i.e., the ID commonly specified by the participants as described above) together with the feature data, the participants can use the joint index and/or ID value as an identifier of each data row to perform the federal learning task together.
According to some embodiments, the index of the first data line may include a traceback time, at least one ID value, and a joint index, and the first export instruction may include a first target traceback time and/or a first set of IDs. As shown in fig. 2, deriving the second data set associated with the federal learning task from the data table based on at least one index corresponding to an index column in the data table at step S108 may further include: step S202, in response to receiving a second derivation instruction of the participant, deriving a third data set associated with the federal learning task from the data table based on the first target backtracking time and/or the first ID set; and step S204, in response to receiving the first derivation instruction of the participant, deriving a second data set associated with the federated learning task from the data table based on at least one federated index corresponding to the index column in the data table.
Therefore, the multiple participants can send a second export instruction to the data management system firstly, so that the data management system performs preliminary screening according to the specified first target backtracking time and/or the specified first ID set to obtain a third data set corresponding to each participant, and then the multiple participants can send a first export instruction to the data management system, so that the data management system further exports a corresponding second data set according to at least one preset joint index indicated in the first export instruction, and transmits the second data set to the federal learning system of the local side to execute the federal learning task.
In an exemplary embodiment, the federal learning task is a longitudinal federal learning related task, and sample submission is required before the federal modeling task is performed. In such embodiments, the second export instruction may indicate that the specified ID value and/or joint index of each second data row that satisfies the specified traceback time is to be exported to result in a third data set consisting of the specified ID value and/or joint index of each second data row that specifies the traceback time. For example, the third data set derived by participant a may include three joint indexes "01-20220601", "02-20220601", "03-20220601" with a backtracking time of 2022, 6/1 or data rows corresponding to the three joint indexes, and the third data set derived by participant B may include three joint indexes "01-20220601", "03-20220601", "04-20220601" or data rows corresponding to the three joint indexes. The federal learning system of each participant can then use the indices (ID values and/or joint indices) included in the third data set to meet to obtain an index of the data row that each participant has in common, i.e., at least one preset joint index. For example, the union index included in the third data set of each of participant A and participant B may be crossed to obtain two preset union indexes "01-20220601" and "03-20220601". Further, the data management system of each participant may derive a second data set based on these preset joint indices and transmit to the federal learning system for performing subsequent tasks related to longitudinal federal learning. For example, the second data set derived by participant A may include data rows with joint indices of "01-20220601" and data rows with joint indices of "03-20220601" in a data table in the data management system of participant A, and the second data set derived by participant B may include data rows with joint indices of "01-20220601" and data rows with joint indices of "03-20220601" in a data table in the data management system of participant B.
In an exemplary embodiment, the federated learning task is a task related to longitudinal federated learning, and is insensitive to a span of backtracking time, but needs to ensure the backtracking time alignment of the data rows of each participant in the modeling process, the second export instruction may instruct the export of the joint index of each second data row to obtain the third data set. The federated learning system of each participant can then deal with the federated index included in the third data set to obtain the data rows that each participant has in common. Different data lines in the data lines may have different backtracking times, but the joint index corresponding to each data line is common to all the participants.
It is to be understood that the second export instruction may also instruct to export any first target traceback time and/or first ID set to obtain a corresponding third data set, which is not limited herein.
In some embodiments, the at least one preset join index may be obtained by computing, for the participant in cooperation with other participants of the plurality of participants, an intersection of the join indexes of the third data sets of the respective plurality of participants. Therefore, joint indexes meeting specific conditions (such as specified backtracking time) are led out by each participant, intersection is carried out based on the joint indexes, corresponding feature data are derived based on at least one preset index obtained after intersection and are transmitted to the federal learning system to carry out subsequent federal learning tasks, the data volume of the second data set transmitted to the federal learning system and the data volume processed by the federal learning system can be reduced, the time and the computing resources required by the process of processing the front data can be reduced under the condition that the quality of the result of processing the front data is not influenced, and the efficiency of the whole federal learning task can be improved.
It is to be understood that the at least one preset joint index may also be determined by other means, and is not limited herein.
In some embodiments, the participant performs the sending of the first data set and/or the at least one pre-defined federated index to the data management system, and the receiving of the second data set and/or the third data set from the data management system, in an asynchronous manner.
In particular, each participant may perform the acquisition and transmission of data in an asynchronous manner with the data management system. For example, the participant may obtain the second data set or the third data set from the data management system based on a predetermined obtaining time window, and send the first data set or at least one preset joint index to the data management system based on a predetermined sending time window different from the obtaining time window, so as to avoid problems of interface timeout, insufficient bandwidth, data collision, and the like caused by synchronous data obtaining and sending by the participant.
In some embodiments, any data acquisition and transmission by each participant may be performed asynchronously, e.g., with the collaborators or with other participants, to avoid interface timeouts, bandwidth starvation, data collisions, etc.
According to some embodiments, the delete, update and query functions of the data table may also be implemented using delete instructions, update instructions and query instructions. The data management method may further include at least one of: in response to receiving a deletion instruction of the participant, deleting a data row in the data table specified by the deletion instruction, wherein the deletion instruction comprises an index corresponding to the data row to be deleted; in response to receiving an update instruction of the participant, updating the feature data of the data row specified by the update instruction in the data table, wherein the update instruction comprises the index corresponding to the data row to be updated and the update feature data corresponding to the attribute column to be updated; and responding to a received query instruction of the participant, returning the characteristic data of the data row specified by the query instruction in the data table, wherein the query instruction comprises an index corresponding to the data row to be acquired.
In some embodiments, the delete instruction is capable of deleting at least one data line based on the index. In some exemplary embodiments, the data rows of a specific individual at a specific trace-back time (e.g., based on the index corresponding to the ID value, the index corresponding to the trace-back time, and/or the joint index), all the data rows of the specific individual at all the trace-back times, all the data rows at the specific trace-back time, or at least one data row flexibly deleted by a different method according to conditions, which is not limited herein. In an exemplary embodiment, the DELETE instruction may be implemented using SQL's DELETE statement and adding a WHERE condition.
In some embodiments, the update instructions can update the already existing individual's specified backtracking time data line (including at least one ID value, backtracking time, and/or characteristic data) based on the index. In an exemplary embodiment, the UPDATE instructions may be implemented using SQL UPDATE statements and adding WHERE conditions.
In some embodiments, the query instruction is capable of querying at least one data row based on the index. In some embodiments, the characteristic data of at least one data row may be queried, and the data rows may also be queried for indices such as at least one ID value, traceback time, and/or joint index. In some embodiments, one or more characteristic data of a data row of a specific individual at a specific backtracking time (e.g., an index corresponding to an ID value, an index corresponding to a backtracking time, and/or a joint index) may be queried, or one or more characteristic data of at least one data row may be queried flexibly according to conditions in other ways, which are not limited herein. In one exemplary embodiment, the query instruction may be implemented using a SELECT statement of SQL and adding a WHERE condition.
Therefore, through the mode, the functions of deleting, modifying, inquiring and the like of the data lines in the data table are realized.
Fig. 3 illustrates a flow chart of a federal learning method 300 in accordance with an exemplary embodiment of the present disclosure. The method 300 is applied to any participant in a federated learning task, the participant being in communication with a plurality of target participants in the federated learning task, wherein a data table is built in a data management system of the participant, the data table including an index column and at least one attribute column. The data table further includes at least one first data row, each first data row including an index corresponding to the index column and characteristic data corresponding to the at least one attribute column, and the index being associated with at least one ID value and a traceback time of the corresponding first data row. Federal learning method 300 includes: step S302, a first derivation instruction is sent to a data management system to obtain a second data set associated with the federal learning task, wherein the second data set is derived from a data table based on at least one index corresponding to an index column in the data table; and step S304, performing subsequent subtasks of the federated learning task with other participants of the plurality of target participants based on the second data set. The data export operation of the data management system in step S302 may refer to step S108 of the method 100, which is not described herein.
Therefore, each participant of the federal learning task can flexibly and conveniently acquire the data set related to the backtracking time under the federal learning scene related to the backtracking service by deriving the second data set of the participant from the data sheet comprising the index column corresponding to the index related to the ID and the backtracking time and executing the subsequent subtasks of the federal learning task with other participants based on the second data set, so that the overall efficiency of the federal learning task is improved.
It is understood that, with respect to the data management system, the data table, the index column, and the attribute column of each participant, reference may be made to the corresponding description above, and details are not described herein.
In some embodiments, before each participant exports the second data set from the respective data management system, a backtracking time corresponding to a subsequent subtask of the federated learning task may be determined among the plurality of participants such that the backtracking times of the rows of data in the second data set that are each exported by the plurality of participants are all the same. The backtracking time of the subsequent subtask of the federal learning task may be predetermined, may be determined by a certain participant and then synchronized to other participants, may be determined by negotiation among multiple participants, or determined by other means, and is not limited herein.
Fig. 4 shows a schematic diagram of a federal learning method in accordance with an exemplary embodiment of the present disclosure. Among them, the participant 401 is communicatively connected to the data management system 402 corresponding to the participant and the other participants 403, and the other participants 403 are communicatively connected to the data management systems 404 corresponding to the other participants. As shown in fig. 4, the method includes:
a participant 401 imports a first data set 411 corresponding to the participant into a corresponding data management system 402;
the other party 403 imports a first data set 412 corresponding to the other party into the corresponding data management system 404;
the participant 401 determines a backtracking time 413 corresponding to a subsequent subtask of the federated learning task and sends the backtracking time to the other participants 403;
participant 401 obtains a second data set 414 corresponding to backtrack time 413 from the corresponding data management system 402;
the other participant 403 obtains a second data set 415 corresponding to the backtracking time 413 from the corresponding data management system 404;
to this end, each participant acquires a data set corresponding to the backtracking time 413 for executing a subsequent subtask of the same federated learning task, and each participant may collectively execute the subsequent subtask 416 based on the respective data set.
It can be understood that, in the above method, the step of importing data into the data management system and the step of acquiring data from the data management system may refer to the description of the corresponding steps in the foregoing data management method 100, which is not described herein again. In addition, some steps in the above method may be performed synchronously. For example, party 401 and other party 403 may import data simultaneously and may export data simultaneously.
In some embodiments, the method of fig. 4 may also be applied to a federated learning framework that includes collaborators, and then some of the data involved in the method may be transmitted to and/or received from the collaborators, and some of the steps therein may be performed by the collaborators. Such a solution is also within the scope of the present disclosure.
It can be understood that the first export instruction in step S302 is similar to the first export instruction described in the foregoing data management method 100, and is not repeated herein.
According to some embodiments, the index may include a joint index that is related to both the at least one ID value and the traceback time of the corresponding first data row. The second data set may be derived from the data table based on the joint index. Thus, by using the joint index, the data rows and the trace-back time corresponding relationship and the individual corresponding relationship are established through a single index, and an index uniquely corresponding to each data row is obtained, so that the required data rows (for example, the data rows corresponding to the trace-back time, the ID set and/or the joint index indicated by the first export instruction) can be quickly acquired from the data table.
In some embodiments, the first derivation instruction may include at least one preset joint index. The second data set may be filtered from the data table according to at least one predetermined joint index. Thus, by performing the filtering using the index uniquely corresponding to each data line, the data line indicated by the first derivation instruction can be quickly acquired from the data table.
In some embodiments, the respective joint index of each of the at least one first data row may include at least one joint index associated with each of the respective at least one ID value. The first derivation instruction may indicate a target ID, and the second data set may include a joint index associated with the target ID for each of the at least one first data row and/or an ID value corresponding to the target ID in at least one ID value for each of the at least one first data row. Thus, by determining a respective joint index for each ID value, data derivation based on the joint index corresponding to the respective ID is enabled with more flexibility. In addition, by deriving the joint index and/or the ID value related to the target ID (i.e., the ID commonly formulated by the participants described in the foregoing) and the feature data together, the joint index and/or the ID value can be used as an identifier of each data row among the participants to jointly perform the federal learning task. .
According to some embodiments, the index may include a traceback time, at least one ID value, and a joint index, and the first export instruction may include a first target traceback time and/or a first set of IDs. As shown in fig. 5, federal learning method 500 may include: and step S502, sending a second derivation instruction to the data management system to obtain a third data set associated with the federal learning task, wherein the third data set is derived from the data table on the basis of the first target backtracking time and/or the first ID set. The operations and effects of steps S506-S508 of federal learning method 500 are similar to the operations and effects of steps S302-S304 of federal learning method 300, and thus are not repeated herein. Step S506, sending the first export instruction to the data management system to obtain the second data set associated with the federal learning task may include: and sending a first export instruction to the data management system to obtain a second data set screened from the third data set, wherein the second export instruction comprises at least one preset joint index, and the joint index corresponding to each second data row in the second data set belongs to the at least one preset joint index.
Therefore, the multiple participants can send a second export instruction to the data management system firstly, so that the data management system performs preliminary screening according to the specified first target backtracking time and/or the specified first ID set to obtain a third data set corresponding to each participant, and then the multiple participants can send a first export instruction to the data management system, so that the data management system further exports a corresponding second data set according to at least one preset joint index indicated in the first export instruction, and transmits the second export instruction to the federal learning system of the local to execute subsequent subtasks of the federal learning task.
According to some embodiments, as shown in FIG. 5, a federated learning method 500 may include: step S504, computing an intersection of the joint indexes of the third data sets of the multiple participants in cooperation with other participants in the multiple participants to obtain at least one preset joint index. Therefore, joint indexes meeting specific conditions (such as specified backtracking time) are led out by each participant, intersection is carried out based on the joint indexes, and corresponding characteristic data is obtained based on at least one preset index obtained after intersection so as to carry out subsequent subtasks of the federal learning task, so that the data volume of the second data set transmitted to the federal learning system and the data volume processed by the federal learning system can be reduced, the time and the computing resources required by the process of processing the preposed data can be reduced under the condition of not influencing the quality of the result of processing the preposed data, and the efficiency of the whole federal learning task can be improved.
FIG. 6 shows a schematic diagram of a federated learning method in accordance with another exemplary embodiment of the present disclosure. Among them, the participant 601 is communicatively connected to the data management system 602 corresponding to the participant and to the other participants 603, and the other participants 603 are communicatively connected to the data management systems 604 corresponding to the other participants. As shown in fig. 6, the method includes:
a participant 601 imports a first data set 611 corresponding to the participant into a corresponding data management system 602;
the other participant 603 imports a first data set 612 corresponding to the other participant into the corresponding data management system 604;
the participant 601 determines a backtracking time 613 corresponding to a subsequent subtask of the federal learning task and sends the backtracking time to other participants 603;
participant 601 obtains a federated index set 614 corresponding to the backtracking time 613 from the corresponding data management system 602;
the other participants 603 obtain a federated index set 615 corresponding to the traceback time 613 from the corresponding data management system 604;
the participant 601 and the other participants 603 perform sample intersection based on the respective joint index set 614 and joint index set 615 to obtain at least one preset joint index 616;
the participant 601 obtains a second data set 617 corresponding to at least one preset joint index 616 from the corresponding data management system 602;
the other participant 603 obtains a second data set 618 corresponding to the at least one preset joint index 616 from the corresponding data management system 604;
thus, each participant acquires a data set corresponding to at least one preset joint index 616 obtained after sample intersection and used for executing a subsequent subtask of the same federal learning task, and each participant can execute the subsequent subtask 619 together based on the respective data set.
It is to be understood that, in the above method, the step of importing data into the data management system, the step of obtaining the joint index set from the data management system, and the step of obtaining data from the data management system based on at least one preset joint index may refer to the corresponding steps of the foregoing data management method 100 and descriptions of step S202 to step S204 in fig. 2, which are not described herein again. In addition, some steps in the above method may be performed synchronously. For example, participant 601 and other participants 603 may import data simultaneously and may export a joint index set/data set simultaneously.
In some embodiments, the method of fig. 6 may also be applied to a federated learning framework that includes collaborators, and then some of the data involved in the method may be transmitted to and/or received from the collaborators, and some of the steps therein may be performed by the collaborators. For example, participant 601 and other participants 603 may transmit respective sets of joint indices to collaborators, which perform sample encounters between multiple sets of joint indices. Such a solution is also within the scope of the present disclosure.
According to some embodiments, subsequent subtasks of the federated learning task in step S304 may include, for example, a federated modeling task (e.g., training of a federated machine learning model). The federal modeling task relates to multiple times of training of submodels of all participants, and the time and the computing resources required by the pre-data processing processes of data transmission, ID intersection calculation and the like in each training can be reduced by applying the federal learning method, so that the acquisition efficiency of the training data for the federal modeling task is improved, and the integral efficiency of the federal modeling task is improved.
In some embodiments, subsequent subtasks of the federated learning task may also include federated query tasks, federated statistics tasks, and the like, involving multiple participants.
Fig. 7 shows a block diagram of a data management system 700 according to an example embodiment of the present disclosure. The data management system 700 is communicatively connected to any one of a plurality of participants performing the same federal learning task, and has a data table built therein, the data table including an index column and a plurality of attribute columns. The data management system 700 includes: an importing unit 702, configured to, in response to receiving an importing instruction of the participant, obtain data information corresponding to each of at least one first data row in a first data set to be imported, where the data information includes at least one ID value, a backtracking time, and feature data corresponding to at least one attribute column; determining an index corresponding to each of the at least one first data line based on the at least one ID value and the backtracking time corresponding to each of the at least one first data line, wherein the index corresponds to the index column; and importing the index and the at least one characteristic data corresponding to each of the at least one first data row into a data table; and a derivation unit 704 configured to, in response to receiving the first derivation instruction of the participant, derive a second data set associated with the federal learning task from the data table based on at least one index corresponding to the index column in the data table.
The data management system 700 may be adapted to perform operations similar to those of the data management method 100 described above, and will not be described herein.
Fig. 8 illustrates a block diagram of a federal learning device 800 in accordance with an exemplary embodiment of the present disclosure. The federated learning device 800 is applied to any one of a plurality of participants who perform the same federated learning task, where the participant's data management system has built-in data tables that include an index column and at least one attribute column. The data table further includes at least one first data row, each first data row including an index corresponding to the index column and characteristic data corresponding to the at least one attribute column, and the index being associated with at least one ID value and a traceback time of the corresponding first data row. The federal learning device 800 includes: a sending unit 802 configured to send a first derivation instruction to a data management system to obtain a second data set associated with the federal learning task, wherein the second data set is derived from a data table based on at least one index corresponding to an index column in the data table; and an execution unit 804 configured to execute subsequent subtasks of the federated learning task with other ones of the plurality of target participants based on the second data set.
Federal learning device 800 may be adapted to perform similar operations to federal learning method 300 described above and will not be described in further detail herein.
According to another aspect of the present disclosure, a federated learning system is provided that includes a federated learning device 800 as shown in FIG. 8.
According to some embodiments, the federated learning system described above may also include a data management system 700 as shown in FIG. 7. It is understood that the federal learning system may include a federal learning device and a data management system for each of a plurality of participants.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.
According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above-described method.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program when executed by a processor implements the method described above.
Referring to fig. 9, an electronic device 900, which is an example of a hardware device (electronic device) that can be applied to aspects of the present disclosure, will now be described. The electronic device 900 may be any machine configured to perform processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a robot, a smart phone, an on-board computer, or any combination thereof. The above-described data transmission methods may be implemented in whole or at least in part by electronic device 900 or a similar device or system.
Electronic device 900 may include components connected to bus 902 (possibly via one or more interfaces) or in communication with bus 902. For example, electronic device 900 may include a bus 902, one or more processors 904, one or more input devices 906, and one or more output devices 908. The one or more processors 904 can be any type of processor and can include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., special processing chips). Input device 906 may be any type of device capable of inputting information to electronic device 900 and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote control. Output device(s) 908 can be any type of device capable of presenting information and can include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The electronic device 900 may also include a non-transitory storage device 910, which may be any storage device that is non-transitory and that may enable data storage, including but not limited to a magnetic disk drive, an optical storage device, solid state memory, a floppy disk, a flexible disk, a hard disk, a magnetic tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. The non-transitory storage device 910 may be removable from the interface. The non-transitory storage device 910 may have data/programs (including instructions)/code for implementing the above-described methods and steps. Electronic device 900 may also include a communications device 912. The communication device 912 may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth (TM) device, an 802.11 device, a Wi-Fi device, a WiMAX device, a cellular communication device, and/or the like.
Electronic device 900 may also include a working memory 914, which may be any type of working memory that can store programs (including instructions) and/or data useful for the operation of processor 904, and which may include, but is not limited to, random access memory and/or read only memory devices.
Software elements (programs) may reside in the working memory 914 and include, but are not limited to, an operating system 916, one or more application programs 918, drivers, and/or other data and code. Instructions for performing the above-described methods and steps may be included in one or more application programs 918, and the above-described methods may be implemented by the processor 904 reading and executing the instructions of the one or more application programs 918. More specifically, the above-described data management method 100, federated learning method 300, and federated learning method 500 may be implemented, for example, by processor 904 executing application 918 having instructions for steps S102-step S108, step S202-step S204, step S302-step S304, and steps S502-S508, respectively. Further, other steps in the data management methods or the federated learning methods described above may be implemented, for example, by processor 904 executing an application 918 having instructions to perform the respective steps. Executable code or source code of instructions of the software elements (programs) may be stored in a non-transitory computer-readable storage medium, such as the storage device 910 described above, and may be stored in the working memory 914 (possibly compiled and/or installed) upon execution. Executable code or source code for the instructions of the software elements (programs) may also be downloaded from a remote location.
It will also be appreciated that various modifications may be made in accordance with specific requirements. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the disclosed methods and apparatus may be implemented by programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) in an assembly language or hardware programming language such as VERILOG, VHDL, C + +, using logic and algorithms according to the present disclosure.
It should also be understood that the foregoing method may be implemented in a server-client mode. For example, a client may receive data input by a user and send the data to a server. The client may also receive data input by the user, perform part of the processing in the foregoing method, and transmit the data obtained by the processing to the server. The server may receive data from the client and perform the aforementioned method or another part of the aforementioned method and return the results of the execution to the client. The client may receive the results of the execution of the method from the server and may present them to the user, for example, through an output device.
It should also be understood that the components of the electronic device 900 may be distributed across a network. For example, some processes may be performed using one processor while other processes may be performed by another processor that is remote from the one processor. Other components of electronic device 900 may also be similarly distributed. As such, electronic device 900 may be interpreted as a distributed computing system that performs processing at multiple locations.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims (25)

1. A data management method applied to a data management system communicatively connected to any one of a plurality of participants who execute the same federal learning task, wherein a data table is built in the data management system, the data table including an index column and a plurality of attribute columns, the method comprising:
in response to receiving the import instruction for the participant,
acquiring data information corresponding to at least one first data row in a first data set to be imported, wherein the data information comprises at least one ID value, backtracking time and feature data corresponding to at least one attribute column;
determining an index corresponding to each of the at least one first data row based on the at least one ID value and the backtracking time corresponding to each of the at least one first data row, the index corresponding to the index column; and
importing the index and the at least one characteristic data corresponding to each of the at least one first data row into the data table; and
in response to receiving a first derivation instruction of the participant, deriving a second data set associated with the federal learning task from the data table based on at least one index corresponding to an index column in the data table.
2. The method of claim 1, wherein the index comprises a joint index that relates to both at least one ID value and a traceback time of a corresponding first data row,
wherein deriving a second data set associated with the federated learning task from the data table based on at least one index corresponding to an index column in the data table comprises:
in response to receiving the first derivation instruction of the participant, deriving a second data set associated with the federated learning task from the data table based on at least one federated index corresponding to an index column in the data table.
3. The method of claim 2, wherein the index further comprises a backtrack time and at least one ID value, wherein the first derivation instruction comprises a first target backtrack time and/or a first set of IDs, and wherein deriving a second set of data associated with the federated learning task from the data table based on at least one index corresponding to an index column in the data table further comprises:
in response to receiving a second derivation instruction for the participant, deriving a third data set associated with the federated learning task from the data table based on the first target backtracking time and/or the first set of IDs,
and wherein deriving a second data set associated with the federated learning task from the data table based on at least one federated index corresponding to an index column in the data table comprises:
and in response to receiving a first derivation instruction of the participant, screening the second data set from the third data set based on the joint index corresponding to each data row in the third data set, wherein the first derivation instruction comprises at least one preset joint index, and the joint index corresponding to each data row in the second data set belongs to the at least one preset joint index.
4. The method of claim 2, wherein the first derivation instruction comprises at least one preset joint index, and wherein deriving the second data set associated with the federal learning task from the data table based on the at least one joint index corresponding to the index column in the data table comprises:
and screening the second data set corresponding to the at least one preset joint index from the data table.
5. The method of any of claims 2-4, wherein the respective joint index for each of the at least one first data row comprises at least one joint index associated with each respective at least one ID value.
6. The method according to claim 3 or 4, wherein the at least one predetermined joint index is obtained by computing an intersection of joint indexes of the third data sets of the respective multiple participants for the participant in cooperation with other participants of the multiple participants.
7. The method of any of claim 5, wherein the first derivation instruction indicates a target ID, and wherein the second data set comprises a joint index associated with the target ID for each of at least one second data row in the data table and/or an ID value corresponding to the target ID in at least one ID value for each of the at least one second data row.
8. The method according to claim 5, wherein the respective joint index of the at least one first data line is obtained by concatenating the at least one ID value of the first data line with the trace-back time.
9. The method according to any of claims 1-8, wherein importing the respective index and the at least one characteristic data for the at least one first data row into the data table comprises:
for each of the at least one first data row, in response to determining that a third data row having the same index as the first data row is included in the data table, forgoing importing the first data row.
10. The method according to any one of claims 1 to 8, wherein the import instruction comprises a trace back time parameter, and the trace back time of each corresponding at least one first data line is consistent with the trace back time parameter.
11. The method according to any one of claims 1-8, further comprising at least one of:
in response to receiving a deletion instruction of the participant, deleting a data row in the data table specified by the deletion instruction, wherein the deletion instruction comprises an index corresponding to the data row to be deleted;
in response to receiving an update instruction of the participant, updating the feature data of the data row specified by the update instruction in the data table, wherein the update instruction comprises an index corresponding to the data row to be updated and update feature data corresponding to the attribute column to be updated; and
and responding to a received query instruction of the participant, and returning the characteristic data of the data row specified by the query instruction in the data table, wherein the query instruction comprises an index corresponding to the data row to be acquired.
12. The method according to any one of claims 1-11, wherein a relational database is built into the data management system, and the data tables are stored in the relational database in the form of relational data.
13. A federal learning method applied to any participant in a federal learning task, wherein the participant is communicatively connected with a plurality of target participants in the federal learning task, the participant is characterized in that a data management system of the participant is internally provided with a data table, the data table comprises an index column and at least one attribute column, wherein the data table further comprises at least one first data row, each first data row comprises an index corresponding to the index column and feature data corresponding to at least one attribute column, and the index is related to at least one ID value and backtracking time of the corresponding first data row, the method comprises:
sending a first export instruction to the data management system to obtain a second data set associated with the federated learning task, wherein the second data set is exported from the data table based on at least one index corresponding to an index column in the data table; and
performing subsequent subtasks of the federated learning task with other participants in the plurality of target participants based on the second dataset.
14. The method of claim 13, wherein the index comprises a joint index that relates to both the at least one ID value and a traceback time of the corresponding first data row, and wherein the second data set is derived from the data table based on the joint index.
15. The method according to claim 14, wherein the index further comprises a traceback time and at least one ID value, wherein the first derived instruction comprises a first target traceback time and/or a first set of IDs, and wherein the method further comprises:
sending a second export instruction to the data management system to obtain a third data set associated with the federated learning task, wherein the third data set is derived from the data table based on the first target backtracking time and/or first ID set,
wherein sending a first export instruction to the data management system to obtain a second data set associated with the federal learning task comprises:
sending a first derivation instruction to the data management system to obtain the second data set screened from the third data set, where the second derivation instruction includes at least one preset joint index, and a joint index corresponding to each second data row in the second data set belongs to the at least one preset joint index.
16. The method of claim 14, wherein the first derivation instruction comprises at least one predetermined joint index, and wherein the second data set is filtered from the data table according to the at least one predetermined joint index.
17. The method according to any of claims 14-16, wherein the respective joint index of the at least one first data row comprises at least one joint index associated with the respective at least one ID value, wherein the first derivation instruction indicates a target ID, wherein the second data set comprises the respective joint index of the at least one first data row associated with the target ID and/or the ID value corresponding to the target ID of the at least one first data row.
18. The method according to claim 15 or 16, further comprising:
computing, in collaboration with other ones of the plurality of participants, an intersection of the joint indices of the respective third data sets of the plurality of participants to obtain the at least one preset joint index.
19. A data management system for use with any one of a plurality of parties performing the same federal learning task, the data management system having a data table built therein, the data table including an index column and a plurality of attribute columns, the data management system comprising:
an import unit configured to, in response to receiving an import instruction of the party,
acquiring respective corresponding data information of at least one first data row in a first data set to be imported, wherein the data information comprises at least one ID value, backtracking time and feature data corresponding to at least one attribute column;
determining an index corresponding to each of the at least one first data row based on the at least one ID value and the backtracking time corresponding to each of the at least one first data row, the index corresponding to the index column; and
importing the index and the at least one characteristic data corresponding to each of the at least one first data row into the data table; and
a derivation unit configured to, in response to receiving the first derivation instruction of the party, derive a second data set associated with the federated learning task from the data table based on at least one index corresponding to an index column in the data table.
20. A federal learning device applied to any one participant in a federal learning task, wherein the participant is in communication connection with a plurality of target participants in the federal learning task, the data management system of the participant is internally provided with a data table, the data table comprises an index column and at least one attribute column, wherein the data table further comprises at least one first data row, each first data row comprises an index corresponding to the index column and characteristic data corresponding to at least one attribute column, and the index is related to at least one ID value and backtracking time of the corresponding first data row, the device comprises:
a sending unit configured to send a first derivation instruction to the data management system to obtain a second data set associated with the federal learning task, wherein the second data set is derived from the data table based on at least one index corresponding to an index column in the data table; and
an execution unit configured to execute subsequent subtasks of the federated learning task with other ones of the plurality of target participants based on the second data set.
21. A Federation learning system, comprising:
the federal learning device as in claim 20.
22. The system of claim 21, further comprising:
the data management system of claim 19.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-18.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-18.
25. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the method according to any one of claims 1-18.
CN202210667189.4A 2022-06-13 2022-06-13 Data management method, device, system, equipment and medium Active CN114925072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210667189.4A CN114925072B (en) 2022-06-13 2022-06-13 Data management method, device, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210667189.4A CN114925072B (en) 2022-06-13 2022-06-13 Data management method, device, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN114925072A true CN114925072A (en) 2022-08-19
CN114925072B CN114925072B (en) 2023-07-21

Family

ID=82814287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210667189.4A Active CN114925072B (en) 2022-06-13 2022-06-13 Data management method, device, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114925072B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202908A (en) * 2022-09-09 2022-10-18 杭州海康威视数字技术股份有限公司 Privacy computation request response method and device based on dynamic arrangement
CN115329032A (en) * 2022-10-14 2022-11-11 杭州海康威视数字技术股份有限公司 Federal dictionary based learning data transmission method, device, equipment and storage medium
WO2024092927A1 (en) * 2022-10-31 2024-05-10 蚂蚁区块链科技(上海)有限公司 Method and apparatus for generating data table

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100082671A1 (en) * 2008-09-26 2010-04-01 International Business Machines Corporation Joining Tables in Multiple Heterogeneous Distributed Databases
CN109241155A (en) * 2018-07-27 2019-01-18 天津大学 A kind of the Federal query processing system and method for RDF flow data and relation data
CN113505520A (en) * 2021-05-17 2021-10-15 京东科技控股股份有限公司 Method, device and system for supporting heterogeneous federated learning
CN113537508A (en) * 2021-06-18 2021-10-22 百度在线网络技术(北京)有限公司 Federal calculation processing method and device, electronic equipment and storage medium
CN114186213A (en) * 2022-02-16 2022-03-15 深圳致星科技有限公司 Data transmission method, device, equipment and medium based on federal learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100082671A1 (en) * 2008-09-26 2010-04-01 International Business Machines Corporation Joining Tables in Multiple Heterogeneous Distributed Databases
CN109241155A (en) * 2018-07-27 2019-01-18 天津大学 A kind of the Federal query processing system and method for RDF flow data and relation data
CN113505520A (en) * 2021-05-17 2021-10-15 京东科技控股股份有限公司 Method, device and system for supporting heterogeneous federated learning
CN113537508A (en) * 2021-06-18 2021-10-22 百度在线网络技术(北京)有限公司 Federal calculation processing method and device, electronic equipment and storage medium
CN114186213A (en) * 2022-02-16 2022-03-15 深圳致星科技有限公司 Data transmission method, device, equipment and medium based on federal learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202908A (en) * 2022-09-09 2022-10-18 杭州海康威视数字技术股份有限公司 Privacy computation request response method and device based on dynamic arrangement
CN115202908B (en) * 2022-09-09 2023-01-03 杭州海康威视数字技术股份有限公司 Privacy computation request response method and device based on dynamic arrangement
CN115329032A (en) * 2022-10-14 2022-11-11 杭州海康威视数字技术股份有限公司 Federal dictionary based learning data transmission method, device, equipment and storage medium
CN115329032B (en) * 2022-10-14 2023-03-24 杭州海康威视数字技术股份有限公司 Learning data transmission method, device, equipment and storage medium based on federated dictionary
WO2024092927A1 (en) * 2022-10-31 2024-05-10 蚂蚁区块链科技(上海)有限公司 Method and apparatus for generating data table

Also Published As

Publication number Publication date
CN114925072B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN114925072B (en) Data management method, device, system, equipment and medium
US10467192B2 (en) Method and apparatus for updating data table in keyvalue database
CN111666326B (en) ETL scheduling method and device
US20230319001A1 (en) Snippet(s) of content associated with a communication platform
US9842116B2 (en) Method and system for synchronizing data between a database system and its client applications
US20220091891A1 (en) Method, device, apparatus of federated computing, and storage medium
CN103488655B (en) Method and system for processing composite model data
CN113505520A (en) Method, device and system for supporting heterogeneous federated learning
CN112150280A (en) Federal learning method and device for improving matching efficiency, electronic device and medium
WO2021057064A1 (en) Data interaction conversion method and apparatus based on artificial intelligence, device, and medium
CN110019916A (en) Event-handling method, device, equipment and storage medium based on user's portrait
CN111784318A (en) Data processing method and device, electronic equipment and storage medium
CN112801301A (en) Asynchronous calculation method, device, equipment, storage medium and program product
CN107463391A (en) Task processing method, device and equipment
KR101614890B1 (en) Method of creating multi tenancy history, server performing the same and storage media storing the same
CN109636127A (en) A kind of network based on positioning puts mutual assistance platform and its application method on someone's head
CN111784297A (en) Method, apparatus, and computer-readable storage medium for processing task
CN112242909B (en) Method and device for generating management template, electronic equipment and storage medium
WO2023142349A1 (en) Behavior sequence generation method and apparatus, storage medium and electronic device
CN111027093A (en) Access right control method and device, electronic equipment and storage medium
EP4109366A1 (en) Method and device for managing project by using data merging
CN110300222A (en) A kind of short message display method, system and terminal device
CN114547184A (en) Personnel information synchronization method, terminal device and storage medium
CN112836767A (en) Federal modeling method, apparatus, device, storage medium, and program product
CN112685557A (en) Visualized information resource management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant