WO2022147908A1 - Table association-based lost data recovery method and apparatus, device, and medium - Google Patents

Table association-based lost data recovery method and apparatus, device, and medium Download PDF

Info

Publication number
WO2022147908A1
WO2022147908A1 PCT/CN2021/083104 CN2021083104W WO2022147908A1 WO 2022147908 A1 WO2022147908 A1 WO 2022147908A1 CN 2021083104 W CN2021083104 W CN 2021083104W WO 2022147908 A1 WO2022147908 A1 WO 2022147908A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
extracted
slave
incremental
association
Prior art date
Application number
PCT/CN2021/083104
Other languages
French (fr)
Chinese (zh)
Inventor
陈伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022147908A1 publication Critical patent/WO2022147908A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the present application relates to the technical field of big data, and in particular, to a method, apparatus, device and medium for recovering lost data based on table association.
  • ETL Extract-Transform-Load
  • ETL Extract-Transform-Load
  • the usual data processing strategy is to incrementally extract data from the source system to the data warehouse system, and then transform and load the data in the data warehouse system.
  • incremental synchronization is usually preferred, that is, the source system synchronizes data to the data warehouse according to the incremental timestamp.
  • the strategy of the data warehouse is to extract each table separately. If the two tables have a master-slave relationship in the source system (such as the customer table and the account table), but the extraction time of the two tables is not exactly the same, or due to the source system transaction Management strategy, resulting in inconsistent submission time during extraction, or for any other reason, the incremental timestamp cannot guarantee the consistency of business logic, which will cause the incremental data of the master-slave table to not match. Then, load the table on the data warehouse side , the dependent key of the slave table cannot be found in the master table, resulting in data loss in subsequent data conversion processing on the data warehouse side.
  • a first aspect of the present application provides a method for recovering lost data based on table association, and the method for recovering lost data based on table association includes:
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
  • a second aspect of the present application provides an electronic device comprising a processor and a memory, the processor being configured to execute at least one computer-readable instruction stored in the memory to implement the following steps:
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
  • a third aspect of the present application provides a computer-readable storage medium on which at least one computer-readable instruction is stored, and the at least one computer-readable instruction is executed by a processor to implement the following steps:
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
  • a fourth aspect of the present application provides an apparatus for recovering lost data based on table association, wherein the recovery of lost data based on table association includes:
  • an acquisition unit configured to acquire the data table to be extracted from the source system according to the first data extraction instruction in response to the first data extraction instruction;
  • a determining unit for determining an associated data table associated with the to-be-extracted data table
  • the construction unit is further configured to obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
  • an association unit used for associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
  • a writing unit used to write the data of the association failure into the recycling table
  • an update unit configured to, in response to a second data extraction instruction for the to-be-extracted data table, extract data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, and updating the extracted data to the main table;
  • the update unit is also used to obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
  • the associating unit is further configured to associate the updated slave table with the updated master table, and remove the successfully associated data from the recovery table.
  • the present application can respond to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction, and extract the data incremental synchronization from the to-be-extracted data table. to the incremental table, and construct the main table according to the extracted data in the incremental table, determine the associated data table associated with the to-be-extracted data table, and obtain the data of the associated data table from the incremental table.
  • the data has been extracted, and a slave table is constructed in the incremental table according to the extracted data, the data in the slave table is associated with the data in the master table, and the data that fails to be associated is obtained, and the data in the slave table is obtained.
  • the data for which the association fails is written into the recovery table to ensure that all data lost in association will be recovered.
  • the extracted data in the extracted data table is incrementally synchronized to the incremental table, and the extracted data is updated to the master table, the current slave table is obtained from the incremental table, and the current slave table and the current slave table are calculated.
  • the union of the recovery table is used as the updated slave table, the updated slave table is associated with the updated master table, and the successfully associated data is removed from the recovery table, thereby solving various problems.
  • the problem of loss of associated data caused by the unsynchronization of associated table data caused by the cause reduces the cost of manual data problem analysis and data supplementation and correction, and enhances the data integrity of the data warehouse.
  • FIG. 1 is a flowchart of a preferred embodiment of the method for recovering lost data based on table association in the present application.
  • FIG. 2 is a functional block diagram of a preferred embodiment of the apparatus for recovering lost data based on table association in the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recovering lost data based on table association in the present application.
  • FIG. 1 it is a flowchart of a preferred embodiment of the method for recovering lost data based on table association in the present application. According to different requirements, the order of the steps in this flowchart can be changed, and some steps can be omitted.
  • the method for recovering lost data based on table association is applied to one or more electronic devices, and the electronic device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, which Hardware includes but is not limited to microprocessors, application specific integrated circuits (ASICs), programmable gate arrays (Field-Programmable Gate Arrays, FPGAs), digital processors (Digital Signal Processors, DSPs), embedded devices, etc. .
  • ASICs application specific integrated circuits
  • FPGAs Field-Programmable Gate Arrays
  • DSPs Digital Signal Processors
  • embedded devices etc.
  • the electronic device can be any electronic product that can interact with the user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • a personal computer a tablet computer
  • a smart phone a personal digital assistant (PDA)
  • PDA personal digital assistant
  • IPTV interactive network television
  • smart wearable devices etc.
  • the electronic equipment may also include network equipment and/or user equipment.
  • the network device includes, but is not limited to, a single network server, a server group formed by multiple network servers, or a cloud formed by a large number of hosts or network servers based on cloud computing (Cloud Computing).
  • the network where the electronic device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
  • VPN Virtual Private Network
  • a data warehouse (Extract-Transform-Load, ETL) is used to describe the process of extracting, transforming, and loading data from a source to a destination.
  • the first data extraction instruction may be configured to be triggered periodically, for example, periodically triggered every day.
  • the source system refers to a source-end system that stores data, and the data in the source system is extracted to a data warehouse for subsequent use.
  • a data warehouse pulls incremental data from source systems on a daily basis.
  • the obtaining the data table to be extracted from the source system according to the first data extraction instruction includes:
  • the data table with the target table name is acquired from the source system as the to-be-extracted data table.
  • the first data extraction instruction is essentially a piece of code, and in the first data extraction instruction, according to the code writing principle, the content between ⁇ is called the method body.
  • the information carried by the first data extraction instruction may be a specific address or various specific data to be processed, and the content of the information mainly depends on the code composition of the first data extraction instruction.
  • the preset label can be custom configured.
  • the preset label and the table name have a one-to-one correspondence, for example, the preset label may be configured as NAME.
  • data can be directly obtained from the instructions to improve processing efficiency, and data is obtained by tags, and the accuracy of data acquisition is also improved due to the unique configuration of labels.
  • extracting data from the to-be-extracted data table is incrementally synchronized to the incremental table, and constructs a main table in the incremental table according to the extracted data.
  • the incremental synchronization of extracting data from the to-be-extracted data table to the incremental table includes:
  • the first time stamp range obtained by parsing the first data extraction instruction to obtain the data extraction includes:
  • the data with the configuration is searched in the information carried by the first data extraction instruction, and the found data is determined as the first timestamp range.
  • the data records that have been changed since the last synchronization until the current synchronization, if they are not within this time interval, are judged to not meet the extraction conditions.
  • the determining the associated data table associated with the to-be-extracted data table includes:
  • the detected data table is determined as the associated data table.
  • the associated data table detected in the above manner has a table association relationship with the to-be-extracted data table, that is, the two data tables have a master-slave relationship in the source system, such as the customer table and the account table.
  • the extraction time of the two tables is often inconsistent, or due to the transaction management strategy of the source system, the submission time during extraction is not completely consistent, or for any other reason, the incremental timestamp cannot guarantee the business.
  • the logical consistency will cause the incremental data of the master and slave tables to not match. Then, when the table is loaded on the data warehouse side, the dependent keywords of the slave table will not be found in the master table, resulting in subsequent Data loss occurs in the data transformation process on the data warehouse side.
  • the master-slave table of the source system is not updated in a transaction, resulting in different update times of the master-slave table, resulting in the data warehouse extracting the master table, but the data from the slave table cannot be extracted.
  • this embodiment detects a data table that has a table association relationship with the to-be-extracted data table, so as to perform targeted processing and avoid data loss.
  • using the join operation to perform table association includes:
  • ticket.id is equal to job.t_id
  • S13 Acquire the extracted data of the associated data table from the incremental table, and construct a secondary table in the incremental table according to the extracted data.
  • the extraction time of the data in the associated data table is not necessarily the same as the extraction time of the data in the to-be-extracted data table. Since each table in the data warehouse is extracted separately, the extraction time is often inconsistent. , it is easy to cause data loss.
  • the associating the data in the slave table with the data in the master table includes:
  • mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data
  • the mapping table When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
  • the mapping table stores the corresponding relationship between the customer ID and the account ID, it means that the data corresponding to the customer ID is associated with the data corresponding to the account ID.
  • the data association corresponding to the customer ID is successful; if the account ID corresponding to the customer ID cannot be found in the mapping table, it means that there is no data associated with the data corresponding to the customer ID in the main table. , it is determined that the data association corresponding to the customer ID fails.
  • the method before writing the data of the association failure into the recovery table, the method further includes:
  • the created homogeneous table is determined as the recycling table.
  • the isomorphic table of the incremental table is created as the recovery table. Since the structures of the tables are completely consistent, it can ensure that the data that fails to be associated is written into the recovery table more completely, avoiding causing more Data loss also enables subsequent data recovery to have a more comprehensive data foundation and reduce error rates.
  • the recycling table is a dynamically updated and cyclically recycled data table to ensure that all associated lost data will be recycled, and then try to associate again next time to repair the data.
  • the second data extraction instruction may also be configured to be triggered periodically, for example, the second data extraction instruction may be triggered the day after the first data extraction instruction is triggered.
  • the incremental synchronization of extracting data from the to-be-extracted data table to the incremental table according to the second data extraction instruction includes:
  • S17 Acquire the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table.
  • the current slave table is also an updated slave table.
  • the current slave table is also incrementally synchronized according to the timestamp range, which is not described here.
  • the union of the current slave table and the recovery table is used as the updated slave table, so as to perform the association again in the current cycle, which effectively avoids data loss.
  • C004 of the customer table does not meet the extraction conditions and is not extracted to the data warehouse. Since A004 of the account table needs to be related to the customer C004 of the main table for related calculations, usually the data of A004 of the account table will be discarded because the records that cannot be related to the main table will be discarded. In this case, the unrelated data will be written into the recycling table , for subsequent use, the data is extracted again in the early morning of the next day, and C004 of the customer table is extracted and entered into the data warehouse.
  • the account table combines the data set extracted the next day with the data from the previous day's recovery table to form a new incremental table.
  • the recovery table completes the data, and the data can be associated.
  • the data that fails to be associated is continuously written to the recycle table, and the recycle table is merged and written to the incremental table in the next incremental cycle, and the association is attempted again. If the link is not associated, it will enter the recovery table again, and the cycle will continue until the link is successful and flows into the next link.
  • the above cycle method effectively reduces the probability of data loss.
  • the successfully associated data is removed from the reclaim table to avoid data redundancy in the reclaim table.
  • This embodiment can solve the problem of loss of associated data caused by asynchronous data in associated tables caused by various reasons, reduce the cost of manual data problem analysis and data supplementation and correction, and enhance the data integrity of the data warehouse.
  • the method further includes:
  • the detected data is determined as the data to be verified
  • the master table, slave table and recovery table can also be deployed on the blockchain to prevent malicious tampering of data.
  • the present application can respond to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction, and extract the data incremental synchronization from the to-be-extracted data table. to the incremental table, and construct the main table according to the extracted data in the incremental table, determine the associated data table associated with the to-be-extracted data table, and obtain the data of the associated data table from the incremental table.
  • the data has been extracted, and a slave table is constructed in the incremental table according to the extracted data, the data in the slave table is associated with the data in the master table, and the data that fails to be associated is obtained, and the data in the slave table is obtained.
  • the data for which the association fails is written into the recovery table to ensure that all data lost in association will be recovered.
  • the extracted data in the extracted data table is incrementally synchronized to the incremental table, and the extracted data is updated to the master table, the current slave table is obtained from the incremental table, and the current slave table and the current slave table are calculated.
  • the union of the recovery table is used as the updated slave table, the updated slave table is associated with the updated master table, and the successfully associated data is removed from the recovery table, thereby solving various problems.
  • the problem of loss of associated data caused by the unsynchronization of associated table data caused by the cause reduces the cost of manual data problem analysis and data supplementation and correction, and enhances the data integrity of the data warehouse.
  • FIG. 2 it is a functional block diagram of a preferred embodiment of the apparatus for recovering lost data based on table association in the present application.
  • the apparatus 11 for recovering lost data based on table association includes an acquisition unit 110 , a construction unit 111 , a determination unit 112 , an association unit 113 , a writing unit 114 , and an updating unit 115 .
  • the modules/units referred to in this application refer to a series of computer program segments that can be executed by the processor 13 and can perform fixed functions, and are stored in the memory 12 . In this embodiment, the functions of each module/unit will be described in detail in subsequent embodiments.
  • the acquiring unit 110 acquires the data table to be extracted from the source system according to the first data extraction instruction.
  • Data warehouse (Extract-Transform-Load, ETL) is used to describe the process of extracting, transforming, and loading data from the source to the destination.
  • the first data extraction instruction may be configured to be triggered periodically, for example, periodically triggered every day.
  • the source system refers to a source-end system that stores data, and the data in the source system is extracted to a data warehouse for subsequent use.
  • a data warehouse pulls incremental data from source systems on a daily basis.
  • the obtaining unit 110 obtaining the data table to be extracted from the source system according to the first data extraction instruction includes:
  • the data table with the target table name is acquired from the source system as the to-be-extracted data table.
  • the first data extraction instruction is essentially a piece of code, and in the first data extraction instruction, according to the code writing principle, the content between ⁇ is called the method body.
  • the information carried by the first data extraction instruction may be a specific address or various specific data to be processed, and the content of the information mainly depends on the code composition of the first data extraction instruction.
  • the preset label can be custom configured.
  • the preset label and the table name have a one-to-one correspondence, for example, the preset label may be configured as NAME.
  • data can be directly obtained from the instructions to improve processing efficiency, and data is obtained by tags, and the accuracy of data acquisition is also improved due to the unique configuration of labels.
  • the construction unit 111 extracts data from the to-be-extracted data table to incrementally synchronize to the incremental table, and constructs a main table in the incremental table according to the extracted data.
  • the construction unit 111 extracts data from the to-be-extracted data table to incrementally synchronize to the incremental table, including:
  • the construction unit 111 parses the first data extraction instruction, and obtains the first time stamp range for data extraction including:
  • the data with the configuration is searched in the information carried by the first data extraction instruction, and the found data is determined as the first timestamp range.
  • the data records that have been changed since the last synchronization until the current synchronization, if they are not within this time interval, are judged to not meet the extraction conditions.
  • the determining unit 112 determines the associated data table associated with the to-be-extracted data table.
  • the determining unit 112 determines that the associated data table associated with the to-be-extracted data table includes:
  • the detected data table is determined as the associated data table.
  • the associated data table detected in the above manner has a table association relationship with the to-be-extracted data table, that is, the two data tables have a master-slave relationship in the source system, such as the customer table and the account table.
  • the extraction time of the two tables is often inconsistent, or due to the transaction management strategy of the source system, the submission time during extraction is not completely consistent, or for any other reason, the incremental timestamp cannot guarantee the business.
  • the logical consistency will cause the incremental data of the master and slave tables to not match. Then, when the table is loaded on the data warehouse side, the dependent keywords of the slave table will not be found in the master table, resulting in subsequent Data loss occurs in the data transformation process on the data warehouse side.
  • the master-slave table of the source system is not updated in a transaction, resulting in different update times of the master-slave table, resulting in the data warehouse extracting the master table, but the data from the slave table cannot be extracted.
  • this embodiment detects a data table that has a table association relationship with the to-be-extracted data table, so as to perform targeted processing and avoid data loss.
  • using the join operation to perform table association includes:
  • ticket.id is equal to job.t_id
  • the construction unit 111 acquires the extracted data of the associated data table from the incremental table, and constructs a secondary table in the incremental table according to the extracted data.
  • the extraction time of the data in the associated data table is not necessarily the same as the extraction time of the data in the to-be-extracted data table. Since each table in the data warehouse is extracted separately, the extraction time is often inconsistent. , it is easy to cause data loss.
  • the associating unit 113 associates the data in the slave table with the data in the master table, and acquires the data for which the association fails.
  • the associating unit 113 associates the data in the slave table with the data in the master table including:
  • mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data
  • the mapping table When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
  • the mapping table stores the corresponding relationship between the customer ID and the account ID, it means that the data corresponding to the customer ID is associated with the data corresponding to the account ID.
  • the data association corresponding to the customer ID is successful; if the account ID corresponding to the customer ID cannot be found in the mapping table, it means that there is no data associated with the data corresponding to the customer ID in the main table. , it is determined that the data association corresponding to the customer ID fails.
  • the writing unit 114 writes the data for which the association fails into the recycle table.
  • the created homogeneous table is determined as the recycling table.
  • the isomorphic table of the incremental table is created as the recovery table. Since the structures of the tables are completely consistent, it can ensure that the data that fails to be associated is written into the recovery table more completely, avoiding causing more Data loss also enables subsequent data recovery to have a more comprehensive data foundation and reduce error rates.
  • the recycling table is a dynamically updated and cyclically recycled data table to ensure that all associated lost data will be recycled, and then try to associate again next time to repair the data.
  • the update unit 115 extracts data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, and synchronizes the extracted data.
  • the data is updated to the main table.
  • the second data extraction instruction may also be configured to be triggered periodically, for example, the second data extraction instruction may be triggered the day after the first data extraction instruction is triggered.
  • the updating unit 115 extracts data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, including:
  • the updating unit 115 acquires the current slave table from the increment table, and calculates the union of the current slave table and the recycling table as the updated slave table.
  • the current slave table is also an updated slave table.
  • the current slave table is also incrementally synchronized according to the timestamp range, which is not described here.
  • the union of the current slave table and the recovery table is used as the updated slave table, so as to perform the association again in the current cycle, which effectively avoids data loss.
  • the associating unit 113 associates the updated slave table with the updated master table, and removes successfully associated data from the recycle table.
  • C004 of the customer table does not meet the extraction conditions and is not extracted to the data warehouse. Since A004 of the account table needs to be related to the customer C004 of the main table for related calculations, usually the data of A004 of the account table will be discarded because the records that cannot be related to the main table will be discarded. In this case, the unrelated data will be written into the recycling table , for subsequent use, the data is extracted again in the early morning of the next day, and C004 of the customer table is extracted and entered into the data warehouse.
  • the account table combines the data set extracted the next day with the data from the previous day's recovery table to form a new incremental table.
  • the recovery table completes the data, and the data can be associated.
  • the data that fails to be associated is continuously written to the recycle table, and the recycle table is merged and written to the incremental table in the next incremental cycle, and the association is attempted again. If the link is not associated, it will enter the recovery table again, and the cycle will continue until the link is successful and flows into the next link.
  • the above cycle method effectively reduces the probability of data loss.
  • the successfully associated data is removed from the reclaim table to avoid data redundancy in the reclaim table.
  • This embodiment can solve the problem of loss of associated data caused by asynchronous data in associated tables caused by various reasons, reduce the cost of manual data problem analysis and data supplementation and correction, and enhance the data integrity of the data warehouse.
  • the detected data is determined as the data to be verified
  • the master table, slave table and recovery table can also be deployed on the blockchain to prevent malicious tampering of data.
  • the present application can respond to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction, and extract the data incremental synchronization from the to-be-extracted data table. to the incremental table, and construct the main table according to the extracted data in the incremental table, determine the associated data table associated with the to-be-extracted data table, and obtain the data of the associated data table from the incremental table.
  • the data has been extracted, and a slave table is constructed in the incremental table according to the extracted data, the data in the slave table is associated with the data in the master table, and the data that fails to be associated is obtained, and the data in the slave table is obtained.
  • the data for which the association fails is written into the recovery table to ensure that all data lost in association will be recovered.
  • the extracted data in the extracted data table is incrementally synchronized to the incremental table, and the extracted data is updated to the master table, the current slave table is obtained from the incremental table, and the current slave table and the current slave table are calculated.
  • the union of the recovery table is used as the updated slave table, the updated slave table is associated with the updated master table, and the successfully associated data is removed from the recovery table, thereby solving various problems.
  • the problem of loss of associated data caused by the unsynchronization of associated table data caused by the cause reduces the cost of manual data problem analysis and data supplementation and correction, and enhances the data integrity of the data warehouse.
  • FIG. 3 it is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recovering lost data based on table association in the present application.
  • the electronic device 1 may include a memory 12, a processor 13 and a bus, and may also include a computer program stored in the memory 12 and executable on the processor 13, such as a table association-based lost data recovery program.
  • the electronic device 1 can be either a bus-type structure or a star-shaped structure.
  • the device 1 may also include more or less other hardware or software than shown, or different component arrangements, for example, the electronic device 1 may also include input and output devices, network access devices, and the like.
  • the electronic device 1 is only an example. If other existing or possible electronic products can be adapted to this application, they should also be included in the protection scope of this application, and are incorporated herein by reference. .
  • the memory 12 includes at least one type of computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium may include flash memory, removable hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 12 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 .
  • the memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the electronic device 1 ) card, Flash Card, etc.
  • the memory 12 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 12 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the codes of the lost data recovery program based on table association, etc., but also can be used to temporarily store data that has been output or will be output.
  • the processor 13 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more central processing units.
  • CPU Central Processing Unit
  • the processor 13 is the control core (Control Unit) of the electronic device 1, and uses various interfaces and lines to connect the various components of the entire electronic device 1, by running or executing the programs or modules stored in the memory 12 (such as executing Lost data recovery program based on table association, etc.), and call data stored in the memory 12 to perform various functions of the electronic device 1 and process data.
  • the processor 13 executes the operating system of the electronic device 1 and various installed application programs.
  • the processor 13 executes the application program to implement the steps in each of the foregoing embodiments of the method for recovering lost data based on table association, for example, the steps shown in FIG. 1 .
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to complete the present invention.
  • the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 1 .
  • the computer program may be divided into an acquisition unit 110 , a construction unit 111 , a determination unit 112 , an association unit 113 , a writing unit 114 , and an updating unit 115 .
  • the above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium.
  • the above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute the based on the various embodiments of the present application. Part of the lost data recovery method associated with the table.
  • modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware devices through a computer program, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above method embodiments can be implemented.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , random access memory, etc.
  • the computer-readable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; Use the created data, etc.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one arrow is shown in FIG. 3, but it does not mean that there is only one bus or one type of bus.
  • the bus is arranged to enable connection communication between the memory 12 and at least one processor 13 and the like.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components, preferably, the power source may be logically connected to the at least one processor 13 through a power management device, so as to be implemented by the power management device Charge management, discharge management, and power management functions.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (such as a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • FIG. 3 only shows the electronic device 1 with components 12-13. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include less than shown in the figure. Or more components, or a combination of certain components, or a different arrangement of components.
  • the memory 12 in the electronic device 1 stores multiple instructions to implement a method for recovering lost data based on table association, and the processor 13 can execute the multiple instructions to implement:
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
  • the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a table association-based lost data recovery method and apparatus, a device, and a medium, relating to the field of big data. The method comprises: obtaining a data table to be extracted; extracting and synchronizing a data increment to an increment table, and constructing a master table according to the extracted data; determining an associated data table; obtaining extracted data, and constructing a slave table according to the extracted data; associating data in the slave table with data in the master table, and obtaining failingly associated data (S14); writing the failingly associated data into a recovery table (S15); extracting the data increment from the data table to be extracted and synchronizing the data increment to the incremental table, and updating the extracted data to the master table; obtaining the current slave table from the increment table, and calculating a union of the current slave table and the recovery table as an updated slave table (S17); and associating updated slave table with the updated master table, and removing successfully associated data from the recovery table (S18). The data integrity of a data warehouse is enhanced. The table association-based lost data recovery method further relates to blockchain technology. The master table, the table, and the recovery table can be stored in a blockchain.

Description

基于表关联的丢失数据回收方法、装置、设备及介质Method, device, device and medium for recovering lost data based on table association
本申请要求于2021年01月05日提交中国专利局,申请号为202110005207.8,发明名称为“基于表关联的丢失数据回收方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on January 5, 2021, with the application number of 202110005207.8 and the invention titled "Method, Apparatus, Equipment and Medium for Lost Data Recovery Based on Table Association", the entire contents of which are Incorporated herein by reference.
技术领域technical field
本申请涉及大数据技术领域,尤其涉及一种基于表关联的丢失数据回收方法、装置、设备及介质。The present application relates to the technical field of big data, and in particular, to a method, apparatus, device and medium for recovering lost data based on table association.
背景技术Background technique
在数据仓库(Extract-Transform-Load,ETL)领域,通常的数据处理策略是从源***增量抽取数据到数据仓库***,然后在数据仓库***进行数据的转换和加载。而为了提升效率,减少数据抽取开销,通常会优先采用增量同步的方式,即源***按照增量时间戳同步数据到数据仓库端。In the field of data warehouse (Extract-Transform-Load, ETL), the usual data processing strategy is to incrementally extract data from the source system to the data warehouse system, and then transform and load the data in the data warehouse system. In order to improve efficiency and reduce data extraction overhead, incremental synchronization is usually preferred, that is, the source system synchronizes data to the data warehouse according to the incremental timestamp.
但发明人意识到上述同步方式存在一定弊端。数据仓库的策略是每个表分别抽取,如果两个表在源***是有主从关系的(比如客户表和账户表),但是由于两个表的抽取时间不完全一致,或者由于源***事务管理策略,造成抽取时提交时间不完全一致,或者其他任何原因造成增量时间戳不能保证业务逻辑的一致性,就会造成主从表的增量数据不匹配,那么,在数据仓库端加载表格时,就会出现从表的依赖关键字在主表中找不到的情况,导致后续在数据仓库端的数据转换处理出现数据丢失。However, the inventor realizes that the above synchronization method has certain drawbacks. The strategy of the data warehouse is to extract each table separately. If the two tables have a master-slave relationship in the source system (such as the customer table and the account table), but the extraction time of the two tables is not exactly the same, or due to the source system transaction Management strategy, resulting in inconsistent submission time during extraction, or for any other reason, the incremental timestamp cannot guarantee the consistency of business logic, which will cause the incremental data of the master-slave table to not match. Then, load the table on the data warehouse side , the dependent key of the slave table cannot be found in the master table, resulting in data loss in subsequent data conversion processing on the data warehouse side.
发明内容SUMMARY OF THE INVENTION
鉴于以上内容,有必要提供一种基于表关联的丢失数据回收方法、装置、设备及介质,能够解决各种原因引起的关联表数据不同步造成的关联数据丢失问题,减少了人工数据问题分析和数据补充修正的成本,增强了数据仓库的数据完整性。In view of the above, it is necessary to provide a method, device, equipment and medium for recovering lost data based on table association, which can solve the problem of associated data loss caused by the asynchronous data of associated tables caused by various reasons, and reduce the problem of manual data analysis and analysis. The cost of data supplementation and correction enhances the data integrity of the data warehouse.
本申请的第一方面提供一种基于表关联的丢失数据回收方法,所述基于表关联的丢失数据回收方法包括:A first aspect of the present application provides a method for recovering lost data based on table association, and the method for recovering lost data based on table association includes:
响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;The data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
本申请的第二方面提供一种电子设备,所述电子设备包括处理器和存储器,所述处 理器用于执行所述存储器中存储的至少一个计算机可读指令以实现以下步骤:A second aspect of the present application provides an electronic device comprising a processor and a memory, the processor being configured to execute at least one computer-readable instruction stored in the memory to implement the following steps:
响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;The data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
本申请的第三方面提供一种计算机可读存储介质,所述计算机可读存储介质上存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行以实现以下步骤:A third aspect of the present application provides a computer-readable storage medium on which at least one computer-readable instruction is stored, and the at least one computer-readable instruction is executed by a processor to implement the following steps:
响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;The data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
本申请的第四方面提供一种基于表关联的丢失数据回收装置,所述基于表关联的丢失数据回收包括:A fourth aspect of the present application provides an apparatus for recovering lost data based on table association, wherein the recovery of lost data based on table association includes:
获取单元,用于响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;an acquisition unit, configured to acquire the data table to be extracted from the source system according to the first data extraction instruction in response to the first data extraction instruction;
构建单元,用于从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;A construction unit for extracting data incrementally from the to-be-extracted data table and synchronizing to the incremental table, and constructing a main table according to the extracted data in the incremental table;
确定单元,用于确定与所述待抽取数据表相关联的关联数据表;a determining unit for determining an associated data table associated with the to-be-extracted data table;
所述构建单元,还用于从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;The construction unit is further configured to obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
关联单元,用于将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;an association unit, used for associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
写入单元,用于将所述关联失败的数据写入回收表;a writing unit, used to write the data of the association failure into the recycling table;
更新单元,用于响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更 新至所述主表;an update unit, configured to, in response to a second data extraction instruction for the to-be-extracted data table, extract data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, and updating the extracted data to the main table;
所述更新单元,还用于从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;The update unit is also used to obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
所述关联单元,还用于将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。The associating unit is further configured to associate the updated slave table with the updated master table, and remove the successfully associated data from the recovery table.
由以上技术方案可以看出,本申请能够响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表,从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表,确定与所述待抽取数据表相关联的关联数据表,从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表,将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据,将所述关联失败的数据写入回收表,以保证所有关联丢失的数据都会被回收起来,响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表,从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表,将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除,进而解决各种原因引起的关联表数据不同步造成的关联数据丢失问题,减少了人工数据问题分析和数据补充修正的成本,增强了数据仓库的数据完整性。It can be seen from the above technical solutions that the present application can respond to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction, and extract the data incremental synchronization from the to-be-extracted data table. to the incremental table, and construct the main table according to the extracted data in the incremental table, determine the associated data table associated with the to-be-extracted data table, and obtain the data of the associated data table from the incremental table. The data has been extracted, and a slave table is constructed in the incremental table according to the extracted data, the data in the slave table is associated with the data in the master table, and the data that fails to be associated is obtained, and the data in the slave table is obtained. The data for which the association fails is written into the recovery table to ensure that all data lost in association will be recovered. In response to the second data extraction instruction for the to-be-extracted data table, according to the second data extraction instruction The extracted data in the extracted data table is incrementally synchronized to the incremental table, and the extracted data is updated to the master table, the current slave table is obtained from the incremental table, and the current slave table and the current slave table are calculated. The union of the recovery table is used as the updated slave table, the updated slave table is associated with the updated master table, and the successfully associated data is removed from the recovery table, thereby solving various problems. The problem of loss of associated data caused by the unsynchronization of associated table data caused by the cause reduces the cost of manual data problem analysis and data supplementation and correction, and enhances the data integrity of the data warehouse.
附图说明Description of drawings
图1是本申请基于表关联的丢失数据回收方法的较佳实施例的流程图。FIG. 1 is a flowchart of a preferred embodiment of the method for recovering lost data based on table association in the present application.
图2是本申请基于表关联的丢失数据回收装置的较佳实施例的功能模块图。FIG. 2 is a functional block diagram of a preferred embodiment of the apparatus for recovering lost data based on table association in the present application.
图3是本申请实现基于表关联的丢失数据回收方法的较佳实施例的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recovering lost data based on table association in the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本申请进行详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments.
如图1所示,是本申请基于表关联的丢失数据回收方法的较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。As shown in FIG. 1 , it is a flowchart of a preferred embodiment of the method for recovering lost data based on table association in the present application. According to different requirements, the order of the steps in this flowchart can be changed, and some steps can be omitted.
所述基于表关联的丢失数据回收方法应用于一个或者多个电子设备中,所述电子设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The method for recovering lost data based on table association is applied to one or more electronic devices, and the electronic device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, which Hardware includes but is not limited to microprocessors, application specific integrated circuits (ASICs), programmable gate arrays (Field-Programmable Gate Arrays, FPGAs), digital processors (Digital Signal Processors, DSPs), embedded devices, etc. .
所述电子设备可以是任何一种可与用户进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。The electronic device can be any electronic product that can interact with the user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
所述电子设备还可以包括网络设备和/或用户设备。其中,所述网络设备包括,但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云。The electronic equipment may also include network equipment and/or user equipment. Wherein, the network device includes, but is not limited to, a single network server, a server group formed by multiple network servers, or a cloud formed by a large number of hosts or network servers based on cloud computing (Cloud Computing).
所述电子设备所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。The network where the electronic device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
S10,响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表。S10, in response to the first data extraction instruction, acquire the data table to be extracted from the source system according to the first data extraction instruction.
数据仓库(Extract-Transform-Load,ETL)是用来描述将数据从来源端经过抽取(extract)、 转换(transform)、加载(load)至目的端的过程。A data warehouse (Extract-Transform-Load, ETL) is used to describe the process of extracting, transforming, and loading data from a source to a destination.
其中,所述第一数据抽取指令可以配置为周期性触发,例如:每天定时触发等。Wherein, the first data extraction instruction may be configured to be triggered periodically, for example, periodically triggered every day.
所述源***是指存储数据的源端***,所述源***中的数据被抽取到数据仓库,以供后续使用。The source system refers to a source-end system that stores data, and the data in the source system is extracted to a data warehouse for subsequent use.
通常情况下,数据仓库每天都会从源***抽取增量数据。Typically, a data warehouse pulls incremental data from source systems on a daily basis.
在本实施例中,所述根据所述第一数据抽取指令从源***中获取待抽取数据表包括:In this embodiment, the obtaining the data table to be extracted from the source system according to the first data extraction instruction includes:
解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
获取预设标签;Get the default label;
在所述第一数据抽取指令所携带的信息中查找具有所述预设标签的数据,并将查找到的数据确定为目标表名;Searching for data with the preset label in the information carried by the first data extraction instruction, and determining the found data as the target table name;
从所述源***中获取具有所述目标表名的数据表作为所述待抽取数据表。The data table with the target table name is acquired from the source system as the to-be-extracted data table.
具体地,所述第一数据抽取指令实质上是一条代码,在所述第一数据抽取指令中,根据代码的编写原则,{}之间的内容被称之为所述方法体。Specifically, the first data extraction instruction is essentially a piece of code, and in the first data extraction instruction, according to the code writing principle, the content between {} is called the method body.
所述第一数据抽取指令所携带的信息可以是一个具体的地址,也可以是具体的各种待处理的数据,所述信息的内容主要取决于所述第一数据抽取指令的代码组成。The information carried by the first data extraction instruction may be a specific address or various specific data to be processed, and the content of the information mainly depends on the code composition of the first data extraction instruction.
其中,所述预设标签可以进行自定义配置。Wherein, the preset label can be custom configured.
所述预设标签与表名具有一一对应关系,例如,所述预设标签可以配置为NAME。The preset label and the table name have a one-to-one correspondence, for example, the preset label may be configured as NAME.
通过上述实施方式,能够直接从指令中获取数据,以提升处理效率,并且,以标签进行数据的获取,由于标签的配置具有唯一性,也提高了数据获取的准确性。Through the above-mentioned embodiments, data can be directly obtained from the instructions to improve processing efficiency, and data is obtained by tags, and the accuracy of data acquisition is also improved due to the unique configuration of labels.
S11,从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表。S11 , extracting data from the to-be-extracted data table is incrementally synchronized to the incremental table, and constructs a main table in the incremental table according to the extracted data.
具体地,所述从所述待抽取数据表中抽取数据增量同步至增量表包括:Specifically, the incremental synchronization of extracting data from the to-be-extracted data table to the incremental table includes:
解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围;Parsing the first data extraction instruction to obtain the first time stamp range for data extraction;
从所述待抽取数据表中获取满足所述第一时间戳范围的数据作为备选数据;Obtain data satisfying the first timestamp range from the data table to be extracted as candidate data;
检测所述备选数据中发生变更的数据;detecting changed data in the candidate data;
将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
其中,所述解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围包括:Wherein, the first time stamp range obtained by parsing the first data extraction instruction to obtain the data extraction includes:
解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
获取配置标签;get configuration tag;
在所述第一数据抽取指令所携带的信息中查找具有所述配置的数据,并将查找到的数据确定为所述第一时间戳范围。The data with the configuration is searched in the information carried by the first data extraction instruction, and the found data is determined as the first timestamp range.
例如:按数据变更的时间戳,同步自上次同步以来,到本次同步为止,此间发生变更的数据记录,如果不在这个时间区间内,就判断为不满足抽取条件。For example, according to the time stamp of the data change, the data records that have been changed since the last synchronization until the current synchronization, if they are not within this time interval, are judged to not meet the extraction conditions.
通过上述实施方式,能够首先实现对数据的增量同步,以提高数据同步的效率,降低数据抽取的开销。Through the above embodiments, incremental synchronization of data can be implemented first, so as to improve the efficiency of data synchronization and reduce the overhead of data extraction.
S12,确定与所述待抽取数据表相关联的关联数据表。S12: Determine an associated data table associated with the to-be-extracted data table.
具体地,所述确定与所述待抽取数据表相关联的关联数据表包括:Specifically, the determining the associated data table associated with the to-be-extracted data table includes:
检测与所述待抽取数据表间具有join操作的数据表;Detecting a data table with a join operation with the to-be-extracted data table;
将检测到的数据表确定为所述关联数据表。The detected data table is determined as the associated data table.
可以理解的是,通过join操作,能够实现不同数据表间的表关联。It can be understood that, through the join operation, the table association between different data tables can be realized.
通过上述方式检测到的关联数据表与所述待抽取数据表具有表关联关系,即两个数据表在源***中是具有主从关系的,如客户表和账户表。The associated data table detected in the above manner has a table association relationship with the to-be-extracted data table, that is, the two data tables have a master-slave relationship in the source system, such as the customer table and the account table.
而对于具有主从关系的表格,常常由于两个表的抽取时间不完全一致,或者由于源***事务管理策略,造成抽取时提交时间不完全一致,或者其他任何原因造成增量时间戳不能保证业务逻辑的一致性,就会造成主从表的增量数据不匹配,那么,在数据仓库端加载表格时,就会出现从表的依赖关键字在主表中找不到的情况,导致后续在数据仓库端的数据转换处理 出现数据丢失。For tables with a master-slave relationship, the extraction time of the two tables is often inconsistent, or due to the transaction management strategy of the source system, the submission time during extraction is not completely consistent, or for any other reason, the incremental timestamp cannot guarantee the business. The logical consistency will cause the incremental data of the master and slave tables to not match. Then, when the table is loaded on the data warehouse side, the dependent keywords of the slave table will not be found in the master table, resulting in subsequent Data loss occurs in the data transformation process on the data warehouse side.
例如:源***主从表不在一个事务里更新,造成主从表的更新时间不一样,从而导致数据仓库抽取到主表,但抽取不到从表的数据。For example, the master-slave table of the source system is not updated in a transaction, resulting in different update times of the master-slave table, resulting in the data warehouse extracting the master table, but the data from the slave table cannot be extracted.
因此,针对上述情况,本实施例检测出与所述待抽取数据表具有表关联关系的数据表,以便进行有针对性的处理,避免出现数据丢失。Therefore, in view of the above situation, this embodiment detects a data table that has a table association relationship with the to-be-extracted data table, so as to perform targeted processing and avoid data loss.
在本实施例中,利用所述join操作进行表关联包括:In this embodiment, using the join operation to perform table association includes:
(1)inner join(内连接)(1) inner join (inner join)
至少有一个匹配时返回行,只返回两个表中连接字段相等的行。Returns rows if at least one match, and only returns rows with equal join fields in both tables.
如:select*from ticketSuch as: select*from ticket
inner join jobinner join job
on ticket.id=job.t_idon ticket.id=job.t_id
只查询出,ticket.id=job.t_id的数据。Only query the data of ticket.id=job.t_id.
(2)left join(左连接)(2) left join (left join)
即使右表中没有匹配,也从左表中返回所有的行。Returns all rows from the left table even if there is no match in the right table.
如:select*from ticketSuch as: select*from ticket
left join jobleft join job
on ticket.id=job.t_idon ticket.id=job.t_id
不管ticket.id是不是等于job.t_id,首先返回ticket中的所有数据;如果ticket.id=job.t_id时,返回相应的job数据;如果ticket.id!=job.t_id时,对应的job数据显示为null。Regardless of whether ticket.id is equal to job.t_id, first return all the data in the ticket; if ticket.id=job.t_id, return the corresponding job data; if ticket.id! =job.t_id, the corresponding job data is displayed as null.
(3)right join(右连接)(3) right join (right join)
即使左表中没有匹配,也从右表中返回所有的行。Returns all rows from the right table even if there is no match in the left table.
如:select*from ticketSuch as: select*from ticket
right join jobright join job
on ticket.id=job.t_idon ticket.id=job.t_id
不管ticket.id是不是等于job.t_id,首先返回job中的所有数据;如果ticket.id=job.t_id时返回相应的ticket数据;如果ticket.id!=job.t_id时,对应的ticket数据显示为null。Regardless of whether ticket.id is equal to job.t_id, first return all data in the job; if ticket.id=job.t_id, return the corresponding ticket data; if ticket.id! When =job.t_id, the corresponding ticket data is displayed as null.
(4)full join(外连接)(4) full join (outer join)
只要其中一个表中存在匹配,则返回行(返回两个表中的行)。Returns rows as long as there is a match in one of the tables (returns rows from both tables).
如:select*from ticketSuch as: select*from ticket
full join jobfull join job
on ticket.id=job.t_idon ticket.id=job.t_id
不管ticket.id是不是等于job.t_id,首先返回ticket和job的所有数据;如果ticket.id=job.t_id时,会在相应的ticket数据后显示job数据;如果ticket.id!=job.t_id时,ticket数据和job数据分两行显示,其对应方的数据分别显示null。Regardless of whether ticket.id is equal to job.t_id, all data of ticket and job will be returned first; if ticket.id=job.t_id, job data will be displayed after the corresponding ticket data; if ticket.id! =job.t_id, ticket data and job data are displayed in two lines, and the corresponding data are displayed as null.
S13,从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表。S13: Acquire the extracted data of the associated data table from the incremental table, and construct a secondary table in the incremental table according to the extracted data.
需要说明的是,所述关联数据表中数据的抽取时间与所述待抽取数据表中数据的抽取时间并不一定相同,由于数据仓库中每个表是分别抽取的,因此,抽取时间往往不一致,也就容易造成数据丢失。It should be noted that the extraction time of the data in the associated data table is not necessarily the same as the extraction time of the data in the to-be-extracted data table. Since each table in the data warehouse is extracted separately, the extraction time is often inconsistent. , it is easy to cause data loss.
S14,将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据。S14, associate the data in the slave table with the data in the master table, and acquire the data for which the association fails.
具体地,所述将所述从表中的数据与所述主表中的数据进行关联包括:Specifically, the associating the data in the slave table with the data in the master table includes:
获取所述从表中每条从数据的数据标识,及获取所述主表中每条主数据的数据标识;Obtain the data identification of each piece of slave data in the slave table, and obtain the data identification of each piece of master data in the master table;
调用预先配置的映射表,所述映射表中存储着每条从数据的数据标识与每条主数据的数据标识间的对应关系;Calling the preconfigured mapping table, the mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data;
当在所述映射表中查找到所述从表中第一数据的数据标识与所述主表中第二数据的数据标识具有对应关系时,确定所述第一数据与所述第二数据关联,并确定所述第一数据关联成功;或者When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
当在所述映射表中没有查找到与所述第一数据的数据标识具有对应关系的主数据时,确定所述第一数据关联失败。When no main data corresponding to the data identifier of the first data is found in the mapping table, it is determined that the first data association fails.
例如:当以客户ID与账户ID进行关联时,如果所述映射表中存储了客户ID与账户ID的对应关系,则说明所述客户ID对应的数据与所述账户ID对应的数据关联,所述客户ID对应的数据关联成功;如果在所述映射表中无法查找到与所述客户ID对应的账户ID,则说明所述主表中不存在与所述客户ID对应的数据相关联的数据,则确定所述客户ID对应的数据关联失败。For example: when the customer ID and the account ID are used to associate, if the mapping table stores the corresponding relationship between the customer ID and the account ID, it means that the data corresponding to the customer ID is associated with the data corresponding to the account ID. The data association corresponding to the customer ID is successful; if the account ID corresponding to the customer ID cannot be found in the mapping table, it means that there is no data associated with the data corresponding to the customer ID in the main table. , it is determined that the data association corresponding to the customer ID fails.
S15,将所述关联失败的数据写入回收表。S15, write the data for which the association fails into a recovery table.
需要说明的是,在将所述关联失败的数据写入回收表前,需要先创建所述回收表。It should be noted that, before the data for which the association fails is written into the reclamation table, the reclamation table needs to be created first.
具体地,在将所述关联失败的数据写入回收表前,所述方法还包括:Specifically, before writing the data of the association failure into the recovery table, the method further includes:
识别所述增量表的表结构;identifying the table structure of the incremental table;
根据所述增量表的表结构创建所述增量表的同构表;Create a homogeneous table of the incremental table according to the table structure of the incremental table;
将创建的同构表确定为所述回收表。The created homogeneous table is determined as the recycling table.
通过上述实施方式,创建所述增量表的同构表作为所述回收表,由于表的结构完全一致,能够保证将关联失败的数据更加完整的写入所述回收表,避免造成更多的数据丢失,也使后续的数据回收具有更加全面的数据基础,降低出错率。Through the above implementation manner, the isomorphic table of the incremental table is created as the recovery table. Since the structures of the tables are completely consistent, it can ensure that the data that fails to be associated is written into the recovery table more completely, avoiding causing more Data loss also enables subsequent data recovery to have a more comprehensive data foundation and reduce error rates.
也就是说,回收表是一个动态更新、循环回收的数据表,以保证所有关联丢失的数据都会被回收起来,下次再尝试再次关联,修补数据。That is to say, the recycling table is a dynamically updated and cyclically recycled data table to ensure that all associated lost data will be recycled, and then try to associate again next time to repair the data.
S16,响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表。S16, in response to the second data extraction instruction for the data table to be extracted, extract data increments from the data table to be extracted according to the second data extraction instruction and synchronize to the increment table, and extract the extracted data. Data is updated to the master table.
其中,所述第二数据抽取指令也可以配置为定时触发,例如:所述第二数据抽取指令可以在所述第一数据抽取指令被触发后的第二天触发。The second data extraction instruction may also be configured to be triggered periodically, for example, the second data extraction instruction may be triggered the day after the first data extraction instruction is triggered.
在本实施例中,所述根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表包括:In this embodiment, the incremental synchronization of extracting data from the to-be-extracted data table to the incremental table according to the second data extraction instruction includes:
解析所述第二数据抽取指令,得到数据抽取的第二时间戳范围;Parsing the second data extraction instruction to obtain a second time stamp range for data extraction;
从所述待抽取数据表中获取满足所述第二时间戳范围的数据作为第二备选数据;Obtaining data satisfying the second timestamp range from the to-be-extracted data table as second candidate data;
检测所述第二备选数据中发生变更的数据;detecting changed data in the second candidate data;
将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
S17,从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表。S17: Acquire the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table.
可以理解的是,所述当前的从表也是更新后的从表。It can be understood that the current slave table is also an updated slave table.
同样地,所述当前的从表也是根据时间戳范围进行增量同步的,在此不赘述。Similarly, the current slave table is also incrementally synchronized according to the timestamp range, which is not described here.
在上述实施方式中,以所述当前的从表与所述回收表的并集作为更新后的从表,以便在当前周期内进行再一次的关联,有效避免了数据丢失。In the above embodiment, the union of the current slave table and the recovery table is used as the updated slave table, so as to perform the association again in the current cycle, which effectively avoids data loss.
S18,将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。S18, associate the updated slave table with the updated master table, and remove the successfully associated data from the recovery table.
例如:凌晨抽取数据时,客户表的C004不满足抽取条件,没有被抽取到数据仓库端。由于账户表的A004需要关联主表的客户C004进行相关计算,通常情况下账户表的A004这条数据会因为关联不上主表的记录被丢弃掉,本案将关联不上的数据写入回收表,待后续使用,次日凌晨再次抽取数据,客户表的C004被抽取到,进入数据仓库。账户表将次日抽取的数据集与前一日回收表的数据合并形成新的增量表,回收表补齐了数据,数据即可被关联上。For example, when extracting data in the early morning, C004 of the customer table does not meet the extraction conditions and is not extracted to the data warehouse. Since A004 of the account table needs to be related to the customer C004 of the main table for related calculations, usually the data of A004 of the account table will be discarded because the records that cannot be related to the main table will be discarded. In this case, the unrelated data will be written into the recycling table , for subsequent use, the data is extracted again in the early morning of the next day, and C004 of the customer table is extracted and entered into the data warehouse. The account table combines the data set extracted the next day with the data from the previous day's recovery table to form a new incremental table. The recovery table completes the data, and the data can be associated.
需要说明的是,本实施例不断将关联失败的数据写入到回收表,回收表在下一个增量周期被合并写入增量表,并再次尝试关联,如果关联上,数据就被流入下一环节,如果没有关联上,则再次进入回收表,一直循环,直到关联成功并流入下一个环节,通过上述循环方式,有效降低了数据丢失的概率。It should be noted that in this embodiment, the data that fails to be associated is continuously written to the recycle table, and the recycle table is merged and written to the incremental table in the next incremental cycle, and the association is attempted again. If the link is not associated, it will enter the recovery table again, and the cycle will continue until the link is successful and flows into the next link. The above cycle method effectively reduces the probability of data loss.
同时,将关联成功的数据从所述回收表中移除,以避免所述回收表中的数据冗余。At the same time, the successfully associated data is removed from the reclaim table to avoid data redundancy in the reclaim table.
本实施例能够解决各种原因引起的关联表数据不同步造成的关联数据丢失问题,减少了人工数据问题分析和数据补充修正的成本,增强了数据仓库的数据完整性。This embodiment can solve the problem of loss of associated data caused by asynchronous data in associated tables caused by various reasons, reduce the cost of manual data problem analysis and data supplementation and correction, and enhance the data integrity of the data warehouse.
但是,所述回收表中也可能存在无法被关联的错误数据,因此,还需要定期启动错误发现机制,以便及时剔除错误数据。However, there may also be erroneous data that cannot be associated in the recycling table, therefore, an error discovery mechanism needs to be started periodically to eliminate erroneous data in time.
具体地,所述方法还包括:Specifically, the method further includes:
检测所述回收表中每条数据的已回收时间;Detecting the reclaimed time of each piece of data in the reclaim table;
当检测到有数据的已回收时间大于或者等于预设时长时,将检测到的数据确定为待验证数据;When it is detected that the recovered time of data is greater than or equal to the preset time length, the detected data is determined as the data to be verified;
根据所述待验证数据在所述源***中进行对账;Perform reconciliation in the source system according to the data to be verified;
从所述待验证数据中获取符合对账标准的数据,并将所述符合对账标准的数据保留至所述回收表;Obtain data that meets the reconciliation standard from the data to be verified, and retain the data that meets the reconciliation standard to the recovery form;
从所述待验证数据中获取不符合对账标准的数据,并将所述不符合对账标准的数据从所述回收表中移除。Obtain data that does not meet the reconciliation standard from the data to be verified, and remove the data that does not meet the reconciliation standard from the recovery table.
通过上述实施方式,能够实现对所述回收表中数据的定期更新,避免冗余数据给***造成运行负担。Through the above-mentioned embodiments, it is possible to periodically update the data in the recycling table, so as to avoid redundant data from causing operational burden to the system.
需要说明的是,为了进一步保证数据的安全性,还可以将主表、从表及回收表部署于区块链,以防止数据被恶意篡改。It should be noted that, in order to further ensure the security of data, the master table, slave table and recovery table can also be deployed on the blockchain to prevent malicious tampering of data.
由以上技术方案可以看出,本申请能够响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表,从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表,确定与所述待抽取数据表相关联的关联数据表,从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表,将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据,将所述关联失败的数据写入回收表,以保证所有关联丢失的数据都会被回收起来,响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表,从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表,将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除,进而解决各种原因引起的关联表数据不同步造成的关联数据丢失问题,减少了人工数据问题分析和数据补充修正的成本,增强了数据仓库的数据完整性。It can be seen from the above technical solutions that the present application can respond to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction, and extract the data incremental synchronization from the to-be-extracted data table. to the incremental table, and construct the main table according to the extracted data in the incremental table, determine the associated data table associated with the to-be-extracted data table, and obtain the data of the associated data table from the incremental table. The data has been extracted, and a slave table is constructed in the incremental table according to the extracted data, the data in the slave table is associated with the data in the master table, and the data that fails to be associated is obtained, and the data in the slave table is obtained. The data for which the association fails is written into the recovery table to ensure that all data lost in association will be recovered. In response to the second data extraction instruction for the to-be-extracted data table, according to the second data extraction instruction The extracted data in the extracted data table is incrementally synchronized to the incremental table, and the extracted data is updated to the master table, the current slave table is obtained from the incremental table, and the current slave table and the current slave table are calculated. The union of the recovery table is used as the updated slave table, the updated slave table is associated with the updated master table, and the successfully associated data is removed from the recovery table, thereby solving various problems. The problem of loss of associated data caused by the unsynchronization of associated table data caused by the cause reduces the cost of manual data problem analysis and data supplementation and correction, and enhances the data integrity of the data warehouse.
如图2所示,是本申请基于表关联的丢失数据回收装置的较佳实施例的功能模块图。所述基于表关联的丢失数据回收装置11包括获取单元110、构建单元111、确定单元112、关联单元113、写入单元114、更新单元115。本申请所称的模块/单元是指一种能够被处理器13所执行,并且能够完成固定功能的一系列计算机程序段,其存储在存储器12中。在本实施例中,关于各模块/单元的功能将在后续的实施例中详述。As shown in FIG. 2 , it is a functional block diagram of a preferred embodiment of the apparatus for recovering lost data based on table association in the present application. The apparatus 11 for recovering lost data based on table association includes an acquisition unit 110 , a construction unit 111 , a determination unit 112 , an association unit 113 , a writing unit 114 , and an updating unit 115 . The modules/units referred to in this application refer to a series of computer program segments that can be executed by the processor 13 and can perform fixed functions, and are stored in the memory 12 . In this embodiment, the functions of each module/unit will be described in detail in subsequent embodiments.
响应于第一数据抽取指令,获取单元110根据所述第一数据抽取指令从源***中获取待抽取数据表。In response to the first data extraction instruction, the acquiring unit 110 acquires the data table to be extracted from the source system according to the first data extraction instruction.
数据仓库(Extract-Transform-Load,ETL)是用来描述将数据从来源端经过抽取(extract)、转换(transform)、加载(load)至目的端的过程。Data warehouse (Extract-Transform-Load, ETL) is used to describe the process of extracting, transforming, and loading data from the source to the destination.
其中,所述第一数据抽取指令可以配置为周期性触发,例如:每天定时触发等。Wherein, the first data extraction instruction may be configured to be triggered periodically, for example, periodically triggered every day.
所述源***是指存储数据的源端***,所述源***中的数据被抽取到数据仓库,以供后续使用。The source system refers to a source-end system that stores data, and the data in the source system is extracted to a data warehouse for subsequent use.
通常情况下,数据仓库每天都会从源***抽取增量数据。Typically, a data warehouse pulls incremental data from source systems on a daily basis.
在本实施例中,所述获取单元110根据所述第一数据抽取指令从源***中获取待抽取数据表包括:In this embodiment, the obtaining unit 110 obtaining the data table to be extracted from the source system according to the first data extraction instruction includes:
解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
获取预设标签;Get the default label;
在所述第一数据抽取指令所携带的信息中查找具有所述预设标签的数据,并将查找到的数据确定为目标表名;Searching for data with the preset label in the information carried by the first data extraction instruction, and determining the found data as the target table name;
从所述源***中获取具有所述目标表名的数据表作为所述待抽取数据表。The data table with the target table name is acquired from the source system as the to-be-extracted data table.
具体地,所述第一数据抽取指令实质上是一条代码,在所述第一数据抽取指令中,根据代码的编写原则,{}之间的内容被称之为所述方法体。Specifically, the first data extraction instruction is essentially a piece of code, and in the first data extraction instruction, according to the code writing principle, the content between {} is called the method body.
所述第一数据抽取指令所携带的信息可以是一个具体的地址,也可以是具体的各种待处理的数据,所述信息的内容主要取决于所述第一数据抽取指令的代码组成。The information carried by the first data extraction instruction may be a specific address or various specific data to be processed, and the content of the information mainly depends on the code composition of the first data extraction instruction.
其中,所述预设标签可以进行自定义配置。Wherein, the preset label can be custom configured.
所述预设标签与表名具有一一对应关系,例如,所述预设标签可以配置为NAME。The preset label and the table name have a one-to-one correspondence, for example, the preset label may be configured as NAME.
通过上述实施方式,能够直接从指令中获取数据,以提升处理效率,并且,以标签进行数据的获取,由于标签的配置具有唯一性,也提高了数据获取的准确性。Through the above-mentioned embodiments, data can be directly obtained from the instructions to improve processing efficiency, and data is obtained by tags, and the accuracy of data acquisition is also improved due to the unique configuration of labels.
构建单元111从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表。The construction unit 111 extracts data from the to-be-extracted data table to incrementally synchronize to the incremental table, and constructs a main table in the incremental table according to the extracted data.
具体地,所述构建单元111从所述待抽取数据表中抽取数据增量同步至增量表包括:Specifically, the construction unit 111 extracts data from the to-be-extracted data table to incrementally synchronize to the incremental table, including:
解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围;Parsing the first data extraction instruction to obtain the first time stamp range for data extraction;
从所述待抽取数据表中获取满足所述第一时间戳范围的数据作为备选数据;Obtain data satisfying the first timestamp range from the data table to be extracted as candidate data;
检测所述备选数据中发生变更的数据;detecting changed data in the candidate data;
将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
其中,所述构建单元111解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围包括:Wherein, the construction unit 111 parses the first data extraction instruction, and obtains the first time stamp range for data extraction including:
解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
获取配置标签;get configuration tag;
在所述第一数据抽取指令所携带的信息中查找具有所述配置的数据,并将查找到的数据确定为所述第一时间戳范围。The data with the configuration is searched in the information carried by the first data extraction instruction, and the found data is determined as the first timestamp range.
例如:按数据变更的时间戳,同步自上次同步以来,到本次同步为止,此间发生变更的数据记录,如果不在这个时间区间内,就判断为不满足抽取条件。For example, according to the time stamp of the data change, the data records that have been changed since the last synchronization until the current synchronization, if they are not within this time interval, are judged to not meet the extraction conditions.
通过上述实施方式,能够首先实现对数据的增量同步,以提高数据同步的效率,降低数据抽取的开销。Through the above embodiments, incremental synchronization of data can be implemented first, so as to improve the efficiency of data synchronization and reduce the overhead of data extraction.
确定单元112确定与所述待抽取数据表相关联的关联数据表。The determining unit 112 determines the associated data table associated with the to-be-extracted data table.
具体地,所述确定单元112确定与所述待抽取数据表相关联的关联数据表包括:Specifically, the determining unit 112 determines that the associated data table associated with the to-be-extracted data table includes:
检测与所述待抽取数据表间具有join操作的数据表;Detecting a data table with a join operation with the to-be-extracted data table;
将检测到的数据表确定为所述关联数据表。The detected data table is determined as the associated data table.
可以理解的是,通过join操作,能够实现不同数据表间的表关联。It can be understood that, through the join operation, the table association between different data tables can be realized.
通过上述方式检测到的关联数据表与所述待抽取数据表具有表关联关系,即两个数据表在源***中是具有主从关系的,如客户表和账户表。The associated data table detected in the above manner has a table association relationship with the to-be-extracted data table, that is, the two data tables have a master-slave relationship in the source system, such as the customer table and the account table.
而对于具有主从关系的表格,常常由于两个表的抽取时间不完全一致,或者由于源***事务管理策略,造成抽取时提交时间不完全一致,或者其他任何原因造成增量时间戳不能保证业务逻辑的一致性,就会造成主从表的增量数据不匹配,那么,在数据仓库端加载表格时,就会出现从表的依赖关键字在主表中找不到的情况,导致后续在数据仓库端的数据转换处理出现数据丢失。For tables with a master-slave relationship, the extraction time of the two tables is often inconsistent, or due to the transaction management strategy of the source system, the submission time during extraction is not completely consistent, or for any other reason, the incremental timestamp cannot guarantee the business. The logical consistency will cause the incremental data of the master and slave tables to not match. Then, when the table is loaded on the data warehouse side, the dependent keywords of the slave table will not be found in the master table, resulting in subsequent Data loss occurs in the data transformation process on the data warehouse side.
例如:源***主从表不在一个事务里更新,造成主从表的更新时间不一样,从而导致数据仓库抽取到主表,但抽取不到从表的数据。For example, the master-slave table of the source system is not updated in a transaction, resulting in different update times of the master-slave table, resulting in the data warehouse extracting the master table, but the data from the slave table cannot be extracted.
因此,针对上述情况,本实施例检测出与所述待抽取数据表具有表关联关系的数据表,以便进行有针对性的处理,避免出现数据丢失。Therefore, in view of the above situation, this embodiment detects a data table that has a table association relationship with the to-be-extracted data table, so as to perform targeted processing and avoid data loss.
在本实施例中,利用所述join操作进行表关联包括:In this embodiment, using the join operation to perform table association includes:
(1)inner join(内连接)(1) inner join (inner join)
至少有一个匹配时返回行,只返回两个表中连接字段相等的行。Returns rows if at least one match, and only returns rows with equal join fields in both tables.
如:select*from ticketSuch as: select*from ticket
inner join jobinner join job
on ticket.id=job.t_idon ticket.id=job.t_id
只查询出,ticket.id=job.t_id的数据。Only query the data of ticket.id=job.t_id.
(2)left join(左连接)(2) left join (left join)
即使右表中没有匹配,也从左表中返回所有的行。Returns all rows from the left table even if there is no match in the right table.
如:select*from ticketSuch as: select*from ticket
left join jobleft join job
on ticket.id=job.t_idon ticket.id=job.t_id
不管ticket.id是不是等于job.t_id,首先返回ticket中的所有数据;如果ticket.id=job.t_id时,返回相应的job数据;如果ticket.id!=job.t_id时,对应的job数据显示为null。Regardless of whether ticket.id is equal to job.t_id, first return all the data in the ticket; if ticket.id=job.t_id, return the corresponding job data; if ticket.id! =job.t_id, the corresponding job data is displayed as null.
(3)right join(右连接)(3) right join (right join)
即使左表中没有匹配,也从右表中返回所有的行。Returns all rows from the right table even if there is no match in the left table.
如:select*from ticketSuch as: select*from ticket
right join jobright join job
on ticket.id=job.t_idon ticket.id=job.t_id
不管ticket.id是不是等于job.t_id,首先返回job中的所有数据;如果ticket.id=job.t_id时返回相应的ticket数据;如果ticket.id!=job.t_id时,对应的ticket数据显示为null。Regardless of whether ticket.id is equal to job.t_id, first return all data in the job; if ticket.id=job.t_id, return the corresponding ticket data; if ticket.id! When =job.t_id, the corresponding ticket data is displayed as null.
(4)full join(外连接)(4) full join (outer join)
只要其中一个表中存在匹配,则返回行(返回两个表中的行)。Returns rows as long as there is a match in one of the tables (returns rows from both tables).
如:select*from ticketSuch as: select*from ticket
full join jobfull join job
on ticket.id=job.t_idon ticket.id=job.t_id
不管ticket.id是不是等于job.t_id,首先返回ticket和job的所有数据;如果ticket.id=job.t_id时,会在相应的ticket数据后显示job数据;如果ticket.id!=job.t_id时,ticket数据和job数据分两行显示,其对应方的数据分别显示null。Regardless of whether ticket.id is equal to job.t_id, all data of ticket and job will be returned first; if ticket.id=job.t_id, job data will be displayed after the corresponding ticket data; if ticket.id! =job.t_id, ticket data and job data are displayed in two lines, and the corresponding data are displayed as null.
所述构建单元111从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表。The construction unit 111 acquires the extracted data of the associated data table from the incremental table, and constructs a secondary table in the incremental table according to the extracted data.
需要说明的是,所述关联数据表中数据的抽取时间与所述待抽取数据表中数据的抽取时间并不一定相同,由于数据仓库中每个表是分别抽取的,因此,抽取时间往往不一致,也就容易造成数据丢失。It should be noted that the extraction time of the data in the associated data table is not necessarily the same as the extraction time of the data in the to-be-extracted data table. Since each table in the data warehouse is extracted separately, the extraction time is often inconsistent. , it is easy to cause data loss.
关联单元113将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据。The associating unit 113 associates the data in the slave table with the data in the master table, and acquires the data for which the association fails.
具体地,所述关联单元113将所述从表中的数据与所述主表中的数据进行关联包括:Specifically, the associating unit 113 associates the data in the slave table with the data in the master table including:
获取所述从表中每条从数据的数据标识,及获取所述主表中每条主数据的数据标识;Obtain the data identification of each piece of slave data in the slave table, and obtain the data identification of each piece of master data in the master table;
调用预先配置的映射表,所述映射表中存储着每条从数据的数据标识与每条主数据的数据标识间的对应关系;Calling the preconfigured mapping table, the mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data;
当在所述映射表中查找到所述从表中第一数据的数据标识与所述主表中第二数据的数据标识具有对应关系时,确定所述第一数据与所述第二数据关联,并确定所述第一数据关联成功;或者When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
当在所述映射表中没有查找到与所述第一数据的数据标识具有对应关系的主数据时,确定所述第一数据关联失败。When no main data corresponding to the data identifier of the first data is found in the mapping table, it is determined that the first data association fails.
例如:当以客户ID与账户ID进行关联时,如果所述映射表中存储了客户ID与账户ID的对应关系,则说明所述客户ID对应的数据与所述账户ID对应的数据关联,所述客户ID对应的数据关联成功;如果在所述映射表中无法查找到与所述客户ID对应的账户ID,则说明所述主表中不存在与所述客户ID对应的数据相关联的数据,则确定所述客户ID对应的数据 关联失败。For example: when the customer ID and the account ID are used to associate, if the mapping table stores the corresponding relationship between the customer ID and the account ID, it means that the data corresponding to the customer ID is associated with the data corresponding to the account ID. The data association corresponding to the customer ID is successful; if the account ID corresponding to the customer ID cannot be found in the mapping table, it means that there is no data associated with the data corresponding to the customer ID in the main table. , it is determined that the data association corresponding to the customer ID fails.
写入单元114将所述关联失败的数据写入回收表。The writing unit 114 writes the data for which the association fails into the recycle table.
需要说明的是,在将所述关联失败的数据写入回收表前,需要先创建所述回收表。It should be noted that, before the data for which the association fails is written into the reclamation table, the reclamation table needs to be created first.
具体地,在将所述关联失败的数据写入回收表前,识别所述增量表的表结构;Specifically, before writing the data of the association failure into the recovery table, identify the table structure of the incremental table;
根据所述增量表的表结构创建所述增量表的同构表;Create a homogeneous table of the incremental table according to the table structure of the incremental table;
将创建的同构表确定为所述回收表。The created homogeneous table is determined as the recycling table.
通过上述实施方式,创建所述增量表的同构表作为所述回收表,由于表的结构完全一致,能够保证将关联失败的数据更加完整的写入所述回收表,避免造成更多的数据丢失,也使后续的数据回收具有更加全面的数据基础,降低出错率。Through the above implementation manner, the isomorphic table of the incremental table is created as the recovery table. Since the structures of the tables are completely consistent, it can ensure that the data that fails to be associated is written into the recovery table more completely, avoiding causing more Data loss also enables subsequent data recovery to have a more comprehensive data foundation and reduce error rates.
也就是说,回收表是一个动态更新、循环回收的数据表,以保证所有关联丢失的数据都会被回收起来,下次再尝试再次关联,修补数据。That is to say, the recycling table is a dynamically updated and cyclically recycled data table to ensure that all associated lost data will be recycled, and then try to associate again next time to repair the data.
响应于对所述待抽取数据表的第二数据抽取指令,更新单元115根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表。In response to the second data extraction instruction for the to-be-extracted data table, the update unit 115 extracts data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, and synchronizes the extracted data. The data is updated to the main table.
其中,所述第二数据抽取指令也可以配置为定时触发,例如:所述第二数据抽取指令可以在所述第一数据抽取指令被触发后的第二天触发。The second data extraction instruction may also be configured to be triggered periodically, for example, the second data extraction instruction may be triggered the day after the first data extraction instruction is triggered.
在本实施例中,所述更新单元115根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表包括:In this embodiment, the updating unit 115 extracts data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, including:
解析所述第二数据抽取指令,得到数据抽取的第二时间戳范围;Parsing the second data extraction instruction to obtain a second time stamp range for data extraction;
从所述待抽取数据表中获取满足所述第二时间戳范围的数据作为第二备选数据;Obtaining data satisfying the second timestamp range from the to-be-extracted data table as second candidate data;
检测所述第二备选数据中发生变更的数据;detecting changed data in the second candidate data;
将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
所述更新单元115从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表。The updating unit 115 acquires the current slave table from the increment table, and calculates the union of the current slave table and the recycling table as the updated slave table.
可以理解的是,所述当前的从表也是更新后的从表。It can be understood that the current slave table is also an updated slave table.
同样地,所述当前的从表也是根据时间戳范围进行增量同步的,在此不赘述。Similarly, the current slave table is also incrementally synchronized according to the timestamp range, which is not described here.
在上述实施方式中,以所述当前的从表与所述回收表的并集作为更新后的从表,以便在当前周期内进行再一次的关联,有效避免了数据丢失。In the above embodiment, the union of the current slave table and the recovery table is used as the updated slave table, so as to perform the association again in the current cycle, which effectively avoids data loss.
所述关联单元113将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。The associating unit 113 associates the updated slave table with the updated master table, and removes successfully associated data from the recycle table.
例如:凌晨抽取数据时,客户表的C004不满足抽取条件,没有被抽取到数据仓库端。由于账户表的A004需要关联主表的客户C004进行相关计算,通常情况下账户表的A004这条数据会因为关联不上主表的记录被丢弃掉,本案将关联不上的数据写入回收表,待后续使用,次日凌晨再次抽取数据,客户表的C004被抽取到,进入数据仓库。账户表将次日抽取的数据集与前一日回收表的数据合并形成新的增量表,回收表补齐了数据,数据即可被关联上。For example, when extracting data in the early morning, C004 of the customer table does not meet the extraction conditions and is not extracted to the data warehouse. Since A004 of the account table needs to be related to the customer C004 of the main table for related calculations, usually the data of A004 of the account table will be discarded because the records that cannot be related to the main table will be discarded. In this case, the unrelated data will be written into the recycling table , for subsequent use, the data is extracted again in the early morning of the next day, and C004 of the customer table is extracted and entered into the data warehouse. The account table combines the data set extracted the next day with the data from the previous day's recovery table to form a new incremental table. The recovery table completes the data, and the data can be associated.
需要说明的是,本实施例不断将关联失败的数据写入到回收表,回收表在下一个增量周期被合并写入增量表,并再次尝试关联,如果关联上,数据就被流入下一环节,如果没有关联上,则再次进入回收表,一直循环,直到关联成功并流入下一个环节,通过上述循环方式,有效降低了数据丢失的概率。It should be noted that in this embodiment, the data that fails to be associated is continuously written to the recycle table, and the recycle table is merged and written to the incremental table in the next incremental cycle, and the association is attempted again. If the link is not associated, it will enter the recovery table again, and the cycle will continue until the link is successful and flows into the next link. The above cycle method effectively reduces the probability of data loss.
同时,将关联成功的数据从所述回收表中移除,以避免所述回收表中的数据冗余。At the same time, the successfully associated data is removed from the reclaim table to avoid data redundancy in the reclaim table.
本实施例能够解决各种原因引起的关联表数据不同步造成的关联数据丢失问题,减少了人工数据问题分析和数据补充修正的成本,增强了数据仓库的数据完整性。This embodiment can solve the problem of loss of associated data caused by asynchronous data in associated tables caused by various reasons, reduce the cost of manual data problem analysis and data supplementation and correction, and enhance the data integrity of the data warehouse.
但是,所述回收表中也可能存在无法被关联的错误数据,因此,还需要定期启动错误发现机制,以便及时剔除错误数据。However, there may also be erroneous data that cannot be associated in the recycling table, therefore, an error discovery mechanism needs to be started periodically to eliminate erroneous data in time.
具体地,检测所述回收表中每条数据的已回收时间;Specifically, detecting the recovered time of each piece of data in the recovery table;
当检测到有数据的已回收时间大于或者等于预设时长时,将检测到的数据确定为待验证 数据;When it is detected that the recovered time of data is greater than or equal to the preset time length, the detected data is determined as the data to be verified;
根据所述待验证数据在所述源***中进行对账;Perform reconciliation in the source system according to the data to be verified;
从所述待验证数据中获取符合对账标准的数据,并将所述符合对账标准的数据保留至所述回收表;Obtain data that meets the reconciliation standard from the data to be verified, and retain the data that meets the reconciliation standard to the recovery form;
从所述待验证数据中获取不符合对账标准的数据,并将所述不符合对账标准的数据从所述回收表中移除。Obtain data that does not meet the reconciliation standard from the data to be verified, and remove the data that does not meet the reconciliation standard from the recovery table.
通过上述实施方式,能够实现对所述回收表中数据的定期更新,避免冗余数据给***造成运行负担。Through the above-mentioned embodiments, it is possible to periodically update the data in the recycling table, so as to avoid redundant data from causing operational burden to the system.
需要说明的是,为了进一步保证数据的安全性,还可以将主表、从表及回收表部署于区块链,以防止数据被恶意篡改。It should be noted that, in order to further ensure the security of data, the master table, slave table and recovery table can also be deployed on the blockchain to prevent malicious tampering of data.
由以上技术方案可以看出,本申请能够响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表,从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表,确定与所述待抽取数据表相关联的关联数据表,从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表,将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据,将所述关联失败的数据写入回收表,以保证所有关联丢失的数据都会被回收起来,响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表,从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表,将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除,进而解决各种原因引起的关联表数据不同步造成的关联数据丢失问题,减少了人工数据问题分析和数据补充修正的成本,增强了数据仓库的数据完整性。It can be seen from the above technical solutions that the present application can respond to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction, and extract the data incremental synchronization from the to-be-extracted data table. to the incremental table, and construct the main table according to the extracted data in the incremental table, determine the associated data table associated with the to-be-extracted data table, and obtain the data of the associated data table from the incremental table. The data has been extracted, and a slave table is constructed in the incremental table according to the extracted data, the data in the slave table is associated with the data in the master table, and the data that fails to be associated is obtained, and the data in the slave table is obtained. The data for which the association fails is written into the recovery table to ensure that all data lost in association will be recovered. In response to the second data extraction instruction for the to-be-extracted data table, according to the second data extraction instruction The extracted data in the extracted data table is incrementally synchronized to the incremental table, and the extracted data is updated to the master table, the current slave table is obtained from the incremental table, and the current slave table and the current slave table are calculated. The union of the recovery table is used as the updated slave table, the updated slave table is associated with the updated master table, and the successfully associated data is removed from the recovery table, thereby solving various problems. The problem of loss of associated data caused by the unsynchronization of associated table data caused by the cause reduces the cost of manual data problem analysis and data supplementation and correction, and enhances the data integrity of the data warehouse.
如图3所示,是本申请实现基于表关联的丢失数据回收方法的较佳实施例的电子设备的结构示意图。As shown in FIG. 3 , it is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for recovering lost data based on table association in the present application.
所述电子设备1可以包括存储器12、处理器13和总线,还可以包括存储在所述存储器12中并可在所述处理器13上运行的计算机程序,例如基于表关联的丢失数据回收程序。The electronic device 1 may include a memory 12, a processor 13 and a bus, and may also include a computer program stored in the memory 12 and executable on the processor 13, such as a table association-based lost data recovery program.
本领域技术人员可以理解,所述示意图仅仅是电子设备1的示例,并不构成对电子设备1的限定,所述电子设备1既可以是总线型结构,也可以是星形结构,所述电子设备1还可以包括比图示更多或更少的其他硬件或者软件,或者不同的部件布置,例如所述电子设备1还可以包括输入输出设备、网络接入设备等。Those skilled in the art can understand that the schematic diagram is only an example of the electronic device 1, and does not constitute a limitation on the electronic device 1. The electronic device 1 can be either a bus-type structure or a star-shaped structure. The device 1 may also include more or less other hardware or software than shown, or different component arrangements, for example, the electronic device 1 may also include input and output devices, network access devices, and the like.
需要说明的是,所述电子设备1仅为举例,其他现有的或今后可能出现的电子产品如可适应于本申请,也应包含在本申请的保护范围以内,并以引用方式包含于此。It should be noted that the electronic device 1 is only an example. If other existing or possible electronic products can be adapted to this application, they should also be included in the protection scope of this application, and are incorporated herein by reference. .
其中,存储器12至少包括一种类型的计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质可以包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器12在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。存储器12在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,存储器12还可以既包括电子设备1的内部存储单元也包括外部存储设备。存储器12不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如基于表关联的丢失数据回收程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。The memory 12 includes at least one type of computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium may include flash memory, removable hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like. The memory 12 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the electronic device 1 ) card, Flash Card, etc. Further, the memory 12 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 12 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the codes of the lost data recovery program based on table association, etc., but also can be used to temporarily store data that has been output or will be output.
处理器13在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。处理器13是所述电子设备1的控制核心(Control Unit),利用各种接口和线路 连接整个电子设备1的各个部件,通过运行或执行存储在所述存储器12内的程序或者模块(例如执行基于表关联的丢失数据回收程序等),以及调用存储在所述存储器12内的数据,以执行电子设备1的各种功能和处理数据。The processor 13 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more central processing units. CPU (Central Processing Unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 13 is the control core (Control Unit) of the electronic device 1, and uses various interfaces and lines to connect the various components of the entire electronic device 1, by running or executing the programs or modules stored in the memory 12 (such as executing Lost data recovery program based on table association, etc.), and call data stored in the memory 12 to perform various functions of the electronic device 1 and process data.
所述处理器13执行所述电子设备1的操作***以及安装的各类应用程序。所述处理器13执行所述应用程序以实现上述各个基于表关联的丢失数据回收方法实施例中的步骤,例如图1所示的步骤。The processor 13 executes the operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps in each of the foregoing embodiments of the method for recovering lost data based on table association, for example, the steps shown in FIG. 1 .
示例性的,所述计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器12中,并由所述处理器13执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序在所述电子设备1中的执行过程。例如,所述计算机程序可以被分割成获取单元110、构建单元111、确定单元112、关联单元113、写入单元114、更新单元115。Exemplarily, the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to complete the present invention. Application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 1 . For example, the computer program may be divided into an acquisition unit 110 , a construction unit 111 , a determination unit 112 , an association unit 113 , a writing unit 114 , and an updating unit 115 .
上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、计算机设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述基于表关联的丢失数据回收方法的部分。The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute the based on the various embodiments of the present application. Part of the lost data recovery method associated with the table.
所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指示相关的硬件设备来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。If the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware devices through a computer program, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above method embodiments can be implemented.
其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器等。Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , random access memory, etc.
进一步地,计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer-readable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; Use the created data, etc.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,在图3中仅用一根箭头表示,但并不表示仅有一根总线或一种类型的总线。所述总线被设置为实现所述存储器12以及至少一个处理器13等之间的连接通信。The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one arrow is shown in FIG. 3, but it does not mean that there is only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 and the like.
尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器13逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。Although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components, preferably, the power source may be logically connected to the at least one processor 13 through a power management device, so as to be implemented by the power management device Charge management, discharge management, and power management functions. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入 单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (such as a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.
图3仅示出了具有组件12-13的电子设备1,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 3 only shows the electronic device 1 with components 12-13. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include less than shown in the figure. Or more components, or a combination of certain components, or a different arrangement of components.
结合图1,所述电子设备1中的所述存储器12存储多个指令以实现一种基于表关联的丢失数据回收方法,所述处理器13可执行所述多个指令从而实现:With reference to FIG. 1 , the memory 12 in the electronic device 1 stores multiple instructions to implement a method for recovering lost data based on table association, and the processor 13 can execute the multiple instructions to implement:
响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;The data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
具体地,所述处理器13对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above-mentioned instruction by the processor 13, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 1 , which is not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的***,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim.
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。本申请中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一、第二等词语用来表示名称,而并不表示任何特定的顺序。Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. A plurality of units or devices stated in this application may also be implemented by one unit or device through software or hardware. The words first, second, etc. are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实 施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims (20)

  1. 一种基于表关联的丢失数据回收方法,其中,所述基于表关联的丢失数据回收方法包括:A method for recovering lost data based on table association, wherein the method for recovering lost data based on table association includes:
    响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
    从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;The data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
    确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
    从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
    将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
    将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
    响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
    从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
    将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
  2. 根据权利要求1所述的基于表关联的丢失数据回收方法,其中,所述根据所述第一数据抽取指令从源***中获取待抽取数据表包括:The method for recovering lost data based on table association according to claim 1, wherein the obtaining the data table to be extracted from the source system according to the first data extraction instruction comprises:
    解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
    获取预设标签;Get the default label;
    在所述第一数据抽取指令所携带的信息中查找具有所述预设标签的数据,并将查找到的数据确定为目标表名;Searching for data with the preset label in the information carried by the first data extraction instruction, and determining the found data as the target table name;
    从所述源***中获取具有所述目标表名的数据表作为所述待抽取数据表。The data table with the target table name is acquired from the source system as the to-be-extracted data table.
  3. 根据权利要求1所述的基于表关联的丢失数据回收方法,其中,所述从所述待抽取数据表中抽取数据增量同步至增量表包括:The method for recovering lost data based on table association according to claim 1, wherein the incremental synchronization of extracting data from the to-be-extracted data table to the incremental table comprises:
    解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围;Parsing the first data extraction instruction to obtain the first time stamp range for data extraction;
    从所述待抽取数据表中获取满足所述第一时间戳范围的数据作为备选数据;Obtain data satisfying the first timestamp range from the data table to be extracted as candidate data;
    检测所述备选数据中发生变更的数据;detecting changed data in the candidate data;
    将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
  4. 根据权利要求1所述的基于表关联的丢失数据回收方法,其中,所述确定与所述待抽取数据表相关联的关联数据表包括:The method for recovering lost data based on table association according to claim 1, wherein the determining the associated data table associated with the to-be-extracted data table comprises:
    检测与所述待抽取数据表间具有join操作的数据表;Detecting a data table with a join operation with the to-be-extracted data table;
    将检测到的数据表确定为所述关联数据表。The detected data table is determined as the associated data table.
  5. 根据权利要求1所述的基于表关联的丢失数据回收方法,其中,所述将所述从表中的数据与所述主表中的数据进行关联包括:The method for recovering lost data based on table association according to claim 1, wherein the associating the data in the slave table with the data in the master table comprises:
    获取所述从表中每条从数据的数据标识,及获取所述主表中每条主数据的数据标识;Obtain the data identification of each piece of slave data in the slave table, and obtain the data identification of each piece of master data in the master table;
    调用预先配置的映射表,所述映射表中存储着每条从数据的数据标识与每条主数据的数据标识间的对应关系;Calling the preconfigured mapping table, the mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data;
    当在所述映射表中查找到所述从表中第一数据的数据标识与所述主表中第二数据的数据标识具有对应关系时,确定所述第一数据与所述第二数据关联,并确定所述第一数据关联成功;或者When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
    当在所述映射表中没有查找到与所述第一数据的数据标识具有对应关系的主数据时,确定所述第一数据关联失败。When no main data corresponding to the data identifier of the first data is found in the mapping table, it is determined that the first data association fails.
  6. 根据权利要求1所述的基于表关联的丢失数据回收方法,其中,在将所述关联失败的数据写入回收表前,所述方法还包括:The method for recovering lost data based on table association according to claim 1, wherein, before writing the data of the association failure into the recovery table, the method further comprises:
    识别所述增量表的表结构;identifying the table structure of the incremental table;
    根据所述增量表的表结构创建所述增量表的同构表;Create a homogeneous table of the incremental table according to the table structure of the incremental table;
    将创建的同构表确定为所述回收表。The created homogeneous table is determined as the recycling table.
  7. 根据权利要求1所述的基于表关联的丢失数据回收方法,其中,所述方法还包括:The method for recovering lost data based on table association according to claim 1, wherein the method further comprises:
    定期检测所述回收表中每条数据的已回收时间;Periodically detect the recovered time of each piece of data in the recovery table;
    当检测到有数据的已回收时间大于或者等于预设时长时,将检测到的数据确定为待验证数据;When it is detected that the recovered time of data is greater than or equal to the preset time length, the detected data is determined as the data to be verified;
    根据所述待验证数据在所述源***中进行对账;Perform reconciliation in the source system according to the data to be verified;
    从所述待验证数据中获取符合对账标准的数据,并将所述符合对账标准的数据保留至所述回收表;Obtain data that meets the reconciliation standard from the data to be verified, and retain the data that meets the reconciliation standard to the recovery form;
    从所述待验证数据中获取不符合对账标准的数据,并将所述不符合对账标准的数据从所述回收表中移除。Obtain data that does not meet the reconciliation standard from the data to be verified, and remove the data that does not meet the reconciliation standard from the recovery table.
  8. 一种基于表关联的丢失数据回收装置,其中,所述基于表关联的丢失数据回收装置包括:A device for recovering lost data based on table association, wherein the device for recovering lost data based on table association includes:
    获取单元,用于响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;an acquisition unit, configured to acquire the data table to be extracted from the source system according to the first data extraction instruction in response to the first data extraction instruction;
    构建单元,用于从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;A construction unit for extracting data incrementally from the to-be-extracted data table and synchronizing to the incremental table, and constructing a main table according to the extracted data in the incremental table;
    确定单元,用于确定与所述待抽取数据表相关联的关联数据表;a determining unit for determining an associated data table associated with the to-be-extracted data table;
    所述构建单元,还用于从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;The construction unit is further configured to obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
    关联单元,用于将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;an association unit, used for associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
    写入单元,用于将所述关联失败的数据写入回收表;a writing unit, used to write the data of the association failure into the recycling table;
    更新单元,用于响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;an update unit, configured to, in response to a second data extraction instruction for the to-be-extracted data table, extract data incrementally from the to-be-extracted data table to the incremental table according to the second data extraction instruction, and updating the extracted data to the main table;
    所述更新单元,还用于从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;The update unit is also used to obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
    所述关联单元,还用于将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。The associating unit is further configured to associate the updated slave table with the updated master table, and remove the successfully associated data from the recovery table.
  9. 一种电子设备,其中,所述电子设备包括处理器和存储器,所述处理器用于执行存储器中存储的至少一个计算机可读指令以实现以下步骤:An electronic device, wherein the electronic device includes a processor and a memory, and the processor is configured to execute at least one computer-readable instruction stored in the memory to implement the following steps:
    响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
    从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;The data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table, and the main table is constructed according to the extracted data in the incremental table;
    确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
    从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
    将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
    将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
    响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
    从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
    将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
  10. 根据权利要求9所述的电子设备,其中,在所述根据所述第一数据抽取指令从源***中获取待抽取数据表时,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:The electronic device of claim 9, wherein, when the data table to be extracted is obtained from the source system according to the first data extraction instruction, the processor executes the at least one computer-readable instruction to achieve the following: step:
    解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
    获取预设标签;Get the default label;
    在所述第一数据抽取指令所携带的信息中查找具有所述预设标签的数据,并将查找到的数据确定为目标表名;Searching for data with the preset label in the information carried by the first data extraction instruction, and determining the found data as the target table name;
    从所述源***中获取具有所述目标表名的数据表作为所述待抽取数据表。The data table with the target table name is acquired from the source system as the to-be-extracted data table.
  11. 根据权利要求9所述的电子设备,其中,在所述从所述待抽取数据表中抽取数据增量同步至增量表时,所述处理器执行所述至少一个计算机可读指令以实现以下步骤:The electronic device according to claim 9, wherein, when the extracted data from the to-be-extracted data table is incrementally synchronized to the incremental table, the processor executes the at least one computer-readable instruction to achieve the following: step:
    解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围;Parsing the first data extraction instruction to obtain the first time stamp range for data extraction;
    从所述待抽取数据表中获取满足所述第一时间戳范围的数据作为备选数据;Obtain data satisfying the first timestamp range from the data table to be extracted as candidate data;
    检测所述备选数据中发生变更的数据;detecting changed data in the candidate data;
    将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
  12. 根据权利要求9所述的电子设备,其中,在所述确定与所述待抽取数据表相关联的关联数据表时,所述处理器执行所述至少一个计算机可读指令用以实现以下步骤:The electronic device according to claim 9, wherein, in the determining of the associated data table associated with the to-be-extracted data table, the processor executes the at least one computer-readable instruction to implement the following steps:
    检测与所述待抽取数据表间具有join操作的数据表;Detecting a data table with a join operation with the to-be-extracted data table;
    将检测到的数据表确定为所述关联数据表。The detected data table is determined as the associated data table.
  13. 根据权利要求9所述的电子设备,其中,在所述将所述从表中的数据与所述主表中的数据进行关联时,所述处理器执行所述至少一个计算机可读指令用以实现以下步骤:9. The electronic device of claim 9, wherein in the associating data in the slave table with data in the master table, the processor executes the at least one computer-readable instruction to Implement the following steps:
    获取所述从表中每条从数据的数据标识,及获取所述主表中每条主数据的数据标识;Obtain the data identification of each piece of slave data in the slave table, and obtain the data identification of each piece of master data in the master table;
    调用预先配置的映射表,所述映射表中存储着每条从数据的数据标识与每条主数据的数据标识间的对应关系;Calling the preconfigured mapping table, the mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data;
    当在所述映射表中查找到所述从表中第一数据的数据标识与所述主表中第二数据的数据标识具有对应关系时,确定所述第一数据与所述第二数据关联,并确定所述第一数据关联成功;或者When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
    当在所述映射表中没有查找到与所述第一数据的数据标识具有对应关系的主数据时,确定所述第一数据关联失败。When no main data corresponding to the data identifier of the first data is found in the mapping table, it is determined that the first data association fails.
  14. 根据权利要求9所述的电子设备,其中,在将所述关联失败的数据写入回收表前,所述处理器执行所述至少一个计算机可读指令还用以实现以下步骤:The electronic device according to claim 9, wherein before writing the data of the association failure into the reclamation table, the processor executes the at least one computer-readable instruction to further implement the following steps:
    识别所述增量表的表结构;identifying the table structure of the incremental table;
    根据所述增量表的表结构创建所述增量表的同构表;Create a homogeneous table of the incremental table according to the table structure of the incremental table;
    将创建的同构表确定为所述回收表。The created homogeneous table is determined as the recycling table.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行时实现以下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores at least one computer-readable instruction, and the at least one computer-readable instruction implements the following steps when executed by a processor:
    响应于第一数据抽取指令,根据所述第一数据抽取指令从源***中获取待抽取数据表;In response to the first data extraction instruction, obtain the data table to be extracted from the source system according to the first data extraction instruction;
    从所述待抽取数据表中抽取数据增量同步至增量表,并在所述增量表中根据抽取的数据构建主表;Extracting data from the to-be-extracted data table is incrementally synchronized to the incremental table, and constructs a main table in the incremental table according to the extracted data;
    确定与所述待抽取数据表相关联的关联数据表;determining an associated data table associated with the to-be-extracted data table;
    从所述增量表中获取所述关联数据表的已抽取数据,并在所述增量表中根据所述已抽取数据构建从表;Obtain the extracted data of the associated data table from the incremental table, and construct a slave table in the incremental table according to the extracted data;
    将所述从表中的数据与所述主表中的数据进行关联,并获取关联失败的数据;Associating the data in the slave table with the data in the master table, and obtaining the data for which the association fails;
    将所述关联失败的数据写入回收表;Write the data of the association failure into the recycling table;
    响应于对所述待抽取数据表的第二数据抽取指令,根据所述第二数据抽取指令从所述待抽取数据表中抽取数据增量同步至所述增量表,并将抽取的数据更新至所述主表;In response to the second data extraction instruction for the to-be-extracted data table, the data extracted from the to-be-extracted data table is incrementally synchronized to the incremental table according to the second data extraction instruction, and the extracted data is updated to the main table;
    从所述增量表中获取当前的从表,并计算所述当前的从表与所述回收表的并集作为更新后的从表;Obtain the current slave table from the incremental table, and calculate the union of the current slave table and the recovery table as the updated slave table;
    将所述更新后的从表与更新后的主表进行关联,并将关联成功的数据从所述回收表中移除。Associating the updated slave table with the updated master table, and removing successfully associated data from the recycling table.
  16. 根据权利要求15所述的存储介质,其中,在所述根据所述第一数据抽取指令从源***中获取待抽取数据表时,所述至少一个计算机可读指令被处理器执行以实现以下步骤:The storage medium of claim 15, wherein, when the data table to be extracted is obtained from the source system according to the first data extraction instruction, the at least one computer-readable instruction is executed by the processor to implement the following steps :
    解析所述第一数据抽取指令的方法体,得到所述第一数据抽取指令所携带的信息;Parsing the method body of the first data extraction instruction to obtain the information carried by the first data extraction instruction;
    获取预设标签;Get the default label;
    在所述第一数据抽取指令所携带的信息中查找具有所述预设标签的数据,并将查找到的数据确定为目标表名;Searching for data with the preset label in the information carried by the first data extraction instruction, and determining the found data as the target table name;
    从所述源***中获取具有所述目标表名的数据表作为所述待抽取数据表。The data table with the target table name is acquired from the source system as the to-be-extracted data table.
  17. 根据权利要求15所述的存储介质,其中,在所述从所述待抽取数据表中抽取数据增量同步至增量表时,所述至少一个计算机可读指令被处理器执行以实现以下步骤:16. The storage medium of claim 15, wherein the at least one computer-readable instruction is executed by a processor to implement the following steps when the extracted data from the to-be-extracted data table is incrementally synchronized to the incremental table :
    解析所述第一数据抽取指令,得到数据抽取的第一时间戳范围;Parsing the first data extraction instruction to obtain the first time stamp range for data extraction;
    从所述待抽取数据表中获取满足所述第一时间戳范围的数据作为备选数据;Obtain data satisfying the first timestamp range from the data table to be extracted as candidate data;
    检测所述备选数据中发生变更的数据;detecting changed data in the candidate data;
    将所述发生变更的数据同步至所述增量表。Synchronize the changed data to the delta table.
  18. 根据权利要求15所述的存储介质,其中,在所述确定与所述待抽取数据表相关联的关联数据表时,所述至少一个计算机可读指令被处理器执行用以实现以下步骤:16. The storage medium of claim 15, wherein, in the determining of an associated data table associated with the to-be-extracted data table, the at least one computer-readable instruction is executed by a processor to implement the following steps:
    检测与所述待抽取数据表间具有join操作的数据表;Detecting a data table with a join operation with the to-be-extracted data table;
    将检测到的数据表确定为所述关联数据表。The detected data table is determined as the associated data table.
  19. 根据权利要求15所述的存储介质,其中,在所述将所述从表中的数据与所述主表中的数据进行关联时,所述至少一个计算机可读指令被处理器执行时用以实现以下步骤:16. The storage medium of claim 15, wherein in the associating data in the slave table with data in the master table, the at least one computer readable instruction when executed by a processor to Implement the following steps:
    获取所述从表中每条从数据的数据标识,及获取所述主表中每条主数据的数据标识;Obtain the data identification of each slave data in the slave table, and obtain the data identification of each master data in the master table;
    调用预先配置的映射表,所述映射表中存储着每条从数据的数据标识与每条主数据的数据标识间的对应关系;Calling the preconfigured mapping table, the mapping table stores the corresponding relationship between the data identification of each slave data and the data identification of each master data;
    当在所述映射表中查找到所述从表中第一数据的数据标识与所述主表中第二数据的数据标识具有对应关系时,确定所述第一数据与所述第二数据关联,并确定所述第一数据关联成功;或者When it is found in the mapping table that the data identifier of the first data in the slave table has a corresponding relationship with the data identifier of the second data in the master table, it is determined that the first data is associated with the second data , and determine that the first data association is successful; or
    当在所述映射表中没有查找到与所述第一数据的数据标识具有对应关系的主数据时,确定所述第一数据关联失败。When no main data having a corresponding relationship with the data identifier of the first data is found in the mapping table, it is determined that the first data association fails.
  20. 根据权利要求15所述的存储介质,其中,在将所述关联失败的数据写入回收表前,所述至少一个计算机可读指令被处理器执行还用以实现以下步骤:The storage medium of claim 15, wherein, before writing the data of the association failure into a reclamation table, the at least one computer-readable instruction is executed by the processor to further implement the following steps:
    识别所述增量表的表结构;identifying the table structure of the incremental table;
    根据所述增量表的表结构创建所述增量表的同构表;Create a homogeneous table of the incremental table according to the table structure of the incremental table;
    将创建的同构表确定为所述回收表。The created homogeneous table is determined as the recycling table.
PCT/CN2021/083104 2021-01-05 2021-03-25 Table association-based lost data recovery method and apparatus, device, and medium WO2022147908A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110005207.8 2021-01-05
CN202110005207.8A CN112328677B (en) 2021-01-05 2021-01-05 Lost data recovery method, device, equipment and medium based on table association

Publications (1)

Publication Number Publication Date
WO2022147908A1 true WO2022147908A1 (en) 2022-07-14

Family

ID=74302154

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083104 WO2022147908A1 (en) 2021-01-05 2021-03-25 Table association-based lost data recovery method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN112328677B (en)
WO (1) WO2022147908A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251448A (en) * 2023-09-18 2023-12-19 北京数方科技有限公司 Method and device for processing data of wide-table zipper table
CN117349377A (en) * 2023-10-08 2024-01-05 中电云计算技术有限公司 Main external key table data synchronization method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328677B (en) * 2021-01-05 2021-04-02 平安科技(深圳)有限公司 Lost data recovery method, device, equipment and medium based on table association
CN113420057A (en) * 2021-06-29 2021-09-21 未鲲(上海)科技服务有限公司 Account checking data processing method and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799634A (en) * 2012-06-26 2012-11-28 中国农业银行股份有限公司 Data storage method and device
CN105320680A (en) * 2014-07-15 2016-02-10 ***通信集团公司 Data synchronization method and device
CN106407360A (en) * 2016-09-07 2017-02-15 广州视源电子科技股份有限公司 Data processing method and device
CN106933823A (en) * 2015-12-29 2017-07-07 北京国双科技有限公司 Method of data synchronization and device
CN109408565A (en) * 2018-10-19 2019-03-01 浪潮软件集团有限公司 Data synchronous interaction method, system and data interaction platform
CN110347672A (en) * 2019-05-27 2019-10-18 深圳壹账通智能科技有限公司 Verification method and device, the electronic equipment and storage medium of tables of data related update
US20190347345A1 (en) * 2018-05-14 2019-11-14 Sap Se Database independent detection of data changes
CN112328677A (en) * 2021-01-05 2021-02-05 平安科技(深圳)有限公司 Lost data recovery method, device, equipment and medium based on table association

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012081071A1 (en) * 2010-12-13 2012-06-21 株式会社 日立製作所 Storage device and method of detecting power failure in storage device
US8874505B2 (en) * 2011-01-11 2014-10-28 Hitachi, Ltd. Data replication and failure recovery method for distributed key-value store
JP6499958B2 (en) * 2015-12-22 2019-04-10 日立オートモティブシステムズ株式会社 Vehicle fault diagnosis device
CN107169003B (en) * 2017-03-31 2020-05-22 北京奇艺世纪科技有限公司 Data association method and device
CN110908995B (en) * 2018-09-17 2023-04-11 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN112015790A (en) * 2019-05-30 2020-12-01 北京沃东天骏信息技术有限公司 Data processing method and device
CN112035463B (en) * 2020-07-22 2023-07-21 武汉达梦数据库股份有限公司 Bidirectional synchronization method and synchronization device of heterogeneous database based on log analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799634A (en) * 2012-06-26 2012-11-28 中国农业银行股份有限公司 Data storage method and device
CN105320680A (en) * 2014-07-15 2016-02-10 ***通信集团公司 Data synchronization method and device
CN106933823A (en) * 2015-12-29 2017-07-07 北京国双科技有限公司 Method of data synchronization and device
CN106407360A (en) * 2016-09-07 2017-02-15 广州视源电子科技股份有限公司 Data processing method and device
US20190347345A1 (en) * 2018-05-14 2019-11-14 Sap Se Database independent detection of data changes
CN109408565A (en) * 2018-10-19 2019-03-01 浪潮软件集团有限公司 Data synchronous interaction method, system and data interaction platform
CN110347672A (en) * 2019-05-27 2019-10-18 深圳壹账通智能科技有限公司 Verification method and device, the electronic equipment and storage medium of tables of data related update
CN112328677A (en) * 2021-01-05 2021-02-05 平安科技(深圳)有限公司 Lost data recovery method, device, equipment and medium based on table association

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251448A (en) * 2023-09-18 2023-12-19 北京数方科技有限公司 Method and device for processing data of wide-table zipper table
CN117251448B (en) * 2023-09-18 2024-04-30 北京数方科技有限公司 Method and device for processing data of wide-table zipper table
CN117349377A (en) * 2023-10-08 2024-01-05 中电云计算技术有限公司 Main external key table data synchronization method and system
CN117349377B (en) * 2023-10-08 2024-05-10 中电云计算技术有限公司 Main external key table data synchronization method and system

Also Published As

Publication number Publication date
CN112328677B (en) 2021-04-02
CN112328677A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
WO2022147908A1 (en) Table association-based lost data recovery method and apparatus, device, and medium
CN112653760B (en) Cross-server file transmission method and device, electronic equipment and storage medium
CN112559535B (en) Multithreading-based asynchronous task processing method, device, equipment and medium
CN111538573A (en) Asynchronous task processing method and device and computer readable storage medium
CN115118738B (en) Disaster recovery method, device, equipment and medium based on RDMA
CN113806434B (en) Big data processing method, device, equipment and medium
CN115543198A (en) Method and device for lake entering of unstructured data, electronic equipment and storage medium
CN111986765A (en) Electronic case entity marking method, device, computer equipment and storage medium
CN111694843A (en) Missing number detection method and device, electronic equipment and storage medium
CN114185776A (en) Big data point burying method, device, equipment and medium for application program
CN115002062B (en) Message processing method, device, equipment and readable storage medium
CN111429085A (en) Contract data generation method and device, electronic equipment and storage medium
CN114816371B (en) Message processing method, device, equipment and medium
CN113254446B (en) Data fusion method, device, electronic equipment and medium
CN115687384A (en) UUID (user identifier) identification generation method, device, equipment and storage medium
WO2022134820A1 (en) Webpage data extraction method and apparatus, electronic device, and storage medium
CN114741422A (en) Query request method, device, equipment and medium
CN114626103A (en) Data consistency comparison method, device, equipment and medium
CN114547011A (en) Data extraction method and device, electronic equipment and storage medium
CN113419718A (en) Data transmission method, device, equipment and medium
CN112686759A (en) Account checking monitoring method, device, equipment and medium
CN111857883A (en) Page data checking method and device, electronic equipment and storage medium
CN115065642B (en) Code table request method, device, equipment and medium under bandwidth limitation
CN114139199A (en) Data desensitization method, apparatus, device and medium
CN114860349B (en) Data loading method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21916959

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21916959

Country of ref document: EP

Kind code of ref document: A1