CN117251460B - Data consistency check system for graph database and relational database - Google Patents

Data consistency check system for graph database and relational database Download PDF

Info

Publication number
CN117251460B
CN117251460B CN202311002776.2A CN202311002776A CN117251460B CN 117251460 B CN117251460 B CN 117251460B CN 202311002776 A CN202311002776 A CN 202311002776A CN 117251460 B CN117251460 B CN 117251460B
Authority
CN
China
Prior art keywords
information
node
checking
verification
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311002776.2A
Other languages
Chinese (zh)
Other versions
CN117251460A (en
Inventor
郝磊
郭志扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhanlue Data Technology Co ltd
Original Assignee
Shanghai Zhanlue Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhanlue Data Technology Co ltd filed Critical Shanghai Zhanlue Data Technology Co ltd
Priority to CN202311002776.2A priority Critical patent/CN117251460B/en
Publication of CN117251460A publication Critical patent/CN117251460A/en
Application granted granted Critical
Publication of CN117251460B publication Critical patent/CN117251460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of databases, in particular to a data consistency check system of a graph database and a relational database, a node acquisition module is used for selecting an initial node in the graph database and positioning a source table of the initial node in the relational database; the analysis module is used for analyzing the relation information of a plurality of edges connected with the initial node; the first verification module is used for verifying the initial node and the corresponding relation information; the second checking module is used for acquiring the number of second nodes connected with the initial node, calculating checking complexity of any second node and determining a corresponding checking strategy according to the checking complexity; the central control module is used for matching the data entity in the source table with the information of the second node, calling the corresponding two source tables of the data entity in the relational database, and checking the relational information of the second node with the information in the corresponding two source tables.

Description

Data consistency check system for graph database and relational database
Technical Field
The invention relates to the technical field of databases, in particular to a data consistency check system of a graph database and a relational database.
Background
In the data warehouse construction process, the physical environment of different platforms and the same platform is always involved in the data warehouse construction process, and because the data warehouse has huge data quantity, the data movement can not be completed in a short period, and the data synchronization, verification, batch tracking and running and rechecking processes can be involved.
The data comparison and verification is always a difficult problem in data migration, and only through the data comparison and verification, the accuracy and the completeness of data migration can be ensured, whether all updates on a production library are applied to a target end or not is confirmed, namely, the data consistency of the target end and the production library is verified, and the risk caused by data difference is avoided. In the process of migrating from orac le to mysql, which relates to the problem of data comparison and verification of heterogeneous databases, we consider how to quickly verify the accuracy of data, orac le provides a comparison software for comparing the data synchronization effect between databases by using a Orac le Go ldenGate Ver idata tool, which supports data comparison of large data amount, can compare data without stopping data synchronization, unfortunately, mysql cannot be supported at present, and for data comparison between heterogeneous databases, if no effective verification tool exists, we need to know the principle and mode of data comparison and verification.
Chinese patent publication No.: CN108280159B discloses a method for converting a graph database into a relational database, firstly, building a structure model of attribute data in the graph database based on generalized directed hypergraph, building a two-dimensional data table for each node in the structure model, and then building the data table according to directed edges and additional directed edges; describing nodes, directed edges, labels, a graph database, node attributes and directed edge attributes in a graph database by using a generalized directed hypergraph, establishing a data storage characteristic description model, establishing a two-dimensional data table for each node in the description model, and constructing the data table according to the directed edges; and then, the constructed data tables are arranged, a database and the data tables are constructed in a relational database management system, the data in the graph database is traversed, and relevant data information is filled in the two-dimensional data tables of the relational database. The invention can accurately realize the conversion from the graph database to the relational database, and the obtained relational database has reasonable structure.
It can be seen that data consistency in the transformed database needs to be checked after data transformation, however, the accuracy of data consistency check of the graph database and the relational database needs to be improved in the prior art.
Disclosure of Invention
Therefore, the invention provides a data consistency verification system for a graph database and a relational database, which is used for solving the problem of poor accuracy of data consistency verification for the graph database and the relational database in the prior art.
In order to achieve the above object, the present invention provides a data consistency check system for a graph database and a relational database, including:
the node acquisition module is used for selecting an initial node in the graph database and positioning a source table of the initial node in the relational database;
the analysis module is connected with the node acquisition module and used for analyzing the relation information of a plurality of edges connected with the initial node, wherein the edges comprise directed edges, undirected edges and additional directed edges;
the first verification module is respectively connected with the node acquisition module and the analysis module and is used for verifying the initial node and the corresponding relation information with the information in the source table through the time stamp and the MD5 value;
the second verification module is respectively connected with the first verification module and the analysis module, and is used for acquiring the number of second nodes connected with the initial node, calculating the verification complexity of any second node under the condition that the number of the second nodes is larger than a number preset value, correcting the verification complexity through the number of the second nodes and the number of edges connected with the second nodes, and determining a corresponding verification strategy according to the comparison result of the corrected verification complexity and the verification complexity comparison parameter;
the central control module is respectively connected with the first verification module and the second verification module, and is used for calling a two-source table corresponding to the data entity in the source table matched with the information of the second node, verifying the relation information of the second node and the information in the corresponding two-source table by adopting a corresponding verification strategy, and storing the information of the unmatched second node and the data entity into a database to be verified;
and the database to be checked is connected with the second checking module and the central control module and is used for storing unmatched data in the graph database and the relational database.
Further, the first verification module sorts the analyzed relation information according to the time stamp, sorts the information in the source table according to the time stamp, matches the sorted relation information with the information in the sorted source table, extracts the information with the same time stamp, calculates the character string MD5 value of each piece of information with the same time stamp,
if the relation information with the same time stamp is the same as the MD5 value of the character string corresponding to the information in the source table, the first verification module judges that the data consistency of the corresponding information accords with a standard;
if the relation information with the same time stamp is different from the MD5 value of the character string corresponding to the information in the source table, the first verification module judges that the corresponding relation information needs to be further verified.
Further, the second checking module extracts second nodes connected with the initial nodes, counts the number of the second nodes,
if the number of the second nodes is larger than the number preset value, the second checking module calculates checking complexity of any second node, and determines a corresponding checking strategy according to the checking complexity;
and if the number of the second nodes is smaller than or equal to the number preset value, checking according to the checking mode of the initial node related information.
Further, the second checking module calculates checking complexity F of any second node according to the following formula under the condition that the number of the second nodes is larger than a number preset value;
wherein N is the number of nodes connected to the second node, M is the number of directed edges connected to the second node, J is the number of undirected edges connected to the second node, and K is the number of additional directed edges connected to the second node.
Further, the second checking module calculates the sum A1 of the number N of nodes connected with the second node and the number J of undirected edges, sets a1=n+j, calculates the sum A2 of the number M of directed edges connected with the second node and the number K of directed edges attached, sets a2=m+k, calculates the ratio σ of A1 and A2, sets σ=a1/A2, compares the ratio σ with the standard ratio σ0, and if σ > σ0, the second checking module determines to correct the checking complexity.
Further, the second checking module calculates the difference value between the ratio sigma and the standard ratio sigma 0, and a plurality of ways for correcting the checking complexity according to the difference value are arranged in the second checking module;
wherein, each correction mode is different in correction size for the check complexity.
Further, a check complexity comparison parameter F0 is arranged in the second check module, the second check module compares the corrected check complexity F' with the check complexity comparison parameter F0 to determine the check complexity level of any second node, and a corresponding check strategy is determined according to the check complexity level,
if F' is less than or equal to F0, the second verification module judges that the verification complexity of the corresponding second node is at a first verification complexity level, and controls the central control module to verify the consistency of the data by adopting a first verification strategy;
if F' > F0, the second verification module judges that the verification complexity of the corresponding second node is at a second verification complexity level, and controls the central control module to verify the consistency of the data by adopting a second verification strategy.
Further, the central control module obtains the data entity in the source table and matches the data entity with the information of the second node,
if the data entity matched with the information of the second node exists, a corresponding two-source table of the data entity in the relational database is called, and the relational information of the second node is checked with the information in the corresponding two-source table;
if the unmatched information of the second node and the data entity exist, the relation information between the second node and the initial node and the time stamp of the data entity are recorded, hash values of the relation information and the data entity are calculated, and the hash values are stored in a database to be verified.
Further, under the first verification policy, the central control module calculates the count value of the relationship information of the second node and the two source tables of the corresponding data entities respectively through a count function, and if the error rate of the count value is less than or equal to 0.1, the central control module judges that the data consistency of the corresponding information meets the standard;
if the error rate of the count value is greater than 0.1, the central control module judges that the data consistency of the corresponding information does not accord with the standard.
Further, the central control module performs slicing processing on the relation information of the second node and the information in the two-source table of the corresponding data entity under the second checking strategy, calculates the MD5 value of the information of any slice of the relation information for any relation information, calculates the average value of the MD5 values of the information of the slices, performs slicing processing on the information in any two-source table by taking the average value as the MD5 value of the relation information, takes the average MD5 value of the slice information as the MD5 value of the corresponding information in the two-source table,
if the MD5 value of the relation information is the same as the MD5 value of the information in the two-source table, the central control module judges that the data consistency of the corresponding information accords with the standard;
if the MD5 value of the relation information is different from the MD5 value of the information in the two-source table, the central control module judges that the corresponding information needs to be further checked.
Compared with the prior art, the method has the beneficial effects that by introducing the checking complexity, different checking strategies are adopted in the data consistency checking, and the checking efficiency and the checking accuracy are improved.
Further, since the initial node in the graph database corresponds to the source table in the relational database, if the relation information corresponding to the initial node is the same as the timestamp of the information in the source table and the MD5 value of the corresponding character string is the same, the data consistency of the corresponding information can be judged.
Further, after the data consistency verification of the information of the initial node and the relational database is completed, the second node connected with the initial node is verified, and the number of the second nodes reflects the data verification complexity.
Further, the invention introduces the check complexity F, which is a characteristic parameter of the data check complexity and is related to the number of nodes and the number of edges connected with the nodes, and the invention reflects the difficulty of checking the second nodes by introducing the check complexity, thereby pertinently selecting a proper check strategy and further improving the accuracy of data consistency check.
Further, the complexity and the information content of the relation information corresponding to the nodes and the undirected edges are generally higher than those of the relation information corresponding to the directed edges and those of the relation information corresponding to the attached directed edges, so that the method and the device calculate the sum A1 of the number N of the nodes and the number J of the undirected edges, the sum A2 of the number M of the directed edges connected with the second nodes and the number K of the attached directed edges, calculate the ratio sigma of the A1 and the A2, and evaluate the check complexity of the second nodes again through the ratio sigma to correct the calculated check complexity F, so that the check complexity of the second nodes is more objectively and accurately reflected, the adopted check strategy is more targeted, and the accuracy of data consistency check is improved.
Further, by setting the check complexity comparison parameter F0 to select a proper check strategy for the second node to be checked, and adopting the first check strategy when the corrected check complexity is smaller than or equal to the check complexity comparison parameter, the check efficiency is improved.
Furthermore, the node with larger checking complexity adopts a second checking strategy, the second checking strategy performs slicing processing on the information, the data slicing can be performed according to different dimensions, attributes, time and the like, more refined and accurate data can be obtained, the accuracy of data checking is improved, the processing and analysis are more convenient through slicing processing, the load is reduced, and the efficiency of data checking is improved.
Drawings
FIG. 1 is a block diagram illustrating a data consistency check system for databases and relational databases according to an embodiment of the present invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, which is a block diagram of a data consistency check system for a graph database and a relational database according to an embodiment of the present invention, the data consistency check system for a graph database and a relational database according to the present invention includes:
the node acquisition module is used for selecting an initial node in the graph database and positioning a source table of the initial node in the relational database;
the analysis module is connected with the node acquisition module and used for analyzing the relation information of a plurality of edges connected with the initial node, wherein the edges comprise directed edges, undirected edges and additional directed edges;
the first verification module is respectively connected with the node acquisition module and the analysis module and is used for verifying the initial node and the corresponding relation information with the information in the source table through the time stamp and the MD5 value;
the second verification module is respectively connected with the first verification module and the analysis module, and is used for acquiring the number of second nodes connected with the initial node, calculating the verification complexity of any second node under the condition that the number of the second nodes is larger than a number preset value, correcting the verification complexity through the number of the second nodes and the number of edges connected with the second nodes, and determining a corresponding verification strategy according to the comparison result of the corrected verification complexity and the verification complexity comparison parameter;
the central control module is respectively connected with the first verification module and the second verification module, and is used for calling a two-source table corresponding to the data entity in the source table matched with the information of the second node, verifying the relation information of the second node and the information in the corresponding two-source table by adopting a corresponding verification strategy, and storing the information of the unmatched second node and the data entity into a database to be verified;
and the database to be checked is connected with the second checking module and the central control module and is used for storing unmatched data in the graph database and the relational database.
Specifically, the node acquisition module randomly selects any node in the graph database as an initial node.
Specifically, the first verification module sorts the analyzed relation information according to the time stamp, sorts the information in the source table according to the time stamp, matches the sorted relation information with the information in the sorted source table, extracts the information with the same time stamp, calculates the character string MD5 value of each piece of information with the same time stamp,
if the relation information with the same time stamp is the same as the MD5 value of the character string corresponding to the information in the source table, the first verification module judges that the data consistency of the corresponding information accords with a standard;
if the relation information with the same time stamp is different from the MD5 value of the character string corresponding to the information in the source table, the first verification module judges that the corresponding relation information needs to be further verified.
The calculation of the MD5 value is well known in the art and will not be described in detail herein.
After the sequence is carried out according to the time stamps, the operation speed is improved.
Because the initial node in the graph database corresponds to the source table in the relational database, if the corresponding relation information of the initial node is the same as the timestamp of the information in the source table and the MD5 value of the corresponding character string is the same, the data consistency of the corresponding information can be judged.
Specifically, the second checking module extracts second nodes connected with the initial nodes, counts the number of the second nodes,
if the number of the second nodes is larger than the number preset value, the second checking module calculates checking complexity of any second node, and determines a corresponding checking strategy according to the checking complexity;
and if the number of the second nodes is smaller than or equal to the number preset value, checking according to the checking mode of the initial node related information.
After the data consistency verification of the information of the initial node and the relational database is completed, the second node connected with the initial node is verified, and the number of the second nodes reflects the data verification complexity.
In this embodiment, for setting the number preset value, it can be calculated according to the following manner:
and counting the number of nodes connected with any node in the graph database, calculating the average value of the number of the nodes connected with the node, and taking the average value as a number preset value.
Specifically, the second verification module calculates the verification complexity F of any second node according to the following formula when the number of the second nodes is greater than a number preset value;
wherein N is the number of nodes connected to the second node, M is the number of directed edges connected to the second node, J is the number of undirected edges connected to the second node, and K is the number of additional directed edges connected to the second node.
According to the invention, the check complexity F is introduced, and is a characteristic parameter of the data check complexity, which is related to the number of nodes and the number of edges connected with the nodes.
Specifically, the second checking module calculates the sum A1 of the number N of nodes connected to the second node and the number J of undirected edges, sets a1=n+j, calculates the sum A2 of the number M of directed edges connected to the second node and the number K of directed edges attached thereto, sets a2=m+k, calculates the ratio σ of A1 and A2, sets σ=a1/A2, compares the ratio σ with the standard ratio σ0, and if σ > σ0, the second checking module determines to correct the checking complexity.
In this embodiment, the standard ratio is set to be 1.5 < sigma 0 < 2.
The complexity and the information content of the information corresponding to the nodes and the relation information corresponding to the undirected edges are generally higher than those of the information corresponding to the directed edges and the relation information corresponding to the attached directed edges, so that the invention calculates the sum A1 of the number N of the nodes and the number J of the undirected edges, the sum A2 of the number M of the directed edges connected with the second nodes and the number K of the attached directed edges, calculates the ratio sigma of the A1 and the A2, and re-evaluates the verification complexity of the second nodes through the ratio sigma to correct the calculated verification complexity F, thereby reflecting the verification complexity of the second nodes more objectively and accurately, leading the adopted verification strategy to be more targeted, and improving the accuracy of data consistency verification.
Specifically, the second checking module calculates the difference value between the ratio sigma and the standard ratio sigma 0, and a plurality of ways for correcting the checking complexity according to the difference value are arranged in the second checking module;
wherein, each correction mode is different in correction size for the check complexity.
Specifically, the second calibration module calculates a difference Δσ between the ratio σ and the standard ratio σ0, and sets Δσ=σ - σ0, the second calibration module compares the difference Δσ with a first preset difference Δσ1 and a second preset difference Δσ2, and determines a calibration mode for the calibration complexity according to the comparison result, where Δσ1 is smaller than Δσ2,
if Δσ is smaller than Δσ1, the second calibration module determines to adopt a first calibration mode, that is, uses a first calibration coefficient f1 to calibrate the calibration complexity to a corresponding value;
if Δσ1 is less than or equal to Δσ2, the second calibration module determines that a second calibration mode is adopted, that is, a second calibration coefficient f2 is used to calibrate the calibration complexity to a corresponding value;
if Δσ is greater than or equal to Δσ2, the second calibration module determines that a third calibration mode is adopted, that is, a third calibration coefficient f3 is used to calibrate the calibration complexity to a corresponding value;
when the second calibration module uses the kth calibration coefficient to calibrate the calibration complexity to the corresponding value, the calibrated calibration complexity F' = (1+fk) ×f is set, and F is the calculated calibration complexity.
Wherein 0.1 < f1 < f2 < f3 < 0.5, f1=0.2, f2=0.3, f3=0.4 are preferred in this embodiment.
In this embodiment, 2 < Δσ1 < 4,6 < Δσ2 < 8, and Δσ1=3 and Δσ2=7 are preferable.
Specifically, a check complexity comparison parameter F0 is arranged in the second check module, the second check module compares the corrected check complexity F' with the check complexity comparison parameter F0 to determine the check complexity level of any second node, and a corresponding check strategy is determined according to the check complexity level,
if F' is less than or equal to F0, the second verification module judges that the verification complexity of the corresponding second node is at a first verification complexity level, and controls the central control module to verify the consistency of the data by adopting a first verification strategy;
if F' > F0, the second verification module judges that the verification complexity of the corresponding second node is at a second verification complexity level, and controls the central control module to verify the consistency of the data by adopting a second verification strategy.
In this embodiment, the check complexity comparison parameter F0 is calculated when the value of N is equal to the number preset value, the value of M is equal to the average number of directional edges connected to any node, the value of J is equal to the average number of undirected edges connected to any node, and the value of K is equal to the average number of additional directional edges connected to any node, where the calculated check complexity is used as the check complexity comparison parameter F0.
According to the invention, the second node to be checked is selected to have a proper check strategy by setting the check complexity comparison parameter F0, and when the corrected check complexity is smaller than or equal to the check complexity comparison parameter, the first check strategy is adopted, so that the check efficiency is improved.
In particular, the central control module obtains the data entity in the source table, the data entity does not contain the data entity corresponding to the initial node so as to avoid repeated verification, and matches the data entity with the information of the second node,
if the data entity matched with the information of the second node exists, a corresponding two-source table of the data entity in the relational database is called, and the relational information of the second node is checked with the information in the corresponding two-source table;
if the unmatched information of the second node and the data entity exist, the relation information between the second node and the initial node and the time stamp of the data entity are recorded, hash values of the relation information and the data entity are calculated, and the hash values are stored in a database to be verified.
The two-source table is a two-dimensional table corresponding to the information of the second node in a relational database.
The relationship information of the second node is the relationship information of a plurality of edges connected with the second node.
The data is converted into character strings, hash values of the character strings are calculated, and the hash values are calculated as the mature prior art, and are not described herein.
Specifically, under the first verification policy, the central control module calculates the count value of the relationship information of the second node and the two source table of the corresponding data entity through the count function respectively, and if the error rate of the count value is less than or equal to 0.1, the central control module judges that the data consistency of the corresponding information meets the standard;
if the error rate of the count value is greater than 0.1, the central control module judges that the data consistency of the corresponding information does not accord with the standard.
The count value is calculated through the count function, which is a mature prior art and is not described in detail herein.
Specifically, the central control module performs slicing processing on the relation information of the second node and the information in the two-source table of the corresponding data entity under the second checking strategy, calculates the MD5 value of the information of any slice of the relation information for any relation information, calculates the average value of the MD5 values of the information of the slices, uses the average value as the MD5 value of the relation information, performs slicing processing on the information in any two-source table, uses the average MD5 value of the slice information as the MD5 value of the corresponding information in the two-source table,
if the MD5 value of the relation information is the same as the MD5 value of the information in the two-source table, the central control module judges that the data consistency of the corresponding information accords with the standard;
if the MD5 value of the relation information is different from the MD5 value of the information in the two-source table, the central control module judges that the corresponding information needs to be further checked.
When the corresponding information is further checked, the information consistency check of the next node of the graph database is performed.
When the information of each node in the graph database and the information in each two-dimensional table in the relational database are compared and checked, the information related to the data in the database to be checked can be extracted at any time, so that the coverage of the check is improved and the check efficiency is improved.
According to the invention, the node with larger check complexity adopts the second check strategy, the second check strategy performs slicing processing on the information, the data slicing can be performed according to different dimensions, attributes, time and the like, more refined and accurate data can be obtained, the accuracy of data check is improved, and the processing and analysis are more convenient through slicing processing, so that the load is reduced, and the efficiency of data check is improved.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A data consistency check system for a graph database and a relational database, comprising:
the node acquisition module is used for selecting an initial node in the graph database and positioning a source table of the initial node in the relational database;
the analysis module is connected with the node acquisition module and used for analyzing the relation information of a plurality of edges connected with the initial node, wherein the edges comprise directed edges, undirected edges and additional directed edges;
the first verification module is respectively connected with the node acquisition module and the analysis module and is used for verifying the initial node and the corresponding relation information with the information in the source table through the time stamp and the MD5 value;
the second verification module is respectively connected with the first verification module and the analysis module, and is used for acquiring the number of second nodes connected with the initial node, calculating the verification complexity of any second node under the condition that the number of the second nodes is larger than a number preset value, correcting the verification complexity through the number of the second nodes and the number of edges connected with the second nodes, and determining a corresponding verification strategy according to the comparison result of the corrected verification complexity and the verification complexity comparison parameter;
the central control module is respectively connected with the first check module and the second check module and is used for calling and collecting the data
The information of the second node is matched with the corresponding two source tables of the data entities in the source tables, the corresponding verification strategy is adopted to verify the relation information of the second node and the information in the corresponding two source tables, and the information of the unmatched second node and the data entities are stored in the database to be verified;
a database to be checked, which is connected with the second checking module and the central control module and is used for storing the graph database
Data which are not matched with the relational database;
the first verification module sorts the analyzed relation information according to the time stamp, sorts the information in the source table according to the time stamp, matches the sorted relation information with the information in the sorted source table, extracts the information with the same time stamp, calculates the character string MD5 value of each piece of information with the same time stamp,
if the relation information with the same time stamp is the same as the MD5 value of the character string corresponding to the information in the source table, the first verification module judges that the data consistency of the corresponding information accords with a standard;
if the relation information with the same time stamp is different from the MD5 value of the character string corresponding to the information in the source table, the first verification module judges that the corresponding relation information needs to be further verified;
the second checking module extracts second nodes connected with the initial nodes, counts the number of the second nodes,
if the number of the second nodes is larger than the number preset value, the second checking module calculates checking complexity of any second node, and determines a corresponding checking strategy according to the checking complexity;
if the number of the second nodes is smaller than or equal to the number preset value, checking according to the checking mode of the initial node related information;
the second checking module calculates checking complexity F of any second node according to the following formula under the condition that the number of the second nodes is larger than a number preset value;
wherein N is the number of nodes connected to the second node, M is the number of directed edges connected to the second node, J is the number of undirected edges connected to the second node, and K is the number of additional directed edges connected to the second node.
2. The system according to claim 1, wherein the second checking module calculates a sum A1 of the number N of nodes connected to the second node and the number J of undirected edges, sets a1=n+j, calculates a sum A2 of the number M of directed edges connected to the second node and the number K of directed edges attached thereto, sets a2=m+k, calculates a ratio σ of A1 and A2, sets σ=a1/A2, compares the ratio σ with a standard ratio σ0, and if σ > σ0, the second checking module determines to correct the checking complexity.
3. The system for verifying the data consistency of the graph database and the relational database according to claim 2, wherein the second verification module calculates a difference value between the ratio sigma and a standard ratio sigma 0, and a plurality of ways for correcting the verification complexity according to the difference value are arranged in the second verification module;
wherein, each correction mode is different in correction size for the check complexity.
4. The system for checking the consistency of data in graph databases and relational databases according to claim 3, wherein the second checking module is provided with a checking complexity comparison parameter F0, and the second checking module compares the corrected checking complexity F' with the checking complexity comparison parameter F0 to determine the checking complexity level of any second node, and determines the corresponding checking strategy according to the checking complexity level,
if F' is less than or equal to F0, the second verification module judges that the verification complexity of the corresponding second node is at a first verification complexity level, and controls the central control module to verify the consistency of the data by adopting a first verification strategy;
if F' > F0, the second verification module judges that the verification complexity of the corresponding second node is at a second verification complexity level, and controls the central control module to verify the consistency of the data by adopting a second verification strategy.
5. The system of claim 4, wherein the central control module obtains a data entity in the source table and matches the data entity with information of the second node,
if the data entity matched with the information of the second node exists, a corresponding two-source table of the data entity in the relational database is called, and the relational information of the second node is checked with the information in the corresponding two-source table;
if the unmatched information of the second node and the data entity exist, the relation information between the second node and the initial node and the time stamp of the data entity are recorded, hash values of the relation information and the data entity are calculated, and the hash values are stored in a database to be verified.
6. The system for verifying the data consistency of the graph database and the relational database according to claim 5, wherein the central control module calculates the count value of the relational information of the second node and the two source tables of the corresponding data entities respectively through a count function under the first verification policy, and if the error rate of the count value is less than or equal to 0.1, the central control module determines that the data consistency of the corresponding information meets the standard;
if the error rate of the count value is greater than 0.1, the central control module judges that the data consistency of the corresponding information does not accord with the standard.
7. The system of claim 6, wherein the central control module performs slicing processing on the relationship information of the second node and the information in the two source tables of the corresponding data entities under the second checking policy, calculates the MD5 value of the information of any slice of the relationship information for any one of the relationship information, calculates the average value of the MD5 values of the information of the slice, uses the average value as the MD5 value of the relationship information, performs slicing processing on the information in any two source table, uses the average MD5 value of the slice information as the MD5 value of the corresponding information in the two source tables,
if the MD5 value of the relation information is the same as the MD5 value of the information in the two-source table, the central control module judges that the data consistency of the corresponding information accords with the standard;
if the MD5 value of the relation information is different from the MD5 value of the information in the two-source table, the central control module judges that the corresponding information needs to be further checked.
CN202311002776.2A 2023-08-10 2023-08-10 Data consistency check system for graph database and relational database Active CN117251460B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311002776.2A CN117251460B (en) 2023-08-10 2023-08-10 Data consistency check system for graph database and relational database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311002776.2A CN117251460B (en) 2023-08-10 2023-08-10 Data consistency check system for graph database and relational database

Publications (2)

Publication Number Publication Date
CN117251460A CN117251460A (en) 2023-12-19
CN117251460B true CN117251460B (en) 2024-04-05

Family

ID=89125562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311002776.2A Active CN117251460B (en) 2023-08-10 2023-08-10 Data consistency check system for graph database and relational database

Country Status (1)

Country Link
CN (1) CN117251460B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012130489A1 (en) * 2011-04-01 2012-10-04 Siemens Aktiengesellschaft Method, system, and computer program product for maintaining data consistency between two databases
CN104346454A (en) * 2014-10-30 2015-02-11 上海新炬网络技术有限公司 Data consistency verification method based on Oracle database
CN106611001A (en) * 2015-10-26 2017-05-03 中兴通讯股份有限公司 Method, device and system for checking consistency of data in database tables in virtual machines
CN108280159A (en) * 2018-01-16 2018-07-13 云南大学 A method of converting chart database to relational database
CN109739831A (en) * 2018-11-23 2019-05-10 网联清算有限公司 Data verification method and device between database
WO2019178772A1 (en) * 2018-03-21 2019-09-26 深圳蓝贝科技有限公司 Database master-slave block consistency check method, device and system
CN114153820A (en) * 2021-12-07 2022-03-08 山东省齐鲁大数据研究院 Database migration checking method
WO2022063223A1 (en) * 2020-09-28 2022-03-31 华为技术有限公司 Data verification method, apparatus, and system
CN114969063A (en) * 2021-02-26 2022-08-30 中国电信股份有限公司 Database checking method and device and non-transitory computer readable storage medium
CN116431379A (en) * 2022-01-04 2023-07-14 青岛海尔科技有限公司 Data verification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748333B2 (en) * 2021-06-30 2023-09-05 Dropbox, Inc. Verifying data consistency using verifiers in a content management system for a distributed key-value database

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012130489A1 (en) * 2011-04-01 2012-10-04 Siemens Aktiengesellschaft Method, system, and computer program product for maintaining data consistency between two databases
CN104346454A (en) * 2014-10-30 2015-02-11 上海新炬网络技术有限公司 Data consistency verification method based on Oracle database
CN106611001A (en) * 2015-10-26 2017-05-03 中兴通讯股份有限公司 Method, device and system for checking consistency of data in database tables in virtual machines
CN108280159A (en) * 2018-01-16 2018-07-13 云南大学 A method of converting chart database to relational database
WO2019178772A1 (en) * 2018-03-21 2019-09-26 深圳蓝贝科技有限公司 Database master-slave block consistency check method, device and system
CN109739831A (en) * 2018-11-23 2019-05-10 网联清算有限公司 Data verification method and device between database
WO2022063223A1 (en) * 2020-09-28 2022-03-31 华为技术有限公司 Data verification method, apparatus, and system
CN114969063A (en) * 2021-02-26 2022-08-30 中国电信股份有限公司 Database checking method and device and non-transitory computer readable storage medium
CN114153820A (en) * 2021-12-07 2022-03-08 山东省齐鲁大数据研究院 Database migration checking method
CN116431379A (en) * 2022-01-04 2023-07-14 青岛海尔科技有限公司 Data verification method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Kim Beom-Heyn,Yoon Young;. Cloud Storage Service Architecture Providing the Eventually Consistent Totally Ordered Commit History of Distributed Key-Value Stores for Data Consistency Verification.ELECTRONICS.2021,第1-26页. *
余平 ; .电网调度自动化主备***间模型校验研究.电气技术.2017,(08),第105-109页. *
朱恒民,姬小利,***.支持数据挖掘的知识库***.西南交通大学学报.2005,(03),第406-411页. *
杨帆,张璨辉等;.基于B/S架构的企业资产数据一致性校验***.自动化技术与应用.2023,(第06期),第91-93页. *

Also Published As

Publication number Publication date
CN117251460A (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN110008254B (en) Transformer equipment standing book checking processing method
CN110827443B (en) Remote measurement post data processing system
US20090024607A1 (en) Query selection for effectively learning ranking functions
EP3896873A1 (en) Field intensity prediction method and apparatus, and device and storage medium
US8073652B2 (en) Method and system for pre-processing data using the mahalanobis distance (MD)
CN111881124A (en) Data processing method and system based on state estimation of improved algorithm
CN109597757B (en) Method for measuring similarity between software networks based on multidimensional time series entropy
CN113985339B (en) Error diagnosis method and system for intelligent ammeter, equipment and storage medium
CN110647913A (en) Abnormal data detection method and device based on clustering algorithm
CN112701675A (en) Distribution station user phase identification method and system based on screening voltage data
CN111589000A (en) Method for verifying parameters of medical linear accelerator
CN110991553B (en) BIM model comparison method
CN117251460B (en) Data consistency check system for graph database and relational database
CN112203324B (en) MR positioning method and device based on position fingerprint database
CN114819692A (en) Business risk analysis method, device, equipment and storage medium
CN117313222B (en) Building construction data processing method based on BIM technology
CN114661584A (en) Testing device for software testing and using method
CN1897242A (en) Method and system for calibrating semiconductor-device manufacture measuring tool
CN113554079B (en) Power load abnormal data detection method and system based on secondary detection method
CN109145258A (en) Weibull distribution parameter confidence interval estimation method based on nonlinear fitting
CN114676749A (en) Power distribution network operation data abnormity judgment method based on data mining
CN110487315B (en) System and method for analyzing instrument drift
CN114048819A (en) Power distribution network topology identification method based on attention mechanism and convolutional neural network
CN116187399B (en) Heterogeneous chip-based deep learning model calculation error positioning method
CN114116729B (en) Test data processing method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant