CN116701375A - Data real-time checking method and device, electronic equipment and storage medium - Google Patents

Data real-time checking method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116701375A
CN116701375A CN202310729761.XA CN202310729761A CN116701375A CN 116701375 A CN116701375 A CN 116701375A CN 202310729761 A CN202310729761 A CN 202310729761A CN 116701375 A CN116701375 A CN 116701375A
Authority
CN
China
Prior art keywords
data stream
target data
database
links
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310729761.XA
Other languages
Chinese (zh)
Inventor
李宏元
郑浩
侯鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Insurance Company of China
Original Assignee
Peoples Insurance Company of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Insurance Company of China filed Critical Peoples Insurance Company of China
Priority to CN202310729761.XA priority Critical patent/CN116701375A/en
Publication of CN116701375A publication Critical patent/CN116701375A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In the method, part or all of the data stream of the source database is used as the target data stream to be written into the storage data units of different links, and the verification database uses the abstract value corresponding to the target data stream written into the storage data units of different links to verify whether the storage data units of different links are successfully written into the target data stream or not, so that the purpose of verifying the target data streams written into the storage data units of multiple links in real time is finally achieved. The method comprises the following steps: obtaining a data stream from a source database; writing the target data stream into storage data units of different links; determining abstract values corresponding to target data streams written into storage data units of different links; and sending all the summary values to a checking database so that the checking database executes a checking data method.

Description

Data real-time checking method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data real-time verification, and in particular, to a data real-time verification method, apparatus, electronic device, and storage medium.
Background
Currently, a terminal device may provide a variety of services for a user to use, for example, the terminal device may provide services for the user to view, find data, and the like. In order to achieve the above object, the terminal device needs to continuously acquire data from the source database and store the data in a storage data unit called when implementing the service, where the process may involve a plurality of storage data units, and the storage data unit includes a message queue, a database, and the like. In order to ensure the real-time performance and accuracy of service data, the data written in a plurality of storage data units need to be checked, and the problem of writing the data is found in time.
In the related art, the method for checking the data in the two databases is an ETL (data warehouse technology) process, and the process checks the data by a manual comparison method, so that the problem of manual intervention exists, and human errors and omission easily occur.
Therefore, how to verify the data in multiple storage data units in real time is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The application provides a data real-time checking method, a device, electronic equipment and a storage medium, wherein part or all of data streams of a source database are used as target data streams to be written into storage data units of different links, and the checking database checks whether the storage data units of different links are successfully written into the target data streams by utilizing abstract values corresponding to the target data streams written into the storage data units of different links, so that the aim of real-time checking of the target data streams written into the storage data units of multiple links is finally achieved.
In a first aspect, an embodiment of the present application provides a method for checking data in real time, including:
obtaining a data stream from a source database; writing a target data stream into storage data units of different links, wherein the target data stream is a part or all of data streams; determining abstract values corresponding to target data streams written into storage data units of different links;
transmitting all the digest values to a collation database to cause the collation database to execute a collation data method, wherein the collation data method comprises: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, and counting the number of the abstract values; if the digest values are the same and the number of digest values is equal to the total number of links, it is determined that the stored data unit was all successfully written to the target data stream.
In some embodiments, the target data stream includes primary key information; the method further comprises the steps of:
and sending the primary key information to a check database, so that the check database receives the primary key information before executing the check data method, and preliminarily detecting whether the storage data units of different links are successfully written into the target data stream according to the primary key information.
In some embodiments, the step of verifying that the database performs the preliminary detection of whether the stored data units of different links are all successfully written to the target data stream based on the primary key information includes:
and if the primary key information sent by the storage data units of different links is the same, preliminarily determining that the storage data units of different links are successfully written into the target data stream.
In some embodiments, the step of verifying that the database performs the preliminary detection of whether the stored data units of different links are all successfully written to the target data stream based on the primary key information includes:
and if the primary key information sent by the storage data units of different links is different, or the number of the primary key information received by the check database is smaller than the total number of all links, repeating the step of acquiring the data stream from the source database.
In some embodiments, the target data stream further comprises field information; the step of determining the digest value corresponding to the target data stream written into the storage data unit of the different link includes:
detecting whether field information of target data streams written into storage data units of different links has intersection content or not;
if the intersection content exists, determining a digest value corresponding to the target data stream according to the primary key information and the intersection content;
and if the intersection content does not exist, determining a summary value corresponding to the target data stream according to the primary key information.
In some embodiments, the step of determining a digest value corresponding to the data stream comprises: and determining a digest value corresponding to the target data stream according to the primary key information.
In some embodiments, the method further comprises:
generating a unit identifier corresponding to the stored data unit;
and sending the unit identifier to the check database so that the check database executes the process of determining the storage data units which are not successfully written into the target data stream through the unit identifier if the number of the summary values is smaller than the total number of links.
In some embodiments, after performing the step of determining, by the element identification, that the stored data element of the target data stream was not successfully written to, the method further comprises: deleting the summary value and the unit identifier stored in the check database, and repeating the step of acquiring the data stream from the source data.
In some embodiments, the step of writing the target data stream into the stored data units of the different links includes:
writing the target data stream into a message queue;
and sending the target data stream in the message queue to different service databases.
In a second aspect, an embodiment of the present application further provides a data real-time checking device, including:
an acquisition unit for acquiring a data stream from a source database;
the writing unit is used for writing the target data stream into the storage data units of different links, wherein the target data stream is a part or all of the data stream;
a determining unit, configured to determine a digest value corresponding to the target data stream written into the storage data unit of the different links;
a transmitting unit configured to transmit all the digest values to a collation database so that the collation database performs a collation data method, wherein the collation data method comprises: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, and counting the number of the abstract values; if the digest values are the same and the number of digest values is equal to the total number of links, it is determined that the stored data unit was all successfully written to the target data stream.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the data stream real-time checking method when executing the computer program.
In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data stream real-time collation method.
The data real-time checking method can check whether the storage data units of a plurality of links are successfully written into the target data stream, shield the difference between databases and timely find the data stream problem. The method has the advantages that the mode of writing the data flow into the storage data unit in real time is adopted, so that the service response time of the terminal equipment is faster, and the method is beneficial to enterprises to respond to service demands more quickly; the real-time data processing and checking can automatically detect the abnormality and the error of the data stream, thereby reducing the data quality problem; real-time data acquisition, processing and checking are helpful for avoiding various data problems such as data delay and data errors, and effectively improving the data quality and accuracy; the collation database can support a large number of data flows, and the processing capacity can be easily extended to accommodate the increase in the amount of data delivered from the source database.
Drawings
FIG. 1 illustrates a flow chart providing a method of data real-time reconciliation in accordance with some embodiments;
FIG. 2 illustrates a data flow diagram provided in accordance with some embodiments;
FIG. 3 illustrates yet another data flow diagram provided in accordance with some embodiments;
FIG. 4 schematically illustrates a schematic diagram of yet another data live verification method provided in accordance with some embodiments;
FIG. 5 illustrates a schematic diagram of yet another data live verification method provided in accordance with some embodiments;
FIG. 6 illustrates a schematic diagram of yet another data live verification method provided in accordance with some embodiments;
fig. 7 schematically illustrates a structure of a data real-time collation apparatus provided according to some embodiments.
Detailed Description
For the purposes of making the objects and embodiments of the present application more apparent, an exemplary embodiment of the present application will be described in detail below with reference to the accompanying drawings in which exemplary embodiments of the present application are illustrated, it being apparent that the exemplary embodiments described are only some, but not all, of the embodiments of the present application.
It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
Currently, a terminal device may provide a variety of services for a user to use, for example, the terminal device may provide services for the user to view, find data, and the like. In order to achieve the above object, the terminal device needs to continuously acquire data from the source database and store the data in a storage data unit called when implementing the service, where the process may involve a plurality of storage data units, and the storage data unit includes a message queue, a database, and the like. In order to ensure the real-time performance and accuracy of service data, the data written in a plurality of storage data units need to be checked, and the problem of writing the data is found in time.
In the related art, the method for checking the data in the two databases is an ETL (data warehouse technology) process, and the process checks the data by a manual comparison method, so that the problem of manual intervention exists, and human errors and omission easily occur.
Therefore, how to verify the data in multiple storage data units in real time is a technical problem to be solved by those skilled in the art.
In order to solve the technical problems described above, an embodiment of the present application provides a data real-time checking method, in which part or all of a data stream of a source database is written as a target data stream into storage data units of different links, and a checking database checks whether the storage data units of different links are successfully written into the target data stream by using a digest value corresponding to the target data stream written into the storage data units of different links, so as to finally achieve the purpose of real-time checking of the target data streams written into the storage data units of multiple links.
Fig. 1 illustrates a flow chart providing a method of data real-time reconciliation in accordance with some embodiments. The method includes S100-S400.
S100, acquiring a data stream from a source database.
In the embodiment of the application, new data is continuously added to the source database, and the new data is continuously transmitted to the storage data unit in the form of data stream, so that the terminal equipment can realize service through the storage data unit. The embodiment of the application adopts the form of data stream to ensure the real-time property of the data obtained by the storage data unit.
And S200, writing the target data stream into storage data units of different links, wherein the target data stream is a part or all of data streams.
In the embodiment of the application, according to the actual service requirement, the data stream acquired from the source database may be partially written into the storage data unit or may be completely written into the storage data unit.
In the embodiment of the present application, the storage data unit may be a unit with a storage function, such as a message queue or a service database. By way of example, the message queue may be a kafka (distributed message queue) message queue, the service database may be a NoSql (non-relational) database, an Hbase (distributed storage system) database, an ES (elstincsearch, high expansion and open-source full-text search and analysis engine) database, or the like.
It should be noted that, the specific type of the service database may be set according to the service implemented by the terminal device, and the specific type of the message queue and the service database is not limited herein.
In some embodiments, the target data stream may have a condition where the stored data unit cannot be successfully written, and embodiments of the present application may check for the error condition.
In some embodiments, the step of writing the target data stream into the stored data units of the different links includes: writing the target data stream obtained from the source database into a message queue; and sending the target data streams in the message queue to different service databases respectively.
By way of example, fig. 2 illustrates a target data stream flow diagram provided in accordance with some embodiments. The storage data units include a kafka message queue, a NoSql database, an Hbase database, and an ES database. The source database sends the target data stream to a kafka message queue which sends the target data stream to the NoS ql database, hbase database, and ES database, respectively.
In the above example, the target data stream may be sent to the kafka message queue, the NoSql database, the Hbase database, and the ES database, i.e., there are four links. The four links are links corresponding to the kafka message queue, the NoSql database, the Hbase database and the ES database respectively.
Of course, in other examples, the target data stream may be sent to the kafka message queue, the first NoSql database, the Hbase database, the ES database, and the second NoSql database, i.e., there are five links.
It should be noted that, the specific setting of the links is determined according to the actual needs, and the embodiment of the application does not limit the total number of links.
In other embodiments, due to different processing manners of data required for implementing different services of the terminal device, there may be a case that the target data stream is not directly written into the service database corresponding to the service from the message queue directly communicated with the source database.
In one example, the stored data unit includes a first message queue, a first traffic database, a second message queue, and a second traffic database. The step of writing the target data stream into the storage data units of different links comprises the following steps: transmitting the target data stream to a first message queue; transmitting the target data stream in the first message queue to a first service database; transmitting the target data stream in the first service database to a second message queue; and writing the target data flow in the second message queue into a second service database.
For example, fig. 3 schematically illustrates yet another target data stream flow diagram provided in accordance with some embodiments. The source database sends the target data stream to a kafka message queue, which sends the target data stream to the Hbase database, which then sends the target data stream to a two-way kafka message queue, which sends the target data stream to the ES database. In the above example, the target data stream may be stored in the kafka message queue, the Hbase database, the two-way kafka message queue, and the ES database, i.e., there are four links. The four links respectively comprise links respectively corresponding to a kafka message queue, an Hbase database, two kafka message queues and an ES database.
In another example, the stored data unit includes a first message queue, a first traffic database, a second message queue, a second traffic database, a third message queue, and a third traffic database. The step of writing the target data stream into the storage data units of different links comprises the following steps: transmitting the database to a first message queue; writing the target data flow in the first message queue into a first service database; transmitting the target data stream in the first service database to a second message queue; writing the target data flow in the second message queue into a second service database; transmitting the target data stream in the second service database to a third message queue; and writing the target data flow in the third message queue into a third service database.
In the embodiment of the application, the arrangement modes of the message queues and the service databases are related to service requirements, the arrangement modes and the quantity are not limited, and the arrangement modes and the quantity are determined according to actual requirements.
The embodiment of the application is not limited to the above-mentioned process of transferring the target data stream between different storage data units, and other contents which are not contrary to the protection content of the application are all within the protection scope of the application.
S300, determining the abstract values corresponding to the target data streams written into the different link storage data units.
In the embodiment of the application, the content of the target data stream is different, and the calculated abstract value is also different. The content of the target data stream is the same, and the calculated abstract value is the same.
When written target data streams exist in storage data units of different links, the digest values can be calculated according to the target data streams, so that the condition that the number of the digest values can be multiple exists. Of course, if a storage data unit of a certain link fails to be written into the target data stream, a summary value corresponding to the storage data unit cannot be calculated.
In some embodiments, the step of calculating the digest value from the target data stream is performed by a predetermined program in the electronic device.
S400, transmitting all the summary values to a checking database so that the checking database executes a checking data method, wherein the checking data method comprises the following steps: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, and counting the number of the abstract values; if the digest values are the same and the number of digest values is equal to the total number of links, it is determined that the stored data unit was all successfully written to the target data stream.
In the embodiment of the application, all the summary values corresponding to the stored data units are sent to the check database. If the corresponding digest values exist for each of the plurality of stored data units, the plurality of digest values are all sent to the collation database.
In the embodiment of the application, the check database is provided with a check data method.
In the embodiment of the application, whether the digest values are the same is compared. And if the digest values are the same, indicating that the target data streams written into the storage data units of different links are the same. If the abstract values are different, the fact that the target data streams written into the storage data units of different links are different is indicated, at the moment, the step of acquiring the data streams from the source database can be re-executed from the beginning, then the target data streams are written into the storage data units of different links again, and after the target data streams are re-written, whether the target data streams are successfully written into all the storage data units can be continuously detected.
In the embodiment of the application, the quantity of the abstract values is counted. The relation between the number of digest values and the total number of links is compared.
When the number of digest values and the total number of links, i.e., the number of storage data units, are equal, it is determined that the storage data units were all successfully written to the target data stream. If the number of digest values is less than the total number of links, it is determined that there are stored data units that have not been written to the target data stream, at which point the step of retrieving the data stream from the source database may be re-performed and then the target data stream is re-written to the stored data units of a different link.
The embodiment of the application does not limit the execution sequence of the step of comparing the number of the summary values with the total number of links and the step of comparing whether the summary values are the same or not.
In some embodiments, the step of comparing whether the digest values are the same may be performed first by counting the number of digest values, comparing the number of digest values with the total number of links, and when the number of digest values counted and the total number of links are equal. It is also possible to perform the step of comparing whether the digest values are the same, and when all digest values are the same, counting the number of digest values, comparing the number of digest values and the total number of links.
In some embodiments, the target data stream includes primary key information; the method further comprises the steps of:
and sending the primary key information to a check database, so that the check database receives the primary key information before executing the check data method, and preliminarily detecting whether the storage data units of different links are successfully written into the target data stream according to the primary key information.
In the embodiment of the application, in order to more accurately determine whether the target data stream is successfully written into the storage data units of different links, the method is realized through two steps of the primary key information and the abstract value, and the accuracy of checking the data is greatly improved.
In some embodiments, FIG. 4 illustrates a flow chart with a method of data real-time reconciliation provided in accordance with some embodiments. The step of checking the database to preliminarily detect whether the storage data units of different links are successfully written into the target data stream according to the primary key information comprises the following steps:
s500, judging whether the primary key information sent by the storage data units of different links is the same, counting the number of the primary key information, and comparing the number of the primary key information with the total number of all links.
S600, if the main key information sent by the storage data units of different links is the same, and the number of the main key information is equal to the total number of all links, the storage data units of different links are preliminarily determined to be successfully written into the target data stream, and then the checking database continues to execute the data checking method.
And S700, if the primary key information sent by the storage data units of different links is different, or the number of the received primary key information is smaller than the total number of all links, re-executing the step of acquiring the data stream from the source database.
In one example, as shown in FIG. 3, the primary key information sent by the kafka message queue of the first link is Pri [ A ], the primary key information of the Hbase database of the second link is Pri [ A ], the primary key information sent by the two kafka message queues of the third link is Pri [ A ], and the primary key information sent by the ES database of the fourth link is Pri [ A ]. As the total of four links and the received primary key information of the first link to the fourth link are Pri [ A ], the storage data units of different links are preliminarily determined to be successfully written into the target data stream.
In another example, the main key information sent by the kafka message queue of the first link is Pri [ A ], the main key information of the Hbase database of the second link is Pri [ A ], the main key information sent by the two kafka message queues of the third link is Pri [ A ], and the main key information sent by the ES database of the fourth link is Pri [ B ]. Since the primary key information of the first three links is Pri [ A ], and the primary key information of the fourth link is Pri [ B ], it is determined that the primary key information sent by the storage data units of different links is different.
If the primary key information sent by the stored data units received by the collation database is different, it is explained that the target data streams written to the stored data units may not be the same, and an error may occur.
In yet another example, the primary key information sent by the kafka message queue of the first link is received by the collation database is Pri [ a ], the primary key information of the Hbase database of the second link is Pri [ a ], the primary key information sent by the two kafka message queues of the third link is Pri [ a ], the primary key information sent by the ES database of the fourth link is not received, and since the number of primary key information is three and the total number of all links is four, it is determined that the primary key information sent by all storage data units is not received by the collation database.
If the primary key information sent by all of the stored data units is not received by the collation database, it is indicated that the target data stream may not be successfully written into the stored data units.
In the embodiment of the application, if the check database can receive the primary key information sent by all the storage data units, the target data stream is written into the storage data units, and if the primary key information received by the check database is the same, the target data stream written into the storage data units is preliminarily judged to be possibly the same. When it is determined that the primary key information sent by the storage data units of different links is different, or the number of the received primary key information is smaller than the total number of all links, which indicates that there are storage data units which are not successfully written into the target data stream, the step of obtaining the data stream from the source database is re-executed at this time, and the target data stream is re-written into the storage data units.
In other embodiments, the step of initially detecting whether the stored data units of the different links are successfully written to the target data stream based on the primary key information is performed before the step of determining the digest value corresponding to the target data stream written to the stored data units of the different links.
If the fact that the target data stream is not successfully written into the storage data units of different links is detected preliminarily according to the primary key information, determining the abstract value corresponding to the target data stream written into the storage data units of different links is executed. If the fact that all the storage data units of different links are successfully written into the target data stream is detected preliminarily according to the primary key information, the method continues to determine the abstract value corresponding to the target data stream written into the storage data units of different links.
In some embodiments, fig. 5 illustrates a flow chart of yet another data real-time reconciliation method provided in accordance with some embodiments. The target data stream further includes field information; the step of determining a digest value corresponding to the target data stream includes:
s301, detecting whether field information of target data streams written into storage data units of different links has intersection content. The intersection content refers to the same content in the field information.
In one example, as shown in fig. 3, the kafka message queue of the first link sends field information of Attr [ b\c\d\e\f ], the Hbase database of the second link sends field information of Attr [ b\c\d\e ], the two kafka message queue of the third link sends field information of Attr [ b\c\d\e ], and the ES database of the fourth link sends field information of Attr [ b\c\e ]. The field information of the target data stream written into the storage data units of different links has intersection content, namely the intersection content is Attr [ B\C\E ].
In another example, as shown in fig. 3, the field information sent by the kafka message queue of the first link is Attr [ b\c\d\e\f ], the field information of the Hbase database of the second link is Attr [ b\c\d\e ], the field information sent by the two kafka message queues of the third link is Attr [ b\c\d\e ], and the field information sent by the ES database of the fourth link is Attr [ G ], and it is determined that the intersection content does not exist in the field information of the target data stream written into the storage data unit of the different link.
S302, if the intersection content exists, determining a digest value corresponding to the target data stream according to the primary key information and the intersection content.
For example, the primary key information of the target data stream written into the storage data units of the four links is Pri [ A ], the intersection content is Attr [ B\C\E ], and the primary key information and the intersection content are combined at the moment to obtain the data A\B\C\E to be processed. To-be-processed data A\B\C\E, a digest value (md 5 value) is calculated.
The method for calculating the abstract value in the embodiment of the application is not limited, and the abstract values calculated according to different contents are different.
It can be understood that, in the embodiment of the present application, the larger the data size of the data to be processed is, the higher the accuracy of the verification result is, that is, the higher the accuracy of judging whether the target data stream is successfully written into the storage data unit. Therefore, the embodiment of the application obtains the intersection content of the field information of the target data stream written into the storage data units of different links, and if the intersection content exists, the intersection content and the primary key information are utilized to jointly calculate the abstract value, so that the accuracy of the checking result can be improved.
S303, if the intersection content does not exist, determining a digest value corresponding to the target data stream according to the primary key information.
For example, the primary key information written into the target data stream of the storage data unit of the four links is Pri [ A ], and no intersection content exists, and the data A to be processed is determined according to the primary key information Pri [ A ]. To-be-processed data A, a digest value is calculated.
In other embodiments, the step of determining a digest value corresponding to the target data stream includes: and directly determining the abstract value corresponding to the target data stream according to the primary key information. In the embodiment of the application, the step of detecting whether the field information sent by the storage data units in different links has intersection content is not executed, and the main key information is directly utilized to determine the abstract value corresponding to the target data stream.
In some embodiments, fig. 6 illustrates a flow chart of yet another data real-time reconciliation method provided in accordance with some embodiments. The method further comprises the steps of:
s800, generating a unit identifier corresponding to the stored data unit. In the embodiment of the application, the unit identifier is a unique identifier of a storage data unit and corresponds to the storage data unit one by one.
In some embodiments, the unit identification may include storing the data unit names and the link order. Illustratively, att r [ es,4], where es is the storage data unit name, 4 is the fourth link, i.e., link order is 4.
S900, sending the unit identifiers to the check database so that the check database can execute the process of determining the storage data units which are not successfully written into the target data stream through the unit identifiers if the number of the summary values is smaller than the total number of links.
In the embodiment of the application, in order to facilitate searching the storage data units which are not successfully written into the target data stream, the sending unit identifies the storage data units in the checking database. And searching the storage data unit corresponding to the missing abstract value according to the unit identifier. Illustratively, the unit identifiers received by the collation database include Attr [ kafka,1], attr [ Hbase,2] and Attr [ kafka2,3], but in practice the total number of links is 4, so the unit identifier of the stored data unit of link order 4 is absent, and it is determined that the stored data unit of link order 4 is not successfully written to the target data stream.
In some embodiments, after performing the step of determining, by the element identification, that the stored data element of the target data stream was not successfully written, further comprising:
deleting the summary value and the unit identifier stored in the check database, and repeating the step of acquiring the data stream from the source data.
In the embodiment of the application, the content for verification in the verification database is deleted because of the stored data unit which is not successfully written into the target data stream, and the content comprises the stored abstract value and the unit identifier. Then, the steps of acquiring the data stream from the source data are repeatedly executed, and then the target data stream is written into the storage data units of different links.
In some embodiments, if after the process is repeatedly performed, it is still determined that there are stored data units that have not been successfully written to the target data stream, the process may be continuously repeatedly performed until the number of repetitions reaches a preset number, which may be 3, for example, and when the number of repetitions reaches 3, the process is not repeatedly performed.
In some embodiments, when it is determined that there is a stored data unit that has not been successfully written to the target data stream, a prompt report may be sent to the terminal of the relevant person, so that after the relevant person looks up the prompt report on the terminal, the whole process of writing the target data stream to the stored data unit is checked, a specific problem is found and improved.
The data real-time checking method of the embodiment of the application can be realized based on hadoop (big data software system operation framework) components.
In the embodiment of the application, whether the storage data units of a plurality of links are successfully written into the target data stream can be checked, the difference between databases is shielded, and the data stream problem is found in time. In the embodiment of the application, the mode of writing the data flow into the storage data unit in real time is adopted, so that the service response time of the terminal equipment is faster, and the method is beneficial to the enterprises to respond to the service demands more quickly; the real-time data processing and checking can automatically detect the abnormality and the error of the data stream, thereby reducing the data quality problem; real-time data acquisition, processing and checking are helpful for avoiding various data problems such as data delay and data errors, and effectively improving the data quality and accuracy; the collation database can support a large number of data flows, and the processing capacity can be easily extended to accommodate the increase in the amount of data delivered from the source database.
In the above embodiment, the method for checking data in real time is provided, in which part or all of the data stream of the source database is written into the storage data units of different links as the target data stream, and the checking database uses the digest value corresponding to the target data stream written into the storage data units of different links to check whether the storage data units of different links are successfully written into the target data stream, so as to finally achieve the purpose of checking the target data streams written into the storage data units of multiple links in real time. The method comprises the following steps: obtaining a data stream from a source database; writing a target data stream into storage data units of different links, wherein the target data stream is a part or all of data streams; determining abstract values corresponding to target data streams written into storage data units of different links; transmitting all the digest values to a collation database to cause the collation database to execute a collation data method, wherein the collation data method comprises: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, counting the number of the abstract values, and comparing the number of the abstract values with the total number of links; if all the digest values are the same and the number of digest values is equal to the total number of links, then it is determined that the stored data unit was all successfully written to the target data stream.
Further, as a specific implementation of the methods of fig. 1 and fig. 4-6, an embodiment of the present application provides a structure schematic diagram of a data real-time checking device, as shown in fig. 7, where the device includes:
an obtaining unit 701, configured to obtain a data stream from a source database;
a writing unit 702, configured to write a target data stream into storage data units of different links, where the target data stream is a part or all of the data streams;
a determining unit 703, configured to determine digest values corresponding to the target data streams written into the storage data units of different links;
a transmitting unit 704, configured to transmit all the digest values to a collation database, so that the collation database performs a collation data method, wherein the collation data method comprises: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, counting the number of the abstract values, and comparing the number of the abstract values with the total number of links; if all the digest values are the same and the number of digest values is equal to the total number of links, then it is determined that the stored data unit was all successfully written to the target data stream.
In a specific application scenario, the apparatus further includes:
The detection unit is used for detecting whether field information of the target data stream written into the storage data units of different links has intersection content or not;
a first determining unit configured to determine a digest value corresponding to the target data stream based on the primary key information and the intersection content if the intersection content exists;
and a second determining unit configured to determine a digest value corresponding to the target data stream according to the primary key information if the intersection content does not exist.
It should be noted that, other corresponding descriptions of each functional unit related to the data real-time checking device provided in this embodiment may refer to corresponding descriptions in fig. 1, fig. 4-6 and fig. 1, and are not repeated here.
In a specific application scenario, the apparatus further includes:
a generating unit, configured to generate a unit identifier corresponding to the stored data unit;
and the comparison unit is used for sending the unit identification to the check database so that the check database can execute the process of determining the storage data units which are not successfully written into the target data stream through the unit identification if the number of the summary values is smaller than the total number of the links.
It should be noted that, other corresponding descriptions of each functional unit related to the data real-time checking device provided in this embodiment may refer to corresponding descriptions in fig. 1, fig. 4-6 and fig. 1, and are not repeated here.
Based on the above methods shown in fig. 1 and fig. 4-6, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, where the program is executed by a processor to implement the data real-time collation method shown in fig. 1 and fig. 4-6.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing an electronic device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present application.
Based on the methods shown in fig. 1 and fig. 4-6 and the virtual device embodiment shown in fig. 7, in order to achieve the above objects, the embodiment of the present application further provides an entity device for checking data in real time, which may specifically be a computer, a smart phone, a tablet computer, a smart watch, a server, or a network device, where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the data real-time collation method as described above and shown in fig. 1, 4-6.
Optionally, the physical device may further include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be appreciated by those skilled in the art that the structure of the electronic device provided in this embodiment is not limited to the electronic device, and may include more or fewer components, or may be combined with certain components, or may be arranged with different components.
The storage medium may further include an operating device and a network communication module. The operating means is a program that manages and saves electronic device hardware and software resources, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all the controls in the storage medium and communication with other hardware and software in the entity equipment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of one preferred implementation scenario and that elements or processes in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that elements of an apparatus in an implementation may be distributed throughout the apparatus in an implementation as described in the implementation, or that corresponding variations may be located in one or more apparatuses other than the present implementation. The units of the implementation scenario may be combined into one unit, or may be further split into a plurality of sub-units.
The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.

Claims (10)

1. A method for data real-time collation, comprising:
obtaining a data stream from a source database;
writing a target data stream into storage data units of different links, wherein the target data stream is a part or all of data streams;
determining abstract values corresponding to target data streams written into storage data units of different links;
Transmitting all the digest values to a collation database to cause the collation database to execute a collation data method, wherein the collation data method comprises: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, counting the number of the abstract values, and comparing the number of the abstract values with the total number of links; if all the digest values are the same and the number of digest values is equal to the total number of links, then it is determined that the stored data unit was all successfully written to the target data stream.
2. The method of claim 1, wherein the target data stream includes primary key information; the method further comprises the steps of:
and sending the primary key information to a check database, so that the check database receives the primary key information before executing the check data method, and preliminarily detecting whether the storage data units of different links are successfully written into the target data stream according to the primary key information.
3. The method of claim 2, wherein the step of checking the database to perform the preliminary detection of whether the stored data units of the different links are all successfully written to the target data stream based on the primary key information comprises: and if the primary key information in the data stream sent by the storage data units of different links is the same, preliminarily determining that the storage data units of different links are successfully written into the target data stream.
4. The method of claim 2, wherein the step of checking the database to perform the preliminary detection of whether the stored data units of different links are all successfully written to the target data stream based on the primary key information further comprises:
and if the primary key information sent by the storage data units of different links is different, or the number of the primary key information received by the check database is smaller than the total number of all links, repeating the step of acquiring the data stream from the source database.
5. The method of claim 1, wherein the target data stream further comprises field information; the step of determining the digest value corresponding to the target data stream written into the storage data unit of the different link includes:
detecting whether field information of target data streams written into storage data units of different links has intersection content or not;
if the intersection content exists, determining a digest value corresponding to the target data stream according to the primary key information and the intersection content;
and if the intersection content does not exist, determining a summary value corresponding to the target data stream according to the primary key information.
6. The method as recited in claim 1, further comprising:
generating a unit identifier corresponding to the stored data unit;
and sending the unit identifier to the check database so that the check database executes the process of determining the storage data units which are not successfully written into the target data stream through the unit identifier if the number of the summary values is smaller than the total number of links.
7. The method of claim 6, further comprising, after performing the step of determining, by the element identification, that the stored data element of the target data stream was not successfully written to,:
deleting the summary value and the unit identifier stored in the check database, and repeating the step of acquiring the data stream from the source data.
8. A data real-time collation apparatus, comprising:
an acquisition unit for acquiring a data stream from a source database;
the writing unit is used for writing the target data stream into the storage data units of different links, wherein the target data stream is a part or all of data streams;
a determining unit, configured to determine a digest value corresponding to the target data stream written into the storage data unit of the different links;
A transmitting unit configured to transmit all the digest values to a collation database so that the collation database performs a collation data method, wherein the collation data method comprises: storing a digest value corresponding to the target data stream; comparing whether the abstract values are the same or not, counting the number of the abstract values, and comparing the number of the abstract values with the total number of links; if all the digest values are the same and the number of digest values is equal to the total number of links, then it is determined that the stored data unit was all successfully written to the target data stream.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the data stream real-time collation method according to any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the data stream real-time collation method according to any one of claims 1 to 7.
CN202310729761.XA 2023-06-19 2023-06-19 Data real-time checking method and device, electronic equipment and storage medium Pending CN116701375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310729761.XA CN116701375A (en) 2023-06-19 2023-06-19 Data real-time checking method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310729761.XA CN116701375A (en) 2023-06-19 2023-06-19 Data real-time checking method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116701375A true CN116701375A (en) 2023-09-05

Family

ID=87830945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310729761.XA Pending CN116701375A (en) 2023-06-19 2023-06-19 Data real-time checking method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116701375A (en)

Similar Documents

Publication Publication Date Title
CN110704231A (en) Fault processing method and device
CN109474578A (en) Message method of calibration, device, computer equipment and storage medium
US9639444B2 (en) Architecture for end-to-end testing of long-running, multi-stage asynchronous data processing services
CN110188103A (en) Data account checking method, device, equipment and storage medium
CN110196759B (en) Distributed transaction processing method and device, storage medium and electronic device
CN107133233B (en) Processing method and device for configuration data query
CN111815169A (en) Business approval parameter configuration method and device
CN111767350A (en) Data warehouse testing method and device, terminal equipment and storage medium
CN109408361A (en) Monkey tests restored method, device, electronic equipment and computer readable storage medium
CN116069838A (en) Data processing method, device, computer equipment and storage medium
CN110380890A (en) A kind of CDN system service quality detection method and system
CN101510172B (en) Test system and method
CN112882957A (en) Test task validity checking method and device
CN109582578A (en) System, method, computer-readable medium and the electronic equipment of software test case
CN116701375A (en) Data real-time checking method and device, electronic equipment and storage medium
CN115481026A (en) Test case generation method and device, computer equipment and storage medium
CN110532186B (en) Method, device, electronic equipment and storage medium for testing by using verification code
WO2019062007A1 (en) Data transmission method and apparatus, terminal device, and medium
CN109740027B (en) Data exchange method, device, server and storage medium
CN113159537A (en) Evaluation method and device for new technical project of power grid and computer equipment
CN111639936A (en) Transaction information acquisition method and device, electronic equipment and readable storage medium
CN117076291A (en) Service testing method, device, computer equipment and storage medium
CN106528577B (en) Method and device for setting file to be cleaned
CN110196862B (en) Data scene construction method, device, server and system
CN116010349B (en) Metadata-based data checking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination