CN115374137A

CN115374137A - Stream data processing method, device, storage medium and equipment

Info

Publication number: CN115374137A
Application number: CN202211032717.5A
Authority: CN
Inventors: 范佳佳; 文国军; 刘美花; 夏鼎玺; 余静莹; 张海洋
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-11-22

Abstract

The application discloses a stream data processing method, a stream data processing device, a storage medium and a stream data processing device, which are applied to the field of big data, wherein the method comprises the following steps: acquiring streaming data from a preset message queue; processing each data to obtain a processing result of each data; determining a first check code of the data based on a field contained in the data; carrying out check code conversion on the processing result to obtain a second check code of the data; under the condition that the first check code is different from the second check code, correcting the processing result to obtain a correction result meeting the preset requirement; and storing the data and the correction result into a target database. The method comprises the steps of comparing a first check code and a second check code of data shown by stream data, and correcting a processing result of the data under the condition that the first check code is different from the second check code to obtain a correction result meeting a preset requirement, so that the processed data is consistent with the data before processing.

Description

Stream data processing method, device, storage medium and equipment

Technical Field

The present application relates to the field of big data, and in particular, to a stream data processing method, apparatus, storage medium, and device.

Background

At present, when the amount of data is increased rapidly, a large amount of service data is generated in various service scenes at every moment, and in the process of processing the continuously generated data, how to ensure the consistency of the data before and after processing because the processed data cannot be landed in each database according to the service requirements is avoided, which becomes a problem to be solved urgently in the field.

Disclosure of Invention

The application provides a stream data processing method, a stream data processing device, a storage medium and a stream data processing device, and aims to ensure that processed data and data before processing are consistent.

In order to achieve the above object, the present application provides the following technical solutions:

a streaming data processing method, comprising:

acquiring streaming data from a preset message queue; the streaming data comprises a sequence of data; the data sequence comprises a plurality of data uploaded to the preset message queue through a source database, and each data is sequenced according to the uploading sequence of the data;

processing each data to obtain a processing result of each data;

determining a first check code of the data based on a field contained in the data;

performing check code conversion on the processing result to obtain a second check code of the data;

under the condition that the first check code is different from the second check code, correcting the processing result to obtain a correction result meeting a preset requirement; the preset requirements are as follows: a second check code obtained by performing check code conversion on the correction result is consistent with the first check code;

and storing the data and the correction result into a target database.

Optionally, the processing the data to obtain a processing result of each data includes:

distributing a time stamp and a watermark to each data; the timestamp indicates a processing time of the data; the watermark indicates a delay time in processing the data;

and processing each data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data to obtain the processing result of each data.

Optionally, the processing the data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data, to obtain the processing result of each data, includes:

deleting data meeting preset conditions in each data to obtain effective stream data; the preset conditions are as follows: the field value of the data is null, and the field of the data contains preset sensitive characters;

and processing each data in the effective stream data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data to obtain the processing result of each data.

Optionally, the processing each data in the valid stream data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data, to obtain a processing result of each data, includes:

performing dimensionality reduction on each data in the effective stream data to remove a redundant attribute column of each data in the effective stream data to obtain target stream data;

and processing each data in the target stream data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data to obtain the processing result of each data.

Optionally, the processing each piece of data in the target stream data according to the sequence of the processing time from morning to evening and after delaying the delay time of each piece of data, to obtain a processing result of each piece of data, includes:

classifying each data in the target stream data to obtain a plurality of data packets; the data packet comprises a plurality of data with the same preset attribute;

and for each data packet, sequentially processing each data in the data packet according to the sequence of the processing time from morning to evening and after delaying the delay time of each data, so as to obtain the processing result of each data in the data packet.

Optionally, the determining a first check code of the data based on the field included in the data includes:

splicing all fields contained in the data to obtain a character string of the data;

and carrying out check code conversion on the character string to obtain a first check code of the data.

Optionally, after performing check code conversion on the processing result to obtain a second check code of the data, the method further includes:

and under the condition that the first check code is the same as the second check code, directly storing the data and the processing result into the target database.

A stream data processing apparatus comprising:

the acquisition unit is used for acquiring streaming data from a preset message queue; the streaming data comprises a sequence of data; the data sequence comprises a plurality of data uploaded to the preset message queue through a source database, and each data is sequenced according to the uploading sequence of the data;

the processing unit is used for processing each data to obtain a processing result of each data;

a determining unit, configured to determine a first check code of the data based on a field included in the data;

the conversion unit is used for carrying out check code conversion on the processing result to obtain a second check code of the data;

the correction unit is used for correcting the processing result under the condition that the first check code is different from the second check code to obtain a correction result meeting the preset requirement; the preset requirements are as follows: a second check code obtained by performing check code conversion on the correction result is consistent with the first check code;

and the storage unit is used for storing the data and the correction result into a target database.

A computer-readable storage medium comprising a stored program, wherein the program executes the streaming data processing method.

A stream data processing apparatus comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing programs, and the processor is used for running the programs, wherein the program executes the streaming data processing method during running.

According to the technical scheme, the streaming data is obtained from the preset message queue. And processing each data to obtain a processing result of each data. Based on the fields contained in the data, a first check code of the data is determined. And carrying out check code conversion on the processing result to obtain a second check code of the data. And under the condition that the first check code is different from the second check code, correcting the processing result to obtain a correction result meeting the preset requirement. And storing the data and the correction result into a target database. According to the method and the device, the first check code and the second check code of the data shown by the stream data are compared, and under the condition that the first check code and the second check code are different, the processing result of the data is corrected to obtain the correction result meeting the preset requirement, so that the processed data and the data before processing are kept consistent.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a schematic flowchart of a stream data processing method according to an embodiment of the present application;

fig. 1b is a schematic flowchart of a stream data processing method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another streaming data processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of an architecture of a stream data processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1a and fig. 1b, a schematic flow chart of a stream data processing method provided in an embodiment of the present application includes the following steps:

s101: and acquiring the streaming data from a preset message queue.

Wherein the streaming data comprises a sequence of data; the data sequence comprises a plurality of data uploaded to a preset message queue through a source database, and each data is sequenced according to the uploading sequence of the data. The predetermined message queue includes, but is not limited to, kafka.

It should be noted that the stream data includes a data sequence, the data sequence includes a plurality of data, and each data is sorted from early to late according to the time stored in the preset message queue.

Optionally, the DataStream interface provided by the Flink framework may be used as a message consumer of the preset message queue, and the DataStream interface may be called to obtain stream data from the preset message queue.

S102: each data is assigned a timestamp (timestamp) and a watermark (watermark).

Wherein the time stamp indicates a processing time of the data and the watermark indicates a delay time when the data is processed.

It should be noted that each data shown in the stream data may be assigned a time stamp and a watermark by using the Flink framework.

S103: and deleting the data meeting the preset conditions in each data to obtain the effective stream data.

Wherein the preset conditions are as follows: the field value of the data is null and the field of the data contains predetermined sensitive characters.

Note that the data includes fields and field values.

Optionally, a fliter operator provided by the Flink framework may be called, and data meeting a preset condition in each data shown by the stream data is deleted, so as to obtain the effective stream data. Specifically, in the algorithm logic provided by the fliter operator, if the data does not meet the preset condition, that is, the result of the fliter judgment of the data is true, the data is retained, and if the data meets the preset condition, that is, the result of the fliter judgment of the data is a flash is determined, the data is deleted.

S104: and performing dimension reduction on each data in the effective stream data to remove the redundant attribute column of each data in the effective stream data to obtain target stream data.

The redundant attribute column is an extra attribute column additionally contained in the data structure.

Optionally, the data structure of each data in the valid stream data may be adjusted by using a fltmap operator provided by the Flink framework, so as to provide an attribute column that is not pre-selected in the data structure of each data, and obtain the target stream data. Specifically, the method of collecting provided by the Flink framework is usually used to select the attribute columns in the data structure, and the remaining unselected attribute columns are regarded as redundant attribute columns and removed.

S105: and classifying each data in the target stream data to obtain a plurality of data packets.

The data packet comprises a plurality of data with the same preset attribute.

Specifically, for example, transaction data (i.e. a specific representation of the data) may be divided into the same data packets, wherein the transaction data is the same as the transaction initiator (i.e. a specific representation of the preset attribute).

Optionally, the data shown in the target stream data may be classified by using a keyby operator provided by the Flink frame, so as to obtain a plurality of data packets.

S106: and for each data packet, processing each data in the data packet according to the sequence of the processing time from morning to evening and after delaying the delay time of each data, and obtaining the processing result of each data in the data packet.

The specific implementation process of processing data is common knowledge familiar to those skilled in the art, and is not described herein again.

Optionally, each data in the data packet may be processed by using a process operator provided by the Flink framework, so as to improve the processing efficiency of the data.

S107: and for each data in the data packet, splicing all fields contained in the data to obtain a character string of the data.

The fields included in the data may be spliced according to a preset splicing order (an order from front to back of the positions of the fields in the data). Specifically, the data includes a field AA, a field BB, and a field CC, where the position of the field AA is the most front, the position of the field BB is centered, and the position of the field CC is the back, and the character string obtained by splicing is AABBCC.

S108: and carrying out check code conversion on the character string of the data to obtain a first check code of the data.

The MD5 message digest algorithm may be used to perform check code (checksum) conversion on the character string of the data, so as to obtain a first check code of the data.

S109: and carrying out check code conversion on the processing result of the data to obtain a second check code of the data.

S110: and under the condition that the first check code is different from the second check code, correcting the processing result of the data to obtain a correction result meeting the preset requirement.

Wherein the preset requirements are as follows: the second check code obtained by converting the check code of the correction result is consistent with the first check code.

Optionally, under the condition that the first check code is the same as the second check code, the data and the processing result are directly stored in the target database.

S111: and storing the data and the correction result into a target database.

Based on the above-mentioned process shown in S101-S111, it is able to improve the security of data transmission through the Flink framework, and ensure that the data in the target database is the same as the data in the source database, and for massive data generated by many service systems, it is able to improve the security of the data in the source database.

In summary, in this embodiment, the first check code and the second check code of the data shown in the stream data are compared, and the processing result of the data is corrected under the condition that the first check code is different from the second check code, so as to obtain a correction result meeting the preset requirement, so that the processed data is consistent with the data before processing.

It should be noted that, in the foregoing embodiment, the step S101 is an optional implementation manner of the stream data processing method in the embodiment of the present application. In addition, S107 mentioned in the foregoing embodiment is also an optional implementation manner of the stream data processing method shown in the embodiment of the present application. For this reason, the flow mentioned in the above embodiment can be summarized as the method shown in fig. 2.

As shown in fig. 2, a schematic flow chart of another streaming data processing method provided in the embodiment of the present application includes the following steps:

s201: and acquiring the stream data from a preset message queue.

Wherein the streaming data comprises a sequence of data; the data sequence comprises a plurality of data uploaded to a preset message queue through a source database, and each data is sequenced according to the uploading sequence of the data.

S202: and processing each data to obtain a processing result of each data.

S203: based on the fields contained in the data, a first check code of the data is determined.

S204: and carrying out check code conversion on the processing result to obtain a second check code of the data.

S205: and under the condition that the first check code is different from the second check code, correcting the processing result to obtain a correction result meeting the preset requirement.

Wherein the preset requirements are as follows: and the second check code obtained by performing check code conversion on the correction result is consistent with the first check code.

S206: and storing the data and the correction result into a target database.

It should be noted that the streaming data processing method provided by the invention can be used in the fields of artificial intelligence, block chaining, distribution, cloud computing, big data, internet of things, mobile internet, network security, chip, virtual reality, augmented reality, holography, quantum computing, quantum communication, quantum measurement, digital twinning, and finance. The above is merely an example, and the application field of the stream data processing method provided by the present invention is not limited.

The streaming data processing method provided by the invention can be used in the financial field or other fields, for example, can be used in transaction application scenes in the financial field. The other fields are arbitrary fields other than the financial field, for example, the electric power field. The above is merely an example, and the application field of the stream data processing method provided by the present invention is not limited.

Corresponding to the streaming data processing method provided by the embodiment of the present application, the embodiment of the present application further provides a streaming data processing apparatus.

As shown in fig. 3, a schematic architecture diagram of a stream data processing apparatus according to an embodiment of the present application is provided, including:

an obtaining unit 100, configured to obtain stream data from a preset message queue; the stream data comprises a sequence of data; the data sequence comprises a plurality of data uploaded to a preset message queue through a source database, and each data is sequenced according to the uploading sequence of the data.

The processing unit 200 is configured to process each data to obtain a processing result of each data.

Optionally, the processing unit 200 is specifically configured to: distributing a time stamp and a watermark to each data; the timestamp indicates a processing time of the data; the watermark indicates a delay time when the data is processed; and processing the data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data to obtain the processing result of each data.

The processing unit 200 is specifically configured to: deleting data meeting preset conditions in each data to obtain effective stream data; the preset conditions are as follows: the field value of the data is null, and the field of the data contains preset sensitive characters; and processing each data in the effective stream data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data to obtain the processing result of each data.

The processing unit 200 is specifically configured to: reducing the dimension of each data in the effective stream data to remove the redundant attribute column of each data in the effective stream data to obtain target stream data; and processing each data in the target stream data according to the sequence of the processing time from morning to evening and after delaying the delay time of each data, and obtaining the processing result of each data.

The processing unit 200 is specifically configured to: classifying each data in the target stream data to obtain a plurality of data packets; the data packet comprises a plurality of data with the same preset attribute; and for each data packet, sequentially processing each data in the data packet after delaying the delay time of each data according to the sequence of the processing time from morning to evening to obtain the processing result of each data in the data packet.

A determining unit 300, configured to determine a first check code of the data based on a field included in the data.

Optionally, the determining unit 300 is specifically configured to: splicing all fields contained in the data to obtain a character string of the data; and carrying out check code conversion on the character string to obtain a first check code of the data.

A conversion unit 400, configured to perform check code conversion on the processing result to obtain a second check code of the data.

A correcting unit 500, configured to correct the processing result to obtain a correction result meeting a preset requirement when the first check code is different from the second check code; the preset requirements are as follows: the second check code obtained by converting the check code of the correction result is consistent with the first check code.

A saving unit 600, configured to save the data and the correction result in the target database.

Optionally, the saving unit 600 is further configured to: and under the condition that the first check code is the same as the second check code, directly storing the data and the processing result into the target database.

The present application also provides a computer-readable storage medium including a stored program, wherein the program executes the streaming data processing method provided by the present application.

The present application also provides a stream data processing apparatus including: a processor, memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein the program runs to execute the streaming data processing method provided by the application, and the method comprises the following steps:

processing each data to obtain a processing result of each data;

and storing the data and the correction result into a target database.

Specifically, on the basis of the foregoing embodiment, the processing the respective data to obtain a processing result of each data includes:

assigning a time stamp and a watermark to each of said data; the timestamp indicates a processing time of the data; the watermark indicating a delay time in processing the data;

Specifically, on the basis of the foregoing embodiment, processing each piece of data after delaying the delay time of each piece of data according to the sequence of the processing time from morning to evening to obtain the processing result of each piece of data includes:

Specifically, on the basis of the foregoing embodiment, the processing each piece of data in the valid stream data after delaying the delay time of each piece of data in the order from morning to evening to obtain the processing result of each piece of data includes:

Specifically, on the basis of the foregoing embodiment, after delaying the delay time of each piece of data in the order from morning to evening according to the processing time, processing each piece of data in the target stream data to obtain a processing result of each piece of data, the processing method includes:

Specifically, on the basis of the foregoing embodiment, the determining a first check code of the data based on a field included in the data includes:

Specifically, on the basis of the foregoing embodiment, after performing check code conversion on the processing result to obtain the second check code of the data, the method further includes:

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic disk or optical disk, etc. for storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A stream data processing method, characterized by comprising:

acquiring stream data from a preset message queue; the streaming data comprises a sequence of data; the data sequence comprises a plurality of data uploaded to the preset message queue through a source database, and each data is sequenced according to the uploading sequence of the data;

processing each data to obtain a processing result of each data;

and storing the data and the correction result into a target database.

2. The method according to claim 1, wherein said processing the respective data to obtain a processing result of each of the data comprises:

3. The method of claim 1, wherein processing each of the data after delaying the delay time of each of the data in the order of the processing time from morning to evening to obtain the processing result of each of the data comprises:

4. The method according to claim 3, wherein said processing each of the data in the valid stream data after delaying the delay time of each of the data in the order of the processing time from morning to evening to obtain the processing result of each of the data comprises:

5. The method according to claim 4, wherein said processing each of the data in the target stream data after delaying the delay time of each of the data in the order of the processing time from morning to evening to obtain the processing result of each of the data comprises:

6. The method of claim 1, wherein determining the first check code of the data based on the field included in the data comprises:

7. The method of claim 1, wherein after performing the checksum conversion on the processing result to obtain the second checksum of the data, the method further comprises:

8. A stream data processing apparatus characterized by comprising:

the determining unit is used for determining a first check code of the data based on the field contained in the data;

9. A computer-readable storage medium, characterized in that the computer-readable storage medium includes a stored program, wherein the program executes the streaming data processing method of any one of claims 1 to 7.

10. A stream data processing apparatus characterized by comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the streaming data processing method according to any one of claims 1 to 7 when running.