CN117708246A

CN117708246A - MySQL-based data blood-lineage map processing method, device and equipment

Info

Publication number: CN117708246A
Application number: CN202311544923.9A
Authority: CN
Inventors: 陈日炜; 吴小前
Original assignee: Beijing Deepexi Technology Co Ltd
Current assignee: Beijing Deepexi Technology Co Ltd
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-03-15

Abstract

The application relates to the technical field of data processing, in particular to a MySQL-based data blood-lineage map processing method, device and equipment. Firstly, obtaining blood margin original data; the blood-source original data are uniformly stored in a message queue by using a JSON format; after screening the data in the message queue, forming a blood-margin map data structure, and storing the data in a MySQL database; acquiring a user request; the user request is a request for checking a blood-related map; and based on the user request, inquiring in a MySQL database, and returning an inquiring result. By the arrangement, a user can directly perform the data blood-margin map related blood-margin functions or similar map structure functions based on MySQL data without introducing a new database or related technical middleware, so that the access, deployment and maintenance cost are reduced.

Description

MySQL-based data blood-lineage map processing method, device and equipment

Technical Field

The application relates to the technical field of data processing, in particular to a MySQL-based data blood-lineage map processing method, device and equipment.

Background

Relational databases are widely used in the field of software development, while NoSQL databases and graph database applications that enable graph data structures are not as extensive as relational databases, i.e., enterprises are likely to have used relational databases, but have not used graph databases. It may be undesirable for an enterprise to have to introduce new databases or related technology middleware if only to implement the blood-line function or similar graph structure function.

Disclosure of Invention

In view of the foregoing, embodiments of the present application are directed to providing a MySQL-based data lineage map processing method, apparatus, and device, to at least partially alleviate the above-described problems.

The first aspect of the application provides a data blood-lineage map processing method based on MySQL, which comprises the following steps:

obtaining blood margin original data; the blood-source original data are uniformly stored in a message queue by using a JSON format;

after screening the data in the message queue, forming a blood-margin map data structure, and storing the data in a MySQL database;

acquiring a user request; the user request is a request for checking a blood-related map;

and based on the user request, inquiring in a MySQL database, and returning an inquiring result.

In some embodiments, the blood-lineage raw data includes: address information of the data-associated node.

In some embodiments, after the filtering the data in the message queue, a blood-margin map data structure is formed and stored in a MySQL database, including:

according to the type and the uniqueness rule of the service, the data are de-duplicated, and a data set is obtained;

storing the metadata of different types into different service tables, and taking the address set of the corresponding recorded address;

combining the data set and the address set to obtain a blood-related map data set with an address generated autonomously;

the blood-lineage map dataset is stored to a table representing a map data structure.

In some embodiments, the autonomously generated address includes indication information of the associated service table.

In some embodiments, the blood-lineage map data structure includes: an autonomously generated address, an address of a target node, and an upstream node address of the target node; wherein the upstream node address of the target node includes: an address of an upstream node or an address of a downstream node;

the autonomously generated address is followed by brackets;

the inner part comprises: three groups of quotation marks arranged at one time are separated by commas;

the address of the target node is placed in a first group of quotation marks;

the address of the upstream node is placed in a second group of quotation marks;

the address of the downstream node is placed in the third set of quotation marks.

In some embodiments, the user request includes: an address of the central node;

the step of inquiring in the MySQL database and returning the inquiry result comprises the following steps:

setting the current traversal layer number currentlevel=0

Inquiring a blood margin map data structure to obtain addresses of all downstream nodes of the central node, and marking the addresses as adjacency list;

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

if yes, traversing the adjacency list, inquiring each address in the adjacency list based on a blood-margin map data structure to obtain addresses of downstream nodes associated with the addresses of all downstream nodes, and marking the addresses as adjacency list; then re-executing the step to set the value of the currentLevel to be currentLevel plus 1;

if not, determining node data of the nodes corresponding to all the queried addresses; the node data are arranged and returned as a query result;

the max_level is a preset value or a value set by a user through a user request.

In some embodiments, the user request further comprises: whether to query the uplink identification:

if the query upstream identifier indicates that the query is upstream, the querying in the MySQL database further includes:

setting the current traversal layer number currentlevel=0;

inquiring a blood margin map data structure to obtain addresses of all upstream nodes of the central node, and marking the addresses as adjacency list;

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

if yes, traversing the adjacency list, inquiring each address in the adjacency list based on a blood-margin map data structure to obtain addresses of upstream nodes related to the addresses of all upstream nodes, and marking the addresses as adjacency list; the step is then re-executed to set the value of currentLevel to currentLevel plus 1.

In some embodiments, the blood-lineage map data structure includes: ETL task blood-margin table;

the ETL task blood-margin table; the method comprises the steps of table names, ETL node addresses, addresses of ETL task blood-edge relation starting nodes and addresses of end nodes.

A second aspect of the present application provides a MySQL-based data blood-lineage map processing apparatus, including:

the acquisition module acquires blood margin original data; the blood-source original data are uniformly stored in a message queue by using a JSON format;

the storage module is used for forming a blood-margin map data structure after screening the data in the message queue and storing the blood-margin map data structure in the MySQL database;

the acquisition module is used for acquiring a user request; the user request is a request for checking a blood-related map;

and the query module is used for querying in the MySQL database based on the user request and returning a query result.

A third aspect of the present application provides an electronic device, comprising:

a processor and a memory for storing a program executable by the processor;

the processor is used for realizing the data blood-margin map processing method by running the program in the memory.

According to the MySQL-based data blood-source map processing method, first, blood-source original data are acquired; the blood-source original data are uniformly stored in a message queue by using a JSON format; after screening the data in the message queue, forming a blood-margin map data structure, and storing the data in a MySQL database; acquiring a user request; the user request is a request for checking a blood-related map; and based on the user request, inquiring in a MySQL database, and returning an inquiring result. By the arrangement, a user can directly perform the data blood-margin map related blood-margin functions or similar map structure functions based on MySQL data without introducing a new database or related technical middleware, so that the access, deployment and maintenance cost are reduced.

Drawings

The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a flowchart of a MySQL-based data blood-lineage map processing method according to an embodiment of the present application.

Fig. 2 is a conceptual diagram of a data flow of a method provided by an embodiment of the present application.

FIG. 3 is a partial flow diagram of a method provided in one embodiment of the present application.

Fig. 4 is a schematic structural diagram of a data security analysis device according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Summary of the application

The prior art scheme is not limited to using a special graph database or using a NoSQL database to realize the graph function of the data blood edges, and is lack of an implementation scheme using a relational database.

Relational databases are widely used in the field of software development, while NoSQL databases and graph database applications that enable graph data structures are not as extensive as relational databases, i.e., enterprises are likely to have used relational databases, but have not used graph databases. It may be undesirable for an enterprise to have to introduce new databases or related technology middleware if only to implement the blood-line function or similar graph structure function. When enterprises take technical implementation and maintenance costs as priority factors, the prior art solutions are unsatisfactory.

Based on the above, the application provides a data blood-margin map processing method based on MySQL, which comprises the steps of firstly obtaining blood-margin original data; the blood-source original data are uniformly stored in a message queue by using a JSON format; after screening the data in the message queue, forming a blood-margin map data structure, and storing the data in a MySQL database; acquiring a user request; the user request is a request for checking a blood-related map; and based on the user request, inquiring in a MySQL database, and returning an inquiring result. By the arrangement, a user can directly perform the data blood-margin map related blood-margin functions or similar map structure functions based on MySQL data without introducing a new database or related technical middleware, so that the access, deployment and maintenance cost are reduced.

Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.

Exemplary method

Fig. 1 is a flowchart of a MySQL-based data blood-lineage map processing method according to an embodiment of the present application. Fig. 2 is a conceptual diagram of a data flow of a method provided by an embodiment of the present application. As shown in fig. 1 and 2, the method includes the following.

Step S110, obtaining blood margin original data; the blood-source original data are uniformly stored in a message queue by using a JSON format;

in some embodiments, the blood-source raw data is generated by a producer, which has a plurality of data types and uniformly stores the data in a JSON format to a message queue. The blood-margin raw data includes: address information of the data-associated node.

Step S120, after screening the data in the message queue, forming a blood-margin map data structure, and storing the blood-margin map data structure in a MySQL database;

specifically, after screening and processing the information of the information queue, a blood-related map data structure is formed and stored in MySQL. It should be noted that, step S110 and step S120 are to store the blood-source raw data based on MySQL database.

Step S130, obtaining a user request; the user request is a request for checking a blood-related map;

step S140, based on the user request, performing a query in MySQL database, and returning a query result.

Specifically, step S130 and step S140 are based on MySQL database utilizing the blood-source raw data. By the arrangement, a user can directly perform the data blood-margin map related blood-margin functions or similar map structure functions based on MySQL data without introducing a new database or related technical middleware, so that the access, deployment and maintenance cost are reduced.

according to the type and the uniqueness rule of the service, the data are de-duplicated, and a data set is obtained; storing the metadata of different types into different service tables, and taking the address set of the corresponding recorded address; combining the data set and the address set to obtain a blood-related map data set with an address generated autonomously; the blood-lineage map dataset is stored to a table representing a map data structure.

Specifically, firstly, according to the type and the service uniqueness rule, the data are de-duplicated to obtain a data set Sinput; different types of metadata are stored in different business tables, and a set Sid of ids of corresponding records is obtained; combining Sinput and Sid to obtain a blood-related map data set Soutput with an autonomously generated id; soutput is stored to a table representing the graph data structure.

It should be noted that, in order to improve the query efficiency, different service tables need to be customized and generate rules of ids, for example, ids generated by table tables have table prefixes, so that only according to the ids, it can be known which service table to be queried in an associated manner. Namely: the autonomously generated address includes indication information of the associated service table.

Further, the blood-lineage map data structure includes: an autonomously generated address, an address of a target node, and an upstream node address of the target node; wherein the upstream node address of the target node includes: an address of an upstream node or an address of a downstream node; the autonomously generated address is followed by brackets; the inner part comprises: three groups of quotation marks arranged at one time are separated by commas; the address of the target node is placed in a first group of quotation marks; the address of the upstream node is placed in a second group of quotation marks; the address of the downstream node is placed in the third set of quotation marks.

Specifically, to meet the query performance, the design of the table of the graph data structure is important, and the core table is as follows:

adjacency(nodeId, upstream, downstream)

the meaning of the fields is as follows:

nodeId node id, system unique, derived from Sid;

upstream of the upstream node, a string of nodeids separated by commas;

downstream of the downstream node, a string of nodeids separated by commas;

nodeId is the primary key of the table;

assume that the data as shown in fig. 3 (assuming that all belong to the table service data):

examples of storage are:

("table1", "table3,table4", "table2")

("table2", "table1", "")

("table3", "", "table1")

("table4", "", "table1")

the system maintains the structure of the graph mainly according to the relation of the adjacent nodes.

Further, if the requirements according to the edge query exist, a corresponding table can be automatically created according to the actual situation. If the corresponding blood margin is to be queried according to the ETL task, a table can be built, namely: the blood-lineage map data structure includes: ETL task blood-margin table; the ETL task blood-margin table; the method comprises the steps of table names, ETL node addresses, addresses of ETL task blood-edge relation starting nodes and addresses of end nodes. Specifically, the ETL job blood-line table is as follows:

etl_edge(jobId, souceId, targetId)

the meaning of the fields is as follows:

id of the jobId ETL task;

the sourceId and the nodeId of the adjacency table are ids of the blood relationship starting nodes;

the targetId and the nodeId of the adjacency table are ids of the end nodes of the blood relationship;

the three fields form the primary key of the table.

In some embodiments, the user request includes: an address of the central node;

correspondingly, the querying in the MySQL database and returning the query result comprise:

setting the current traversal layer number currentlevel=0

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

setting the current traversal layer number currentlevel=0;

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

In practical application, the specific steps are as follows:

1. input: the central node nodeId, whether to query upstream queryUpstream, maximum query depth MAX_LEVEL

2. Setting the current traversal layer number currentlevel=0

3. Judging that the queryUpstream is true, and inquiring the upstream; otherwise downstream of the query (hereinafter unified as query downstream example)

4. Querying the adjacency table according to the nodeId to obtain the nodeids of all the downstream nodes, and marking the nodeids as adjacency list

5. Setting currentlevel+1

6. Judging currentLevel < MAX_level

7. If yes, traversing the adjacency list, and repeatedly executing the step 3 for each nodeId; otherwise go to the next step

8. Classifying all nodeids queried in the previous according to service prefixes, and respectively querying corresponding service tables to obtain node details

9. Assembling data

10. And (3) outputting: and finally inquiring the result.

Wherein, the system built-in MAX_LEVEL is generally a smaller positive integer value, which limits the depth of a single query and prevents users from waiting too long.

The algorithm core code is as follows:

private int bfs (String origin id, int level, boolean queryUpstream, biConsumer < BFSLevelDTO, deque < String > > exeutateateverylevel) { Deque < String > adjQueue = ", set < String > mark = new HashSet < (16);// central node adjqueue.offer (origid); int i = 0; write (i < level & i) adjQueue + & size (); struqu. Size +); strunoded ="; (int j = 0; j < size + { nodejqueue + & pol ());

the// avoid repeated queries if (satisfying. Containers (nodeId)) continuous; else satisfying. Add (nodeId);

the// query upstream/downstream node for (String adj: adjacency { adjqueue } } } i++;

data if for processing each layer (Objects. NonNull (executateveryLevel)) { BFSLevelDTO bfsLevelDTO =new BFSLevelDTO (); bfsLevelDTO. SetLevel (i); bfsLevelDTO. SetOriginodeId (nodeId); executateveryLevel. Accept (bfsLevelDTO, adjQuue); }

The depth return i of the return traversal = level i: i-1

In summary, the scheme provided by the application reduces the dependence on technical components, does not need to introduce a new database, and reduces the cost of realizing the data blood-lineage map; even in non-blood-source scenarios, it may be applicable to functions that require graph data structures.

Exemplary apparatus

The embodiment of the application device can be used for executing the embodiment of the application method. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Fig. 4 is a block diagram of a data security analysis device according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:

an acquisition module 41 for acquiring blood-lineage raw data; the blood-source original data are uniformly stored in a message queue by using a JSON format;

the storage module 42 is configured to form a blood-margin map data structure after screening the data in the message queue, and store the blood-margin map data structure in the MySQL database;

an acquisition module 41, configured to acquire a user request; the user request is a request for checking a blood-related map;

and the query module 43 is configured to query in the MySQL database based on the user request, and return a query result.

the autonomously generated address is followed by brackets;

the address of the target node is placed in a first group of quotation marks;

In some embodiments, the user request includes: an address of the central node;

setting the current traversal layer number currentlevel=0

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

setting the current traversal layer number currentlevel=0;

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

Exemplary electronic device

Next, an electronic device according to an embodiment of the present application is described with reference to fig. 5. Fig. 5 illustrates a block diagram of an electronic device according to an embodiment of the present application.

As shown in fig. 5, the electronic device 500 includes one or more processors 510 and memory 520.

Processor 510 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in electronic device 500 to perform desired functions.

Memory 520 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that may be executed by the processor 510 to implement the MySQL-based data lineage map processing method and/or other desired functions of the various embodiments of the present application described above. Various contents such as category correspondence may also be stored in the computer-readable storage medium.

In one example, the electronic device 500 may further include: an input device 530 and an output device 540, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

In addition, the input device 530 may also include, for example, a keyboard, mouse, interface, etc. The output device 540 may output various information including analysis results and the like to the outside. The output device 540 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device that are relevant to the present application are shown in fig. 5 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, the electronic device may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a method of processing a data blood-lineage map based on MySQL described in the "exemplary methods" section of the present specification, according to various embodiments of the present application.

The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a MySQL-based data blood-lineage map processing method according to various embodiments of the present application described in the above "exemplary methods" section of the present specification.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. The MySQL-based data blood-lineage map processing method is characterized by comprising the following steps:

2. The MySQL-based data lineage map processing method according to claim 1, wherein the lineage raw data includes: address information of the data-associated node.

3. The MySQL-based data blood-lineage map processing method according to claim 2, wherein after the filtering processing is performed on the data in the message queue, a blood-lineage map data structure is formed, and the data is stored in a MySQL database, including:

4. A MySQL-based data lineage map processing method according to claim 3, wherein the autonomously generated address includes indication information of an associated service table.

5. The MySQL-based data blood-lineage map processing method according to claim 3, wherein the blood-lineage map data structure includes: an autonomously generated address, an address of a target node, and an upstream node address of the target node; wherein the upstream node address of the target node includes: an address of an upstream node or an address of a downstream node;

the autonomously generated address is followed by brackets;

the address of the target node is placed in a first group of quotation marks;

6. The MySQL-based data lineage map processing method according to claim 5, wherein the user request includes: an address of the central node;

setting the current traversal layer number currentlevel=0

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

if yes, traversing the adjacency list, inquiring each address in the adjacency list based on a blood-edge map data structure, obtaining addresses of downstream nodes associated with addresses of all downstream nodes in the adjacency list, and marking the addresses as the adjacency list; then re-executing the step to set the value of the currentLevel to be currentLevel plus 1;

7. The MySQL-based data lineage map processing method according to claim 6, wherein the user request further includes: whether to query the uplink identification:

setting the current traversal layer number currentlevel=0;

setting the value of currentLevel to currentLevel plus 1;

judging whether the currentLevel is smaller than MAX_level;

if yes, traversing the adjacency list, inquiring each address in the adjacency list based on a blood-edge map data structure, obtaining addresses of upstream nodes associated with addresses of all upstream nodes in the adjacency list, and marking the addresses as the adjacency list; the step is then re-executed to set the value of currentLevel to currentLevel plus 1.

8. The MySQL-based data blood-lineage map processing method according to claim 3, wherein the blood-lineage map data structure includes: ETL task blood-margin table;

9. A MySQL-based data blood-lineage map processing apparatus, comprising:

10. An electronic device, comprising:

a processor and a memory for storing a program executable by the processor;

the processor is configured to implement the MySQL-based data blood-lineage map processing method according to any one of claims 1 to 8 by running a program in the memory.