CN114817298A - Method, device and equipment for extracting field-level data blood margin and storage medium - Google Patents

Method, device and equipment for extracting field-level data blood margin and storage medium Download PDF

Info

Publication number
CN114817298A
CN114817298A CN202210512520.5A CN202210512520A CN114817298A CN 114817298 A CN114817298 A CN 114817298A CN 202210512520 A CN202210512520 A CN 202210512520A CN 114817298 A CN114817298 A CN 114817298A
Authority
CN
China
Prior art keywords
field
sql statement
output
syntax tree
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210512520.5A
Other languages
Chinese (zh)
Inventor
李震川
李钊
李均
沈琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210512520.5A priority Critical patent/CN114817298A/en
Publication of CN114817298A publication Critical patent/CN114817298A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence and discloses a field-level data blood margin extraction method, a field-level data blood margin extraction device, field-level data blood margin extraction equipment and a storage medium. The method comprises the following steps: preprocessing the initial SQL statement to obtain a target SQL statement, and analyzing the target SQL statement into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.

Description

Method, device and equipment for extracting field-level data blood margin and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a field-level data blood margin extraction method, a field-level data blood margin extraction device, field-level data blood margin extraction equipment and a storage medium.
Background
Data security management and control is one of important modes for protecting data security, and the risk of data leakage is reduced by adopting a minimum authorization principle and accurately controlling field-level data access authority. One of the most important links in data security management and control is to trace the source of each field of the user query result to obtain the security management and control information of the actual table corresponding to the fields, for example, whether the fields have authority to access, whether encryption/desensitization processing is performed, and the like, so as to take corresponding intervention measures.
Because data security management and control needs to be able to cover all database systems of an enterprise, including relational databases such as MySQL, Oracle and the like, and big data systems such as hive, spark, presto and the like, how to cover various database systems, and how to accurately trace the source of field-level query results to determine the corresponding physical tables and fields thereof, and determining the consanguinity information between data becomes a technical problem that needs to be solved by technical personnel in the field.
Disclosure of Invention
The invention mainly aims to solve the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art by analyzing and extracting the blood margin data of the output field of the input SQL statement.
The invention provides a field-level data blood margin extraction method in a first aspect, which comprises the following steps: acquiring an initial SQL statement, and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format; performing lexical analysis on the target SQL statement to generate an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relationship between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining an initial SQL statement and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format includes: deleting useless data in the initial SQL statement based on a preset first regular expression, wherein the useless data comprises comments, blank spaces, line feed characters and end characters; deleting the paging operation limit code in the initial SQL statement; removing the life cycle in the initial SQL statement based on a preset second regular expression; and based on a preset strategy mode, carrying out syntax processing of a corresponding type on the initial SQL statement to obtain a target SQL statement in a preset format.
Optionally, in a second implementation manner of the first aspect of the present invention, the performing lexical analysis on the target SQL statement to generate an abstract syntax tree includes: acquiring a preset key field set, and screening out words consistent with key fields in the key field set from the target SQL statement to obtain standard key fields; the target SQL statement is segmented by utilizing the standard key field to obtain a plurality of SQL subfields; converting a plurality of the SQL subfields into an abstract syntax tree.
Optionally, in a third implementation manner of the first aspect of the present invention, the segmenting the target SQL statement by using the standard key field to obtain a plurality of SQL subfields includes: taking the standard key field as a segmentation node, and performing left-side and right-side segmentation on the target SQL statement based on the segmentation node to obtain a plurality of SQL subfields; and performing secondary segmentation on the plurality of segmentation sentences based on the standard key fields as positioning points to obtain a plurality of SQL sub-fields.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the traversing nodes of multiple levels of the abstract syntax tree based on the parsing rule and the parsing policy of the abstract syntax tree to obtain each output field of the target SQL statement includes: traversing nodes of a plurality of levels of the abstract syntax tree based on a parsing rule and a parsing strategy of the abstract syntax tree, and acquiring field data involved from the nodes of the plurality of levels of the abstract syntax tree; determining an input node, an output node and an intermediate conversion node according to the out-degree and the in-degree; and traversing all the output nodes, judging whether the target SQL statement of the output node contains a field value, if so, taking the field in the target SQL statement as an output field, and if not, acquiring all the fields of the table in table output as the output field.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the generating, according to the correspondence between the output field and the physical table field, field context information between the output field and the physical table field includes: determining a blood relationship analyzer corresponding to the output field according to the type of a preset basic Node in the abstract data tree; based on the blood relationship analyzer, field query is carried out on the basic Node to obtain a corresponding query field list; and traversing the query field list and the physical table field, and searching in a recursive tracing mode to obtain the field blood relationship information between the output field and the physical table field.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after the generating, according to the correspondence between the output field and the physical table field, field context information between the output field and the physical table field, further includes: and inputting the field blood relationship information into the target SQL statement for data reduction to obtain target physical field information corresponding to the output field.
The second aspect of the present invention provides a field-level data blood margin extraction device, including: the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring an initial SQL statement and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format; the analysis module is used for carrying out lexical analysis on the target SQL statement to generate an abstract syntax tree; the traversal module is used for traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; a determining module, configured to perform recursive backtracking on the abstract syntax tree to obtain a physical table field corresponding to each output field, and determine a corresponding relationship between the output field and the physical table field; and the generation module is used for generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
Optionally, in a first implementation manner of the second aspect of the present invention, the preprocessing module is specifically configured to: deleting useless data in the initial SQL statement based on a preset first regular expression, wherein the useless data comprises comments, blank spaces, line feed characters and end characters; deleting the paging operation limit code in the initial SQL statement; removing the life cycle in the initial SQL statement based on a preset second regular expression; and based on a preset strategy mode, carrying out syntax processing of a corresponding type on the initial SQL statement to obtain a target SQL statement in a preset format.
Optionally, in a second implementation manner of the second aspect of the present invention, the parsing module includes: the screening unit is used for acquiring a preset key field set, screening out words consistent with key fields in the key field set from the target SQL sentence, and obtaining standard key fields; the segmentation unit is used for segmenting the target SQL statement by using the standard key field to obtain a plurality of SQL subfields; a conversion unit for converting the plurality of SQL subfields into an abstract syntax tree.
Optionally, in a third implementation manner of the second aspect of the present invention, the splitting unit is specifically configured to: taking the standard key field as a segmentation node, and performing left-side and right-side segmentation on the target SQL statement based on the segmentation node to obtain a plurality of SQL subfields; and performing secondary segmentation on the plurality of segmentation sentences based on the standard key fields as positioning points to obtain a plurality of SQL sub-fields.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the traversal module is specifically configured to: traversing nodes of a plurality of levels of the abstract syntax tree based on a parsing rule and a parsing strategy of the abstract syntax tree, and acquiring field data involved from the nodes of the plurality of levels of the abstract syntax tree; determining an input node, an output node and an intermediate conversion node according to the out-degree and the in-degree; and traversing all the output nodes, judging whether the target SQL statement of the output node contains a field value, if so, taking the field in the target SQL statement as an output field, and if not, acquiring all the fields of the table in table output as the output field.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the generating module is specifically configured to: determining a blood relationship analyzer corresponding to the output field according to the type of a preset basic Node in the abstract data tree; based on the blood relationship analyzer, field query is carried out on the basic Node to obtain a corresponding query field list; and traversing the query field list and the physical table field, and searching in a recursive tracing mode to obtain the field blood relationship information between the output field and the physical table field.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the field-level data blood margin extraction apparatus further includes: and the data reduction module is used for inputting the field blood relationship information into the target SQL statement for data reduction to obtain target physical field information corresponding to the output field.
The third aspect of the present invention provides a field level data blood margin extraction device, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the field level data vein extraction apparatus to perform the various steps of the field level data vein extraction method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-described field-level data vein extraction method.
In the technical scheme provided by the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Drawings
FIG. 1 is a diagram of a first embodiment of a field-level data blood-margin extraction method provided by the present invention;
FIG. 2 is a diagram of a second embodiment of the method for extracting the blood relationship of field level data according to the present invention;
FIG. 3 is a diagram illustrating a third embodiment of a field-level data blood-margin extraction method according to the present invention;
FIG. 4 is a diagram illustrating a fourth embodiment of a method for extracting a blood margin from field level data according to the present invention;
FIG. 5 is a diagram of a fifth embodiment of the method for extracting the blood relationship of field level data according to the present invention;
FIG. 6 is a schematic diagram of a first embodiment of a field-level data vein extraction apparatus provided in the present invention;
FIG. 7 is a diagram of a second embodiment of the field-level data vein extraction apparatus according to the present invention;
fig. 8 is a schematic diagram of an embodiment of a field-level data blood margin extraction device provided in the present invention.
Detailed Description
The embodiment of the invention provides a field-level data blood margin extraction method, a field-level data blood margin extraction device, field-level data blood margin extraction equipment and a field-level data blood margin storage medium, wherein in the technical scheme of the invention, firstly, an initial SQL statement is preprocessed to obtain a target SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a method for extracting a field-level data blood margin according to an embodiment of the present invention includes:
101. acquiring an initial SQL statement, and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format;
in this embodiment, the preprocessing is a pre-processing that is set by a system in a self-defined manner and is used for the obtained initial SQL statement to be processed, and may include, but is not limited to, any one or a combination of more than one of the following: culling garbage in the initial SQL statement by using a first regular expression, wherein the garbage comprises at least one of the following items: comments, spaces, linefeeds and terminators; removing the paging operation limit code in the initial SQL statement; removing the life cycle in the SQL statement by using a second regular expression; and performing corresponding type syntax processing on different types of initial SQL sentences by using a preset strategy mode.
The target SQL statement is a standard SQL statement that conforms to a standard SQL syntax, e.g., an SQL statement that does not include special symbols such as comments, spaces, terminators, and linechangers, etc.
102. Performing lexical analysis on the target SQL statement to generate an abstract syntax tree;
in this embodiment, a source code of an SQL statement is obtained, a character stream of the source code is sequentially read in, and a syntax analysis is performed on the source code to obtain an abstract syntax tree.
In another specific embodiment, a source code corresponding to an initial SQL statement is obtained, and the source code is segmented based on an end symbol of the source code to obtain a plurality of lines of codes corresponding to the source code; removing redundant characters in the multi-line code to obtain a processed multi-line code; and performing semantic analysis on the processed multi-line code by using a syntax analyzer to obtain an abstract syntax tree.
Specifically, a source code of an SQL statement written by a developer is obtained, a semicolon end symbol is used as a segmentation node for the source code, the source code is segmented into a plurality of lines of codes to form a code structure corresponding to the source code, redundant line feed symbols, index symbols and annotation statements in the plurality of lines of codes are removed to obtain processed multi-line codes, the processed multi-line codes are subjected to line-by-line SQL semantic analysis to obtain an abstract syntax tree, redundant characters are removed to improve the efficiency of the semantic analysis, the influence of the redundant characters on the semantic analysis is reduced, and the accuracy of the semantic analysis is improved, so that the obtained abstract syntax tree is higher in matching degree with the initial SQL statement.
103. Traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement;
in this embodiment, based on the parsing rule and the parsing policy of the abstract syntax tree, nodes of multiple levels of the abstract syntax tree are traversed to obtain each output field in the target SQL statement. The method comprises the steps of defining analysis rules and analysis strategies for a plurality of levels of nodes of the abstract syntax tree in advance, analyzing the nodes of the abstract syntax tree based on the analysis rules and the analysis strategies of the abstract syntax tree after the analysis rules and the analysis strategies of the abstract syntax tree are obtained, and traversing all the nodes of the abstract syntax tree from the nodes with high levels to the nodes with low levels.
In another specific embodiment, the abstract syntax tree comprises a unitary node, a binary node and a leaf node from high to low in hierarchy, respective corresponding parsing rules are set for a table object and a field object in the abstract syntax tree, corresponding parsing strategies are set for objects of the same type at the unitary node, the binary node and the leaf node, and the unitary node, the binary node and the leaf node of the objects of different types in the abstract syntax tree are analyzed based on the preset parsing rules and parsing strategies of the abstract syntax tree to obtain nodes of multiple hierarchies.
Specifically, the multiple types of data corresponding to the SQL statements respectively correspond to one analysis rule, the analysis rules include different analysis strategies for nodes of different levels, and the analysis strategies are compatible with syntax rules of the multiple types of SQL statements. The analysis rules and the analysis strategy are backward compatible with other SQL grammar rules such as Presto and the like. And presetting an analysis strategy compatible with various grammar rules aiming at different grammar rules so as to meet the analysis requirements of different types of data under various grammar rules and improve the compatibility when the abstract grammar tree is analyzed.
104. Performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields;
in this embodiment, the abstract syntax tree is recursively traced back to obtain the physical table fields corresponding to the output fields, and the correspondence between the output fields and the physical table fields is determined. And performing recursive backtracking on the abstract syntax tree, sequentially processing each operation node, and respectively determining an input field set, an output field set and field mapping information in each operation node. For a button script, as shown in FIG. 2, a set of input fields and a set of output fields are determined.
Specifically, the recursive backtracking is called a recursive backtracking algorithm, and the recursive backtracking algorithm is actually a search attempt process similar to enumeration, mainly finds a solution of a problem in the search attempt process, and returns the backtracking to try another path when finding that a solution condition is not met. The backtracking method is a preferred search method, and searches forward according to preferred conditions to achieve the target. However, when a certain step is explored, if the original selection is not good or the target is not reached, the step is returned to be reselected, the technology of returning to be returned to be a backtracking method, and a point of a certain state meeting the backtracking condition is called a backtracking point. Many complex, large-scale problems can be solved using backtracking, which is commonly known as "general problem solving".
105. And generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
In this embodiment, the field blood relationship information between the output field and the physical table field is generated according to the corresponding relationship between the output field and the physical table field.
Specifically, traversing the abstract syntax tree, and marking the nodes in the abstract syntax tree, which are consistent with the nodes of the preset fields, as target nodes; extracting a data value in the target node, and performing comparison query based on the data value and a data table in a database to obtain an operation table corresponding to the data value; determining the type of the operation table according to the label corresponding to the operation table, determining the relationship between the operation tables of different types, and summarizing the relationship between the operation tables to obtain a blood relationship analysis result. In detail, in the embodiment of the present invention, the preset field is a table, and a value corresponding to the preset field in the target node is extracted, for example: and (4) table A, wherein the corresponding value is A, and then the data table with the table name of A in the preset database is inquired to obtain the corresponding operation table.
Specifically, the type of the job table is determined according to the type to which the tag corresponding to the job table belongs, for example, if the tag type corresponding to the job table is a start data table tag, the job table is determined as a start data table; and if the label type corresponding to the data table is a target data table label, determining the operation table as a target data table, wherein the blood relationship between the initial data table and the target data table is an upstream data table with the initial data table as the target data table, and summarizing all the initial data tables and the target data table to obtain the blood relationship analysis result.
In the embodiment of the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to each output field, and determining the corresponding relationship between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Referring to fig. 2, a second embodiment of the method for extracting the blood margin of the field level data according to the embodiment of the present invention includes:
201. deleting useless data in the initial SQL statement based on a preset first regular expression;
in this embodiment, the first regular expression is set by a user in a programming language Java, and is a logic formula for operating on a character string and a special character. The method comprises the following steps of using a first regular expression to eliminate useless information in an SQL statement to be processed, wherein the useless information comprises but is not limited to any one item or combination of more items in the following items: comments, spaces, linefeeds, endcaps, or other special characters/symbols.
202. Deleting the paging operation limit code in the initial SQL statement;
in this embodiment, code fragments after a paging operation (limit) in the SQL statement to be processed are removed, so as to prevent that a blood relation between data in the SQL statement cannot be obtained.
203. Eliminating the life cycle in the initial SQL statement based on a preset second regular expression;
in this embodiment, the life cycle in the initial SQL statement is removed based on the preset second regular expression. Specifically, the second regular expression is set by a user in a programming language Java, and is a logic formula for operating on character strings and special characters. The first regular expression is different from the second regular expression. The method and the device use the second regular expression to remove the possible life cycle in the SQL statement to be processed.
Specifically, the regular expression is also called a regular expression. (English: Regular Expression, often abbreviated in code as regex, regexp or RE), a concept of computer science. Regular expressions are typically used to retrieve, replace, text that conforms to a certain pattern (rule).
Many programming languages support string operations using regular expressions. For example, a powerful regular expression engine is built into Perl. The concept of regular expressions was originally popularized by tool software in Unix (e.g., sed and grep). Regular expressions are often abbreviated as "regex", with regex p, regex in the singular and regexps, regexes, regexen in the plural.
204. Based on a preset strategy mode, carrying out grammar processing of corresponding types on the initial SQL statement to obtain a target SQL statement in a preset format;
in this embodiment, based on the preset policy mode, syntax processing of a corresponding type is performed on the initial SQL statement to obtain a target SQL statement in a preset format. Specifically, the preset policy mode is set by the system in a self-defined manner, and can be used for performing a custom-processing hadler method so as to perform corresponding special syntax processing on different types of SQL statements. The SQL statement to be processed can be subjected to corresponding special syntax processing by using the third regular expression indicated by the preset strategy mode.
For example, the following third regular expression is adopted in the embodiment to process the SQL statement to be processed: (i) (create \ s + (temp | TEMPORARY) \ \ s + table \ \ s +), which indicates that the temp | TEMPORARY table in the SQL sentence matching the regular expression is replaced by the create table, so that the blood relationship between data can be normally analyzed by subsequent operations.
205. Performing lexical analysis on the target SQL statement to generate an abstract syntax tree;
206. traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement;
207. performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields;
208. and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
The steps 205-208 in this embodiment are similar to the steps 102-105 in the first embodiment, and are not described herein again.
In the embodiment of the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Referring to fig. 3, a third embodiment of the method for extracting the blood margin of the field level data according to the embodiment of the present invention includes:
301. acquiring an initial SQL statement, and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format;
302. acquiring a preset key field set, and screening out words consistent with key fields in the key field set from a target SQL sentence to obtain standard key fields;
in this embodiment, a preset key field set is obtained, and words consistent with key fields in the key field set are screened out from the target SQL statement to obtain standard key fields. Specifically, the set of key fields includes a start set of key fields and a target set of key fields, wherein the start set of key fields includes ' LEFT JOIN ', ' RIGHT JOIN ', ' LEFT out JOIN ', ' RIGHT out JOIN ', etc., and the target set of key fields includes ' CREATE ', ' INSERT ', ' SELECT ' INTO ', or ' over write '.
303. Taking the standard key field as a segmentation node, and performing left-side and right-side segmentation on the target SQL sentence based on the segmentation node to obtain a plurality of SQL subfields;
in this embodiment, the standard key field is used as a segmentation node, and the target SQL statement is segmented to the left and to the right based on the segmentation node, so as to obtain a plurality of SQL subfields. The standard key field is used as a segmentation node, and the SQL sentence is segmented towards the left side and the right side based on the segmentation node to obtain a plurality of SQL sub-sentences; or segmenting the SQL sentence based on a preset random segmentation length to obtain a plurality of segmented sentences, and performing secondary segmentation on the plurality of segmented sentences by using the standard key field as a positioning point to obtain a plurality of SQL sub-sentences.
For example, the SQL statement is select col _ a from a, and the standard key fields are "select" and "from", so that the standard key fields "select" and "from" are used as segmentation nodes to perform segmentation, and two SQL sub-statements including select col _ a and from a are obtained.
304. Performing secondary segmentation on the plurality of segmentation sentences based on the standard key fields as positioning points to obtain a plurality of SQL subfields;
in this embodiment, the multiple segmented statements are secondarily segmented based on the standard key field as a positioning point, so as to obtain multiple SQL subfields. Wherein the segmentation is the segmentation of words. Chinese Segmentation (also called Chinese Segmentation) refers to the Segmentation of a Chinese character sequence into a single Word. The Chinese word segmentation is the basis of text mining, and for a section of input Chinese, the Chinese word segmentation is successfully carried out, so that the effect of automatically identifying the meaning of a sentence by a computer can be achieved. The method is also called mechanical word segmentation method, which matches the Chinese character string to be analyzed with the entry in a sufficiently large machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is recognized).
305. Converting the plurality of SQL subfields into an abstract syntax tree;
in this embodiment, the SQL statement is segmented based on a preset random segmentation length to obtain a plurality of segmented statements, and the standard keyword is used as a positioning point to perform secondary segmentation on the plurality of segmented fields to obtain a plurality of SQL subfields. For example, the SQL field is select col _ a from a, and the standard keywords are "select" and "from", so the standard keywords "select" and "from" are used as the segmentation nodes to perform segmentation, and two SQL subfields including select col _ a and from a are obtained.
Specifically, after the SQL field is segmented by using the standard keyword to obtain a plurality of SQL subfields, the method further includes: tagging the plurality of SQL subfields. For example, the keyword corresponding to the SQL subfield select col _ a is a keyword in the target keyword set, and thus the tag corresponding to the SQL subfield select col _ a is a target tag.
Further, said converting a plurality of said SQL subfields into an abstract syntax tree comprises: analyzing the SQL sub-statement by using a preset lexical analyzer to obtain a plurality of word elements; and constructing a syntax tree of the plurality of the word elements according to a preset syntax analysis method to obtain the abstract syntax tree. Wherein the preset syntax analysis method includes a top-down analysis method and a bottom-up analysis method.
306. Traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement;
307. performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields;
308. and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
The steps 301 and 306-308 in this embodiment are similar to the steps 101 and 103-105 in the first embodiment, and are not described herein again.
In the embodiment of the invention, the initial SQL statement is preprocessed to obtain a target SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Referring to fig. 4, a fourth embodiment of the method for extracting the blood-related data of field level according to the present invention includes:
401. acquiring an initial SQL statement, and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format;
402. performing lexical analysis on the target SQL statement to generate an abstract syntax tree;
403. traversing nodes of multiple levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree, and acquiring related field data from the nodes of the multiple levels of the abstract syntax tree;
in this embodiment, based on the parsing rule and the parsing policy of the abstract syntax tree, nodes of multiple levels of the abstract syntax tree are traversed, and field data involved is acquired from the nodes of the multiple levels of the abstract syntax tree. Specifically, multiple types of data corresponding to the SQL statements respectively correspond to one analysis rule, the analysis rules include different analysis strategies for nodes of different levels, and the analysis strategies are compatible with syntax rules of the multiple types of SQL statements. The analysis rules and the analysis strategy are backward compatible with other SQL grammar rules such as Presto and the like. And presetting an analysis strategy compatible with various grammar rules aiming at different grammar rules so as to meet the analysis requirements of different types of data under various grammar rules and improve the compatibility when the abstract grammar tree is analyzed.
404. Determining an input node, an output node and an intermediate conversion node according to the out-degree and the in-degree;
in this embodiment, the input node, the output node, and the intermediate conversion node are determined according to the out-degree and the in-degree. Specifically, the operation nodes are divided into input nodes, output nodes and intermediate conversion nodes according to the out-degree and the in-degree of the operation nodes in the loop-free directed graph. The input node is a job node having an instruction degree greater than 0 and an entry degree of 0. The output node is a job node with an out degree of 0 and an in degree of more than 0. And for the operation nodes with the out-degree and the in-degree both greater than 0, the operation nodes are intermediate conversion nodes.
405. Traversing all output nodes, judging whether a target SQL statement of the output nodes contains field values, if so, taking fields in the target SQL statement as output fields, and if not, acquiring all fields of a table in table output as output fields;
in this embodiment, all output nodes are traversed, whether a target SQL statement of an output node includes a field value is determined, if yes, a field in the target SQL statement is used as an output field, and if not, all fields of the table in table output are obtained as output fields. Specifically, the core of generating the data consanguinity graph is to perform mapping between fields, so the present invention processes the input nodes, the output nodes and the intermediate conversion nodes respectively to determine corresponding input fields, output fields and mapping relationships between the input fields and the output fields. For the input node. In this embodiment, the SQL statement of the corresponding node is obtained, and whether the SQL statement includes a field value is determined. And directly taking the field value in the SQL statement corresponding to the input node as an input field. When the SQL statement does not include a field value but adopts a full-match, the present embodiment directly obtains all fields of the table as input fields according to the table input in the SQL statement.
Specifically, in this embodiment, a corresponding operation table is determined based on an SQL statement, a corresponding database is dynamically connected according to the database connection information, and all fields of the current table are obtained from the database as input fields. According to the method, all input nodes are traversed, and the input fields obtained by processing each input node are combined into an input field set.
406. Performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields;
407. and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
The steps 401-.
In the embodiment of the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Referring to fig. 5, a fifth embodiment of the method for extracting the blood margin of the field level data according to the embodiment of the present invention includes:
501. acquiring an initial SQL statement, and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format;
502. performing lexical analysis on the target SQL statement to generate an abstract syntax tree;
503. traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement;
504. performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields;
505. determining a blood relationship analyzer corresponding to an output field according to the type of a preset basic Node in the abstract data tree;
in this embodiment, the blood relationship analyzer corresponding to the output field is determined according to the type of the preset base Node in the abstract data tree. Specifically, a corresponding blood relationship analyzer may be selected according to the type of the base Node to perform blood relationship analysis. Understandably, the blood-related relationships are commonly found in insertnodes and createnodes. Therefore, if the type of the basic Node is an Insertnode, a blood margin resolver corresponding to the Insertnode is selected; on the contrary, if the type of the base Node is a CreateNode, a blood margin resolver corresponding to the CreateNode is selected.
506. Based on a blood relationship analyzer, field query is carried out on a basic Node to obtain a corresponding query field list;
in this embodiment, based on the blood relationship analyzer, field query is performed on the base Node to obtain a corresponding query field list. The query field list comprises a main table and field information corresponding to the main table. The main table here may be an insert table or a create table. The field and list queries include, but are not limited to, any one or combination of more of the following: the method comprises the steps of conventional/common query, subquery, joint union query and function field query, and the method can select a corresponding query mode according to the actual situation of a basic Node to perform query processing to obtain a final query field list.
507. Traversing the query field list and the physical table field, and searching in a recursive tracing manner to obtain field blood relationship information between the output field and the physical table field;
in this embodiment, the query field list and the physical table field are traversed, and the field blood relationship information between the output field and the physical table field is obtained by searching in a recursive source tracing manner. Specifically, the query field list may be traversed in a loop to find the main table and the field information of the main table included in the list. And then, searching the blood relationship among corresponding data from the main table in the query field list and the field information of the main table in a recursive source tracing mode. The blood relationship includes, but is not limited to, at least one of: superficial level kindred relationship, field level kindred relationship, or other level kindred relationship.
508. And inputting the field blood relationship information into a target SQL statement for data reduction to obtain target physical field information corresponding to the output field.
The steps 501-504 in this embodiment are similar to the steps 101-105 in the first embodiment, and are not described herein again.
In the embodiment of the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
With reference to fig. 6, the above description of the method for extracting field-level data blood-level margin according to the embodiment of the present invention, and the following description of the device for extracting field-level data blood-level margin according to the embodiment of the present invention, a first embodiment of the device for extracting field-level data blood-level margin according to the embodiment of the present invention includes:
the preprocessing module 601 is configured to obtain an initial SQL statement and preprocess the initial SQL statement to obtain a target SQL statement in a preset format;
the parsing module 602 is configured to perform lexical parsing on the target SQL statement to generate an abstract syntax tree;
a traversing module 603, configured to traverse nodes of multiple levels of the abstract syntax tree based on the parsing rule and the parsing policy of the abstract syntax tree to obtain each output field in the target SQL statement;
a determining module 604, configured to perform recursive backtracking on the abstract syntax tree to obtain a physical table field corresponding to each output field, and determine a corresponding relationship between the output field and the physical table field;
a generating module 605, configured to generate, according to a corresponding relationship between the output field and the physical table field, field blood relationship information between the output field and the physical table field.
In the embodiment of the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Referring to fig. 7, a second embodiment of the apparatus for extracting a field-level data blood margin according to the present invention specifically includes:
the preprocessing module 601 is configured to obtain an initial SQL statement and preprocess the initial SQL statement to obtain a target SQL statement in a preset format;
the parsing module 602 is configured to perform lexical parsing on the target SQL statement to generate an abstract syntax tree;
a traversing module 603, configured to traverse nodes of multiple levels of the abstract syntax tree based on the parsing rule and the parsing policy of the abstract syntax tree to obtain each output field in the target SQL statement;
a determining module 604, configured to perform recursive backtracking on the abstract syntax tree to obtain a physical table field corresponding to each output field, and determine a corresponding relationship between the output field and the physical table field;
a generating module 605, configured to generate, according to a corresponding relationship between the output field and the physical table field, field blood relationship information between the output field and the physical table field.
In this embodiment, the preprocessing module 601 is specifically configured to:
deleting useless data in the initial SQL statement based on a preset first regular expression, wherein the useless data comprises comments, blank spaces, line feed characters and end characters;
deleting the paging operation limit code in the initial SQL statement;
removing the life cycle in the initial SQL statement based on a preset second regular expression;
and based on a preset strategy mode, carrying out syntax processing of a corresponding type on the initial SQL statement to obtain a target SQL statement in a preset format.
In this embodiment, the parsing module 602 includes:
a screening unit 6021, configured to obtain a preset key field set, and screen out a word consistent with a key field in the key field set from the target SQL statement, to obtain a standard key field;
a splitting unit 6022, configured to split the target SQL statement by using the standard key field to obtain a plurality of SQL subfields;
a conversion unit 6023 for converting the plurality of SQL subfields into an abstract syntax tree.
In this embodiment, the splitting unit 6022 is specifically configured to:
taking the standard key field as a segmentation node, and performing left-side and right-side segmentation on the target SQL statement based on the segmentation node to obtain a plurality of SQL sub-fields
And performing secondary segmentation on the plurality of segmentation sentences based on the standard key fields as positioning points to obtain a plurality of SQL sub-fields.
In this embodiment, the traversing module 603 is specifically configured to:
traversing nodes of a plurality of levels of the abstract syntax tree based on a parsing rule and a parsing strategy of the abstract syntax tree, and acquiring field data involved from the nodes of the plurality of levels of the abstract syntax tree;
determining an input node, an output node and an intermediate conversion node according to the out-degree and the in-degree;
and traversing all the output nodes, judging whether the target SQL statement of the output node contains a field value, if so, taking the field in the target SQL statement as an output field, and if not, acquiring all the fields of the table in table output as the output field.
In this embodiment, the generating module 605 is specifically configured to:
determining a blood relationship analyzer corresponding to the output field according to the type of a preset basic Node in the abstract data tree;
based on the blood relationship analyzer, field query is carried out on the basic Node to obtain a corresponding query field list;
and traversing the query field list and the physical table field, and searching in a recursive tracing mode to obtain the field blood relationship information between the output field and the physical table field.
In this embodiment, the field-level data blood margin extraction device further includes:
and the data reduction module 606 is configured to input the field blood relationship information into the target SQL statement to perform data reduction, so as to obtain target physical field information corresponding to the output field.
In the embodiment of the invention, the target SQL statement is obtained by preprocessing the initial SQL statement, and the target SQL statement is analyzed into an abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement; performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relation between the output fields and the physical table fields; and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field. According to the scheme, the blood margin data of the output field of the input SQL sentence is analyzed and extracted, so that the technical problems that various database systems cannot be covered and the blood margin information among data can be accurately determined in the prior art are solved.
Fig. 6 and 7 describe the field-level data blood margin extraction device in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the field-level data blood margin extraction device in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 8 is a schematic structural diagram of a field-level data vein extraction device according to an embodiment of the present invention, where the field-level data vein extraction device 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) for storing applications 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instructions operating on the field-level data edge extraction device 800. Further, the processor 810 may be configured to communicate with the storage medium 830, and execute a series of instruction operations in the storage medium 830 on the field-level data vein extraction device 800 to implement the steps of the field-level data vein extraction method provided by the above-described method embodiments.
The field level data vein extraction apparatus 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the field level data vein extraction device configuration shown in fig. 8 does not constitute a limitation of the field level data vein extraction devices provided herein, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the above-mentioned field-level data vein extraction method.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses, and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A field-level data blood margin extraction method is characterized by comprising the following steps:
acquiring an initial SQL statement, and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format;
performing lexical analysis on the target SQL statement to generate an abstract syntax tree;
traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement;
performing recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determining the corresponding relationship between the output fields and the physical table fields;
and generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
2. The method for extracting the bloody border of the field-level data according to claim 1, wherein the obtaining of the initial SQL statement and the preprocessing of the initial SQL statement to obtain the target SQL statement in the preset format comprises:
deleting useless data in the initial SQL statement based on a preset first regular expression, wherein the useless data comprises comments, blank spaces, line feed characters and end characters;
deleting the paging operation limit code in the initial SQL statement;
removing the life cycle in the initial SQL statement based on a preset second regular expression;
and based on a preset strategy mode, carrying out syntax processing of a corresponding type on the initial SQL statement to obtain a target SQL statement in a preset format.
3. The method of claim 1, wherein the lexical parsing of the target SQL statement to generate an abstract syntax tree comprises:
acquiring a preset key field set, and screening out words consistent with key fields in the key field set from the target SQL statement to obtain standard key fields;
segmenting the target SQL statement by using the standard key field to obtain a plurality of SQL subfields;
converting a plurality of the SQL subfields into an abstract syntax tree.
4. The method according to claim 3, wherein the segmenting the target SQL statement by using the standard key field to obtain a plurality of SQL sub-fields comprises:
taking the standard key field as a segmentation node, and performing left-side and right-side segmentation on the target SQL statement based on the segmentation node to obtain a plurality of SQL subfields;
and performing secondary segmentation on the plurality of segmentation sentences based on the standard key fields as positioning points to obtain a plurality of SQL sub-fields.
5. The method of claim 1, wherein traversing the nodes of the plurality of levels of the abstract syntax tree based on the parsing rules and parsing strategies of the abstract syntax tree to obtain each output field of the target SQL statement comprises:
traversing nodes of a plurality of levels of the abstract syntax tree based on a parsing rule and a parsing strategy of the abstract syntax tree, and acquiring field data involved from the nodes of the plurality of levels of the abstract syntax tree;
determining an input node, an output node and an intermediate conversion node according to the out-degree and the in-degree;
and traversing all the output nodes, judging whether the target SQL statement of the output node contains a field value, if so, taking the field in the target SQL statement as an output field, and if not, acquiring all the fields of the table in table output as the output field.
6. The method for extracting data vein from field level according to claim 1, wherein the generating of the field vein information between the output field and the physical table field according to the corresponding relationship between the output field and the physical table field comprises:
determining a blood relationship analyzer corresponding to the output field according to the type of a preset basic Node in the abstract data tree;
based on the blood relationship analyzer, field query is carried out on the basic Node to obtain a corresponding query field list;
and traversing the query field list and the physical table field, and searching in a recursive tracing mode to obtain the field blood relationship information between the output field and the physical table field.
7. The method for extracting data consanguinity at field level according to any one of claims 1-6, wherein after the generating of the field consanguinity information between the output field and the physical table field according to the correspondence between the output field and the physical table field, further comprises:
and inputting the field blood relationship information into the target SQL statement for data reduction to obtain target physical field information corresponding to the output field.
8. A field-level data vein extraction apparatus, comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring an initial SQL statement and preprocessing the initial SQL statement to obtain a target SQL statement in a preset format;
the analysis module is used for carrying out lexical analysis on the target SQL statement to generate an abstract syntax tree;
the traversal module is used for traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rule and the parsing strategy of the abstract syntax tree to obtain each output field in the target SQL statement;
a determining module, configured to perform recursive backtracking on the abstract syntax tree to obtain physical table fields corresponding to the output fields, and determine a correspondence between the output fields and the physical table fields;
and the generation module is used for generating field blood relationship information between the output field and the physical table field according to the corresponding relation between the output field and the physical table field.
9. A field level data vein extraction device, characterized in that the field level data vein extraction device comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the field level data vein extraction apparatus to perform the steps of the field level data vein extraction method of any of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the field-level data consanguinity extraction method according to any one of claims 1 to 7.
CN202210512520.5A 2022-05-12 2022-05-12 Method, device and equipment for extracting field-level data blood margin and storage medium Pending CN114817298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210512520.5A CN114817298A (en) 2022-05-12 2022-05-12 Method, device and equipment for extracting field-level data blood margin and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210512520.5A CN114817298A (en) 2022-05-12 2022-05-12 Method, device and equipment for extracting field-level data blood margin and storage medium

Publications (1)

Publication Number Publication Date
CN114817298A true CN114817298A (en) 2022-07-29

Family

ID=82514249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210512520.5A Pending CN114817298A (en) 2022-05-12 2022-05-12 Method, device and equipment for extracting field-level data blood margin and storage medium

Country Status (1)

Country Link
CN (1) CN114817298A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115421701A (en) * 2022-11-07 2022-12-02 北京滴普科技有限公司 Method and system for generating cypher statement based on model
CN115544065A (en) * 2022-11-28 2022-12-30 北京数语科技有限公司 Data blood relationship discovery method, system, equipment and storage medium
CN116010438A (en) * 2022-12-22 2023-04-25 北京柏睿数据技术股份有限公司 Method and system for calculating database operation delay
CN116595038A (en) * 2023-07-17 2023-08-15 恒丰银行股份有限公司 Data blood edge tracing method, device, equipment and medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115421701A (en) * 2022-11-07 2022-12-02 北京滴普科技有限公司 Method and system for generating cypher statement based on model
CN115544065A (en) * 2022-11-28 2022-12-30 北京数语科技有限公司 Data blood relationship discovery method, system, equipment and storage medium
CN115544065B (en) * 2022-11-28 2023-02-28 北京数语科技有限公司 Data blood relationship discovery method, system, equipment and storage medium
CN116010438A (en) * 2022-12-22 2023-04-25 北京柏睿数据技术股份有限公司 Method and system for calculating database operation delay
CN116010438B (en) * 2022-12-22 2023-11-28 北京柏睿数据技术股份有限公司 Method and system for calculating database operation delay
CN116595038A (en) * 2023-07-17 2023-08-15 恒丰银行股份有限公司 Data blood edge tracing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN114817298A (en) Method, device and equipment for extracting field-level data blood margin and storage medium
CN107203468B (en) AST-based software version evolution comparative analysis method
Marcus et al. Identification of high-level concept clones in source code
US20160275180A1 (en) System and method for storing and searching data extracted from text documents
US20180260474A1 (en) Methods for extracting and assessing information from literature documents
US12019981B2 (en) Method and system for converting literature into a directed graph
US20080059146A1 (en) Translation apparatus, translation method and translation program
US7779049B1 (en) Source level optimization of regular expressions
CN112579155B (en) Code similarity detection method and device and storage medium
AU4937099A (en) A search system and method for retrieval of data, and the use thereof in a search engine
US20170286103A1 (en) Identifying and correlating semantic bias for code evaluation
CN115576984A (en) Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
CN110969517B (en) Bidding life cycle association method, system, storage medium and computer equipment
CN110909016A (en) Database-based repeated association detection method, device, equipment and storage medium
Warren et al. Multi-column substring matching for database schema translation
CN113032371A (en) Database grammar analysis method and device and computer equipment
JP7103763B2 (en) Information processing system and information processing method
EA037156B1 (en) Method for template match searching in a text
Chunyong et al. Log parser with one-to-one markup
JP2007025834A (en) Method and system for supporting input of image diagnostic reading report
US20220004708A1 (en) Methods and apparatus to improve disambiguation and interpretation in automated text analysis using structured language space and transducers applied on automatons
Beth A comparison of similarity techniques for detecting source code plagiarism
Min et al. A Longest Matching Resource Mapping Algorithm with State Compression Dynamic Programming Optimization.
Grandi ProbQL: A Probabilistic Query Language for Information Extraction from PDF Reports and Natural Language Written Texts
CN117331963B (en) Data access processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination