CN117290383A

CN117290383A - Query processing method and device based on multi-table connection

Info

Publication number: CN117290383A
Application number: CN202311340649.3A
Authority: CN
Inventors: 欧伟杰; 邹良港
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-12-26

Abstract

The application provides a query processing method and a query processing device based on multi-table connection, wherein the method relates to a user side, an optimizing side, an executing side and a database, and comprises the following steps: when the optimizing terminal receives a query request sent by the user terminal, the optimizing terminal invokes corresponding metadata to the database according to the query request; the optimizing end generates a first access constraint according to the data table set and a preset cross-table semantic; the optimizing terminal generates a query plan according to the first access constraint and a preset query rewrite type, wherein the query rewrite type comprises projection matching rewrite and Join rewrite; the optimizing end sends the query plan to the executing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end. The dependence capability of the cross-table data is improved, and the reading and writing expenditure and the calculated amount of scanning during inquiry are reduced; and the maintenance cost is reduced, and the real-time update of the data is supported.

Description

Query processing method and device based on multi-table connection

Technical Field

The application relates to the field of data query, in particular to a query processing method and device based on multi-table connection.

Background

For query analysis in big data scenarios, query performance is one of the key indicators. To improve query performance, the industry explores a variety of methods including indexing, materialized views and cube, access constraints based on bounded computing.

The materialized view and the cube are difficult to update synchronously in real time, data needs to be refreshed generally, the refresh time is long when the data quantity is large, and the materialized view is unavailable during the process of refreshing the data. These limitations result in poor flexibility of materialized views and cube, and can only be used for relatively fixed query statements.

In access constraints for bounded computation, it is necessary to find the repetition N of x- > y of a particular table and store the repeated x- > y as { x, y, count }. In practical application, the actual query statement often relates to a plurality of relation tables, and in order to enable the query to use the ACs of each table, all access constraints of the related relation tables Ti have to be considered, and when in optimization, it is impossible to quickly judge which AC in a specific relation table Ti is optimal; and each AC requires IO overhead when scanning.

Disclosure of Invention

In view of the foregoing, the present application has been developed to provide a method and apparatus for processing queries based on multi-table connections that overcome or at least partially solve the foregoing, and include:

The method is used for inquiring data under a data table set scene with a plurality of data tables and correlation among the plurality of data tables, and relates to a user side, an optimizing side, an executing side and a database, wherein the user side is used for generating an inquiry request according to user requirements and sending the inquiry request to the optimizing side; the method comprises the following steps:

when the optimizing terminal receives a query request sent by the user terminal, the optimizing terminal invokes corresponding metadata to the database according to the query request, wherein the metadata comprises a data table set and an access constraint set;

the optimizing end generates a first access constraint according to the data table set and a preset cross-table semantic, wherein the cross-table semantic comprises an external key, a same-semantic field and a logic containing relation;

the optimizing terminal generates a query plan according to the first access constraint and a preset query rewrite type, wherein the query rewrite type comprises projection matching rewrite and Join rewrite;

the optimizing end sends the query plan to the executing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end.

Further, the generating a first access constraint according to the data table set and a preset cross-table semantic, where the cross-table semantic includes a foreign key, a homosemantic field, and a logical inclusion relationship, includes:

when the cross-table meaning is the external key, the optimizing end acquires a first target data table from the data table set;

the optimizing end determines a first target value of a preset target field in the first target data table;

the optimizing end obtains a second target value with the same value as the first target value in the data table set according to the first target value;

the optimizing end determines a second target data table in the data table set according to the second target value;

the optimizing end generates the first access constraint according to the preset target field, the first target data table, the first target value, the second target data table and the second target value.

When the cross-table semantics are the same-semantic fields, the optimizing end obtains at least two data tables in the data table set, wherein the two data tables are a third target data table and a fourth target data table;

the optimizing end obtains a third target field from the third target data table and obtains a fourth target field from the fourth target data table;

when the third target field and the fourth target field have the same semantics, the optimizing end generates the first access constraint according to the third target data table, the fourth target data table, the third target field and the fourth target field.

when the cross-table semantics are the logic inclusion relationship, the optimizing end obtains at least two data tables in the data table set, wherein the two data tables are a fifth target data table and a sixth target data table;

the optimizing end obtains a fifth target field from the fifth target data table and obtains a sixth target field from the sixth target data table;

When the semantic meaning of the fifth target field includes the semantic meaning of the sixth target field or the semantic meaning of the sixth target field includes the semantic meaning of the fifth target field, the optimizing terminal generates the first access constraint according to the fifth target data table, the sixth target data table, the fifth target field and the sixth target field.

Further, generating a query plan according to the first access constraint and a query rewrite type, wherein the query rewrite type comprises the steps of projection matching rewrite and Join rewrite, and the method comprises the following steps:

when the query rewrite type is the projection match rewrite, the optimizing end determines whether a projection expression in an analysis query statement is obtained through expression operation in the first access constraint, if so, the optimizing end generates the query plan according to the first access constraint; or alternatively, the first and second heat exchangers may be,

when the query rewrite type is the Join rewrite, the optimizing end determines whether data of left and right nodes of a Join operator exist in the first access constraint, and if yes, the query plan is generated according to the first access constraint.

The embodiment of the application also discloses a query processing method based on multi-table connection, which is used for querying data in a data table set scene with a plurality of data tables and a plurality of relations among the data tables, and relates to a user side, an optimizing side, an executing side and a database, and the method comprises the following steps:

The execution end receives the query plan sent by the optimization end, and the optimization end is used for sending the query plan to the execution end;

the execution end determines target access constraint according to the query plan;

the execution end determines target data in the data table set according to the target access constraint;

the execution end generates a query result according to the target data;

the execution end sends the query result to the user end, and the user end is used for receiving the query result fed back by the execution end.

the user side responds to the requirement of a user to generate a query request, and sends the query request to the optimizing side, wherein the optimizing side is used for receiving the query request sent by the user side;

the user side receives the query result sent by the execution side and returns the query result to the corresponding target user, and the execution side is used for feeding back the query result to the user side.

The embodiment of the application also discloses a query processing device based on multi-table connection, the device is used for querying data in a data table set scene with a plurality of data tables and a plurality of relations among the data tables, the device relates to a user side, an optimizing side, an executing side and a database, and the device comprises:

the first calling module is used for calling corresponding metadata from the database according to the query request by the optimizing terminal when the optimizing terminal receives the query request sent by the user terminal, wherein the metadata comprise a data table set and an access constraint set;

the first generation module is used for generating a first access constraint according to the data table set and a preset cross-table semantic by the optimization terminal, wherein the cross-table semantic comprises an external key, a homosemantic field and a logic inclusion relation;

the second generation module is used for generating a query plan according to the first access constraint and a preset query rewrite type by the optimization terminal, wherein the query rewrite type comprises projection matching rewrite and Join rewrite;

the first sending module is used for sending the query plan to the execution end by the optimization end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end.

the second sending module is used for receiving the query plan sent by the optimizing end by the executing end, and the optimizing end is used for sending the query plan to the executing end;

the first determining module is used for determining target access constraint according to the query plan by the executing end;

the second determining module is used for determining target data in the data table set according to the target access constraint by the executing end;

the third generation module is used for generating a query result according to the target data by the execution end;

the third sending module is used for sending the query result to the user end by the execution end, and the user end is used for receiving the query result fed back by the execution end.

The fourth sending module is used for responding to the requirement of a user by the user side to generate a query request and sending the query request to the optimizing side, and the optimizing side is used for receiving the query request sent by the user side;

the return module is used for receiving the query result sent by the execution end by the user end and returning the query result to the corresponding target user, and the execution end is used for feeding back the query result to the user end.

The application has the following advantages:

in the embodiment of the present application, compared with the "optimization time, in the prior art, it is impossible to quickly determine which AC in the specific relationship table Ti is optimal; in addition, each AC needs IO overhead during scanning, the application provides a solution of a query processing method and a query processing device based on multi-table connection, which specifically comprises the following steps: when the optimizing terminal receives a query request sent by the user terminal, the optimizing terminal invokes corresponding metadata to the database according to the query request, wherein the metadata comprises a data table set and an access constraint set; the optimizing end generates a first access constraint according to the data table set and a preset cross-table semantic, wherein the cross-table semantic comprises an external key, a same-semantic field and a logic containing relation; the optimizing terminal generates a query plan according to the first access constraint and a preset query rewrite type, wherein the query rewrite type comprises projection matching rewrite and Join rewrite; the optimizing end sends the query plan to the executing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end. Generating a first access constraint through the data table set and a preset cross-table semantic, wherein the cross-table semantic comprises an external key, a homosemantic field and a logic inclusion relation; the optimizing terminal generates a query plan according to the first access constraint and the query rewrite type, and can not quickly judge which AC in a specific relation table Ti is optimal when optimizing; and each AC needs IO overhead during scanning; the method has the advantages that the capability of increasing the dependence of the data of the cross-table for SAC is achieved, and IO overhead and Join calculated amount of scanning multi-table AC during inquiry are reduced; based on data dependence, SAC maintenance cost is reduced, and the effect of data real-time update is supported.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flowchart illustrating a method for processing a query based on multi-table join according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of a query processing method based on multi-table join according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for processing a query based on multi-table join according to an embodiment of the present application;

FIG. 4 is a block diagram of a query processing device based on multi-table connection according to an embodiment of the present application;

FIG. 5 is a block diagram of a query processing device based on multi-table connection according to an embodiment of the present application;

FIG. 6 is a block diagram of a query processing device based on multi-table connection according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a computer device according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating an exemplary method for processing a multi-table connected query according to an embodiment of the present invention;

fig. 9 is a flowchart of another embodiment of a query processing method for multi-table connection according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present application more comprehensible, the present application is described in further detail below with reference to the accompanying drawings and detailed description. It will be apparent that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The inventors found by analyzing the prior art that:

Materialized views in Oracle

The greatest advantage of materialized views is that performance can be improved: the materialized view of Oracle provides powerful functions that can be used to pre-compute and save the results of time-consuming operations such as table join or aggregation, so that these time-consuming operations can be avoided and the results obtained quickly when executing a query. The main constraints are as follows:

1. Materialized views work best in read-only or "fine-read" environments, and are not used in online transaction processing system (OLTP) environments, and may cause materialized view lockrows when fact tables and the like are updated, thereby affecting system concurrency.

2. The materialized view has the phenomenon that the materialized view cannot be refreshed quickly, so that query data are inaccurate.

3. The Rowid materialized view (created materialized view typically has primary keys, rowid, and sub-query views) has only a single primary table and cannot include any of the following:

A. distict or aggregation function;

B. group by, sub-query, connect, and SET operation.

4. Materialized views may increase the demand for disk resources, i.e., permanently allocated hard disk space is needed for materialized views to store data.

5. The principle of operation of materialized views is subject to some possible constraints, such as primary keys, foreign keys, etc.

Pre-calculation of Apache Kylin

The working principle of Apache Kylin is to perform Cube pre-calculation on a data model, and accelerate query by using the calculated result, and the specific working process is as follows:

1) A data model is specified, defining dimensions and metrics.

2) And pre-calculating Cube, calculating all Cube oids and saving the Cube as materialized views.

3) When the query is executed, the Cub oid is read, and the query result is generated by operation.

Because the Kylin query process does not scan the original record, but completes complex operations such as table association, aggregation and the like through pre-calculation, and utilizes the pre-calculation result to execute the query, compared with a non-pre-calculation query technology, the speed is generally one to two orders of magnitude faster, and the advantages are more obvious on an oversized data set. But the pre-computation itself needs to be done in advance and does not support dynamic changes of the data.

Bounded computing

The access cost of the bounded calculation to a single relation table can be greatly reduced through the access constraint, but under multi-table connection, single-table access constraint ACs of different tables are required to be accessed first, and then connection operation is carried out, so that the cost of IO and calculation is higher than that of a specific multi-table access constraint SAC.

In access constraints for bounded computation, it is necessary to find the repetition N of x- > y of a particular table and store the repeated x- > y as { x, y, count }. In practical applications, the actual query statement often involves multiple relational tables, so that in order to enable the query to use AC of each table, all access constraints of the relational tables Ti involved have to be considered, and it cannot be quickly determined which AC in the specific relational table Ti is optimal during optimization. And each AC requires IO overhead when scanning.

It should be noted that, the organization manner of the single-table query access constraint AC is that the repeated data of the designated column is compressed and de-duplicated according to the model definition { x- > y, N }. Where x and y are each a set of expressions, each expression containing one or more fields in a table. Therefore, the AC contains x, y, and the number of repeated records of y value corresponding to each x value (called the repetition degree, denoted as count), that is, AC is a set of { x, y, count } data. Generally, the more fields x, y are involved, the less readily x- > y will be able to achieve a higher degree of repetition, i.e. the value of count will be smaller.

Referring to fig. 1, 8 and 9, a flowchart of steps and an embodiment flowchart of a query processing method based on multi-table connection according to an embodiment of the present application are shown;

a multi-table connection-based query processing method for querying data in a data table set scenario having a plurality of data tables and associations between a plurality of the data tables, the method involving a user side, an optimizing side, an executing side, and a database, the method comprising:

s110, when the optimizing terminal receives a query request sent by the user terminal, the optimizing terminal invokes corresponding metadata to the database according to the query request, wherein the metadata comprises a data table set and an access constraint set;

S120, the optimizing end generates a first access constraint according to the data table set and a preset cross-table semantic, wherein the cross-table semantic comprises an external key, a same semantic field and a logic containing relation;

s130, the optimizing end generates a query plan according to the first access constraint and a preset query rewrite type, wherein the query rewrite type comprises projection matching rewrite and Join rewrite;

s140, the optimizing end sends the query plan to the executing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end.

Next, a query processing method based on multi-table connection in the present exemplary embodiment will be further described.

In an embodiment of the invention, when creating a SAC, the complete definition of the SAC needs to be given by the user, including which relation tables the SAC involves, the specific definition of the x and y fields, and the connection conditions that are met. The database filters invalid data according to the specified connection conditions for the data of the original table according to the definition of SAC, and generates { T1.X, T2.Y, count } data, wherein x and y belong to different relation tables.

In one embodiment of the invention, when looking up data in the table, then the optimizer determines if the query can be rewritten as access to the SAC during the query optimization phase.

In one embodiment of the present invention, in FIG. 8:

SAC: storing the values of X and the corresponding Y fields on a plurality of associated data tables according to access constraints created by the access modes;

metadata: the data table and the description information of the SAC are stored.

An optimizer: the query plan is rewritten to take advantage of the SAC's ability to find matching SACs for a particular Join operation, as shown in the following figure.

An actuator: and executing the query according to the query plan generated by the optimizer, and generating a query result.

Querying: and generating a query plan according to the query by the optimizer, delivering the query plan to the executor for execution, and finally returning the query result to the user.

In one embodiment of the present invention, in FIG. 9:

table face: a fact table;

tableim 1: dimension table 1;

tableim 2: dimension table 2;

SAC: extracting access constraint of multi-table association inquiry according to association relation between the fact table and the two dimension tables;

analytical Query: when the analysis query passes through the optimizer and judges that the access constraint SAC cannot be used, the data of each table needs to be queried, and then the associated calculation is performed. When the access constraint SAC can be used, the result can be obtained by inquiring the data of the SAC, so that IO and calculation cost are saved.

In an embodiment of the present invention, the generating the first access constraint according to the data table set and a preset cross-table semantic includes the steps of foreign key, homosemantic field and logic inclusion relationship, including:

It should be noted that, there is an association relationship of foreign keys between two fields of different tables, that is, the value of a certain field of a certain table must exist in the value of a field corresponding to another table.

It should be noted that two fields of different tables represent the same semantics. In one particular implementation, such as a relationship of "zone" and "administrative zone".

The semantics of a certain field of a certain table include the semantics of a field corresponding to another table. In one embodiment, such as the relationship "province" and "city".

In the present invention; in another embodiment, the multi-table query access constraint SAC may further include the following attributes to further enhance the acceleration effect:

1. and (3) condition filtration: adding filtering conditions to the SAC, wherein only data meeting the conditions can be inserted into the SAC;

2. aggregation function: an aggregation function for a specified field or expression can be added outside x and y, so that pre-calculation of aggregation operation is realized;

3. expression calculation: the fields in x and y can be not only columns in a table, but also expression calculation of the columns in the table, so that the pre-calculation of the expression calculation is realized. Expressions may also appear in the filtering conditions.

In an embodiment of the present invention, the generating a query plan according to the first access constraint and a query rewrite type, where the query rewrite type includes steps of projection matching rewrite and Join rewrite, includes:

It should be noted that, by analyzing a query statement to determine expressions corresponding to the projection matching rewrite and the Join rewrite, an overlapping degree and/or a logical relationship with a target expression defined by an access constraint in the access constraint set are determined according to the expressions, and then the overlapping degree and/or the logical relationship can be calculated and rewritten in the query statement as a direct query to a specific column in the access constraint by using the expression of the access constraint.

It should be noted that the purpose of projection matching rewrite is to ensure that all projection expressions required for a query can be provided by the data in SAC. Therefore, the optimizer calculates the upward projection expression required by the bottom scan based on the logic computation tree generated by the query, and performs matching operation with the expression in the SAC definition by using the calculated projection, so that all projection expressions can be obtained by computing the expression in the SAC definition.

Note that Join rewrite identifies whether or not data of left and right nodes of a Join operator contains a count column from SAC, and if there are both left and right nodes, adds the product of the left and right node count columns to a Join output column.

After SAC contains cross-table mapping, calculation in query can be advanced to the time of generating SAC data, so that the calculation amount in the process of executing query is reduced; contributing to increasing complexity and reducing the data size of SAC. The query performance is accelerated.

Referring to fig. 2, the embodiment of the present invention further discloses a step flowchart of a query processing method based on multi-table connection according to an embodiment of the present application;

S210, the executing end receives the query plan sent by the optimizing end, and the optimizing end is used for sending the query plan to the executing end;

s220, the execution end determines target access constraint according to the query plan;

s230, the execution end determines target data in the data table set according to the target access constraint;

s240, the execution end generates a query result according to the target data;

s250, the executing end sends the query result to the user end, and the user end is used for receiving the query result fed back by the executing end.

Referring to fig. 3, the embodiment of the present invention further discloses a step flowchart of a query processing method based on multi-table connection according to an embodiment of the present application;

s310, the user side responds to the requirement of a user to generate a query request, and the query request is sent to the optimizing side, wherein the optimizing side is used for receiving the query request sent by the user side;

S320, the user terminal receives the query result sent by the execution terminal and returns the query result to the corresponding target user, and the execution terminal is used for feeding back the query result to the user terminal.

Example 1

on_time table structure:

field name	Type(s)	Description of the invention
			FlightNo	varchar(15)	Flight
OriginAirportID	int	Departure airport
			DestAirportID	int	Arriving at airport
Date	date	Date of day
			DelayTime	decimal(15,2)	Time of late

The on_time table records the late information of the flight. The on_time table has more than 1 hundred million records and the fields FlightNo, destAirportID, delayTime have a large number of repeated values.

air table structure:

the air table records airport information. The air table has 1000 records.

There are the following queries:

select flight no, avg (DelayTime) from on_time, airport where DestAirportID = AirportID and Location in (all domestic airports) group by FlightNo;

assuming 1000 flights, there are 100 tens of thousands of records satisfying the desteplarportid=airportid condition, each desteplarportid corresponds to a maximum of 1000 flight nos, i.e., the repetition of desteplarportid- > flight nos is 1000. And there is also an association across table fields, desteAirportID- > AirportID- > Location, corresponding to one AirportID, address Location is also unique.

After the on_time table and the air table are subjected to full-table filtering and reconnection, the query statement is executed in a grouping aggregation mode, firstly, 1 hundred million records and 1000 records are scanned and connected, a large intermediate result set is generated, 100 ten thousand records filtered out are subjected to grouping aggregation, 1000 groups are available, and each group needs to be accumulated for 1000 times, namely 100 ten thousand times in total.

In a manner that uses a single table to query access constraints ACs, an AC needs to be created for the on_time table: { DestAirportID, flightNo, sum (DelayTime) as SumDelayTime, count }. Creating an AC for the air table: { AirportID, location, count }, the previous assumption of data volume, it was known that the AC of the on_time table had a maximum of 100 ten thousand records. And then connecting with the record of the air list to obtain the addresses of all the air lists, and then filtering, wherein the aggregation operation directly takes the value from SumDelayTime, and the calculated amount is reduced compared with the original list, but the intermediate result set is still larger after connection.

Defining access constraints according to a multi-table query access constraint SAC mode: based on the on_time table and the air table, X, Y dataset { DestAirportID, airportID, location, flight no, sum (DelayTime) as SumDelayTime, count }, which satisfies the conditions "destAirportID=AirportID and Location in (all domestic airports)", can accurately locate SAC records satisfying the conditions, avoiding scanning and linking of different tables.

The technical scheme has the advantages that:

1. the data dependence capacity of the cross-table is increased for SAC, and IO overhead and Join calculation amount of scanning multi-table AC during inquiry are reduced;

2. based on data dependence, SAC maintenance cost is reduced, and data real-time update is supported.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Referring to fig. 4, a block diagram of a query processing device based on multi-table connection according to an embodiment of the present application is shown;

a query processing device based on multi-table connection, wherein the device is used for querying data in a data table set scene with a plurality of data tables and correlation among the plurality of data tables, the device relates to a user side, an optimizing side, an executing side and a database, and the user side is used for generating a query request according to user requirements and sending the query request to the optimizing side; the device comprises:

a first invoking module 410, configured to, when the optimizing terminal receives a query request sent by the user terminal, invoke corresponding metadata to the database according to the query request, where the metadata includes a data table set and an access constraint set;

A first generating module 420, configured to generate, by the optimizing end, a first access constraint according to the data table set and a preset cross-table semantic, where the cross-table semantic includes an external key, a co-semantic field, and a logical inclusion relationship;

a second generating module 430, configured to generate a query plan according to the first access constraint and a preset query rewrite type by using the optimizing terminal, where the query rewrite type includes projection matching rewrite and Join rewrite;

a first sending module 440, configured to send the query plan to the executing end by the optimizing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end.

In an embodiment of the present invention, the first generating module 420 includes:

the first acquisition sub-module is used for acquiring a first target data table from the data table set by the optimizing terminal when the cross-table meaning is the foreign key;

the first determining submodule is used for determining a first target value of a preset target field in the first target data table by the optimizing end;

the second obtaining submodule is used for obtaining a second target value with the same value as the first target value in the data table set according to the first target value by the optimizing end;

The second determining submodule is used for determining a second target data table in the data table set according to the second target value by the optimizing end;

the first generation sub-module is used for generating the first access constraint according to the preset target field, the first target data table, the first target value, the second target data table and the second target value by the optimization terminal.

the third obtaining sub-module is used for obtaining at least two data tables in the data table set by the optimizing terminal when the cross-table semantics are the same-semantic field, wherein the two data tables are a third target data table and a fourth target data table;

the fourth obtaining submodule is used for obtaining a third target field in the third target data table and a fourth target field in the fourth target data table by the optimizing end respectively;

and the second generation sub-module is used for generating the first access constraint according to the third target data table, the fourth target data table, the third target field and the fourth target field by the optimizing end when the third target field and the fourth target field have the same semantics.

a fifth obtaining sub-module, configured to obtain, when the cross-table semantic is the logical inclusion relationship, at least two data tables from the data table set by the optimizing end, where the two data tables are a fifth target data table and a sixth target data table;

a sixth obtaining sub-module, configured to obtain a fifth target field from the fifth target data table and obtain a sixth target field from the sixth target data table by using the optimizing end;

and the third generation sub-module is used for generating the first access constraint according to the fifth target data table, the sixth target data table, the fifth target field and the sixth target field when the semantic of the fifth target field contains the semantic of the sixth target field or the semantic of the sixth target field contains the semantic of the fifth target field.

Referring to fig. 5, a block diagram of a query processing device based on multi-table connection according to an embodiment of the present application is shown;

a multi-table connection based query processing apparatus for querying data in a data table set scenario having a plurality of data tables and associations between a plurality of the data tables, the apparatus involving a user side, an optimizing side, an executing side, and a database, the apparatus comprising:

A second sending module 510, configured to receive, by the executing end, the query plan sent by the optimizing end, where the optimizing end is configured to send the query plan to the executing end;

a first determining module 520, configured to determine a target access constraint according to the query plan by the executing end;

a second determining module 530, configured to determine target data in the data table set according to the target access constraint by the executing end;

a third generating module 540, configured to generate a query result according to the target data by the executing end;

and a third sending module 550, configured to send the query result to the client by the executing end, where the client is configured to receive the query result fed back by the executing end.

Referring to fig. 6, a block diagram of a query processing device based on multi-table connection according to an embodiment of the present application is shown;

a fourth sending module 610, configured to generate a query request by the user side in response to a requirement of a user, and send the query request to the optimizing side, where the optimizing side is configured to receive the query request sent by the user side;

And a return module 620, configured to receive the query result sent by the executing end and return the query result to the corresponding target user, where the executing end is configured to feed back the query result to the user end.

Referring to fig. 7, a computer device of a query processing method based on multi-table connection according to the present invention may specifically include the following:

the computer device 12 described above is embodied in the form of a general purpose computing device, and the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, a processor, or a local bus 18 using any of a variety of bus 18 architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus 18, micro channel architecture (MAC) bus 18, enhanced ISA bus 18, video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk such as a CD-ROM, DVD-ROM, or other optical media may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, the program modules 42 being configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, a memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 7, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, data backup storage systems 34, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the multi-table connection-based query processing method provided by the embodiment of the present invention.

That is, the processing unit 16 realizes when executing the program: when the optimizing terminal receives a query request sent by the user terminal, the optimizing terminal invokes corresponding metadata to the database according to the query request, wherein the metadata comprises a data table set and an access constraint set; the optimizing end generates a first access constraint according to the data table set and a preset cross-table semantic, wherein the cross-table semantic comprises an external key, a same-semantic field and a logic containing relation; the optimizing terminal generates a query plan according to the first access constraint and a preset query rewrite type, wherein the query rewrite type comprises projection matching rewrite and Join rewrite; the optimizing end sends the query plan to the executing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end.

In an embodiment of the present invention, the present invention further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a multi-table connection-based query processing method as provided in all embodiments of the present application:

That is, the program is implemented when executed by a processor: when the optimizing terminal receives a query request sent by the user terminal, the optimizing terminal invokes corresponding metadata to the database according to the query request, wherein the metadata comprises a data table set and an access constraint set; the optimizing end generates a first access constraint according to the data table set and a preset cross-table semantic, wherein the cross-table semantic comprises an external key, a same-semantic field and a logic containing relation; the optimizing terminal generates a query plan according to the first access constraint and a preset query rewrite type, wherein the query rewrite type comprises projection matching rewrite and Join rewrite; the optimizing end sends the query plan to the executing end; the execution end is used for feeding back the query result generated after the execution of the query plan to the user end.

Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has described in detail a multi-table connection-based query processing method and apparatus thereof, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, the above examples being for the purpose of helping to understand the method and core ideas of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The query processing method based on multi-table connection is characterized by comprising the steps of querying data in a data table set scene with a plurality of data tables and correlation among the plurality of data tables, and the method relates to a user side, an optimizing side, an executing side and a database, wherein the user side is used for generating a query request according to user requirements and sending the query request to the optimizing side;

the method comprises the following steps:

2. The method of claim 1, wherein the generating a first access constraint from the set of data tables and a preset cross-table semantic, wherein the cross-table semantic includes a foreign key, a homosemantic field, and a logical inclusion relationship, comprises:

3. The method of claim 1, wherein the generating a first access constraint from the set of data tables and a preset cross-table semantic, wherein the cross-table semantic includes a foreign key, a homosemantic field, and a logical inclusion relationship, comprises:

4. The method of claim 1, wherein the generating a first access constraint from the set of data tables and a preset cross-table semantic, wherein the cross-table semantic includes a foreign key, a homosemantic field, and a logical inclusion relationship, comprises:

5. The method of claim 1, wherein the generating a query plan in accordance with the first access constraint and a query rewrite type, wherein the query rewrite type includes a projection match rewrite and a Join rewrite, comprises:

6. A method for querying data in a data table set scenario having a plurality of data tables and a plurality of associations between the data tables, the method involving a user side, an optimizing side, an executing side, and a database, the method comprising:

The execution end generates a query result according to the target data;

7. A method for querying data in a data table set scenario having a plurality of data tables and a plurality of associations between the data tables, the method involving a user side, an optimizing side, an executing side, and a database, the method comprising:

8. The query processing device based on multi-table connection is characterized in that the device is used for querying data in a data table set scene with a plurality of data tables and a plurality of data tables are associated, the device relates to a user side, an optimizing side, an executing side and a database, and the user side is used for generating a query request according to user requirements and sending the query request to the optimizing side; the device comprises:

9. A multi-table connection-based query processing apparatus for querying data in a data table set scenario having a plurality of data tables and associations between a plurality of said data tables, said apparatus involving a user side, an optimizing side, an executing side, and a database, said apparatus comprising:

10. A multi-table connection-based query processing apparatus for querying data in a data table set scenario having a plurality of data tables and associations between a plurality of said data tables, said apparatus involving a user side, an optimizing side, an executing side, and a database, said apparatus comprising: