CN111949686A

CN111949686A - Data processing method, device and equipment

Info

Publication number: CN111949686A
Application number: CN201910400008.XA
Authority: CN
Inventors: 李韬
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2020-11-17
Anticipated expiration: 2039-05-14

Abstract

The application provides a data processing method, a device and equipment, wherein the method comprises the following steps: obtaining an original execution plan, wherein the original execution plan comprises a plurality of nodes; selecting a target node which cannot be processed by a data source from the nodes of the original execution plan; performing equivalence transformation on the target node in the original execution plan to obtain an equivalent execution plan; determining a target execution plan according to the original execution plan and the equivalent execution plan; and sending the target execution plan to a data source so that the data source executes the target execution plan. By the technical scheme, the computing power of the data source can be reasonably utilized, the data transmission quantity is reduced, and higher query performance is obtained.

Description

Data processing method, device and equipment

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method, apparatus, and device.

Background

Currently, data is typically stored in multiple data sources (i.e., databases), for example, a portion of the data of an enterprise is stored in data source1 and another portion of the data is stored in data source 2. Since data is stored in different data sources in a scattered manner, the data needs to be read from each data source by connecting each data source through the query system so as to support data processing across the data sources. For example, the interrogation system may read data from data source1, read data from data source 2, and process using the read data.

However, if the query system reads all the data in the data source1 and reads all the data in the data source 2, the data reading cost is high, and the overall processing efficiency of the query system is affected. Based on this, the query system usually pushes part of the processing to the data source for execution, so that on one hand, the computing power of the data source itself can be utilized, and on the other hand, the data amount returned by the data source to the query system can be reduced.

However, for the processing request of the user, which processes should be pushed to the data source for execution, there is no effective determination method at present, the computing power of the data source cannot be reasonably utilized, and the user experience is poor.

Disclosure of Invention

The application provides a data processing method, which comprises the following steps:

obtaining an original execution plan, wherein the original execution plan comprises a plurality of nodes;

selecting a target node which cannot be processed by a data source from the nodes of the original execution plan;

performing equivalence transformation on the target node in the original execution plan to obtain an equivalent execution plan;

determining a target execution plan according to the original execution plan and the equivalent execution plan;

and sending the target execution plan to a data source so that the data source executes the target execution plan.

performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which cannot be processed by the data source;

replacing a target node in an original execution plan by using the first child node and the second child node to obtain an equivalent execution plan, wherein the execution result of the equivalent execution plan is the same as that of the original execution plan;

determining a parent node corresponding to the target node from the nodes of the original execution plan;

pulling up the target node as an upper node of the father node to obtain an equivalent execution plan; the execution result of the equivalent execution plan is the same as the execution result of the original execution plan;

and processing data according to the equivalent execution plan.

The present application provides a data processing apparatus, the apparatus comprising: an obtaining module, configured to obtain an original execution plan, where the original execution plan includes a plurality of nodes; a selection module, configured to select a target node that cannot be processed by a data source from the nodes of the original execution plan; the processing module is used for carrying out equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; the determining module is used for determining a target execution plan according to the original execution plan and the equivalent execution plan; and the sending module is used for sending the target execution plan to the data source so as to enable the data source to execute the target execution plan.

The present application provides a data processing apparatus comprising:

a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:

Based on the technical scheme, in the embodiment of the application, the target nodes in the original execution plan can be subjected to equivalent transformation to obtain the equivalent execution plan, so that the number of the execution plans is increased, more execution plans are generated, the solution space of the execution plans is effectively expanded, the optimal execution plan is selected from the execution plans, and the optimal execution plan is used as the target execution plan. Obviously, because the selectable execution plans are more, the finally selected target execution plan is more optimal, so that the effect of the target execution plan is improved, the processing executed by the data source is more reasonable, the computing power of the data source is reasonably utilized, the data transmission quantity is reduced, higher query performance is obtained, the overall query processing efficiency is higher, and the user experience is better.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.

FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario in one embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a data processing method according to another embodiment of the present application;

FIGS. 4A-4F are schematic diagrams of an execution plan in one embodiment of the present application;

FIG. 5 is a block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

The embodiment of the present application provides a data processing method, which may be applied to any device, such as any device of a query system, and as shown in fig. 1, is a flowchart of the method, and the method may include:

step 101, an original execution plan is obtained, wherein the original execution plan comprises a plurality of nodes.

Step 102, selecting a target node which cannot be processed by a data source from the nodes of the original execution plan.

Specifically, a tree structure may be adopted to arrange a plurality of nodes of the original execution plan; on the basis, sequentially traversing each node of the tree structure from the lowest node of the tree structure from bottom to top until the node which cannot be processed by the data source is traversed, and determining the traversed node as a target node.

103, performing equivalent transformation on a target node in the original execution plan to obtain an equivalent execution plan; wherein the execution result of the equivalent execution plan is the same as the execution result of the original execution plan.

Specifically, the target node is equivalently transformed to obtain an equivalent execution plan, which includes but is not limited to:

the method comprises the steps that firstly, a target node in an original execution plan is subjected to segmentation processing, and a first child node capable of being processed by a data source and a second child node incapable of being processed by the data source are obtained; and then, replacing the target node in the original execution plan by using the first child node and the second child node to obtain an equivalent execution plan.

The replacing the target node in the original execution plan with the first child node and the second child node to obtain the equivalent execution plan may include, but is not limited to: and arranging a first child node and a second child node of the equivalent execution plan by adopting a tree structure, wherein the first child node is positioned at the lower layer of the second child node.

Determining a parent node corresponding to the target node from the nodes of the original execution plan; and then, the target node is pulled up to be an upper node of the father node, and an equivalent execution plan is obtained.

The determining of the parent node corresponding to the target node from the nodes of the original execution plan may include, but is not limited to: if the plurality of nodes of the original execution plan are arranged in the tree structure, the upper node connected to the target node may be determined as a parent node corresponding to the target node.

And 104, determining a target execution plan according to the original execution plan and the equivalent execution plan.

Specifically, a cost value corresponding to the original execution plan may be determined, and a cost value corresponding to the equivalent execution plan may be determined; then, the target execution plan is determined according to the execution plan with the minimum cost value.

Step 105, sending the target execution plan to a data source so that the data source executes the target execution plan. After the data source executes the target execution plan, the data source may return the execution results to the query system instead of returning all of the original data to the query system, thereby reducing the amount of data transmission.

In the above embodiment, obtaining the original execution plan may include, but is not limited to: acquiring a data processing request, and acquiring an original execution plan according to the data processing request; alternatively, after the equivalent execution plan is obtained (i.e., step 103), the equivalent execution plan may be determined to be the original execution plan.

In an example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

Based on the same application concept as the method, the embodiment of the present application further provides another data processing method, which may be applied to any device of a query system, and the method may include:

obtaining an original execution plan, which may include a plurality of nodes; selecting a target node which cannot be processed by a data source from the nodes of the original execution plan; performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which cannot be processed by the data source; and replacing the target node in the original execution plan with the first child node and the second child node to obtain an equivalent execution plan, wherein the execution result of the equivalent execution plan is the same as that of the original execution plan. Then, a target execution plan is determined according to the original execution plan and the equivalent execution plan, and the target execution plan is sent to a data source so that the data source executes the target execution plan.

In one example, replacing a target node in an original execution plan with a first child node and a second child node to obtain an equivalent execution plan, includes: the first child node and the second child node of the equivalent execution plan are arranged by adopting a tree structure, wherein the first child node can be positioned at the lower layer of the second child node.

obtaining an original execution plan, which may include a plurality of nodes; selecting a target node which cannot be processed by a data source from the nodes of the original execution plan; determining a father node corresponding to the target node from the nodes of the original execution plan, and pulling up the target node as an upper node of the father node to obtain an equivalent execution plan; wherein, the execution result of the equivalent execution plan is the same as the execution result of the original execution plan. Then, a target execution plan is determined according to the original execution plan and the equivalent execution plan, and the target execution plan is sent to a data source so that the data source executes the target execution plan.

In one example, determining a parent node corresponding to the target node from the nodes of the original execution plan may include, but is not limited to: and if the plurality of nodes of the original execution plan are arranged by adopting the tree structure, determining the upper-layer node connected with the target node as a father node corresponding to the target node.

obtaining an original execution plan, wherein the original execution plan comprises a plurality of nodes; selecting a target node which cannot be processed by a data source from the nodes of the original execution plan; performing equivalence transformation on the target node in the original execution plan to obtain an equivalent execution plan; and processing data according to the equivalent execution plan.

Optionally, in an example, the data processing according to the equivalent execution plan may include, but is not limited to, the following: and determining a target execution plan according to the original execution plan and the equivalent execution plan, and sending the target execution plan to a data source so that the data source executes the target execution plan.

For implementation of the above steps, reference may be made to the above embodiments, which are not described herein again.

The data processing method is further described below with reference to specific application scenarios.

Referring to fig. 2, which is a schematic view of an Application scenario in the embodiment of the present Application, a client may be an APP (Application) included in a terminal device (e.g., a Personal Computer (PC), a notebook Computer, or a mobile terminal), or may be a browser included in the terminal device, which is not limited thereto.

The query system is used for implementing the data processing function in the embodiment of the present application, since the user data is stored in a plurality of data sources (i.e. databases), it is necessary to connect each data source through the query system, and read data from each data source, so as to support data processing across the data sources. When using a query system, a user typically describes a query task in a query language (e.g., SQL) and then submits the query task to the query system for execution. The query system has a query optimization function and can automatically pick out a reasonable execution plan to process the user query.

The data sources may be databases, and in this embodiment of the present application, the data sources may be application scenarios for heterogeneous data sources, that is, the data sources may be data sources of the same type, or data sources of different types, and the data sources may be relational databases or non-relational databases.

Further, for each data source, the type of the data source may also include, but is not limited to: OSS (Object Storage Service), TableStore, HBase (Hadoop Database), HDFS (Hadoop Distributed File System), MySQL (Relational Database), RDS (Relational Database Service), DRDS (Distributed Relational Database Service), RDBMS (Relational Database Management System), Relational Database Management System, sqerver (Relational Database), PostgreSQL (Object Relational Database), MongoDB (Distributed File Storage based Database), etc., which are just a few examples of the types of data sources and do not limit the types of data sources.

The data stored in the data source may be various types of data, and the data type is not limited, such as user data, commodity data, map data, video data, image data, audio data, and the like.

In the application scenario, referring to fig. 3, a flowchart of a data processing method provided in the embodiment of the present application may be applied to any device of an inquiry system, and the method may include:

step 301, a data processing request, such as a data processing request of SQL (Structured Query Language) type, is obtained, and the type of the data processing request is not limited.

Specifically, the client may send a data processing request to the query system, and the query system may receive the data processing request. For example, one example of a data processing request may be: SELECT c2, sum (c3) FROM datasource1.table1WHERE c2>10AND udf (c2) >20GROUP BY c 2.

In the data processing request, the data source1 indicates the name of the data source, the table1 indicates the name of the data table, and c2 and c3 indicate the names of the columns in the data table "table 1". Based on this, the query system may determine, according to the data processing request, data source "datasource 1", data table "table 1" of data source "datasource 1", and need to operate on the data of columns "c 2, c 3" in data table "table 1".

In the above data processing request, c2>10, indicates that data greater than 10 is filtered from all data of column "c 2". UDF (c2) >20, which indicates that data greater than 20 is filtered UDF (c2) based on all data of column "c 2", and UDF (c2) indicates that data of column "c 2" is operated on using UDF (user defined function).

In the data processing request, sum (c3) GROUP BY c2 indicates that the data is grouped according to the column "c 2", and after the data is grouped according to the column "c 2", the summation operation is performed on the column "c 3".

Of course, the data processing request is only an example of the present application, the content of the data processing request does not affect the technical solution of the present application, and in the following embodiments, the data processing request is taken as an example.

Step 302, at least one original execution plan is obtained according to the data processing request.

Specifically, the data processing request is a SQL-type data processing request written by a user, and the data processing request may be converted into a machine-executable execution plan, where the execution plan describes a specific execution step, and the execution plan may be generated by an optimizer of the query system.

For convenience of description, the query system may obtain at least one original execution plan according to the data processing request, and for example, generate an original execution plan, which is subsequently referred to as an original execution plan a.

Wherein the original execution plan a may comprise a plurality of nodes, each node may represent a calculation step. For example, Scan nodes (e.g., Seq Scan, Index Only Scan, bitmap Scan, etc.); connecting nodes (e.g., Join, Nested Loop, Hash Join, Merge Join, etc.); materialized nodes (e.g., Materialize); sort nodes (e.g., Sort); grouping nodes (e.g., Group); aggregation nodes (e.g., Aggregate); filtering nodes (e.g., Filter); projection nodes (e.g., Projection); and adding nodes (such as appendix). Of course, the above are just a few examples of nodes, and other types of nodes may be included, which is not limited in this regard.

In an example, the plurality of nodes of the original execution plan a may be arranged in a tree structure, and the arrangement manner is not limited. After arranging the plurality of nodes of the original execution plan a, the process of executing the original execution plan a is a process of sequentially executing each node from the lowest node (i.e., the bottom node) of the original execution plan a, and the process of executing the original execution plan a is not described again.

For example, for a data processing request "SELECT c2, sum (c3) FROM data source1.table1where c2>10AND udf (c2) >20GROUP BY c 2", a scan node (for performing a scan step), a filter node (for performing a filter step), AND an aggregation node (for performing an aggregation step) may be included in the original execution plan a generated FROM the data processing request. Of course, the original execution plan a may also include other types of nodes related to the data processing request and the specific algorithm, which are not described herein again.

Referring to fig. 4A, a schematic diagram of arranging a plurality of nodes of an original execution plan a in a tree structure is shown. Based on the tree structure, starting from the lowest node (i.e., the bottom node) of the original execution plan a, the scan node, i.e., the scan data in the data table "table 1" of the data source "datasource 1", is executed first. Then, a filter node is executed, that is, "c 2>10AND UDF (c2) > 20" is executed, that is, data larger than 10 is filtered from all data of the column "c 2", data larger than 20 is filtered UDF (c2) based on all data of the column "c 2", AND UDF (c2) indicates that data of the column "c 2" is operated on by UDF.

Then, an aggregation node is performed, that is, "sum (c3) GROUP BY c 2", that is, grouped according to column "c 2", and after grouped according to column "c 2", a summation operation is performed on column "c 3".

Step 303, selecting a target node which cannot be processed by the data source from the nodes of the original execution plan.

Specifically, after arranging a plurality of nodes of the original execution plan by using the tree structure, each node of the tree structure may be sequentially traversed from the lowest node (i.e., the bottom node) of the tree structure from bottom to top (i.e., the tree structure is searched from bottom to top) until a node that cannot be processed by the data source is traversed, and the traversed node (i.e., the node that cannot be processed by the data source) is determined as the target node.

For example, referring to fig. 4A, in order to form the tree structure of the original execution plan a, the first node, i.e., the scan node, is traversed from the lowest node of the tree structure, and the scan node can be executed because the data source has a scan function, so the scan node is a node capable of being processed by the data source.

Then, a second node, that is, a filter node, is traversed, because the filter node needs to adopt a UDF (user defined function), and the data source does not know what the UDF (user defined function) is, and cannot execute the UDF (user defined function), the data source cannot execute the filter node, and therefore, the filter node is a node that cannot be processed by the data source, the filter node is determined as a target node (a first node that is to be traversed and cannot be processed by the data source is determined as a target node), and the traversal process is ended.

Of course, the "user-defined function" is only an example, and other ways may also be used to determine the node that cannot be processed by the data source, which is not limited to this, as long as the node that cannot be processed by the data source is found. For example, if a node needs to use data of the data source 2, the node is a node that cannot be processed by the data source1, that is, the data source1 cannot obtain the data of the data source 2, so that the node cannot be processed. As another example, a node needs to utilize function X, and the data source does not have function X, resulting in an inability to process the node.

Step 304, performing equivalence transformation on a target node in the original execution plan to obtain an equivalent execution plan; wherein, the execution result of the equivalent execution plan and the execution result of the original execution plan may be the same.

Specifically, after the target node in the original execution plan is found, if the target node can be equivalently transformed, the target node in the original execution plan can be equivalently transformed to obtain an equivalent execution plan. If the target node is not capable of performing the equivalence transformation, the equivalence transformation of the target node is not required.

The target node can perform equivalent transformation or the target node cannot perform equivalent transformation refers to: after the target node is transformed, if the execution result of the obtained execution plan is the same as the execution result of the original execution plan, it is described that the target node can be equivalently transformed, and the obtained execution plan is the equivalent execution plan. After the target node is transformed, if the execution result of the obtained execution plan is different from the execution result of the original execution plan, it is described that the target node cannot be equivalently transformed.

In one example, as a typical example, a number of equivalent transformed rules are described in relational algebra, which can be applied to generate an equivalent execution plan.

For example, if the execution result of the execution plan 1 obtained by transforming the target node is the data set a, the execution result of the original execution plan is the data set B, and the data set a and the data set B are completely the same, it is described that the target node can be equivalently transformed, and the execution plan 1 is an equivalent execution plan. If the data set a is different from the data set B, it is described that the target node cannot be equivalently transformed.

In one example, an equivalent transformation policy (or rule) may be configured, and if the target node matches the equivalent transformation policy, it indicates that the target node can perform equivalent transformation, and the target node in the original execution plan may be equivalently transformed, so as to obtain an equivalent execution plan. And if the target node is not matched with the equivalent transformation strategy, indicating that the target node cannot perform equivalent transformation. The content of the equivalence transformation policy may be configured according to experience, and is not limited to this, as long as whether equivalence transformation is performed can be distinguished.

For example, the equivalence transformation strategy may include a segmentation strategy (for performing segmentation operation on the target node) and a pull-up strategy (for performing pull-up operation on the target node), and both the segmentation strategy and the pull-up strategy need to guarantee equivalence transformation, and do not change the semantics and query results of the original query, thereby guaranteeing the generality of the algorithm.

If the target node is matched with the segmentation strategy, the target node can be subjected to equivalent transformation, and the target node in the original execution plan can be subjected to equivalent transformation, so that an equivalent execution plan is obtained; and if the target node is not matched with the segmentation strategy, indicating that the target node can not carry out equivalent transformation. The content of the segmentation strategy can be configured according to experience as long as whether equivalent transformation is performed can be distinguished.

If the target node is matched with the pull-up strategy, the target node can be subjected to equivalent transformation, and the target node in the original execution plan can be subjected to equivalent transformation, so that an equivalent execution plan is obtained; and if the target node is not matched with the pull-up strategy, indicating that the target node can not perform equivalent transformation. The content of the pull-up strategy can be configured according to experience as long as whether equivalent transformation is performed can be distinguished.

The process of step 304 is described in detail below with reference to two specific implementations.

The method comprises the steps that firstly, a target node in an original execution plan is subjected to segmentation processing, and a first child node capable of being processed by a data source and a second child node incapable of being processed by the data source are obtained; and replacing the target node in the original execution plan by using the first child node and the second child node to obtain an equivalent execution plan. Specifically, a tree structure may be adopted to arrange a first child node and a second child node of the equivalent execution plan, where the first child node is located at a lower layer of the second child node, that is, the lower layer is the first child node that can be processed by the data source.

For example, referring to fig. 4A, the target node may be a filter node, AND the filter node is configured to execute "c 2>10AND udf (c2) > 20", obviously, the filter node in the original execution plan a may be subjected to the slicing process, AND a filter child node 1 AND a filter child node 2 are obtained, where the filter child node 1 is configured to execute "c 2> 10", AND the filter child node 2 is configured to execute "udf (c2) > 20". Also, since the data source can execute "c 2> 10", but cannot execute "udf (c2) > 20", the filtering child node 1 is a first child node that can be processed by the data source, and the filtering child node 2 is a second child node that cannot be processed by the data source.

Further, the filtering child node 1 and the filtering child node 2 are used for replacing a target node in the original execution plan to obtain an equivalent execution plan, and for convenience of distinguishing, the equivalent execution plan is called as an equivalent execution plan B in the following. Referring to fig. 4B, the nodes of the equivalent execution plan B may be arranged in a tree structure, and the filtering child node 1 and the filtering child node 2 replace the positions of the filtering nodes, where the filtering child node 1 is located at the lower layer of the filtering child node 2.

Determining a parent node corresponding to the target node from the nodes of the original execution plan; and pulling up the target node as an upper node of the father node to obtain an equivalent execution plan. The upper node connected to the target node in the original execution plan may be determined as a parent node corresponding to the target node.

For example, referring to FIG. 4A, the target node may be a filter node, AND the filter node is used to perform "c 2>10AND udf (c2) > 20". Obviously, the upper node connected to the filtering node is an aggregation node, and the aggregation node is configured to execute "sum (C3) GROUP BY C2", so that the aggregation node is determined as a parent node corresponding to the filtering node, and the filtering node is pulled up to be the upper node of the parent node (i.e., the filtering node is pulled up along the tree structure to become the upper node of the parent node), so as to obtain an equivalent execution plan, and for convenience of differentiation, the equivalent execution plan may be referred to as an equivalent execution plan C subsequently.

Referring to fig. 4C, the nodes of the equivalent execution plan C may be arranged in a tree structure, and the order of the aggregation node and the filter node is changed, that is, the aggregation node is located at the lower layer of the filter node.

Step 305, determining a target execution plan according to the original execution plan and the equivalent execution plan.

Specifically, a Cost value (i.e., Cost) corresponding to the original execution plan may be determined, and a Cost value corresponding to the equivalent execution plan may be determined; and determining a target execution plan according to the execution plan with the minimum cost value. For example, a part of the execution plans with the smallest cost value may be determined as the target execution plan.

For example, a cost value 1 corresponding to the original execution plan a may be determined, a cost value 2 corresponding to the equivalent execution plan B may be determined, and a cost value 3 corresponding to the equivalent execution plan C may be determined, and for the determination method of the cost values 1, 2, and 3, the execution cost on the data source, the execution cost in the query system, and the data transmission cost between the two may be synthesized, and the specific calculation method may refer to a conventional method, which is not described herein again. Then, assuming that the cost value 2 is the minimum, the target execution plan can be determined from the equivalent execution plan B. For example, a partial execution plan of the equivalent execution plan B is determined as the target execution plan.

In another example, the original execution plan a may be added to the set of execution plans after the original execution plan a is retrieved in accordance with the data processing request. Then, it is determined whether there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed original execution plan a, the original execution plan a may be processed in

steps

303 and 304 to obtain an equivalent execution plan B and an equivalent execution plan C, and then the equivalent execution plan B and the equivalent execution plan C are used as original execution plans, and the equivalent execution plan B and the equivalent execution plan C are added to the execution plan set.

Then, it is determined whether there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan B (which is already the original execution plan), the equivalent execution plan B may be processed using step 303 and step 304. Referring to fig. 4B, the target node that cannot be processed by the data source is the filtering child node 2, and since the filtering child node 2 cannot be split, the processing is no longer performed in the first manner. When the second method is adopted, the filtering child node 2 is pulled up to be the upper node of the aggregation node, and an equivalent execution plan D is obtained, which is shown in fig. 4D. Then, the equivalent execution plan D is taken as the original execution plan, and the equivalent execution plan D is added to the execution plan set.

Then, it is determined whether or not there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan C (which has already been the original execution plan), the equivalent execution plan C is processed using step 303 and step 304. Referring to fig. 4C, the target node that cannot be processed by the data source is a filter node, and since the filter node has no parent node, the processing is no longer performed in the above-described manner two. In the first mode, the filtering node is divided into the filtering sub-node 1 and the filtering sub-node 2, the filtering sub-node 1 is located at the lower layer of the filtering sub-node 2, and an equivalent execution plan E is obtained, as shown in fig. 4E, and the equivalent execution plan E is taken as an original execution plan and is added to the execution plan set.

Then, it is determined whether there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan D (which is already the original execution plan), the equivalent execution plan D may be processed using step 303 and step 304. Referring to fig. 4D, the target node that cannot be processed by the data source is the filtering child node 2, and since the filtering child node 2 cannot be split and the filtering child node 2 has no parent node, the processing is no longer performed in the first and second manners.

Then, it is determined whether there is an unprocessed original execution plan in the execution plan set, and since there is an unprocessed equivalent execution plan E (which is already the original execution plan), the equivalent execution plan E can be processed using step 303 and step 304. Referring to fig. 4E, the target node that cannot be processed by the data source is the filtering child node 2, and since the filtering child node 2 cannot be split and the filtering child node 2 has no parent node, the processing is no longer performed in the first and second manners.

Then, it is determined whether or not there is an unprocessed original execution plan in the execution plan set, and the traversal process is terminated because there is no unprocessed original execution plan. Further, the target execution plan may be determined in the manner of step 305 based on the cost value of each execution plan in the set of execution plans (e.g., original execution plan a, equivalent execution plan B-equivalent execution plan E, etc.).

In one example, a cost value of 1 corresponding to the original execution plan a may be determined, a cost value of 2 corresponding to the equivalent execution plan B may be determined, a cost value of 3 corresponding to the equivalent execution plan C may be determined, a cost value of 4 corresponding to the equivalent execution plan D may be determined, and a cost value of 5 corresponding to the equivalent execution plan E may be determined. Then, assuming that the cost value 4 is the minimum, the target execution plan can be determined from the equivalent execution plan D.

When determining the target execution plan according to the equivalent execution plan D, each node of the tree structure may be traversed from the lowest node of the tree structure of the equivalent execution plan D in sequence from bottom to top (i.e., the tree structure is searched from bottom to top) until a node that cannot be processed by the data source is traversed, and all nodes below the node are determined as the target execution plan. For example, referring to fig. 4D, the scan node, the filter child node 1, and the aggregation node may be used as a target execution plan, that is, all nodes in the target execution plan are nodes capable of being processed by the data source, which is an example of the target execution plan, referring to fig. 4F.

In addition, for the remaining execution plans in the equivalent execution plan D (i.e., other execution plans besides the target execution plan), such as the filtering child node 2, the processing may be performed by the query system itself.

Step 306, sending the target execution plan to the data source so that the data source executes the target execution plan.

After the data source executes the target execution plan (without limitation to the execution process), the data source may return part of the data to the query system according to the execution result, instead of returning all of the data source to the query system. After receiving the data returned by the data source, the query system may execute the remaining execution plans (i.e., other execution plans except the target execution plan), such as the filtering child node 2, which is not described in detail herein.

Based on the technical scheme, in the embodiment of the application, the query system pushes part of the query to the data source to be executed by sending the target execution plan to the data source, so that the computing power of the data source can be utilized, and the data returned by the data source to the query system can be reduced. In addition, the target nodes in the original execution plan can be equivalently transformed to obtain an equivalent execution plan, so that the number of the execution plans is increased, more execution plans are generated, the solution space of the execution plans is effectively expanded, the optimal execution plan is selected from the execution plans, and the optimal execution plan is used as the target execution plan. Obviously, because the selectable execution plans are more, the finally selected target execution plan is more optimal, so that the effect of the target execution plan is improved, the processing executed by the data source is more reasonable, the computing power of the data source is reasonably utilized, the portability is good, the method can be applied to different heterogeneous scenes and query optimizers, and the user experience is better.

Based on the same application concept as the method, an embodiment of the present application further provides a data processing apparatus, as shown in fig. 5, which is a structural diagram of the data processing apparatus, and the data processing apparatus includes:

an obtaining module 51, configured to obtain an original execution plan, where the original execution plan includes a plurality of nodes; a selecting module 52, configured to select a target node that cannot be processed by a data source from the nodes of the original execution plan; a processing module 53, configured to perform an equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan; a determining module 54, configured to determine a target execution plan according to the original execution plan and the equivalent execution plan; a sending module 55, configured to send the target execution plan to the data source, so that the data source executes the target execution plan.

The processing module 53 performs an equivalent transformation on the target node in the original execution plan, and when an equivalent execution plan is obtained, the processing module is specifically configured to: performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which cannot be processed by the data source; and replacing the target node in the original execution plan by using the first child node and the second child node to obtain an equivalent execution plan.

The processing module 53 performs an equivalent transformation on the target node in the original execution plan, and when an equivalent execution plan is obtained, the processing module is specifically configured to: determining a parent node corresponding to the target node from the nodes of the original execution plan; and pulling up the target node as an upper node of the father node to obtain an equivalent execution plan.

Based on the same application concept as the method, an embodiment of the present application further provides a data processing apparatus, including: a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:

The embodiment of the application also provides a machine-readable storage medium, wherein a plurality of computer instructions are stored on the machine-readable storage medium; the computer instructions when executed perform the following:

Referring to fig. 6, which is a block diagram of a data processing device proposed in the embodiment of the present application, the data processing device 60 may include: a processor 61, a network interface 62, a bus 63, and a memory 64. The memory 64 may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the memory 64 may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., a compact disk, a dvd, etc.).

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1,

selecting a target node from the nodes of the original execution plan that cannot be processed by a data source, comprising:

arranging the plurality of nodes of the original execution plan by adopting a tree structure;

and traversing from the lowest node of the tree structure to the top in sequence until the node which cannot be processed by the data source is traversed, and determining the traversed node as the target node.

3. The method of claim 1, wherein performing an equivalence transformation on the target node in the original execution plan to obtain an equivalent execution plan comprises:

performing segmentation processing on the target node in the original execution plan to obtain a first child node which can be processed by a data source and a second child node which cannot be processed by the data source; and replacing the target node in the original execution plan by using the first child node and the second child node to obtain an equivalent execution plan.

4. The method of claim 3, wherein replacing the target node in the original execution plan with the first child node and the second child node to obtain an equivalent execution plan comprises:

and arranging the first child node and the second child node of the equivalent execution plan by adopting a tree structure, wherein the first child node is positioned at the lower layer of the second child node.

5. The method of claim 1, wherein performing an equivalence transformation on the target node in the original execution plan to obtain an equivalent execution plan comprises:

and pulling up the target node as an upper node of the father node to obtain an equivalent execution plan.

6. The method of claim 5,

determining a parent node corresponding to the target node from the nodes of the original execution plan, including:

and if the plurality of nodes of the original execution plan are arranged by adopting a tree structure, determining an upper node connected with the target node as a father node corresponding to the target node.

7. The method according to any one of claims 1 to 6,

the execution result of the equivalent execution plan is the same as the execution result of the original execution plan.

8. The method of claim 1,

determining a target execution plan according to the original execution plan and the equivalent execution plan, including:

determining a cost value corresponding to the original execution plan;

determining a cost value corresponding to the equivalent execution plan;

and determining the target execution plan according to the execution plan with the minimum cost value.

9. The method of claim 1, wherein obtaining an original execution plan comprises:

acquiring a data processing request, and acquiring an original execution plan according to the data processing request; alternatively, the first and second electrodes may be,

after obtaining the equivalent execution plan, determining the equivalent execution plan as an original execution plan.

10. A method of data processing, the method comprising:

11. The method of claim 10, wherein replacing the target node in the original execution plan with the first child node and the second child node to obtain an equivalent execution plan comprises:

12. A method of data processing, the method comprising:

13. The method of claim 12,

14. A method of data processing, the method comprising:

and processing data according to the equivalent execution plan.

15. A data processing apparatus, characterized in that the apparatus comprises:

an obtaining module, configured to obtain an original execution plan, where the original execution plan includes a plurality of nodes;

a selection module, configured to select a target node that cannot be processed by a data source from the nodes of the original execution plan;

the processing module is used for carrying out equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan;

the determining module is used for determining a target execution plan according to the original execution plan and the equivalent execution plan;

and the sending module is used for sending the target execution plan to the data source so as to enable the data source to execute the target execution plan.

16. The apparatus of claim 15, wherein the processing module performs an equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan, and is specifically configured to:

17. The apparatus of claim 15, wherein the processing module performs an equivalent transformation on the target node in the original execution plan to obtain an equivalent execution plan, and is specifically configured to:

18. A data processing apparatus, characterized by comprising: