CN111367954A

CN111367954A - Data query processing method, device and system and computer readable storage medium

Info

Publication number: CN111367954A
Application number: CN201811599635.2A
Authority: CN
Inventors: 鲍焱; 丁岩; 黄峰; 叶松青
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2020-07-03
Also published as: WO2020135613A1

Abstract

The disclosure provides a data query processing method, a device and a system, and a computer readable storage medium. The data query processing method comprises the following steps: receiving a query processing request, determining a data component related to the query processing request according to global metadata information acquired from a global metadata database, and generating a query execution plan according to the global metadata information; wherein the global metadata information of the global metadata base comprises metadata information of data components; and based on the query execution plan, performing data operation with the data component related to the query processing request in an interactive mode. According to the scheme provided by the embodiment, the related data components are determined according to the global metadata information, and the storage details of the underlying data can be shielded for the business application layer.

Description

Data query processing method, device and system and computer readable storage medium

Technical Field

The embodiment of the invention relates to a data query processing method, a data query processing device, a data query processing system and a computer readable storage medium.

Background

For the storage of mass data, the data record quantity is extremely large, and the data record quantity exceeds the design range of a relational database; in such a case, the solutions commonly used in the industry are HIVE, HBASE, etc.

Hive is a data warehouse infrastructure established on Hadoop, and data is stored in a file block mode. Hive provides a simple SQL-like (Structured Query Language) that allows users to Query data in a SQL-like manner. HIVE converts SQL requests into MapReduce for processing, and needs a large amount of overhead when submitting and scheduling jobs. Therefore, it is not possible to achieve fast queries with low latency on large-scale datasets; the best use case for Hive is batch processing of large data sets, such as analysis processing of behavior logs.

HBase is a high-reliability, high-performance, nematic and scalable distributed storage system, and a large-scale structured storage cluster can be built on the cheap PCServer. HBase is different from a general relational database in that it adopts a column storage mode and stores data in a key-value format. Therefore, after the main key (rowkey) is obtained, the value of the corresponding field can be quickly obtained; therefore, the design of the main key is strongly coupled with the service scene, and the expansibility is poor. HBASE does not provide a SQL-like interface and requires the developer to call through an Application Program Interface (API).

Other components in the field of mass data application are difficult to consider various application scenarios. Therefore, in practice, different components need to be adopted according to specific application scenarios, and then customized development is performed according to the query interface provided by the components. Under the mechanism, in order to deal with multiple application scenarios, a plurality of data processing components need to be deployed to support the complete business requirement. This brings about the following problems: development is required according to the access interface provided by each data component, and engineering complexity is increased. The data components are physically isolated, and association (join, union, etc.) analysis cannot be performed on the data.

There are two types of solutions that are currently more common:

the first kind of scheme firstly analyzes and processes the data in each data component, forms the result into a middle table and stores the middle table in a relational database or a distributed relational database; then, performing association operation on the intermediate table; and finally, respectively fishing out detailed information from each data component according to the formed association result information. This approach has the following significant limitations: 1. the data components need to be directly butted, and the development workload and the maintenance workload of the service codes are increased; 2. the data storage mode has high coupling degree with the application scene, the data processing logic processing is complex, and the rapid deployment capability of the service application is influenced. 3. The data is read for many times, and the cluster resources are greatly consumed.

The second type scheme deploys components supporting correlation analysis of data of multiple data sources; among them, Presto is widely used. As shown in fig. 1, Presto is a Java-developed big data distributed SQL query engine introduced by Facebook, and can query data stored in data products such as Hive, Redis, Cassandra, Mysql, Oracle, and the like, and perform association analysis on data from multiple data sources. However, Presto has significant limitations: the data source used needs to be specified through the schema; for the service layer, the data storage still needs to be logically partitioned, and the engineering implementation complexity is increased.

Disclosure of Invention

At least one embodiment of the invention provides a data query processing method, a data query processing device, a data query processing system and a computer-readable storage medium, which can reduce query complexity.

At least one embodiment of the present invention provides a data query processing method, including:

receiving a query processing request, determining a data component related to the query processing request according to global metadata information acquired from a global metadata database, and generating a query execution plan according to the global metadata information; wherein the global metadata information of the global metadata base comprises metadata information of data components;

and based on the query execution plan, performing data operation with the data component related to the query processing request in an interactive mode.

At least one embodiment of the present invention provides a data query processing system, including:

the global metadata base is used for storing global metadata information, and the global metadata information comprises metadata information of data components;

the component management node is used for receiving a query processing request, determining a data component related to the query processing request according to global metadata information acquired from the global metadata database, and generating a query execution plan according to the global metadata information; generating an execution task based on the query execution plan, and issuing the execution task to a task execution node;

and the task execution node is used for finishing data operation with the data assembly related to the query processing request after receiving the execution task.

At least one embodiment of the present invention provides a data query processing apparatus, including a memory and a processor, where the memory stores a program, and the program, when read and executed by the processor, implements the data query processing method according to any one of the embodiments.

At least one embodiment of the present invention provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the data query processing method according to any one of the embodiments.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a diagram of Presto architecture in the related art;

FIG. 2 is a block diagram of a data query processing system according to an embodiment of the present invention;

FIG. 3 is a flowchart of a data query processing method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a service information group table creation according to an exemplary application of the present invention;

FIG. 5 is a timing diagram illustrating an exemplary operation of the present invention;

FIG. 6 is a schematic diagram of an exemplary two-dimensional analysis application of the present invention;

FIG. 7 is a timing diagram illustrating operation of a second DML according to an exemplary embodiment of the present invention;

FIG. 8a is a diagram illustrating an exemplary third implementation of the present invention based on behavior log queries;

FIG. 8b is a diagram of an exemplary three syntax tree for use in the present invention;

FIG. 9 is a schematic diagram of an exemplary application of the present invention illustrating a three-conditional expression hold-down;

FIG. 10 is a diagram of an exemplary four data export implementation of the present invention;

FIG. 11 is a diagram illustrating an implementation of a five-heterogeneous data source query in accordance with an exemplary embodiment of the present invention;

FIG. 12 is a diagram illustrating an exemplary six-index and detail association implementation of the present invention;

FIG. 13 is a block diagram of a data query processing apparatus according to an embodiment of the present invention;

fig. 14 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

As shown in fig. 2, an embodiment of the present invention provides a data query processing system, which includes a component management node (Coordinator)201, a global metadata base 202, and at least one task execution node (Worker)203, where:

the component management node 201 is configured to receive a query processing request, determine a data component related to the query processing request according to global metadata information acquired from the global metadata repository 202, and generate a query execution plan; generating an execution task (task) based on the query execution plan, and issuing the execution task to a task execution node;

the global metadata base 202 is used for storing global metadata information; the global metadata information packet includes metadata information of each data component, and the component management node does not need to access each data component to obtain the global metadata information. The global metadata information may be all metadata information of the data component or may be partial metadata information.

The task execution node 203 is configured to, after receiving the execution task, complete data operation by interacting with the data component related to the query processing request. The task execution node 203 is responsible for actually executing the task.

The system may also include RDBMS, Redis, HIVE, HBASE, Solr, etc. databases.

According to the scheme provided by the embodiment, the related data components are determined according to the global metadata information, a data source or a data alias (Schema) does not need to be specified, namely, the data storage details do not need to be known, and the bottom data storage details can be shielded for the service application layer. In addition, the global metadata information is stored in the global metadata database, and the global metadata information is directly acquired from the global metadata database. The scheme provided by the embodiment realizes metadata lifting operation, isolates the service application from the bottom data storage, and provides powerful support for rapid design and deployment of the application service.

The global metadata base may include all metadata in the data components, or may include only partial metadata of the data components. When the global metadata information of part of tables in the data component is not contained in the global metadata database, a user cannot access the data in the tables, so that the global metadata information of the data needing to be protected can not be put into the global metadata database, so that the user cannot access the data, and the purpose of protecting the data is achieved. The metadata in the global metadata base can be imported from the data component, and can also be updated by the task execution node.

The component management node 201 is a management node, analyzes the query processing request to obtain a query execution plan, divides the query execution plan to obtain query execution sections (stages), further divides the stages to obtain tasks, issues the tasks to the task execution node 203, and the task execution node 203 is a work node and is called by the component management node 201 to actually complete each task and interact with the data component.

Wherein the query processing request may be an SQL statement. The SQL statement includes a Data Manipulation Language (DML) and a Data Definition Language (DDL). DML includes SELECT (getting data from database TABLEs), UPDATE (updating data in database TABLEs), DELETE (deleting data from database TABLEs), INSERT INTO (inserting data INTO database TABLEs), etc., DDL includes CREATE TABLE (creating new TABLEs), ALTER TABLE (changing database TABLEs), DROP TABLE (deleting TABLEs), etc. When the query processing request is an SQL statement, the component management node analyzes the SQL statement to obtain the query processing request.

The Java DataBase Connectivity (JDBC) client Java package and the command line window are provided, and business application developers can conveniently and quickly submit SQL and acquire an execution query result.

Of course, the query processing request may be other types of statements.

In an embodiment, the global metadata information comprises at least one of: table definition information, table extension information, table field definition information, metadata rule information, table statistics information. The above information may be stored in the form of a table, but may also be stored in other forms. Wherein,

the table definition information includes: data component information, table name information, etc. of the data table;

the table extension information defines the self characteristics of each data table for the data assembly, such as the Region division mode of HBASE, the partition field of HIVE and other information;

the table field definition information comprises field information included in the data table, including field name, data type, whether to index the field, whether to use the primary key and other information;

the metadata rule information includes: when the preset condition is met, executing preset operation on a preset data component; the preset data assembly is one or more data assemblies;

the statistical information of the table is: to support cost-based optimization (CBO), statistics on the data table are needed, such as: the number of records in the table, the maximum and minimum values of the fields, the histogram distribution, etc.

It should be noted that the above-mentioned global metadata information is only an example, and other information may be included as needed.

In one embodiment, the component management node 201 includes: a query execution plan module 2011, cluster management nodes 2012, a global metadata management module 2013, and a data component interface layer 2014, where:

the Query execution plan module (Query)2011 is configured to receive a Query processing request, determine a data component related to the Query processing request according to the global metadata information, and generate a Query execution plan; generating an execution task based on the query execution plan, and issuing the execution task to a task execution node accessed to a cluster management node;

the cluster management node (Discovery Server)2012 is configured to manage access and exit of a task execution node; after the task execution node is started, the task execution node registers to the cluster management node 2012 to realize access. And the execution task is issued to the accessed task execution node.

The Global MetaData management module (Global MetaData, abbreviated as GMD)2013 is configured to obtain the Global MetaData information from the Global MetaData repository; the global metadata information may be retrieved from the global metadata repository upon initialization after startup. The module is mainly responsible for managing the global metadata information, such as interaction with a global metadata base, acquisition and synchronization of the global metadata information, and the like.

The data component interface layer (Connector)2014 is used for interfacing with a data component to enable a task execution node to access the data component, for example, data collection from the data component, data writing in the data component, and the like, and when a tablescan node in an execution plan is queried, a function of collecting data needs to be implemented through the data component interface layer 2014.

It should be noted that the cluster management node 2012, the global metadata management module 2013, and the data component interface layer 2014 may be disposed in the component management node 201, or may be independent modules.

It should be noted that fig. 2 only generally illustrates the data component interface layer 2014, and in an actual application, the data components according to the actual access may include a high Connector, an hbsase Connector, a sol Connector, and the like.

The global metadata management module 2013 is independent of data components and is closer to a business application layer.

In one embodiment, the query execution plan module generating the query execution plan includes: and matching with a predefined metadata rule in the global metadata information, and generating the query execution plan according to the matched metadata rule. Metadata rules may be defined as desired. For example, a group Table creation rule is predefined, and when a Create Table statement is submitted and the Table name starts with a preset name, a data Table is created in preset data components (the preset data components may be one or more). For example, a rule is defined that when the Create Table statement is submitted and the Table name starts with xytj, a data Table is created in Mysql and hbsase. For another example, when the Create Table statement is submitted, the Table name starts with MT _ then the data Table is created in HIVE, and so on.

By defining metadata rules, the implementation complexity of the associated requirements can be simplified. A series of rules can be defined according to common requirements, and the implementation complexity is reduced. Common requirements are, for example, data association among different data components, export of data of one data component to another data component, export of data from a plurality of data components and combination, and the association of data of different data components may predefine rules corresponding thereto to simplify implementation. Taking the same set of tables created in two data components as an example, in the related art, two different SQL statements need to be submitted respectively, each SQL statement is specific to one data component, thereby achieving the creation of the same set of tables in two data components. In the application, a rule may be defined, and when the SQL statement includes information for creating a group table with a preset name, the table is created in the two data components at the same time, so that one SQL statement creates the same group table in the two data components.

In an embodiment, the global metadata management module 2013 is further configured to: and pushing the global metadata information to the task execution node. Since the task execution node may need the global metadata information when executing the task, the global metadata information may be pushed to the task execution node for its use when executing the task.

In an embodiment, the task execution node 203 is further configured to, in the process of finishing data operation by interacting with the data component related to the query processing request, locally obtain the global metadata information from the task execution node when the global metadata information needs to be used. Compared with the prior art that the metadata information is requested from the data component in the task execution process, the execution efficiency is improved by the scheme provided by the embodiment.

In an embodiment, the global metadata management module 2013 is further configured to update the global metadata database when the task execution node updates the data in the data component, and synchronize local global metadata information and global metadata information of the task execution node according to the updated global metadata database.

Each query execution plan generates an Abstract Syntax Tree (AST), and each node in the syntax tree is an execution node (e.g., a JoinNode, a FilterNode, a tablescan node, etc.). In one embodiment, the query execution plan module generating the query execution plan includes: a syntax tree is generated that includes multiple levels of execution nodes, and at least some of the filter conditions are assigned to table scan nodes (tableScanNodes) in the syntax tree. Note that all filter conditions may be assigned to tablescan node. In the solution provided in this embodiment, the filtering condition is pressed to tablescan node, and compared with a manner in which the filtering condition is assigned to FilterNode, the result set is taken out from the data component, and then the filtering is performed in FilterNode in the related art, the filtering is performed before the result set is taken out from the data component, so that the information amount of the result set obtained from the data component can be reduced, and the execution efficiency can be improved. The filter conditions that need to be pressed down to tablescan node may be defined in the metadata rules. The filter condition may be fully or partially pressed to tablescan node. It should be noted that the table scan node represents an operation step directly interacting with the data component in the syntax tree, and is not limited to the name of tablescan node. The filter conditions include, for example: OR statements, and conditional expressions including format conversion, arithmetic, trigonometric functions, and the like.

In an embodiment, in the syntax tree, a filtering node (FilterNode) whose assignment is empty is at a previous stage of the table scan node, that is, the filtering condition assigned to the FilterNode is assigned to tablescan node, and the FilterNode node is still maintained in order to maintain consistency of the established query execution plan. It should be noted that, in other embodiments, the FilterNode may be removed from the syntax tree.

In one embodiment, the query execution plan module generating the query execution plan includes: and generating a query execution plan which is optimized based on cost according to the statistical information in the global metadata information. The global metadata information may include statistical information, such as discrete distribution information of each field in the table, and the cost (time and resources consumed) of different query execution plans is determined according to the statistical information, so as to optimize the query execution plan and select an optimal execution plan.

As shown in fig. 3, an embodiment of the present invention provides a data query processing method, including:

step 301, receiving a query processing request, determining a data component related to the query processing request according to global metadata information acquired from a global metadata database, and generating a query execution plan according to the global metadata information; wherein the global metadata information of the global metadata base comprises metadata information of data components;

after the component management node is started, the global metadata information is loaded from the global metadata base.

Step 302, based on the query execution plan, data operation is completed by interacting with the data component related to the query processing request.

According to the scheme provided by the embodiment, the related data components are determined according to the global metadata information, a data source or a data alias (Schema) does not need to be specified, that is, the data storage details do not need to be known, or the data storage details of the bottom layer can be shielded for the service application layer. In addition, the global metadata information is stored in the global metadata database, and the global metadata information is directly obtained from the global metadata database. The scheme provided by the embodiment realizes metadata lifting operation, isolates the service application from the bottom data storage, and provides powerful support for rapid design and deployment of the application service. In addition, the format of the query processing request is irrelevant to the data components, and service application is not needed to develop a docking program of each data component, namely, the interfaces are consistent, so that the project complexity is obviously reduced.

The global metadata base may include all metadata information of the data component, or may include partial metadata information, and when only including partial metadata information, data in the data component may be protected. That is, only the data tables having metadata in the global metadata base can be accessed at this time, and the data tables having no metadata in the global metadata base cannot be accessed.

In an embodiment, the global metadata information comprises at least one of: table definition information, table extension information, table field definition information, metadata rule information, table statistics information. The meaning of each information is described in the foregoing system embodiments, and is not described herein again. It should be noted that the above-mentioned global metadata information is only an example, and other information may be added as needed.

In an embodiment, the determining, in step 301, the data component involved in the query processing request according to the global metadata information obtained from the global metadata base includes at least one of:

when the global metadata information comprises table definition information which comprises component information, determining the related data components according to the component information in the table definition information in the global metadata information;

and when the global metadata information comprises the metadata rule, determining the related data component according to the matched metadata rule in the global metadata information.

In one embodiment, the step 301 of generating the query execution plan includes: and matching the query processing request with a predefined metadata rule in the global metadata information, and generating the query execution plan according to the matched metadata rule. Metadata rules are for example: when the preset condition is met, a preset operation is executed in the preset data component, and the preset operation can be specifically defined according to needs, for example, according to access requirements. For example, the access to multiple data sources can be performed by defining the access rules of the multiple data sources and matching the access rules with one SQL statement, so as to directly access the multiple data sources.

In an embodiment, in the step 302, the performing data operation with the data component involved in the query processing request includes:

and dividing the query execution plan into execution tasks, issuing the execution tasks to task execution nodes, and finishing data operation by the task execution nodes through interaction between the data component butt-joint layer of the data component related to the query processing request and the data component. Dividing the query execution plan into execution tasks includes: dividing the query execution plan into query execution paragraphs, and splitting the query execution paragraphs into execution tasks, where dividing the query execution plan into query execution paragraphs, and a specific way of splitting the query execution paragraphs into execution tasks refers to related technologies, which are not described herein again. The query execution paragraphs contain a hierarchical relationship with each other, and one query execution paragraph represents a portion of the query execution plan. Each query execution paragraph has its input (upstream) and output (downstream).

Performing lexical grammar analysis according to the global metadata information to generate a query execution plan; before dividing into query execution paragraphs, rule-based optimization (RBO) can also be performed on the query execution plan.

In an embodiment, before receiving the query processing request, the step 301 further includes: and the component management node acquires the global metadata information from the global metadata database and pushes the global metadata information to the task execution node.

In an embodiment, in step 302, in the process of completing data operation by interacting with the data component related to the query processing request, when the global metadata information needs to be used, the global metadata information is locally obtained from the task execution node.

In an embodiment, the method further comprises: when the data in the data component is updated, the global metadata base is updated, and the global metadata information of the component management node and the global metadata information in the task execution node are updated. For example, a new table is created in the data component, the global metadata base is updated at this time, and the global metadata information at the management node of the component and the global metadata information at the task execution node are updated.

In one embodiment, the query execution plan is a syntax tree comprising multiple levels of execution nodes, and at least some of the filter conditions are assigned to tablescan nodes in the syntax tree. Of course, all the filter conditions may be assigned to tablescan node, or a filter condition push rule may be defined in the global metadata information, and a filter condition that satisfies the filter condition push rule may be assigned to tablescan node. Wherein, the tablescan node completes the operation of collecting data from the data component. In the implementation of Presto scheme in the related art, the OR statement and the conditional expression including the function are converted into FilterNode and are not transmitted to the data component, so that the collection result set of the data component is huge, and the execution efficiency is influenced. In the scheme, the query execution plan is optimized, the filtering condition is pressed down to the data assembly for execution, the size of a result set returned from the data assembly is greatly converged, and the execution efficiency is improved.

In one embodiment, the syntax tree includes a filtering node whose value is assigned to be empty at a level above the table scan node. I.e., the filter node is retained in the syntax tree even if its assignment is null.

In an embodiment, the query execution plan is a query execution plan optimized based on cost according to statistical information in the global metadata information. The global metadata information may include statistical information, such as discrete distribution information of each field in the table, and the cost (time and resources consumed) of different query execution plans is determined according to the statistical information, so as to optimize the query execution plan and select an optimal execution plan. By optimizing the execution plan, the query efficiency is greatly improved.

In an embodiment, step 302 further includes returning the execution result or result set to the business application that initiated the query processing request. Specifically, the task execution node returns an execution result or result set to the component management node, and the component management node returns the execution result or result set to the service application initiating the query processing request. For example, the information is returned to the service application in a ResultSet or command line playback mode in JDBC (Java database Connectivity) specification. JDBC is a Java API for executing SQL statements that provides uniform access to multiple relational databases.

In one embodiment, the method further comprises the steps of expanding the supported data components, and supporting components such as HBase, Solr, elastic search and the like; and a secondary index mechanism is introduced for improving the query efficiency of HBase and the expandability of an application level. The scheme provided by the embodiment realizes the butt joint of common data sources, including a traditional relational database, a memory database, a full-text retrieval component, and HIVE, HBase and the like commonly used in the field of mass data, and remarkably reduces the workload of business application development.

In the embodiment, the automatic selection of the data components according to the rules is supported, and the modification of the metadata definition through the data definition statements is supported. The global metadata information is managed by the component management node and is issued/synchronized to the task execution node through the cluster management node, so that all operations related to the global metadata information can be directly acquired from a local memory, and the execution efficiency is improved. Compared with the Presto scheme in the related art, Presto directly requests each data component to acquire global metadata information. Therefore, when the business application is docked, the required data must be known by which data component is stored; in the analysis and execution process of the query execution plan, global metadata information needs to be requested from the corresponding data component in real time, and the execution efficiency is low.

Compared with the related technology, the scheme provided by at least one embodiment of the invention can improve the development and deployment efficiency of the service application aiming at mass data, reduce the implementation complexity and reduce the workload of design and development.

The application is further illustrated by the following specific application examples. In the example figures, solid lines represent message flows and dashed lines represent data flows.

Application example 1

The present embodiment relates to the creation of a service information group table. In the field of mass data applications, a scenario that a correlation table needs to be created in different data components at the same time to support business applications often occurs. In this embodiment, when the service application executes CREATE TABLE xytj _ tableA (assuming that a rule is established, the TABLE name starts with xytj _ and is a service information group TABLE), xytj _ tableA _ Mysql needs to be created in Mysql, and xytj _ tableA _ HBase needs to be created in HBase.

The process of creating the service information group table is shown in fig. 4. The components described in this scheme are denoted by GSQL in the flow chart herein, the same applies hereinafter.

The precondition of this embodiment is that global metadata information needs to be imported from the data component according to the business application. For this embodiment, the relevant rule for creating the group Table needs to be defined in advance as follows (the rule represents the submitted Create Table statement, and if the Table name starts with xytj, the data Table needs to be created in the hbsase and Mysql simultaneously):

<operator>CreateTable</operator>

<function>createMysqlTable</function>

<function>createHBaseTable</function>

</functions>

</rule>

step 401, the business application submits a query SQL statement to the component management node through the JDBC interface. The SQL statements submitted in this embodiment are as follows:

Create Table xytj_tableA(teamcode varchar2(20),contenttype number,contentcode varchar2(64))

the SQL represents a service-side initiated request to create an xytj tableA group table. In other implementation schemes in the industry, the requirements for implementing the embodiment need to be submitted separately for different data sources; the scheme of use only needs to submit once.

Step 402, after receiving the SQL statement, the component management node matches a create service information group table rule (multiple createtable) in combination with the global metadata information, and constructs a query execution plan according to the rule.

Therein, step 402.1 is the interaction of the component management node and the GMD.

In step 403, after the query execution plan is optimized (i.e. Rule-Based Optimization, RBO) according to the rules, a query execution paragraph is formed.

For specific implementation of RBO, reference may be made to related technologies, which are not described herein in detail.

In this embodiment, two query execution paragraphs are generated: stage-1 comprises a table building operation on HBase; stage-2 includes a table build operation on Mysql. Stage-1 was performed first.

Step 404, creating xytj _ tableA _ HBase in the HBase, where the HBase statement actually executed in this embodiment is as follows:

create'jz:xytj_tableA_hbase',{NAME＝>'cf',DATA_BLOCK_ENCODING＝>'DIFF',COMPRESSION＝>'SNAPPY'},{NAME＝>'idx',DATA_BLOCK_ENCODING＝>'DIFF',COMPRESSION＝>'SNAPPY'}

the above command line is a command to create a table in HBase. In other implementation schemes in the industry, the requirement of the embodiment is realized by logging in HBASE by service personnel, and compiling and submitting the command; by using the scheme, the command is automatically generated and submitted, and the manual execution of business personnel is not needed.

Step 405, adding the xytj _ tableA _ hbase related information in the global metadata database.

Step 406, triggering the execution of Stage-2.

Step 407, create xytj _ tableA _ Mysql in Mysql, and the Mysql SQL statements actually executed in this embodiment are as follows:

Create Table xytj_tableA_mysql(teamcode varchar2(20),contenttypenumeric,contentcode varchar2(64))

the above command line is the command to create a table in Mysql. In other implementation schemes in the industry, the requirement of the embodiment is realized by logging in Mysql by service personnel, and compiling and submitting the command; by using the scheme, the command is automatically generated and submitted, and the manual execution of business personnel is not needed.

In step 408, the relevant information of xytj _ tableA _ mysql is added to the global metadata database.

In step 409, the execution result (here, the indication of successful execution of the data definition statement) is output to the component management node.

In step 410, the component management node returns the execution result to the service application.

In the present embodiment, the timing of the related global metadata operation is as shown in fig. 5.

When the component service is started, the implementation flow of the related operation is as follows:

step 501, a component management node process (a process running on a component management node) triggers a metadata information initialization operation, obtains metadata information from a global metadata base, and stores the metadata information in a local memory.

Step 502, the component management node process pushes the global metadata information to the cluster management node.

Step 503, the task execution node service is started and completed, and registers to the cluster management node.

In step 504, after receiving the registration message of the task execution node, the cluster management node pushes the global metadata information to the task execution node. And the task execution node receives the information and stores the information in a local memory.

When executing a data definition statement (DDL), comprising:

and 505, the component management node process receives the SQL statement and triggers the query execution plan construction operation.

Step 506, according to the transmitted SQL statement, rule matching is carried out, the related data component is determined, whether business application customization operation exists or not is determined, and according to the matched rule, the sorted SQL statement is generated.

And 507, performing semantic analysis according to the metadata information to generate a query execution plan. In the syntax tree of the query execution plan, MetaDataUpdateNode is added. The node is used for updating the metadata information.

And step 508, generating an execution task according to the query execution plan, and completing the DDL operation on the data assembly through a corresponding data assembly docking layer.

In step 509, a global metadata update operation is performed in the MetaDataUpdateNode, and the operation updates the metadata information (located in the memory) of the global metadata base and the component management node. And then informing the component management node process to perform global metadata synchronization operation.

Step 510, the component management node process pushes the metadata information to the cluster management node.

Step 511, the cluster management node pushes the metadata information to each task execution node in a normal state. And after receiving the metadata information, the task execution node stores the metadata information in a local memory.

Application example two

The embodiment relates to multidimensional analysis application, and in the multidimensional analysis application of mass data, the data stored in different data sources are often subjected to associated query. In the embodiment, a table kkjkxx stores collected track information, and the table is stored in an HIVE; the table kkppp stores the pictures and the radio frequency image information related to the track, and the table is stored in HBase; the business application needs to perform association (Join) operation on the data of the two tables according to the track number (gjxxbh) field.

The current common solution is to count according to track number from a table kkjkxx and a table kkppp respectively and then merge the statistical information; the implementation is relatively complex. By using the scheme, the business requirement can be realized by submitting one SQL.

The precondition of this embodiment is to import global metadata information according to the service application. For this embodiment, metadata information of the table kkjkxx and the table kkppp and used data component information need to be defined in advance.

The multidimensional analysis (Join) flow is shown in FIG. 6 and includes:

step 601, the business application submits the query SQL statement to the component management node through the JDBC interface. The SQL statements submitted in this embodiment are as follows:

Select I01.clxxbh,I01.hphm,I01.dwbh,I02.SPSPLSH,I02.GCBQLSH Fromkkjkxx I01join kksppp I02on I01.gjxxbh＝I02.gjxxbh

the SQL statement represents the records of two data tables of kkjkxx and kksppp, which are queried and the gjxxbh fields are equal.

Step 602, after receiving the SQL statement, the component management node constructs a query execution plan by combining the global metadata information.

Step 603, after optimizing the query execution plan according to the rules, dividing the query execution plan into a plurality of query execution paragraphs. In this embodiment, two query execution paragraphs are generated: stage-1 comprises data acquisition operation and Join operation of HBase; stage-2 includes data acquisition operations on Hive. Both query execution paragraphs start running simultaneously. In Stage-1, HBaseConnector will be executed first; and after waiting for the completion of the execution of the Stage-2 and transmitting the completed data, executing the JoinNode operation.

In step 604, the HBase Connector acquires data from the HBase.

Step 605, constructing a Hive Connector; and when the global metadata information is needed, local reading operation is performed as much as possible, namely the global metadata information is obtained from the local.

In step 606, the Hive Connector acquires data from HIVE.

And step 607, transmitting the data acquired from HIVE to Stage-1 to perform Join operation.

Step 608, output Join operation result set to component management node.

In step 609, the component management node returns the query result to the service application.

Steps 602.1, 603.1, and 605.1 in fig. 6 represent interactions with the global metadata management module, and interact with the global metadata management module when metadata needs to be acquired or updated during processing.

As shown in fig. 7, when executing a data manipulation statement (DML), the following operations are included:

step 701, triggering and constructing query execution plan operation by the component management node process.

Step 702, according to the transmitted SQL statement, matching with the global metadata information, determining the related data component, and determining whether business application customization operation exists. And generating the sorted SQL statement according to the relevant rule.

And 703, performing semantic analysis according to the global metadata information to generate a query execution plan.

Step 704, generating an execution task according to the query execution plan, and completing data operation on the data component through the corresponding data component docking layer. In the step of performing data operation by the data component docking layer docking the data component, if the global metadata information is needed, the global metadata information is directly acquired from the local memory of the task execution node.

Application example three

The present embodiments relate to behavior log based queries. The query based on the behavior log is an application scene which is frequently appeared in mass data application, the data is stored by HBase frequently, and unnecessary data is filtered as much as possible in the modes of partitioning, predicate push-down and the like for the query. In this embodiment, it is assumed that the behavior log table useropra is stored in the HBase and partitioned by day; the business application needs to inquire the related behavior log information of a certain type of agent within a specified date range.

The precondition of this embodiment is to import global metadata information according to the service application. For this embodiment, metadata information and component information used in the table useropra need to be predefined, and the table useropra is partitioned by day according to the service data scene.

The HBase query flow is shown in fig. 8a, and includes:

step 801, the business application submits a query SQL statement to the component management node through the JDBC interface. The SQL statements submitted in this embodiment are as follows:

select userid,opraid,capture_time,from useropra where userid like'a1743％'and(substr(opraid,1,3)＝'101'or substr(opraid,1,3)＝'102')andcapture_time>＝1526400000and capture_time<＝1529078400

the SQL represents the records that query the useropra table for, and meet the following conditions:

a: userid begins with "a 1743";

b: the first three characters of oproid are "101" or "102";

c: capture _ time is between 1526400000and 1529078400 (inclusive);

step 802, after receiving the SQL statement, the component management node constructs a query execution plan by combining the global metadata information.

In step 803, a value is assigned to FilterNode and tablescan node in the query execution plan. Traversing the query execution plan, when traversing to the FilterNode, pressing the Filter condition (such as OR statement, conditional expression containing format conversion, four arithmetic operations and trigonometric functions) meeting the rules into the subnode tablescan node, and leaving the FilterNode which is pressed conditionally empty. In this embodiment, the conditional expressions "userid like ' a 1743%" (substr (oprad, 1, 3): 101' or substr (oprad, 1, 3): 102') "" both push down tablescan node, resulting in a very large convergence.

Syntax tree as shown in fig. 8b, the syntax tree includes nodes OutputNode (output node), ExchangeNode (exchange node), FilterNode and tablescan node, in the related art, the filtering condition is executed in FilterNode, in the embodiment, the filtering condition is executed in tablescan node.

And after the query execution plan is optimized according to the rules, dividing the query execution plan into a plurality of query execution paragraphs. In combination with the definition of the partition type of the data table in the global metadata information, an HBase Connector is constructed (according to the preset condition of the embodiment, multiple HBASESCAN is formed according to a specified time range (capture _ time is 1526400000and capture _ time is 1529078400), and concurrent query is performed).

And step 804, the HBase receives the Scan request, and acquires the related behavior log information of the designated agent through the coprocessor and the secondary index.

To better support the traffic demand, we need to index the row key in the HBASE: simultaneously storing service data and secondary index information in HBASE; wherein, the rowkey of the service data is the unique mark of the record; and generating the rowkey of the secondary index according to the actual value of each service field, wherein the corresponding value is the rowkey of the service data. When the query service is processed, the rowkey range of the secondary index can be spelled according to the query condition by realizing a coprocessor interface of HBASE, and then the rowkey of the service data is quickly positioned, so that the query efficiency is improved.

Step 805, the HBase returns the query information to the HBase Connector.

In step 806, the HBase Connector processes the returned result and outputs it to the downstream query execution section (in this embodiment, the component management node).

In step 807, the component management node returns the query result to the business application.

Steps 802.1 and 803.1 in fig. 8 represent interactions with the global metadata management module, and interact with the global metadata management module when metadata needs to be acquired or updated in the processing process.

The Filter condition depression timing diagram is shown in fig. 9 and includes:

in step 901, the component management node process receives the SQL statement and triggers the construction of the query execution plan.

Step 902, generating a query execution plan, and assigning values to filternodes in the query execution plan.

In step 903, a value is assigned to tablescan node in the execution plan.

Step 904, optimize for tablescan node, trace back its own upper level FilterNode, and push down the Filter condition (e.g. OR statement, conditional expression including format conversion, four arithmetic operations, trigonometric functions) that meets the rules to tablescan node.

Step 905, the FilterNode that has been conditionally pushed is nulled, and is not removed from the query execution plan in order to maintain the consistency of the query execution plan.

Step 906, data is collected from the data component according to the tablescan node designation information through the data component connection layer.

Step 907, trigger the FilterNode operation, and filter the result set returned by the tablescan node. The nulled FilterNode will not do substantive processing and directly return to the result set.

Step 908, the result set is returned to the component management node process, which returns the result set to the business application.

Application example four

The embodiment relates to data export operation, and in a massive data application scene, a query result set is often too large and is not suitable for reading one by one; generally, an SQL statement like CREATE table. And then directly acquires the file. In this embodiment, it is assumed that a query is made to a data table tamee in the HBase, and the result is written into the HIVE.

The precondition of this embodiment is to import global metadata information according to the service application. For this embodiment, it is necessary to define the metadata information and the used component information of the table tamee in advance, and define the relevant rules as follows (the rule indicates that the table is a data table in HIVE if the table name starts with MT >):

<function>hiveTable</function>

</functions>

</rule>

the data export process is shown in fig. 10 and includes:

step 1001, the service application submits a query SQL statement to the component management node through the JDBC interface. The SQL statements submitted in this embodiment are as follows:

CREATE TABLE MT_161220102036554_070AS Select msisdn_1,msimsi_1Fromtamee Where dt＝20161219

the SQL represents querying the data table tamee for the record set with dt of 20161219, and saving the record set to the data table MT _161220102036554_ 070.

In step 1002, after receiving the SQL statement, the component management node constructs a query execution plan in combination with the global metadata information.

Step 1003, after optimizing the query execution plan according to the rules, forming query execution paragraph division. In this embodiment, two query execution paragraphs are generated: stage-1 comprises data acquisition operation of HBase; stage-2 includes a table build operation on Hive. Two query execution paragraphs are started and run simultaneously; however, since there is no data, Stage-2 is in a suspended state.

At step 1004, the HBase Connector acquires data from HBase.

And step 1005, transmitting the data acquired from the HBase to Stage-2.

In step 1006, a table MT _161220102036554_070 is created in HIVE. Based on submitted SQL, the table MT _161220102036554_070 only comprises two fields of msisdn _1 and msimsi _1, and the field information of the table MT _161220102036554_070 refers to tamee related information defined by global metadata; the generated HIVE tabulation statement is as follows:

CREATE TABLE IF NOT EXISTS MT_161220102036554_070(msisdn_1STRING,msimsi_1STRING)。

after the table is created, the data is saved on the ground.

Step 1007, updating the global metadata, specifically updating the global metadata information of the global metadata base and the component management node in the MetaDataUpdateNode. The detailed process may take part in the description of application example one. And adding table _ Hive related information in the global metadata base.

Step 1008 outputs the result of the execution (here, the total number of records of the export data) to the component management node.

In step 1009, the component management node returns the execution result to the service application.

Steps 1002.1 and 1003.1 in fig. 10 represent interaction with the global metadata management module, and interact with the global metadata management module when metadata is needed or updated in the processing process.

Application example five

The present embodiments relate to heterogeneous data source queries. In a massive data application scene, in order to quickly query hot data and store and use historical full data, the hot data and the historical data are frequently stored separately. In this embodiment, the gprs data is divided into hot spot data (data of the last 3 months), and stored in the Elasticsearch; and the historical full data is deposited in HBASE. When the data is queried, the selection and subsequent processing of the heterogeneous data source can be automatically completed according to the submitted SQL, and the actual storage mode of the corresponding data does not need to be concerned when the business application layer is designed and developed.

The precondition of this embodiment is to import global metadata information from the data component to the global metadata repository according to the service application. For this embodiment, metadata information and used component information of the table gprs need to be predefined, and data table related rules are predefined as follows (the rules indicate query operations on the table gprs, and will be queried from the Elasticsearch and HBase according to the date field (dt):

<operator>Select</operator>

<function>dateMultipleSelect</function>

</MultipleSelect>

<function>selectEsTable</function>

<function>selectHBaseTable</function>

</functions>

</rule>

in the present example, a specific time judgment rule is described by "< parameter > dt, m3</function >": dt is the date field of the data and m3 represents the last 3 months. Other rule description modes may be used.

Heterogeneous data source access as shown in fig. 11, includes:

step 1101, the business application submits a query SQL statement to the component management node through the JDBC interface. The SQL statements submitted in this embodiment are as follows:

Select msisdn_1,msimsi_1,sdnnum From gprs Where dt>'20171219'and dt<'20180531'

the SQL statement represents a query for data between '20171219' and '20180531' in the gprs table, fields msisdn _1, msimsi _1, sdnnum.

Step 1102, after receiving the SQL statement, the component management node constructs a query execution plan in combination with the global metadata information.

Wherein, according to the predefined rule, the query to the gprs table needs to acquire data from HBase and Elasticsearch, respectively.

Step 1103, the query execution plan is optimized according to rules and divided into a plurality of query execution paragraphs. In this embodiment, two query execution paragraphs are generated: stage-1 comprises data acquisition operation and Union operation on historical data (a gprs table in HBase); stage-2 includes the collection operation for hot spot data (gprs table in Elasticsearch). Both query execution paragraphs start running simultaneously. In Stage-1, HBase Connector will be executed first; and after waiting for the completion of the execution of Stage-2 and transmitting the completed data, executing the UnionNode operation.

In step 1104, the HBase Connector acquires historical data (excluding hotspot data) from the HBase.

In step 1105, an elastic search Connector is constructed.

In step 1106, the Elasticsearch Connector obtains hotspot data from the Elasticsearch.

Step 1107, the hotspot data obtained from the Elasticsearch is transmitted to Stage-1, and Union operation is performed on the hotspot data and the history data obtained from the HBase.

Step 1108, the result set of the Union operation is output to the component management node.

Step 1109, the component management node returns the result set to the service application.

Wherein, steps 11602.1, 1103.1, 1105.1 in fig. 11 represent interactions with the global metadata management module, and when metadata is needed or updated during processing, the interactions with the global metadata management module are performed.

Application example six

The present embodiments relate to index and detail association queries. In a massive data application scenario, the index (or information generated by performing statistical operations) is often stored separately from the details. In this embodiment, it is assumed that a certain track application service stores track information in the Solr (table gclli), and stores track details in the HBase (table gcliddesc); the complete trajectory information that meets the conditions may be obtained by associating sql (join) statements. And the present scheme can optimize based on Cost (CBO) when processing the SQL.

The precondition of this embodiment is to import global metadata information from the data component to the global metadata repository according to the service application. For this embodiment, it is necessary to define the table gclli, the table gclldesc metadata information, and the used component information in advance, and the automatic statistics function for the table gclli is already started. The index query flow chart is shown in fig. 12, and includes:

step 1201, the business application submits a query SQL statement to the component management node through the JDBC interface. The SQL statements submitted in this embodiment are as follows:

Select T1.gcbh,T1.gcwz,T2.sxrs,T2.sxfl From gclli T1join gclliDescT2on T1.gcbh＝T2.gcbh Where T1.gcwz＝'N073E011'

the SQL statement represents that gcbh fields are equal in two data tables of gclli and gcliddesc, and gclli table gcwz field is a record of 'N073E 011'.

Step 1202, after receiving the SQL statement, the component management node constructs a query execution plan by combining the global metadata information. In this embodiment, optimization is performed based on cost, and judgment is performed according to statistical information in the global metadata information, and there is less record information that conforms to the conditional expression ("t 1.gcwz in 'N073E 011'"), so that an execution plan for sequential execution is formed instead of the original Join execution plan. That is, information conforming to "T1. gcwz in' N073E 011" is taken out first, and then Join (association) is performed. By using the method, namely the filtered information is taken out firstly and then correlated, compared with the method of correlating firstly and then filtering, the processing speed is higher.

Step 1203, after optimizing the query execution plan according to the rules, forming query execution paragraph partitions. In this embodiment, two query execution paragraphs are generated: stage-1 includes data acquisition operations on the table gclli (deposited in Solr); stage-2 includes data acquisition operations on tables gclliDes (deposited in Hive). Stage-1 is executed first.

In step 1204, the Solr Connector acquires qualified gclli data from the Solr.

And step 1205, transmitting the acquired gcli data to Stage-2.

And step 1206, generating a query statement for the table gcliDesc according to the gcli data, and collecting data from the HIVE.

Step 1207, output the collection data set to the component management node.

In step 1208, the component management node returns the execution result to the service application.

Where step 1202.1 in FIG. 12 represents interaction with the global metadata management module when metadata is needed or updated during processing.

As shown in fig. 13, an embodiment of the present invention provides a data query processing apparatus 130, which includes a memory 1310 and a processor 1320, where the memory 1310 stores a program, and when the program is read and executed by the processor 1320, the program implements the data query processing method according to any embodiment.

As shown in fig. 14, an embodiment of the present invention provides a computer-readable storage medium 140, which stores one or more programs 141, and the one or more programs 141 are executable by one or more processors to implement the data query processing method according to any embodiment.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A data query processing method, comprising:

2. The data query processing method of claim 1, wherein the global metadata information comprises at least one of: table definition information, table extension information, table field definition information, metadata rule information, table statistics information.

3. The data query processing method according to claim 1, wherein the determining, based on the global metadata information obtained from the global metadata repository, the data component involved in the query processing request includes at least one of:

4. The data query processing method of claim 1, wherein the generating a query execution plan according to the global metadata information comprises: and matching the query processing request with a predefined metadata rule in the global metadata information, and generating the query execution plan according to the matched metadata rule when the metadata rule is matched.

5. The data query processing method of claim 1, wherein the performing data operations with the data component involved in the query processing request comprises:

and dividing the query execution plan into execution tasks, issuing the execution tasks to task execution nodes, and finishing data operation by the task execution nodes through interaction between the data component butt-joint layer of the data component related to the query processing request and the data component.

6. The data query processing method of claim 5, before receiving the query processing request, further comprising: and the component management node acquires the global metadata information from the global metadata database and pushes the global metadata information to the task execution node.

7. The data query processing method of claim 6, wherein in the process of finishing data operations by interacting with the data components involved in the query processing request, when the global metadata information needs to be used, the global metadata information is locally obtained from the task execution node.

8. The data query processing method of claim 6, wherein the method further comprises: when the data in the data component is updated, the global metadata base is updated, and the global metadata information of the component management node and the global metadata information in the task execution node are updated.

9. The data query processing method of any one of claims 1 to 8, wherein the query execution plan is a syntax tree comprising multiple levels of execution nodes, and at least some of the filter conditions are assigned to table scan nodes in the syntax tree.

10. The data query processing method of claim 9, wherein a filtering node whose value is null is located at a level above the table scan node in the syntax tree.

11. The data query processing method according to any one of claims 1 to 8, wherein the query execution plan is a query execution plan optimized based on a cost according to statistical information in the global metadata information.

12. A data query processing system, comprising:

13. The data query processing system of claim 12, wherein the global metadata information includes at least one of: table definition information, table extension information, table field definition information, metadata rule information, table statistics information.

14. The data query processing system of claim 12, wherein the component management node comprises: the system comprises a query execution plan module, a cluster management node, a global metadata management module and a data component docking layer, wherein:

the global metadata management module is used for acquiring the global metadata information from the global metadata base;

the query execution plan module is used for receiving a query processing request, determining a data component related to the query processing request according to the global metadata information, and generating a query execution plan according to the global metadata information; generating an execution task based on the query execution plan, and issuing the execution task to a task execution node accessed to a cluster management node;

the cluster management node is used for managing the access and exit of the task execution node;

the data component docking layer is used for docking with the data component to realize the access of the task execution node to the data component.

15. The data query processing system of claim 14, wherein the query execution plan module generating a query execution plan based on the global metadata information comprises: and matching the query processing request with a predefined metadata rule in the global metadata information, and generating the query execution plan according to the matched metadata rule.

16. The data query processing system of claim 14, wherein the global metadata management module is further configured to: and pushing the global metadata information to the task execution node.

17. The data query processing system of claim 16, wherein the task execution node is further configured to, during a data operation performed by interacting with the data component involved in the query processing request, locally obtain the global metadata information from the task execution node when the global metadata information is needed.

18. The data query processing system of claim 16, wherein the global metadata management module is further configured to update the global metadata repository when the task execution node updates the data in the data component, and synchronize the local global metadata information with the global metadata information of the task execution node according to the updated global metadata repository.

19. The data query processing system of any of claims 12 to 18, wherein the query execution plan module generating the query execution plan based on the global metadata information comprises: a syntax tree is generated that includes multiple levels of execution nodes, and at least some of the filter conditions are assigned to table scan nodes in the syntax tree.

20. The data query processing system of any of claims 12 to 18, wherein the query execution plan module generating the query execution plan based on the global metadata information comprises: and generating a query execution plan which is optimized based on cost according to the statistical information in the global metadata information.

21. A data query processing apparatus comprising a memory and a processor, the memory storing a program which, when read and executed by the processor, implements a data query processing method as claimed in any one of claims 1 to 11.

22. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the data query processing method of any one of claims 1 to 11.