WO2022247201A1 - 一种数据查询方法及装置 - Google Patents

一种数据查询方法及装置 Download PDF

Info

Publication number
WO2022247201A1
WO2022247201A1 PCT/CN2021/134954 CN2021134954W WO2022247201A1 WO 2022247201 A1 WO2022247201 A1 WO 2022247201A1 CN 2021134954 W CN2021134954 W CN 2021134954W WO 2022247201 A1 WO2022247201 A1 WO 2022247201A1
Authority
WO
WIPO (PCT)
Prior art keywords
data query
data
sql
subtask
script
Prior art date
Application number
PCT/CN2021/134954
Other languages
English (en)
French (fr)
Inventor
王和平
尹强
黄山
杨峙岳
刘有
杨永坤
华德义
白乐
徐嘉杨
饶进阳
邸帅
卢道和
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022247201A1 publication Critical patent/WO2022247201A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • Embodiments of the present invention relate to the field of financial technology (Fintech), and in particular to a data query method and device.
  • Fetech financial technology
  • the existing solutions are usually based on data query engine tools (such as Presto or openLooKeng), and each data is obtained through the data source connector corresponding to each data source.
  • data query engine tools such as Presto or openLooKeng
  • the full amount of data in the database table of the source, and the mixed calculation for the full amount of data in the database table of each data source can get the final query result.
  • this processing method needs to consume more computing resources when performing mixed calculations on the full amount of data in the database tables of each data source; on the other hand, it needs to consume more network resources.
  • Embodiments of the present invention provide a data query method and device, which are used to reduce network resources consumed during data transmission.
  • an embodiment of the present invention provides a data query method, including:
  • the computing component receives a data query request;
  • the data query request includes a structured query language sql data query script;
  • the calculation component parses the sql data query script to generate m data query subtasks with execution dependencies; the m data query subtasks include at least one first data that only involves a single data source node Query subtasks;
  • the computing component distributes at least one first data query subtask to respective data source nodes; the data source node is used to execute the first data query subtask and obtain data query subresults;
  • the calculation component executes a second data query subtask among the m data query subtasks based on the execution dependency and the data query subtask to obtain a data query result; the second data query subtask involves Multiple data source nodes.
  • m data query subtasks with execution dependencies are generated by parsing the sql data query script, and at least one of the m data query subtasks includes the first data that only involves a single data source node Query subtasks, and distribute at least one first data query subtask to respective data source nodes, so that each data source node executes the first data query subtasks received respectively, and obtains data query subresults without computing components
  • the full amount of data in the database table is obtained from each data source node involved in the sql data query script, and the query calculation is performed locally in the computing component.
  • the calculation component executes the second data query subtask among the m data query subtasks based on the execution dependencies and the data query sub-results to obtain the required data query results.
  • the calculation components since the calculation components only follow the execution dependencies
  • the integrated calculation is performed based on a small number of data query sub-results (compared with the full amount of data in the database table, the number of data query sub-results is much smaller), so the calculation pressure of the calculation component can be reduced.
  • each data source node since each data source node only transmits data query sub-results to the computing component, and does not transmit the full amount of data in the database table, it can greatly reduce the amount of data transmission between the computing component and each data source node, thereby reducing data transmission.
  • the network resources consumed in the process can solve the problem in the prior art that the full amount of data in the database table of each data source node needs to be obtained.
  • the scheme can make full use of the computing power supported by each data source node to perform data query sub-tasks and obtain data query sub-results, it can reduce the computing resources consumed by computing components and reduce some calculations for computing components
  • the development of capabilities can reduce the workload of developers.
  • the calculation component parses the sql data query script to generate m data query subtasks with execution dependencies, including:
  • the calculation component generates the syntax tree of the sql data query script according to the syntax analysis rules
  • the computing component determines from the syntax tree a first data query subtask involving only a single data source node
  • the calculation component constructs a second data query subtask through the data query subresult of the first data query subtask corresponding to the table connection keyword;
  • the calculation component determines m data query subtasks with execution dependencies according to the execution order of each first data query subtask and each second data query subtask.
  • the first data query subtask can be determined timely and accurately, and the first data query subtask is distributed to the corresponding data source node for execution, thereby avoiding the The full amount of data in the database table is obtained from each data source node involved in the sql data query script.
  • the second data query subtask is constructed through the data query subresult of the first data query subtask corresponding to the table connection keyword, so that the calculation component executes the second data query subtask based on the execution dependency and the data query subresult. Tasks, so that the required data query results can be obtained in a timely and effective manner, which can reduce the computing pressure of computing components.
  • the calculation component generates a syntax tree of the sql data query script according to syntax parsing rules, including:
  • the calculation component parses each keyword in the sql data query script sequentially according to the grammatical analysis rules
  • calculation component determines that the parsed table name keywords do not conform to the table name naming rules, then continue to analyze the table name keywords according to the grammatical analysis rules until the table name keywords that meet the table name naming rules are parsed. , so as to obtain the syntax tree of the sql data query script.
  • the table name keywords when it is determined that the parsed table name keywords do not conform to the table name naming rules, the table name keywords can be further analyzed until the table name keywords that meet the table name naming rules are parsed, and the complete table name keywords can be obtained.
  • Clear grammatical tree and based on the grammatical tree, multiple data query subtasks can be split, so that the data query subtasks that need to be executed by the data source node and the data query subtasks that need to be executed by the computing node can be determined, so that it can be used for Subsequent data query subtasks that need to be executed by each data source node are determined in a timely manner to provide support.
  • the data source node is determined by the following methods, including:
  • the calculation component determines the data source node involved in the sql data query script according to the specified tag name in the syntax parsing rule; or,
  • the calculation component determines the data source nodes involved in the sql data query script according to the table name rules in the syntax analysis rules.
  • the data source nodes involved in the sql data query script can be determined timely and accurately by following the specified tag name in the syntax analysis rule or the table name rule in the syntax analysis rule.
  • the computing component marks at least one first data query subtask among the m data query subtasks with corresponding data source node identifiers;
  • the computing component distributes the first data query subtask to respective data source nodes, including:
  • the calculation component For each first data query subtask, the calculation component distributes the first data query subtask to corresponding data source nodes based on the data source node identifier corresponding to the first data query subtask.
  • each first data query subtask by marking each first data query subtask with the corresponding data source node identifier, it can be ensured that the computing component distributes each first data query subtask to its corresponding data source node in a timely and accurate manner, and avoids
  • the computing component distributes the first data query subtask, because it cannot identify the attribution of the first data query subtask, the first data query subtask is distributed to unmatched data source nodes, and the correct data query result cannot be obtained.
  • calculation component parses the sql data query script, it also includes:
  • the calculation component determines that the sql data query script is executable.
  • the calculation component verifies the syntax and/or parameters of the sql data query script through the set sql data query script verification rules, so as to determine whether the sql data query script is executable.
  • the embodiment of the present invention also provides a data query device, including:
  • a receiving unit configured to receive a data query request; the data query request includes a structured query language sql data query script;
  • a processing unit configured to parse the sql data query script to generate m data query subtasks with execution dependencies; at least one of the m data query subtasks involves only a single data source node.
  • processing unit is specifically configured to:
  • processing unit is specifically configured to:
  • each keyword in the sql data query script is parsed in sequence
  • processing unit is specifically configured to:
  • the data source nodes involved in the sql data query script are determined according to the table name rules in the grammar parsing rules.
  • processing unit is also used for:
  • the processing unit is specifically used for:
  • For each first data query subtask based on the data source node identifier corresponding to the first data query subtask, distribute the first data query subtask to the corresponding data source node.
  • processing unit is also used for:
  • processing unit is specifically configured to:
  • the syntax and/or parameters of the sql data query script are verified through the set sql data query script verification rules, so as to determine whether the sql data query script is executable.
  • an embodiment of the present invention provides a computing device, including at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processing The server executes any data query method described in the first aspect above.
  • an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes the above-mentioned first The data query method described in any aspect.
  • FIG. 1 is a schematic diagram of a data query system architecture provided by an embodiment of the present invention
  • Fig. 2 is a schematic flow chart of a data query method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a benchmark syntax tree provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a syntax tree for sql data query scripts provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a dependency tree provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of another dependency tree provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a data query device provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • the system structure shown in FIG. 1 is taken as an example to describe the architecture of the data query system applicable to the embodiment of the present invention.
  • the data query system architecture can be applied to participate in mixed queries involving multiple data sources, etc.
  • the data query system architecture may include a client 100, a computing component 200, and at least one data source node (such as a data source node 301, a data source node 302, and a data source node 303, etc.).
  • the computing component 200 may include a task acceptance verification module 201 , a task analysis and arrangement module 202 and a task distribution module 203 .
  • each data source node can be connected to the computing component 200, for example, it can be connected by wire or wirelessly, which is not limited in this embodiment of the present invention.
  • each data source node may include at least one database engine.
  • the data source node 301 may include database engines such as SparkEngine and HiveEngine
  • the data source node 302 may include database engines such as HbaseEngine and HiveEngine
  • the data source node 303 may include database engines such as SparkEngine and HbaseEngine.
  • the client 100 is configured to send the data query script to the computing component 200 after receiving the data query script submitted by the user.
  • the client 100 may be client software on a terminal, and the terminal may be a mobile phone, a notebook computer, a desktop computer, a tablet computer, etc., which is not limited in this embodiment of the present invention.
  • the computing component 200 is used to parse the structured query language sql data query script sent by the client 100 to obtain data source information. If the data source information involves multiple data sources, the data query script can be split into multiple sql data query subscripts, and at least one sql data query subscript that needs to be sent to the data source node for execution is distributed to the respective corresponding data source nodes. Then, according to the query dependencies of the plurality of sql data query subscripts, the data query subresults obtained after each data source node executes the corresponding data query subscripts are integrated, so as to obtain the final target data query result.
  • the task acceptance verification module 201 first verifies the received sql data query script, and after the verification is successful, the sql data query script is parsed through the task analysis and arrangement module 202 to obtain Data source information, if the data source information involves multiple data sources, the data query script can be split into multiple sql data query subscripts, and the multiple sql data query subscripts can be distributed to The corresponding data source nodes.
  • the computing component such as Linkis, etc.
  • the application layer and the engine layer can be decoupled through the computing component 200, which simplifies the complex network call relationship, reduces the overall complexity, and saves overall development and maintenance costs.
  • the data source node For each data source node, after receiving the corresponding sql data query subscript, the data source node executes the sql data subquery script, obtains the data query subresult, and stores the data query subresult, for example, the data query
  • the subresults are stored in the context context.
  • FIG. 1 is only an example, which is not limited in this embodiment of the present invention.
  • FIG. 2 exemplarily shows a flow of a data query method provided by an embodiment of the present invention, and the flow can be executed by a data query device.
  • the process specifically includes:
  • Step 201 the computing component receives a data query request.
  • Step 202 the computing component parses the sql data query script to generate m data query subtasks with execution dependencies.
  • Step 203 the computing component distributes at least one first data query subtask to respective data source nodes.
  • step 204 the calculation component executes a second data query subtask among the m data query subtasks based on the execution dependency and the data query subtask to obtain a data query result.
  • the calculation component may receive a data query request from a user on a client terminal on a terminal.
  • the user can submit a data query request through the client on the mobile phone, or can submit a data query request through the web (World Wide Web, World Wide Web) interface on the laptop computer, or can directly submit the data query request through the service interface provided by the system where the computing component is located.
  • the data query request may include a sql data query script.
  • the calculation component parses the sql data query script to generate m data query subtasks with execution dependencies.
  • the m data query subtasks may include at least one first data query subtask involving only a single data source node; each data query subtask may include a sql data query subscript, that is to say, the data source node When executing the first data query subtask, what is executed is the sql data query subscript in the first data query subtask; or when the calculation component executes the second data query subtask, what is executed is the second data query subtask The sql data query subscript.
  • the calculation component generates a syntax tree of the sql data query script according to syntax analysis rules, and determines the first data query subtask involving only a single data source node from the syntax tree. Then, for any table connection keyword in the syntax tree, a second data query subtask is constructed through the data query subresult of the first data query subtask corresponding to the table connection keyword. Then, according to the execution sequence of each first data query subtask and each second data query subtask, m data query subtasks with execution dependencies are determined.
  • the table join keywords can include left join, right join, inner Join/join, full join, etc.
  • the first data query subtask can be determined in a timely and accurate manner, and the first data query subtask can be distributed to the corresponding data source nodes for execution, thereby avoiding the problem of querying from the sql data query
  • the full amount of data in the database table is obtained from each data source node involved in the script.
  • the second data query subtask is constructed through the data query subresult of the first data query subtask corresponding to the table connection keyword, so that the calculation component executes the second data query subtask based on the execution dependency and the data query subresult. Tasks, so that the required data query results can be obtained in a timely and effective manner, which can reduce the computing pressure of computing components.
  • the syntax analysis of the sql data query script is performed to generate a syntax tree of the sql data query script, and two data query subtasks are determined from the syntax tree, namely the data query subtask a and data query subtask b. Then construct the data query subtask c through the execution result of the data query subtask a corresponding to the table connection keyword (such as join) and the execution result of the data query subtask b.
  • the execution of data query subtask c depends on the execution result of data query subtask a and the data query subresult of data query subtask b.
  • the data query subtask is executed in data source node a
  • the data query subtask b is executed in the data source node b
  • the data query subtask c is executed in the computing component.
  • the keywords in the sql data query script can be parsed in turn according to the syntax analysis rules, if it is determined that the parsed table name keywords do not conform to the table name naming rules , continue parsing the table name keywords according to the grammar parsing rules until the table name keywords conforming to the table name naming rules are parsed, so as to obtain the syntax tree of the SQL data query script. Based on this, when it is determined that the parsed table name keywords do not conform to the table name naming rules, the table name keywords can be further analyzed until the table name keywords that meet the table name naming rules are resolved, and a complete and clear table name can be obtained.
  • Syntax tree, and multiple data query subtasks can be split based on the syntax tree.
  • the data query subtasks that need to be executed by the data source node and the data query subtasks that need to be executed by the computing node can be determined, so that it can be used for subsequent and timely Provide support for accurately determining the data query subtasks that each data source node needs to execute.
  • the calculation component verifies the syntax and/or parameters of the sql data query script through the set sql data query script verification rules, so as to determine whether the sql data query script is executable. In this way, it can be timely and accurately determined whether the syntax and/or parameters of the sql data query script are correct, so that it can be timely and effectively determined whether the sql data query script can be executed successfully.
  • the calculation component distributes at least one first data query subtask to respective data source nodes, so that each data source node executes the received first data query subtask and obtains data query subresults. Specifically, after generating m data query subtasks with execution dependencies, the computing component marks at least one first data query subtask among the m data query subtasks with corresponding data source node identifiers. For each first data query subtask, the calculation component distributes the first data query subtask to the corresponding data source node based on the data source node identifier corresponding to the first data query subtask.
  • each first data query subtask with a corresponding data source node identifier, it can be ensured that the computing component distributes each first data query subtask to its corresponding data source node in a timely and accurate manner, and it is possible to prevent the computing component from When distributing the first data query subtask, because the attribution of the first data query subtask cannot be identified, the first data query subtask is distributed to unmatched data source nodes, and the correct data query result cannot be obtained.
  • the data source node can be determined in the following way: the calculation component can timely and accurately determine the data source node involved in the sql data query script according to the specified tag name in the syntax analysis rule; or, it can follow the table in the syntax analysis rule Name rules can be used to accurately determine the data source nodes involved in the sql data query script in a timely manner.
  • the data query subtask a is sent to the data source node a (such as IDC_a) for execution, and the data query subtask a can be marked with the identifier of IDC_a;
  • the data query subtask b is sent to the data source node b (such as IDC_b ) is executed, the data query subtask b can be marked with the identifier of IDC_b.
  • the data query subtask a can be sent to the data source node a corresponding to IDC_a in a timely and accurate manner
  • the data query subtask b can be sent to the data source node a corresponding to IDC_b in a timely and accurate manner.
  • the calculation component executes the second data query subtask among the m data query subtasks based on the execution dependency and the data query subtasks to obtain the data query result.
  • the second data query subtask involves multiple data source nodes.
  • three data query subtasks with dependencies are generated through grammatical analysis of the sql data query script, namely data query subtask a, data query subtask b and data query subtask c.
  • the execution of the data query subtask c needs to depend on the execution result of the data query subtask a and the data query subresult of the data query subtask b.
  • the computing component can execute based on the dependencies between data query subtask a, data query subtask b, and data query subtask c, and the execution results of data query subtask a and data query subtask b.
  • Data query subtask c so that the data query results required by the user can be accurately obtained.
  • Step1 The calculation component verifies the data query script submitted by the client.
  • the computing component may be set on an independent physical machine, or may be set on a server cluster or a distributed system composed of multiple physical servers, which is not limited in the embodiment of the present invention.
  • the user when the user needs to query data, he can submit a data query request to the task receiving verification module in the computing component through the client on the terminal.
  • the data query request can include the pre-edited sql data query script.
  • the computing component parses out the sql data query script from the data query request.
  • the user can directly edit the sql data query script in real time on the service interface provided by the system where the computing component resides or directly input the pre-edited sql data query script.
  • the service interface provided by the system where the computing component is located can be displayed by the client on the terminal.
  • the task receiving verification module in the computing component will verify the sql data query script submitted by the user to determine whether the sql data query script submitted by the user can be executed successfully, for example, the syntax and/or Parameters etc. are verified.
  • Step2 The calculation component performs syntax analysis on the sql data query script that passes the verification, and generates at least one data query subscript.
  • the task analysis and arrangement module in the computing component after receiving the sql data query script, performs grammatical analysis on the sql data query script according to the sql syntax analysis rules, and parses out at least one data source involved in the sql data query script , and then divide the sql data query script into multiple steps to execute based on the at least one data source.
  • the sql data query script is divided into multiple sql data query subscripts, and at least one sql data query that needs to be sent to the data source node for execution in the multiple sql data query subscripts Subscripts, at least one sql data query subscript that needs to be sent to the data source node generates corresponding sql data query subscripts, and at least one sql data that does not need to be sent to the data source node among the multiple sql data query subscripts
  • the query subscripts respectively generate corresponding sql data query subtasks and save them locally.
  • label the sql data query subtask that needs to be sent to the data source node with the corresponding data source label such as labeling IDC (Internet Data Center, data center, also known as computer room or computing center) 1 or IDC2, etc.
  • IDC Internet Data Center, data center, also known as computer room or computing center
  • Engine1 can be used to represent the database engine SparkEngine or HiveEngine, etc., and Engine2 can be used to represent the database engine HiveEngine or SparkEngine, etc.; or, Engine1 can be used to represent the database engine SparkEngine or HbaseEngine, etc., and Engine2 can be used to represent the database engine HbaseEngine or SparkEngine, etc.; or, Engine1 may be used to represent the database engine HiveEngine or HbaseEngine, etc., and Engine2 may be used to represent the database engine HbaseEngine or HiveEngine, etc., which is not limited in this embodiment of the present invention.
  • sql data query subtask A needs to be sent to the data Source node A (such as IDC1_Engine1)
  • sql data query subtask B needs to be sent to data source node B (such as IDC2_Engine2)
  • IDC1_Engine1 data source label for sql data query subtask A
  • label sql data query subtask B On IDC2_Engine2.
  • a sql data query script involves two data sources, that is, data source A and data source B. Based on the two data sources, the sql data query script can be divided into multiple data query steps (such as stage1, stage2 and stage3) for execution. That is, based on the two data sources, the sql data query script is divided into corresponding multiple sql data query subscripts, that is, based on data source A and data source B, the sql data query script is divided into sql data Query subscript A, sql data query subscript B and sql data query subscript C.
  • the entire sql data query script needs to be executed in the data query engine tool, and the full amount of data in the database table of at least one data source involved in the sql data query script needs to be read, while this scheme There is no need to read the full amount of data in the database table of each data source node, split the entire sql data query script into multiple sql data query subscripts, and distribute at least one sql data query subscript that needs to be executed by each data source node For the corresponding data source nodes, relying on the computing power supported by each data source node to execute the corresponding SQL data query subscript can make full use of the computing power supported by each data source node and reduce the computing resources consumed by computing components. And can reduce the development of some computing power for computing components.
  • step a the task analysis and arrangement module in the calculation component completes the data source analysis for the sql data query script.
  • the data source information involved in the sql data query script can be obtained through two implementation manners.
  • the first implementation method is: the user specifies the corresponding data source label information in the submitted data query request, and then the corresponding data source information can be obtained directly through the data source label information. That is to say, the user needs to specify the database table name and data source information involved in the data source label information.
  • the meaning of the data source label information is: database table name-data source name-database name.
  • tableA is used to indicate the name of the database table
  • IDC1_Hive is used to indicate the name of the data source
  • DB1 is used to indicate the name of the database in Hive
  • tableB is used to indicate the name of the database table
  • IDC2_Hbase is used to indicate the name of the data source
  • DB1 is used to indicate the name of the database in Hbase The database name.
  • the second implementation method is: the user specifies the corresponding data source information in the database table name in the submitted sql data query script, then the corresponding data source information can be obtained by parsing the sql data query script. That is to say, the user needs to specify the data source name, database name and other information in the database table name.
  • the user specifies the table name rule containing data source information in the sql data query script as: data source name.database name.database table name.
  • IDC1_Hive is used to represent the data source name
  • DB1 is used to represent the database name in Hive
  • tableA is used to represent the database table name
  • IDC2_Hbase is used to represent the data source name
  • DB1 is used to represent the database name in Hbase
  • tableB is used to represent the database table name.
  • step b the task analysis and arrangement module in the calculation component generates a dependency tree for the sql data query script.
  • the dependency tree can directly generate the corresponding SQL data query task based on the SQL data query script, and distribute the SQL data query task to the corresponding data source node, so that the data source node can execute the SQL data query script in the SQL data query task , get the data query result and cache it.
  • the sql data query script involves mixed data query, and a dependency tree for the sql data query script needs to be generated. That is, by parsing and splitting the sql data query script based on multiple data sources involved in the sql data query script, multiple sql data query subscripts are determined. Specifically, the task parsing and arrangement module in the computing component can analyze and split the sql data query script according to the sql syntax parsing rules, so as to obtain multiple sql data query subscripts.
  • FIG. 3 is a schematic diagram of a reference syntax tree provided by an embodiment of the present invention.
  • the lexical analysis tool parses the sql data query script according to tokens such as select, from, join, where, etc., first parses out the corresponding data source information (such as IDC1_Hive and IDC2_Hbase), and then parses out the corresponding query conditions (such as where tableA.id >1 and tableB.id>1), the corresponding syntax tree shown in Figure 4 can be obtained. Then, based on the syntax tree corresponding to the sql data query script, a dependency tree involving multiple data query steps (that is, multiple sql data query subscripts) as shown in FIG. 5 can be generated.
  • the task analysis and arrangement module in the computing component analyzes and splits the SQL data query script based on multiple data sources involved in the SQL data query script, and determines multiple data query steps, namely Stage1, Stage2 and Stage3 .
  • Stage1 is to execute sql data query subscript A (i.e. execute Select*from IDC1_Hive.DB1.tableA tableA where tableA.id>1)
  • Stage2 is to execute sql data query subscript B (i.e. execute Select*from IDC2_Hbase.DB1. tableB tableB where tableB.id>1)
  • Stage1 is executed in the data source node IDC1_Hive
  • Stage2 is executed in the data source node IDC2_Hbase
  • Stage3 is executed in the computing component.
  • Stage3 is executed in the calculation component based on the execution result CSTableA of Stage1 and the execution result CSTableB of Stage2.
  • the relatively complex sql data query script is to query the persons whose name is the same as the adult in XX city (such as city A) in the data source ES and the adult in XX city (such as city A) in the data source Hive.
  • the more complicated sql data query script is:
  • the lexical analysis tool analyzes the more complex sql data query script according to tokens such as select, from, join, where, etc., and first parses out the corresponding data source information (such as IDC1_ES, IDC2_Hive1 and IDC2_Hive2 ), and then parse out the corresponding query conditions (such as where es_tableA.age>18and hive_tableB.age>18), you can get the corresponding syntax tree. Then, based on the syntax tree corresponding to the relatively complex sql data query script, a dependency tree involving multiple data query steps (ie, multiple sql data query subscripts) as shown in FIG. 6 can be generated.
  • tokens such as select, from, join, where, etc.
  • the task analysis and arrangement module in the computing component parses and splits the more complex sql data query script based on the multiple data sources involved in the more complex sql data query script, and determines multiple data queries Steps, namely Stage1, Stage2, Stage3, Stage4 and Stage5. Specifically, firstly, two data query steps (namely Stage1 and Stage*) are parsed out through the outer join condition of the relatively complex sql data query script, and then the outermost where condition is parsed out, and then pushed down to the corresponding In the subqueries on both sides of the join.
  • Stage* because there is still a join of mixed queries in the second data query step (that is, Stage*), continue to analyze and generate two data query steps (that is, Stage2 and Stage3), and mix Stage2 and Stage3 when executing Stage2 and Stage3
  • the query result is stored in CSTableD, so that Stage4 executes sql data query subscript 4 based on the CSTableD.
  • Stage5 obtains the final target data query result based on the execution results of Stage1 and Stage4.
  • Stage1 is to execute sql data query subscript 1 (i.e.
  • Stage2 is to execute sql data query Subscript 2 (i.e. execute Select tableB.name, tableB.age from IDC2_Hive1.hive1.tableB tableB where tableB.age>18);
  • Stage4 is to execute sql data query subscript 4 (i.e. execute Select B.name,B.age,C.area from CSTableB B join CSTableC C on B.
  • Stage1 is executed in the data source node IDC1_ES
  • Stage2 is executed in the data source IDC2_Hive
  • Stage3 is executed in IDC2_Hive
  • Stage4 and Stage5 are executed in the computing component.
  • Stage4 is executed in the computing component based on the execution results of Stage2 CSTableB and Stage3
  • Stage5 is executed in the computing component based on the execution results of Stage1 CSTableA and Stage4.
  • Step3 The calculation component encapsulates at least one sql data query subscript as a corresponding data query subtask, and distributes at least one data query subtask to the respective corresponding data source nodes.
  • Stage1 and Stage2 into corresponding sql data query subtasks, namely sql data query subtask A and sql data query subtask B, and provide each sql data query subtask Mark the corresponding data source tag and database engine tag on the query subtask, for example, mark IDC1_Hive for sql data query subtask A, and mark IDC2_Hbase for sql data query subtask B. Then, distribute the sql data query subtask A to the corresponding data source node IDC1_Hive, and distribute the sql data query subtask B to the corresponding data source node IDC2_Hbase.
  • the corresponding sql data query subtask C generated by the sql data query subscript C is stored locally in the computing component, and after it is detected that the sql data query subtask A and the sql data query subtask B are executed on their respective data source nodes (Or each data source node can notify the calculation component of the execution completion message after executing its own sql data query subtask, or each data source node can execute its own sql data query subtask after completing its respective execution The results are sent to the computing component), and then based on the execution results of the sql data query subtask A and the execution results of the sql data query subtask B, the sql data query subtask C is executed to obtain the final target data query result.
  • Stage1, Stage2, and Stage3 into corresponding sql data query subtasks, namely, sql data query subtask 1, sql data query subtask 2, and sql data query Query subtask 3, and mark the corresponding data source label and database engine label for each sql data query subtask, for example, mark IDC1_ES for sql data query subtask 1, and mark IDC2_Hive1 for sql data query subtask 2, Mark IDC2_Hive2 for sql data query subtask 3.
  • the sql data query subtask 1 is distributed to the corresponding data source node IDC1_ES
  • the sql data query subtask 2 is distributed to the corresponding data source node IDC2_Hive1
  • the sql data query subtask 3 is distributed to the corresponding data source node IDC2_Hive2.
  • sql data query subscript 4 and sql data query subscript 5 generate corresponding sql data query subtask 4 and sql data query subtask 5, respectively, and store them locally in the computing component, and when sql data query subtask 1,
  • sql data query subtask 2 and sql data query subtask 3 are executed on their respective data source nodes (or each data source node can notify the computing component of the execution completion message after executing their respective sql data query subtasks, Or each data source node can send its execution result to the computing component after executing its own sql data query subtask), and then based on the execution result of sql data query subtask 2 and the execution result of sql data query subtask 3
  • Execute sql data query subtask 4 and then execute sql data query subtask 5 based on the execution result of sql data query subtask 1 and the execution result of sql data query subtask 4 after detecting that the execution of sql data query subtask 4 is completed, Get the final target data query result.
  • Step4 Each data source node receives the corresponding data query subtask, and executes the sql data query subscript in the data query subtask.
  • the data source node For each data source node, after receiving the data query subtask, the data source node parses the data query subtask to obtain the corresponding sql data query subscript, and then executes the sql data query subscript. Exemplarily, continue to take the above example shown in FIG. 5 as an example.
  • the data source node IDC1_Hive parses the sql data query subtask A to obtain the sql data query subscript A. Then execute the sql data query subscript A, and store the executed execution result in the context CSTableA.
  • the data source node IDC2_Hbase After the data source node IDC2_Hbase receives the sql data query sub-task B, it parses the sql data query sub-task B to obtain the sql data query sub-script B, then executes the sql data query sub-script B, and stores the execution result after execution to the context CSTableB. Then, the computing component executes the sql data query subtask C based on the execution result of the sql data query subtask A and the execution result of the sql data query subtask B to obtain the final target data query result. Or, continue to take the example shown in Figure 6 above as an example.
  • the data source node IDC1_ES parses the sql data query subtask 1 to obtain the sql data query subscript 1, and then executes The sql data queries subscript 1, and stores the executed execution result in the context CSTableA.
  • the data source node IDC2_Hive1 parses the sql data query sub-task 2 to obtain the sql data query sub-script 2, then executes the sql data query sub-script 2, and stores the executed execution results separately to context CSTableB.
  • the data source node IDC2_Hive2 After the data source node IDC2_Hive2 receives the sql data query sub-task 3, it parses the sql data query sub-task 3 to obtain the sql data query sub-script 3, then executes the sql data query sub-script 3, and stores the executed execution results separately to the context CSTableC. Then, the calculation component executes the sql data query subtask 4 based on the execution results of the sql data query subtask 2 and the execution results of the sql data query subtask 3, and after detecting that the execution of the sql data query subtask 4 is completed, based on the sql data query The execution result of subtask 1 and the execution result of sql data query subtask 4 execute sql data query subtask 5 to obtain the final target data query result.
  • the above embodiment shows that by parsing the sql data query script, m data query subtasks with execution dependencies are generated, and the m data query subtasks include at least one first data that only involves a single data source node Query subtasks, and distribute at least one first data query subtask to respective data source nodes, so that each data source node executes the first data query subtasks received respectively, and obtains data query subresults without computing components
  • the full amount of data in the database table is obtained from each data source node involved in the sql data query script, and the query calculation is performed locally in the computing component.
  • the calculation component executes the second data query subtask among the m data query subtasks based on the execution dependencies and the data query sub-results to obtain the required data query results.
  • the calculation components since the calculation components only follow the execution dependencies
  • the integrated calculation is performed based on a small number of data query sub-results (compared with the full amount of data in the database table, the number of data query sub-results is much smaller), so the calculation pressure of the calculation component can be reduced.
  • each data source node since each data source node only transmits data query sub-results to the computing component, and does not transmit the full amount of data in the database table, it can greatly reduce the amount of data transmission between the computing component and each data source node, thereby reducing data transmission.
  • the network resources consumed in the process can solve the problem in the prior art that the full amount of data in the database table of each data source node needs to be obtained.
  • the scheme can make full use of the computing power supported by each data source node to perform data query sub-tasks and obtain data query sub-results, it can reduce the computing resources consumed by computing components and reduce some calculations for computing components
  • the development of capabilities can reduce the workload of developers.
  • FIG. 7 exemplarily shows a data query device provided by an embodiment of the present invention, and the device can execute the flow of the data query method.
  • the device includes:
  • the receiving unit 701 is configured to receive a data query request; the data query request includes a structured query language sql data query script;
  • the processing unit 702 is configured to perform grammatical analysis on the sql data query script, and generate m data query subtasks with execution dependencies; the m data query subtasks include at least one first involving only a single data source node A data query subtask; distributing at least one first data query subtask to respective data source nodes; the data source node is used to execute the first data query subtask and obtain a data query subtask; based on the execution dependency and the data query sub-result, executing a second data query subtask among the m data query subtasks to obtain a data query result; the second data query subtask involves multiple data source nodes.
  • processing unit 702 is specifically configured to:
  • processing unit 702 is specifically configured to:
  • each keyword in the sql data query script is parsed in turn;
  • processing unit 702 is specifically configured to:
  • the data source nodes involved in the sql data query script are determined according to the table name rules in the grammar parsing rules.
  • processing unit 702 is further configured to:
  • the processing unit 702 is specifically used for:
  • For each first data query subtask based on the data source node identifier corresponding to the first data query subtask, distribute the first data query subtask to the corresponding data source node.
  • processing unit is also used for:
  • processing unit 702 is specifically configured to:
  • the syntax and/or parameters of the sql data query script are verified through the set sql data query script verification rules, so as to determine whether the sql data query script is executable.
  • the embodiment of the present invention also provides a computing device, as shown in FIG. 8 , including at least one processor 801 and a memory 802 connected to the at least one processor.
  • the specific connection medium between the processor 801 and the memory 802, the bus connection between the processor 801 and the memory 802 in FIG. 8 is taken as an example.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the memory 802 stores instructions executable by at least one processor 801, and at least one processor 801 can execute the steps included in the aforementioned data query method by executing the instructions stored in the memory 802.
  • the processor 801 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, by running or executing instructions stored in the memory 802 and calling data stored in the memory 802, thereby realizing data deal with.
  • the processor 801 may include one or more processing units, and the processor 801 may integrate an application processor and a modem processor.
  • the call processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 801 .
  • the processor 801 and the memory 802 can be implemented on the same chip, and in some embodiments, they can also be implemented on independent chips.
  • the processor 801 can be a general processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present invention.
  • a general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the data query method can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the memory 802 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules.
  • Memory 802 may include at least one type of storage medium, for example, may include flash memory, hard disk, multimedia card, card memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Disk , CD, etc.
  • Memory 802 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory 802 in the embodiment of the present invention may also be a circuit or any other device capable of implementing a storage function, and is used for storing program instructions and/or data.
  • an embodiment of the present invention also provides a computer-readable storage medium, which stores a computer program executable by a computing device, and when the program is run on the computing device, the computing device Execute the steps of the above data query method.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供了一种数据查询方法及装置,该方法包括计算组件接收数据查询请求,对sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务,并将至少一个第一数据查询子任务分发给各自的数据源节点,基于执行依赖关系和数据查询子结果,执行m个数据查询子任务中的第二数据查询子任务,得到数据查询结果。由于各数据源节点只是传输数据查询子结果给计算组件,并不是传输数据库表的全量数据,因此可以减少计算组件与各数据源节点之间的数据传输量,从而可以降低数据传输过程中所消耗的网络资源。此外,由于该方案充分利用各数据源节点自身支持的计算能力来执行数据查询子任务,因此可以降低计算组件所消耗的计算资源。

Description

一种数据查询方法及装置
相关申请的交叉引用
本申请要求在2021年05月25日提交中国专利局、申请号为202110572421.1、申请名称为“一种数据查询方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及金融科技(Fintech)领域,尤其涉及一种数据查询方法及装置。
背景技术
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技转变,但由于金融行业的安全性、实时性要求,也对技术提出的更高的要求。在金融领域,随着金融业务的不断扩展,金融业务数据的数量变多且数据的形式也变得多样,如此针对金融业务数据应用的要求也会变高,在此基础上,数据查询在金融业务上的应用也变得广泛。因此,如何进行及时有效地数据查询操作,以满足金融业务的需求就成为急需解决的问题。
现有方案在针对客户端提交的数据查询脚本中涉及的多数据源进行混合查询计算时,通常基于数据查询引擎工具(比如Presto或openLooKeng),通过各数据源对应的数据源连接器获取各数据源的数据库表的全量数据,并针对各数据源的数据库表的全量数据进行混合计算,即可得到最终的查询结果。然而,这种处理方式一方面在对各数据源的数据库表的全量数据进行混合计算时需要消耗较多的计算资源,另一方面在获取各数据源的数据库表的全量数据时需要消耗较多的网络资源。
综上,目前亟需一种数据查询方法,用以降低数据传输过程中所消耗的网络资源。
发明内容
本发明实施例提供了一种数据查询方法及装置,用以降低数据传输过程中所消耗的网络资源。
第一方面,本发明实施例提供了一种数据查询方法,包括:
计算组件接收数据查询请求;所述数据查询请求中包括结构化查询语言sql数据查询脚本;
所述计算组件对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务;所述m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务;
所述计算组件将至少一个第一数据查询子任务分发给各自的数据源节点;所述数据源节点用于执行第一数据查询子任务并得到数据查询子结果;
所述计算组件基于所述执行依赖关系和所述数据查询子结果,执行所述m个数据查询 子任务中的第二数据查询子任务,得到数据查询结果;所述第二数据查询子任务涉及多个数据源节点。
上述技术方案中,通过对sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务,且该m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务,并将至少一个第一数据查询子任务分发给各自的数据源节点,以使各数据源节点执行各自接收到的第一数据查询子任务,得到数据查询子结果,而无需计算组件从sql数据查询脚本所涉及的各数据源节点中获取数据库表的全量数据在计算组件本地进行查询计算。然后,计算组件基于执行依赖关系和数据查询子结果,执行m个数据查询子任务中的第二数据查询子任务,即可得到所需的数据查询结果,如此,由于计算组件只是按照执行依赖关系基于数量较少的数据查询子结果(相比数据库表的全量数据,数据查询子结果的数量少很多)进行整合计算,因此可以减轻计算组件的计算压力。基于此,由于各数据源节点只是传输数据查询子结果给计算组件,并不是传输数据库表的全量数据,因此可以大大减少计算组件与各数据源节点之间的数据传输量,从而可以降低数据传输过程中所消耗的网络资源,从而可以解决现有技术中存在需要获取各数据源节点的数据库表的全量数据的问题。此外,由于该方案能够充分利用各数据源节点自身支持的计算能力来执行数据查询子任务,得到数据查询子结果,因此可以降低计算组件所消耗的计算资源,并可以减少针对计算组件的一些计算能力的开发,从而可以减少开发人员的工作量。
可选地,所述计算组件对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务,包括:
所述计算组件按照语法解析规则,生成所述sql数据查询脚本的语法树;
所述计算组件从所述语法树中确定仅涉及单一数据源节点的第一数据查询子任务;
所述计算组件针对所述语法树中任一表连接关键词,通过所述表连接关键词对应的第一数据查询子任务的数据查询子结果构建第二数据查询子任务;
所述计算组件根据各第一数据查询子任务和各第二数据查询子任务的执行顺序,确定具有执行依赖关系的m个数据查询子任务。
上述技术方案中,通过基于sql数据查询脚本的语法树,可以及时准确地确定出第一数据查询子任务,并将第一数据查询子任务分发给对应的数据源节点进行执行,从而可以避免从sql数据查询脚本所涉及的各数据源节点中获取数据库表的全量数据。此外,通过表连接关键词对应的第一数据查询子任务的数据查询子结果构建出第二数据查询子任务,以便于计算组件基于执行依赖关系和数据查询子结果来执行该第二数据查询子任务,从而可以及时有效地得到所需的数据查询结果,如此即可减轻计算组件的计算压力。
可选地,所述计算组件按照语法解析规则,生成所述sql数据查询脚本的语法树,包括:
所述计算组件按照语法解析规则,依次解析出所述sql数据查询脚本中的各关键词;
所述计算组件若确定解析出的表名关键词不符合表名命名规则,则对所述表名关键词按照所述语法解析规则继续解析,直至解析到符合表名命名规则的表名关键词,从而得到所述sql数据查询脚本的语法树。
上述技术方案中,在确定解析出的表名关键词不符合表名命名规则时,可对表名关键词进行进一步解析,直至解析到符合表名命名规则的表名关键词,即可得到完整清晰的语法树,并基于该语法树可以拆分出多个数据查询子任务,如此也可以确定出需要数据源节 点执行的数据查询子任务以及需要计算节点执行的数据查询子任务,从而可以为后续及时地确定出各数据源节点需要执行的数据查询子任务提供支持。
可选地,通过如下方式确定数据源节点,包括:
所述计算组件按照语法解析规则中的指定标签名,确定所述sql数据查询脚本中涉及的数据源节点;或,
所述计算组件按照语法解析规则中的表名规则,确定所述sql数据查询脚本中涉及的数据源节点。
上述技术方案中,通过按照语法解析规则中的指定标签名或照语法解析规则中的表名规则,可以及时准确地确定出sql数据查询脚本中涉及的数据源节点。
可选地,在生成具有执行依赖关系的m个数据查询子任务之后,还包括:
所述计算组件为所述m个数据查询子任务中的至少一个第一数据查询子任务分别标注上对应的数据源节点标识;
所述计算组件将第一数据查询子任务分发给各自的数据源节点,包括:
针对每个第一数据查询子任务,所述计算组件基于所述第一数据查询子任务对应的数据源节点标识,将所述第一数据查询子任务分发给对应的数据源节点。
上述技术方案中,通过对各第一数据查询子任务标注上对应的数据源节点标识,可以确保计算组件及时准确地将各第一数据查询子任务分发给各自对应的数据源节点,并可以避免计算组件在分发第一数据查询子任务时因无法识别第一数据查询子任务的归属而导致第一数据查询子任务分发给不匹配的数据源节点,而无法获取正确的数据查询结果。
可选地,在所述计算组件对所述sql数据查询脚本进行语法解析之前,还包括:
所述计算组件确定所述sql数据查询脚本可执行。
上述技术方案中,在对sql数据查询脚本进行语法解析之前,通过确定sql数据查询脚本是否可执行,可以确保正确的sql数据查询脚本能够被成功执行而获取正确的数据查询结果,并可以避免sql数据查询脚本因不可执行而无法获取所需的数据查询结果。
可选地,通过下述方式确定所述sql数据查询脚本是否可执行:
所述计算组件通过设定的sql数据查询脚本校验规则,对所述sql数据查询脚本的语法和/或参数进行校验,从而确定所述sql数据查询脚本是否可执行。
上述技术方案中,通过设定的sql数据查询脚本校验规则,对sql数据查询脚本的语法和/或参数进行校验,可以及时准确地确定sql数据查询脚本的语法和/或参数是否正确,从而可以及时有效地确定sql数据查询脚本是否可成功执行。
第二方面,本发明实施例还提供了一种数据查询装置,包括:
接收单元,用于接收数据查询请求;所述数据查询请求中包括结构化查询语言sql数据查询脚本;
处理单元,用于对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务;所述m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务;将至少一个第一数据查询子任务分发给各自的数据源节点;所述数据源节点用于执行第一数据查询子任务并得到数据查询子结果;基于所述执行依赖关系和所述数据查询子结果,执行所述m个数据查询子任务中的第二数据查询子任务,得到数据查询结果;所述第二数据查询子任务涉及多个数据源节点。
可选地,所述处理单元具体用于:
按照语法解析规则,生成所述sql数据查询脚本的语法树;
从所述语法树中确定仅涉及单一数据源节点的第一数据查询子任务;
针对所述语法树中任一表连接关键词,通过所述表连接关键词对应的第一数据查询子任务的数据查询子结果构建第二数据查询子任务;
根据各第一数据查询子任务和各第二数据查询子任务的执行顺序,确定具有执行依赖关系的m个数据查询子任务。
可选地,所述处理单元具体用于:
按照语法解析规则,依次解析出所述sql数据查询脚本中的各关键词;
若确定解析出的表名关键词不符合表名命名规则,则对所述表名关键词按照所述语法解析规则继续解析,直至解析到符合表名命名规则的表名关键词,从而得到所述sql数据查询脚本的语法树。
可选地,所述处理单元具体用于:
按照语法解析规则中的指定标签名,确定所述sql数据查询脚本中涉及的数据源节点;或,
按照语法解析规则中的表名规则,确定所述sql数据查询脚本中涉及的数据源节点。
可选地,所述处理单元还用于:
在生成具有执行依赖关系的m个数据查询子任务之后,为所述m个数据查询子任务中的至少一个第一数据查询子任务分别标注上对应的数据源节点标识;
所述处理单元具体用于:
针对每个第一数据查询子任务,基于所述第一数据查询子任务对应的数据源节点标识,将所述第一数据查询子任务分发给对应的数据源节点。
可选地,所述处理单元还用于:
在对所述sql数据查询脚本进行语法解析之前,确定所述sql数据查询脚本可执行。
可选地,所述处理单元具体用于:
通过设定的sql数据查询脚本校验规则,对所述sql数据查询脚本的语法和/或参数进行校验,从而确定所述sql数据查询脚本是否可执行。
第三方面,本发明实施例提供一种计算设备,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行上述第一方面任意所述的数据查询方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述第一方面任意所述的数据查询方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种数据查询***架构的示意图;
图2为本发明实施例提供的一种数据查询方法的流程示意图;
图3为本发明实施例提供的一种基准语法树的示意图;
图4为本发明实施例提供的一种针对sql数据查询脚本的语法树示意图;
图5为本发明实施例提供的一种依赖树的结构示意图;
图6为本发明实施例提供的另一种依赖树的结构示意图;
图7为本发明实施例提供的一种数据查询装置的结构示意图;
图8为本发明实施例提供的一种计算设备的结构示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
为了便于理解本发明实施例,首先以图1中示出的***结构为例说明适用于本发明实施例的数据查询***架构。该数据查询***架构可以应用于参与涉及多数据源的混合查询等。如图1所示,该数据查询***架构可以包括客户端100、计算组件200和至少一个数据源节点(比如数据源节点301、数据源节点302和数据源节点303等)。其中,计算组件200可以包括任务接收校验模块201、任务解析编排模块202和任务分发模块203。客户端100与计算组件200进行连接,每个数据源节点可以与计算组件200进行连接,比如,可以通过有线方式连接,或者可以通过无线方式连接,本发明实施例对此并不作限定。此外,每个数据源节点可以包括至少一个数据库引擎。示例性地,比如数据源节点301可以包括SparkEngine、HiveEngine等数据库引擎;数据源节点302可以包括HbaseEngine、HiveEngine等数据库引擎;数据源节点303可以包括SparkEngine、HbaseEngine等数据库引擎。
其中,客户端100用于在接收到用户提交的数据查询脚本后,将该数据查询脚本发送给计算组件200。其中,客户端100可以为终端上的客户端软件,该终端可以为手机、笔记本电脑、台式电脑、平板电脑等,本发明实施例对此并不作限定。
计算组件200用于针对客户端100发送的结构化查询语言sql数据查询脚本进行语法解析,获取数据源信息,若该数据源信息涉及多个数据源,则可以将该数据查询脚本拆分为多个sql数据查询子脚本,并将需要发送给数据源节点执行的至少一个sql数据查询子脚本分发给各自对应的数据源节点。然后,按照该多个sql数据查询子脚本的查询依赖关系,将各数据源节点执行各自对应的数据查询子脚本后所得到的数据查询子结果进行整合处理,以便得到最终的目标数据查询结果。此外,在具体实施过程中,任务接收校验模块201首先对接收到的sql数据查询脚本进行校验,在校验成功后,通过任务解析编排模块202对该sql数据查询脚本进行语法解析,获取数据源信息,若该数据源信息涉及多个数据源,则可以将该数据查询脚本拆分为多个sql数据查询子脚本,并通过任务分发模块203将该多个sql数据查询子脚本分发给各自对应的数据源节点。其中,计算组件(比如Linkis等)200提供了强大的连通、复用、编排、扩展和治理管控能力。通过计算组件200可以将应用层和引擎层解耦,简化了复杂的网络调用关系,降低了整体复杂度,同时节约了整体开发和维护成本。
针对每个数据源节点,该数据源节点在接收到对应的sql数据查询子脚本后,执行该sql数据子查询脚本,得到数据查询子结果,并将数据查询子结果进行存储,比如将数据查询子结果存储至上下文context中。
需要说明的是,上述图1所示的结构仅是一种示例,本发明实施例对此不做限定。
基于上述描述,图2示例性的示出了本发明实施例提供的一种数据查询方法的流程,该流程可以由数据查询装置执行。
如图2所示,该流程具体包括:
步骤201,计算组件接收数据查询请求。
步骤202,所述计算组件对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务。
步骤203,所述计算组件将至少一个第一数据查询子任务分发给各自的数据源节点。
步骤204,所述计算组件基于所述执行依赖关系和所述数据查询子结果,执行所述m个数据查询子任务中的第二数据查询子任务,得到数据查询结果。
上述步骤201中,计算组件可以接收用户在终端上的客户端上的数据查询请求。比如,用户可以通过手机上的客户端提交数据查询请求,或者可以通过笔记本电脑上的web(World Wide Web,万维网)界面提交数据查询请求,或者,可以通过计算组件所在***提供的服务界面上直接实时编辑sql数据查询脚本,以生成数据查询请求,并将该数据查询请求在该服务界面进行上传。其中,数据查询请求可以包括sql数据查询脚本。
上述步骤202中,计算组件对sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务。其中,m个数据查询子任务中至少可以包括一个仅涉及单一数据源节点的第一数据查询子任务;每个数据查询子任务中可以包括sql数据查询子脚本,也即是说,数据源节点在执行第一数据查询子任务时,执行的是第一数据查询子任务中的sql数据查询子脚本;或者计算组件在执行第二数据查询子任务时,执行的是第二数据查询子任务中的sql数据查询子脚本。具体地,计算组件按照语法解析规则,生成sql数据查询脚本的语法树,并从该语法树中确定出仅涉及单一数据源节点的第一数据查询子任务。再针对语法树中任一表连接关键词,通过表连接关键词对应的第一数据查询子任务的数据查询子结果构建第二数据查询子任务。然后,根据各第一数据查询子任务和各第二数据查询子任务的执行顺序,确定具有执行依赖关系的m个数据查询子任务。其中,表连接关键词可以包括left join、right join、inner Join/join、full join等。如此,通过基于sql数据查询脚本的语法树,可以及时准确地确定出第一数据查询子任务,并将第一数据查询子任务分发给对应的数据源节点进行执行,从而可以避免从sql数据查询脚本所涉及的各数据源节点中获取数据库表的全量数据。此外,通过表连接关键词对应的第一数据查询子任务的数据查询子结果构建出第二数据查询子任务,以便于计算组件基于执行依赖关系和数据查询子结果来执行该第二数据查询子任务,从而可以及时有效地得到所需的数据查询结果,如此即可减轻计算组件的计算压力。
示例性地,针对某一sql数据查询脚本,通过sql数据查询脚本进行语法解析,生成sql数据查询脚本的语法树,并从该语法树中确定出两个数据查询子任务,即数据查询子任务a和数据查询子任务b。再通过表连接关键词(比如join)对应的数据查询子任务a的执行结果和数据查询子任务b的执行结果构建出数据查询子任务c。其中,数据查询子任务c的执行需要依赖数据查询子任务a的执行结果以及数据查询子任务b的数据查询子结果, 比如数据查询子任务是在数据源节点a中执行的,数据查询子任务b是在数据源节点b中执行的,而数据查询子任务c是在计算组件中执行的。
进一步地,在生成sql数据查询脚本的语法树时,可以按照语法解析规则,依次解析出所述sql数据查询脚本中的各关键词,若确定解析出的表名关键词不符合表名命名规则,则对表名关键词按照语法解析规则继续解析,直至解析到符合表名命名规则的表名关键词,从而得到sql数据查询脚本的语法树。基于此,在确定解析出的表名关键词不符合表名命名规则时,可对表名关键词进行进一步解析,直至解析到符合表名命名规则的表名关键词,即可得到完整清晰的语法树,并基于该语法树可以拆分出多个数据查询子任务,如此也可以确定出需要数据源节点执行的数据查询子任务以及需要计算节点执行的数据查询子任务,从而可以为后续及时地确定出各数据源节点需要执行的数据查询子任务提供支持。
此外,在对sql数据查询脚本进行语法解析之前,需要确定sql数据查询脚本是否可执行,以便确保正确的sql数据查询脚本能够被成功执行而获取正确的数据查询结果,并可以避免sql数据查询脚本因不可执行而无法获取所需的数据查询结果。具体地,计算组件通过设定的sql数据查询脚本校验规则,对sql数据查询脚本的语法和/或参数进行校验,从而确定sql数据查询脚本是否可执行。如此,可以及时准确地确定sql数据查询脚本的语法和/或参数是否正确,从而可以及时有效地确定sql数据查询脚本是否可成功执行。
上述步骤203中,计算组件将至少一个第一数据查询子任务分发给各自的数据源节点,以使各数据源节点执行接收到的第一数据查询子任务并得到数据查询子结果。具体地,在生成具有执行依赖关系的m个数据查询子任务之后,计算组件为m个数据查询子任务中的至少一个第一数据查询子任务分别标注上对应的数据源节点标识。针对每个第一数据查询子任务,计算组件基于所述第一数据查询子任务对应的数据源节点标识,将第一数据查询子任务分发给对应的数据源节点。如此,通过对各第一数据查询子任务标注上对应的数据源节点标识,可以确保计算组件及时准确地将各第一数据查询子任务分发给各自对应的数据源节点,并可以避免计算组件在分发第一数据查询子任务时因无法识别第一数据查询子任务的归属而导致第一数据查询子任务分发给不匹配的数据源节点,而无法获取正确的数据查询结果。
此外,可以通过下述方式确定数据源节点:计算组件按照语法解析规则中的指定标签名,可以及时准确地确定sql数据查询脚本中涉及的数据源节点;或者,可以按照语法解析规则中的表名规则,可以及时准确地确定sql数据查询脚本中涉及的数据源节点。
示例性地,假设有两个第一数据查询子任务,即数据查询子任务a和数据查询子任务b。其中,数据查询子任务a是发给数据源节点a(比如IDC_a)进行执行,则可以为数据查询子任务a标注上IDC_a的标识;数据查询子任务b是发给数据源节点b(比如IDC_b)进行执行,则可以为数据查询子任务b标注上IDC_b的标识。然后,在分发各第一数据查询子任务时,即可及时准确地将数据查询子任务a发送给IDC_a对应的数据源节点a,可以及时准确地将数据查询子任务b发送给IDC_b对应的数据源节点b。
上述步骤204中,计算组件基于执行依赖关系和数据查询子结果,执行m个数据查询子任务中的第二数据查询子任务,得到数据查询结果。其中,第二数据查询子任务涉及多个数据源节点。
示例性地,针对某一sql数据查询脚本,通过sql数据查询脚本进行语法解析,生成具有依赖关系的三个数据查询子任务,即数据查询子任务a、数据查询子任务b和数据查询 子任务c。其中,数据查询子任务c的执行需要依赖数据查询子任务a的执行结果以及数据查询子任务b的数据查询子结果。如此,计算组件即可基于数据查询子任务a、数据查询子任务b和数据查询子任务c之间的依赖关系,以及数据查询子任务a的执行结果和数据查询子任务b的执行结果,执行数据查询子任务c,从而可以准确地得到用户所需的数据查询结果。
基于此,下面对本发明实施例中数据查询方法的实施过程进行具体描述。
Step1:计算组件针对客户端提交的数据查询脚本进行校验。
其中,计算组件可以设置在一***立的物理机上,或者可以设置在多个物理服务器构成的服务器集群或者分布式***上,本发明实施例对此并不作限定。
具体实施过程中,用户在需要进行数据查询时,可以通过终端上的客户端向计算组件中的任务接收校验模块提交数据查询请求,该数据查询请求中可以包括已经预先编辑好的sql数据查询脚本。计算组件在接收到数据查询请求后,从数据查询请求中解析出sql数据查询脚本。或者,用户可以在计算组件所在***提供的服务界面上直接实时编辑sql数据查询脚本或直接输入已预先编辑好的sql数据查询脚本。其中,计算组件所在***提供的服务界面可以通过终端上的客户端进行显示。然后,计算组件中的任务接收校验模块会对用户提交的sql数据查询脚本进行校验,以确定用户提交的sql数据查询脚本是否可成功执行,比如,对sql数据查询脚本的语法和/或参数等进行校验。
Step2:计算组件针对校验通过的sql数据查询脚本进行语法解析,生成至少一个数据查询子脚本。
具体地,计算组件中的任务解析编排模块在接收到sql数据查询脚本后,按照sql语法解析规则,对该sql数据查询脚本进行语法解析,解析出该sql数据查询脚本所涉及的至少一个数据源,再基于该至少一个数据源将该sql数据查询脚本划分为多个步骤执行。也即是,基于该至少一个数据源将该sql数据查询脚本划分为多个sql数据查询子脚本,并针对该多个sql数据查询子脚本中需要发送给数据源节点执行的至少一个sql数据查询子脚本,将需要发送给数据源节点的至少一个sql数据查询子脚本分别生成对应的sql数据查询子任务,将该多个sql数据查询子脚本中不需要发送给数据源节点的至少一个sql数据查询子脚本分别生成对应的sql数据查询子任务保存在本地。同时将需要发送给数据源节点的sql数据查询子任务标注上对应的数据源标签,比如标注上IDC(Internet Data Center,数据中心,也可称为机房或计算中心)1或IDC2等,当然,也可以标注上具体执行sql数据查询子任务的数据库引擎,比如标注上Engine1或Engine2等。比如,Engine1可以用于表示数据库引擎SparkEngine或HiveEngine等,Engine2可以用于表示数据库引擎HiveEngine或SparkEngine、等;或者,Engine1可以用于表示数据库引擎SparkEngine或HbaseEngine等,Engine2可以用于表示数据库引擎HbaseEngine或SparkEngine等;或者,Engine1可以用于表示数据库引擎HiveEngine或HbaseEngine等,Engine2可以用于表示数据库引擎HbaseEngine或HiveEngine等,本发明实施例对此并不作限定。示例性地,假设有两个sql数据查询子任务(比如sql数据查询子任务A和sql数据查询子任务B)需要发送给对应的数据源节点,即,sql数据查询子任务A需要发送给数据源节点A(比如IDC1_Engine1),sql数据查询子任务B需要发送给数据源节点B(比如IDC2_Engine2),则可以为sql数据查询子任务A标注上数据源标签IDC1_Engine1,为sql数据查询子任务B标注上IDC2_Engine2。
示例性地,假设某一sql数据查询脚本所涉及两个数据源,即数据源A和数据源B。基于该两个数据源可以将该sql数据查询脚本划分为多个数据查询步骤(比如stage1、stage2和stage3)执行。也即是,基于该两个数据源将该sql数据查询脚本划分为对应的多个sql数据查询子脚本,即,基于数据源A和数据源B,从该sql数据查询脚本中划分出sql数据查询子脚本A、sql数据查询子脚本B以及sql数据查询子脚本C。基于此,相比现有方案中需要在数据查询引擎工具中执行整个sql数据查询脚本,并需要读取该sql数据查询脚本所涉及的至少一个数据源的数据库表中的全量数据,而本方案不需要读取各数据源节点的数据库表中的全量数据,将整个sql数据查询脚本拆分为多个sql数据查询子脚本,并将需要各数据源节点执行的至少一个sql数据查询子脚本分发给对应的数据源节点,依靠各数据源节点自身支持的计算能力来执行对应sql数据查询子脚本,可以使得各数据源节点自身支持的计算能力得到充分利用,降低计算组件所消耗的计算资源,并可以减少针对计算组件的一些计算能力的开发。
下面对基于sql语法解析规则对sql数据查询脚本进行解析的具体过程进行描述。
步骤a、计算组件中的任务解析编排模块完成针对sql数据查询脚本的数据源解析。
具体地,可以通过两种实现方式来获取sql数据查询脚本所涉及的数据源信息。
第一种实现方式为:用户在提交的数据查询请求中指定对应的数据源标签信息,则可以直接通过数据源标签信息获取到对应的数据源信息。也即是说,用户需要在数据源标签信息中指定涉及到的数据库表名和数据源信息。
示例性地,比如用户提交的数据查询请求中,包括某一sql数据查询脚本:select*from tableA join tableB on tableA.c1=tableB.c2,以及该sql数据查询脚本所涉及的数据源标签信息为tableA-IDC1_Hive-DB1和tableB-IDC2_Hbase-DB1。其中,该数据源标签信息的含义为:数据库表名-数据源名-数据库名。即,tableA用于表示数据库表名,IDC1_Hive用于表示数据源名,DB1用于表示Hive中的数据库名;tableB用于表示数据库表名,IDC2_Hbase用于表示数据源名,DB1用于表示Hbase中的数据库名。
第二种实现方式为:用户在提交的sql数据查询脚本中的数据库表名中指定了对应的数据源信息,则可以通过解析sql数据查询脚本获取对应的数据源信息。也即是说,需要用户在数据库表名中指定数据源名、数据库名等信息。
示例性地,比如用户提交的某一sql数据查询脚本为:select*from IDC1_Hive.DB1.tableA tableA join IDC2_Hbase.DB1.tableB tableB on tableA.c1=tableB.c2。其中,用户在sql数据查询脚本中指定包含有数据源信息的表名规则为:数据源名.数据库名.数据库表名。即,在该sql数据查询脚本所涉及的第一个数据源中,IDC1_Hive用于表示数据源名,DB1用于表示Hive中的数据库名,tableA用于表示数据库表名;在该sql数据查询脚本所涉及的第二个数据源中,IDC2_Hbase用于表示数据源名,DB1用于表示Hbase中的数据库名,tableB用于表示数据库表名。
通过上述两种实现方式获取到sql数据查询脚本所涉及的数据源信息后,会和计算组件中的数据源模块Datasource(图1中未示出)进行映射,获取到对应的数据源的真实信息。
步骤b、计算组件中的任务解析编排模块生成针对sql数据查询脚本的依赖树。
在获取到sql数据查询脚本所涉及的数据源信息后,若该数据源信息只涉及一个数据源,即,该sql数据查询脚本并不涉及数据混合查询,则不需要生成针对sql数据查询脚本 的依赖树,可直接基于该sql数据查询脚本生成对应的sql数据查询任务,并将该sql数据查询任务分发给对应的数据源节点,以便数据源节点执行该sql数据查询任务中的sql数据查询脚本,得到数据查询结果并进行缓存。
若该数据源信息涉及两个或两个以上数据源,则该sql数据查询脚本涉及数据混合查询,需要生成针对sql数据查询脚本的依赖树。即,通过基于sql数据查询脚本所涉及的多个数据源,将sql数据查询脚本进行解析拆分,确定出多个sql数据查询子脚本。具体地,计算组件中的任务解析编排模块可以按照sql语法解析规则,sql数据查询脚本进行解析拆分,以此可得到多个sql数据查询子脚本。其中,参考图3,图3为本发明实施例提供的一种基准语法树的示意图。sql语法解析规则采用的是自顶向下的分析方法去做词法分析,词法分析工具将按照select、from、join、where等token对sql数据查询脚本进行解析。基于图3所示,从左到右按字符输入,并按照最左推导为终结符,将sql数据查询脚本解析为对应的语法树。示例性地,以用户提交的某一sql数据查询脚本:select*from IDC1_Hive.DB1.tableA tableA join IDC2_Hbase.DB1.tableB tableB on tableA.c1=tableB.c2 where tableA.id>1 and tableB.id>1为例进行描述。词法分析工具按照select、from、join、where等token对该sql数据查询脚本进行解析,先解析出对应的数据源信息(比如IDC1_Hive和IDC2_Hbase),再解析出对应的查询条件(比如where tableA.id>1 and tableB.id>1),可以得到对应的如图4所示的语法树。然后,基于该sql数据查询脚本对应的语法树,即可生成如图5所示的涉及多个数据查询步骤(即多个sql数据查询子脚本)的依赖树。基于图5,计算组件中的任务解析编排模块通过基于sql数据查询脚本所涉及的多个数据源,将sql数据查询脚本进行解析拆分,确定出多个数据查询步骤,即Stage1、Stage2和Stage3。其中,Stage1为执行sql数据查询子脚本A(即执行Select*from IDC1_Hive.DB1.tableA tableA where tableA.id>1);Stage2为执行sql数据查询子脚本B(即执行Select*from IDC2_Hbase.DB1.tableB tableB where tableB.id>1);Stage3为执行sql数据查询子脚本C(即执行Select*from CSTableA a join CSTableB b on a.c1=b.c1)。需要说明的是,Stage1是在数据源节点IDC1_Hive中执行,Stage2是在数据源节点IDC2_Hbase中执行,Stage3是在计算组件中执行。而且,Stage3是基于Stage1的执行结果CSTableA以及Stage2的执行结果CSTableB在计算组件中执行的。
进一步地,下面通过一条较为复杂的sql数据查询脚本来对sql数据查询脚本的解析过程进行描述。
示例性地,该较为复杂的sql数据查询脚本是将数据源ES中XX城市(比如城市A)的成年人与数据源Hive中XX城市(比如城市A)的成年人名字相同的人查询出来。其中,该较为复杂的sql数据查询脚本为:
select*from
(
select name,age,area
from IDC1_ES.ES1.tableA tableA where area=’城市A’
)as es_tableA
join
(
select tableB.name,tableB.age,tableC.area
from IDC2_Hive1.hive1.tableB tableB
join IDC2_Hive2.hive2.tableC tableC on tableB.name=tableC.name
where tableC.area=’城市A’
)as hive_tableB
on es_tableA.name=hive_tableB.name
where es_tableA.age>18and hive_tableB.age>18
针对上述较为复杂的sql数据查询脚本,词法分析工具按照select、from、join、where等token对该较为复杂的sql数据查询脚本进行解析,先解析出对应的数据源信息(比如IDC1_ES、IDC2_Hive1和IDC2_Hive2),再解析出对应的查询条件(比如where es_tableA.age>18and hive_tableB.age>18),可以得到对应的语法树。然后,基于该较为复杂的sql数据查询脚本对应的语法树,即可生成如图6所示的涉及多个数据查询步骤(即多个sql数据查询子脚本)的依赖树。基于图6,计算组件中的任务解析编排模块通过基于该较为复杂的sql数据查询脚本所涉及的多个数据源,将该较为复杂的sql数据查询脚本进行解析拆分,确定出多个数据查询步骤,即Stage1、Stage2、Stage3、Stage4和Stage5。具体地,首先通过该较为复杂的sql数据查询脚本的外层的join条件解析出两个数据查询步骤(即Stage1和Stage*),并接着解析出最外层的where条件,再下推到对应的join两边的子查询里。然后,接着由于第二个数据查询步骤(即Stage*)里还存在混合查询的join,继续解析生成两个数据查询步骤(即Stage2和Stage3),在执行Stage2和Stage3时将Stage2和Stage3的混合查询结果存储至CSTableD,以便Stage4基于该CSTableD执行sql数据查询子脚本4。最后,Stage5基于Stage1的执行结果以及Stage4的执行结果得到最终的目标数据查询结果。其中,Stage1为执行sql数据查询子脚本1(即执行select name,age,area from IDC1_ES.ES1.tableA tableA where tableA.area=’城市A’and tableA.age>18);Stage2为执行sql数据查询子脚本2(即执行Select tableB.name,tableB.age from IDC2_Hive1.hive1.tableB tableB where tableB.age>18);Stage3为执行sql数据查询子脚本3(即执行Select tableC.name,tableC.area from IDC2_Hive2.hive2.tableC tableC where tableC.area=’城市A’);Stage4为执行sql数据查询子脚本4(即执行Select B.name,B.age,C.area from CSTableB B join CSTableC C on B.name=C.name);Stage5为执行sql数据查询子脚本5(即执行Select A.name,A.age,A.area from CSTableA A join CSTableD D on A.name=D.name)。需要说明的是,Stage1是在数据源节点IDC1_ES中执行,Stage2是在数据源IDC2_Hive中执行,Stage3是在IDC2_Hive中执行,Stage4和Stage5是在计算组件中执行。而且,Stage4是基于Stage2的执行结果CSTableB以及Stage3的执行结果CSTableC在计算组件中执行的;Stage5是基于Stage1的执行结果CSTableA以及Stage4的执行结果CSTableD在计算组件中执行的。
Step3:计算组件将至少一个sql数据查询子脚本封装为对应的数据查询子任务,并将至少一个数据查询子任务分发给各自对应的数据源节点。
继续以上述图5所示的示例为例进行描述,将Stage1和Stage2分别封装为对应的sql数据查询子任务,即sql数据查询子任务A和sql数据查询子任务B,并为每个sql数据查询子任务标注上对应的数据源标签以及数据库引擎标签,比如,为sql数据查询子任务A标注上IDC1_Hive,为sql数据查询子任务B标注上IDC2_Hbase。然后,将sql数据查询子任务A分发给对应的数据源节点IDC1_Hive,将sql数据查询子任务B分发给对应的数据源节点IDC2_Hbase。同时,将sql数据查询子脚本C生成对应的sql数据查询子任务C 存储在计算组件本地,并在检测到sql数据查询子任务A和sql数据查询子任务B在各自的数据源节点执行完成后(或者各数据源节点可以在执行各自的sql数据查询子任务完成后,将执行完成的消息通知计算组件,或者各数据源节点可以在执行各自的sql数据查询子任务完成后,将各自的执行结果发送给计算组件),再基于sql数据查询子任务A的执行结果以及sql数据查询子任务B的执行结果执行sql数据查询子任务C,得到最终的目标数据查询结果。或者,继续以上述图6所示的示例为例进行描述,将Stage1、Stage2和Stage3分别封装为对应的sql数据查询子任务,即sql数据查询子任务1、sql数据查询子任务2和sql数据查询子任务3,并为每个sql数据查询子任务标注上对应的数据源标签以及数据库引擎标签,比如,为sql数据查询子任务1标注上IDC1_ES,为sql数据查询子任务2标注上IDC2_Hive1,为sql数据查询子任务3标注上IDC2_Hive2。然后,将sql数据查询子任务1分发给对应的数据源节点IDC1_ES,将sql数据查询子任务2分发给对应的数据源节点IDC2_Hive1,将sql数据查询子任务3分发给对应的数据源节点IDC2_Hive2。同时,将sql数据查询子脚本4和sql数据查询子脚本5分别生成对应的sql数据查询子任务4、sql数据查询子任务5保存在计算组件本地,并在检测到sql数据查询子任务1、sql数据查询子任务2和sql数据查询子任务3在各自的数据源节点执行完成后(或者各数据源节点可以在执行各自的sql数据查询子任务完成后,将执行完成的消息通知计算组件,或者各数据源节点可以在执行各自的sql数据查询子任务完成后,将各自的执行结果发送给计算组件),再基于sql数据查询子任务2的执行结果以及sql数据查询子任务3的执行结果执行sql数据查询子任务4,然后在检测到sql数据查询子任务4执行完成后,基于sql数据查询子任务1的执行结果以及sql数据查询子任务4的执行结果执行sql数据查询子任务5,得到最终的目标数据查询结果。基于此,相比现有方案中的数据查询引擎工具需要去对应的数据源节点中读取对应的数据库表的全量数据后,再基于该读取的数据库表的全量数据进行计算,而本方案利用各数据源节点自身支持的计算能力来执行sql数据查询子脚本,有助于降低计算组件所消耗的计算资源,并可以减少读取数据库表的全量数据的传输过程中所消耗的网络资源。
Step4:每个数据源节点接收对应的数据查询子任务,并执行数据查询子任务中的sql数据查询子脚本。
针对每个数据源节点,数据源节点在接收到数据查询子任务后,解析该数据查询子任务,得到对应的sql数据查询子脚本,然后执行该sql数据查询子脚本。示例性地,继续以上述图5所示的示例为例进行描述,数据源节点IDC1_Hive在接收到sql数据查询子任务A后,解析该sql数据查询子任务A,得到sql数据查询子脚本A,然后执行该sql数据查询子脚本A,并将执行后的执行结果存储至上下文CSTableA中。数据源节点IDC2_Hbase在接收到sql数据查询子任务B后,解析该sql数据查询子任务B,得到sql数据查询子脚本B,然后执行该sql数据查询子脚本B,并将执行后的执行结果存储至上下文CSTableB中。然后,计算组件基于sql数据查询子任务A的执行结果以及sql数据查询子任务B的执行结果执行sql数据查询子任务C,得到最终的目标数据查询结果。或者,继续以上述图6所示的示例为例进行描述,数据源节点IDC1_ES在接收到sql数据查询子任务1后,解析该sql数据查询子任务1,得到sql数据查询子脚本1,然后执行该sql数据查询子脚本1,并将执行后的执行结果存储至上下文CSTableA中。数据源节点IDC2_Hive1在接收到sql数据查询子任务2后,解析sql数据查询子任务2,得到sql数据查询子脚本2,然 后执行该sql数据查询子脚本2,并将执行后的执行结果分别存储至上下文CSTableB。数据源节点IDC2_Hive2在接收到sql数据查询子任务3后,解析sql数据查询子任务3,得到sql数据查询子脚本3,然后执行该sql数据查询子脚本3,并将执行后的执行结果分别存储至上下文CSTableC中。然后,计算组件基于sql数据查询子任务2的执行结果以及sql数据查询子任务3的执行结果执行sql数据查询子任务4,并在检测到sql数据查询子任务4执行完成后,基于sql数据查询子任务1的执行结果以及sql数据查询子任务4的执行结果执行sql数据查询子任务5,得到最终的目标数据查询结果。
上述实施例表明,通过对sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务,且该m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务,并将至少一个第一数据查询子任务分发给各自的数据源节点,以使各数据源节点执行各自接收到的第一数据查询子任务,得到数据查询子结果,而无需计算组件从sql数据查询脚本所涉及的各数据源节点中获取数据库表的全量数据在计算组件本地进行查询计算。然后,计算组件基于执行依赖关系和数据查询子结果,执行m个数据查询子任务中的第二数据查询子任务,即可得到所需的数据查询结果,如此,由于计算组件只是按照执行依赖关系基于数量较少的数据查询子结果(相比数据库表的全量数据,数据查询子结果的数量少很多)进行整合计算,因此可以减轻计算组件的计算压力。基于此,由于各数据源节点只是传输数据查询子结果给计算组件,并不是传输数据库表的全量数据,因此可以大大减少计算组件与各数据源节点之间的数据传输量,从而可以降低数据传输过程中所消耗的网络资源,从而可以解决现有技术中存在需要获取各数据源节点的数据库表的全量数据的问题。此外,由于该方案能够充分利用各数据源节点自身支持的计算能力来执行数据查询子任务,得到数据查询子结果,因此可以降低计算组件所消耗的计算资源,并可以减少针对计算组件的一些计算能力的开发,从而可以减少开发人员的工作量。
基于相同的技术构思,图7示例性的示出了本发明实施例提供的一种数据查询装置,该装置可以执行数据查询方法的流程。
如图7所示,该装置包括:
接收单元701,用于接收数据查询请求;所述数据查询请求中包括结构化查询语言sql数据查询脚本;
处理单元702,用于对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务;所述m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务;将至少一个第一数据查询子任务分发给各自的数据源节点;所述数据源节点用于执行第一数据查询子任务并得到数据查询子结果;基于所述执行依赖关系和所述数据查询子结果,执行所述m个数据查询子任务中的第二数据查询子任务,得到数据查询结果;所述第二数据查询子任务涉及多个数据源节点。
可选地,所述处理单元702具体用于:
按照语法解析规则,生成所述sql数据查询脚本的语法树;
从所述语法树中确定仅涉及单一数据源节点的第一数据查询子任务;
针对所述语法树中任一表连接关键词,通过所述表连接关键词对应的第一数据查询子任务的数据查询子结果构建第二数据查询子任务;
根据各第一数据查询子任务和各第二数据查询子任务的执行顺序,确定具有执行依赖关系的m个数据查询子任务。
可选地,所述处理单元702具体用于:
按照语法解析规则,依次解析出所述sql数据查询脚本中的各关键词;
若确定解析出的表名关键词不符合表名命名规则,则对所述表名关键词按照所述语法解析规则继续解析,直至解析到符合表名命名规则的表名关键词,从而得到所述sql数据查询脚本的语法树。
可选地,所述处理单元702具体用于:
按照语法解析规则中的指定标签名,确定所述sql数据查询脚本中涉及的数据源节点;或,
按照语法解析规则中的表名规则,确定所述sql数据查询脚本中涉及的数据源节点。
可选地,所述处理单元702还用于:
在生成具有执行依赖关系的m个数据查询子任务之后,为所述m个数据查询子任务中的至少一个第一数据查询子任务分别标注上对应的数据源节点标识;
所述处理单元702具体用于:
针对每个第一数据查询子任务,基于所述第一数据查询子任务对应的数据源节点标识,将所述第一数据查询子任务分发给对应的数据源节点。
可选地,所述处理单元还用于:
在对所述sql数据查询脚本进行语法解析之前,确定所述sql数据查询脚本可执行。
可选地,所述处理单元702具体用于:
通过设定的sql数据查询脚本校验规则,对所述sql数据查询脚本的语法和/或参数进行校验,从而确定所述sql数据查询脚本是否可执行。
基于相同的技术构思,本发明实施例还提供了一种计算设备,如图8所示,包括至少一个处理器801,以及与至少一个处理器连接的存储器802,本发明实施例中不限定处理器801与存储器802之间的具体连接介质,图8中处理器801和存储器802之间通过总线连接为例。总线可以分为地址总线、数据总线、控制总线等。
在本发明实施例中,存储器802存储有可被至少一个处理器801执行的指令,至少一个处理器801通过执行存储器802存储的指令,可以执行前述的数据查询方法中所包括的步骤。
其中,处理器801是计算设备的控制中心,可以利用各种接口和线路连接计算设备的各个部分,通过运行或执行存储在存储器802内的指令以及调用存储在存储器802内的数据,从而实现数据处理。可选的,处理器801可包括一个或多个处理单元,处理器801可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理下发指令。可以理解的是,上述调制解调处理器也可以不集成到处理器801中。在一些实施例中,处理器801和存储器802可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。
处理器801可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本发明实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合数据查询方法实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器802作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器802可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器802是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本发明实施例中的存储器802还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
基于相同的技术构思,本发明实施例还提供了一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述数据查询方法的步骤。
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (10)

  1. 一种数据查询方法,其特征在于,包括:
    计算组件接收数据查询请求;所述数据查询请求中包括结构化查询语言sql数据查询脚本;
    所述计算组件对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务;所述m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务;
    所述计算组件将至少一个第一数据查询子任务分发给各自的数据源节点;所述数据源节点用于执行第一数据查询子任务并得到数据查询子结果;
    所述计算组件基于所述执行依赖关系和所述数据查询子结果,执行所述m个数据查询子任务中的第二数据查询子任务,得到数据查询结果;所述第二数据查询子任务涉及多个数据源节点。
  2. 如权利要求1所述的方法,其特征在于,所述计算组件对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务,包括:
    所述计算组件按照语法解析规则,生成所述sql数据查询脚本的语法树;
    所述计算组件从所述语法树中确定仅涉及单一数据源节点的第一数据查询子任务;
    所述计算组件针对所述语法树中任一表连接关键词,通过所述表连接关键词对应的第一数据查询子任务的数据查询子结果构建第二数据查询子任务;
    所述计算组件根据各第一数据查询子任务和各第二数据查询子任务的执行顺序,确定具有执行依赖关系的m个数据查询子任务。
  3. 如权利要求2所述的方法,其特征在于,所述计算组件按照语法解析规则,生成所述sql数据查询脚本的语法树,包括:
    所述计算组件按照语法解析规则,依次解析出所述sql数据查询脚本中的各关键词;
    所述计算组件若确定解析出的表名关键词不符合表名命名规则,则对所述表名关键词按照所述语法解析规则继续解析,直至解析到符合表名命名规则的表名关键词,从而得到所述sql数据查询脚本的语法树。
  4. 如权利要求1所述的方法,其特征在于,通过如下方式确定数据源节点,包括:
    所述计算组件按照语法解析规则中的指定标签名,确定所述sql数据查询脚本中涉及的数据源节点;或,
    所述计算组件按照语法解析规则中的表名规则,确定所述sql数据查询脚本中涉及的数据源节点。
  5. 如权利要求1所述的方法,其特征在于,在生成具有执行依赖关系的m个数据查询子任务之后,还包括:
    所述计算组件为所述m个数据查询子任务中的至少一个第一数据查询子任务分别标注上对应的数据源节点标识;
    所述计算组件将至少一个第一数据查询子任务分发给各自的数据源节点,包括:
    针对每个第一数据查询子任务,所述计算组件基于所述第一数据查询子任务对应的数据源节点标识,将所述第一数据查询子任务分发给对应的数据源节点。
  6. 如权利要求1所述的方法,其特征在于,在所述计算组件对所述sql数据查询脚本 进行语法解析之前,还包括:
    所述计算组件确定所述sql数据查询脚本可执行。
  7. 如权利要求6所述的方法,其特征在于,通过下述方式确定所述sql数据查询脚本是否可执行:
    所述计算组件通过设定的sql数据查询脚本校验规则,对所述sql数据查询脚本的语法和/或参数进行校验,从而确定所述sql数据查询脚本是否可执行。
  8. 一种数据查询装置,其特征在于,包括:
    接收单元,用于接收数据查询请求;所述数据查询请求中包括结构化查询语言sql数据查询脚本;
    处理单元,用于对所述sql数据查询脚本进行语法解析,生成具有执行依赖关系的m个数据查询子任务,所述m个数据查询子任务中至少包括一个仅涉及单一数据源节点的第一数据查询子任务;将至少一个第一数据查询子任务分发给各自的数据源节点;所述数据源节点用于执行第一数据查询子任务并得到数据查询子结果;基于所述执行依赖关系和所述数据查询子结果,执行所述m个数据查询子任务中的第二数据查询子任务,得到数据查询结果;所述第二数据查询子任务涉及多个数据源节点。
  9. 一种计算设备,其特征在于,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1至7任一权利要求所述的方法。
  10. 一种计算机可读存储介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行权利要求1至7任一权利要求所述的方法。
PCT/CN2021/134954 2021-05-25 2021-12-02 一种数据查询方法及装置 WO2022247201A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110572421.1 2021-05-25
CN202110572421.1A CN113177062B (zh) 2021-05-25 2021-05-25 一种数据查询方法及装置

Publications (1)

Publication Number Publication Date
WO2022247201A1 true WO2022247201A1 (zh) 2022-12-01

Family

ID=76929983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134954 WO2022247201A1 (zh) 2021-05-25 2021-12-02 一种数据查询方法及装置

Country Status (2)

Country Link
CN (1) CN113177062B (zh)
WO (1) WO2022247201A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680061A (zh) * 2023-08-02 2023-09-01 腾讯科技(深圳)有限公司 任务执行方法、装置、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177062B (zh) * 2021-05-25 2023-06-09 深圳前海微众银行股份有限公司 一种数据查询方法及装置
CN113836186B (zh) * 2021-09-28 2023-10-10 北京环境特性研究所 基于es搜索引擎的仿真数据查询方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280326A1 (en) * 2013-03-15 2014-09-18 Looker Data Sciences Inc. Querying one or more databases
CN110674177A (zh) * 2019-09-30 2020-01-10 奇安信科技集团股份有限公司 数据查询方法、装置、电子设备和存储介质
CN111190924A (zh) * 2019-12-18 2020-05-22 中思博安科技(北京)有限公司 跨域的数据查询方法及装置
CN111930770A (zh) * 2020-07-15 2020-11-13 北京金山云网络技术有限公司 数据查询方法、装置及电子设备
CN112699141A (zh) * 2020-12-29 2021-04-23 医渡云(北京)技术有限公司 多源异构数据的数据查询方法、装置、存储介质及设备
CN113177062A (zh) * 2021-05-25 2021-07-27 深圳前海微众银行股份有限公司 一种数据查询方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10719188B2 (en) * 2016-07-21 2020-07-21 Palantir Technologies Inc. Cached database and synchronization system for providing dynamic linked panels in user interface
CN107220376B (zh) * 2017-06-21 2020-10-27 北京奇艺世纪科技有限公司 一种数据查询方法和装置
CN110765157B (zh) * 2019-09-06 2024-02-02 中国平安财产保险股份有限公司 数据查询方法、装置、计算机设备及存储介质
KR102247249B1 (ko) * 2019-10-31 2021-05-03 주식회사 티맥스티베로 데이터베이스 관리 시스템에서 비동기적 데이터 처리를 위한 컴퓨터 프로그램
CN112748993A (zh) * 2019-10-31 2021-05-04 北京国双科技有限公司 任务执行方法、装置、存储介质及电子设备
CN111949856B (zh) * 2020-08-11 2023-12-22 北京金山云网络技术有限公司 基于web的对象存储查询方法及装置
CN112527848B (zh) * 2020-12-22 2023-05-12 苏州科达科技股份有限公司 基于多数据源的报表数据查询方法、装置、***及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280326A1 (en) * 2013-03-15 2014-09-18 Looker Data Sciences Inc. Querying one or more databases
CN110674177A (zh) * 2019-09-30 2020-01-10 奇安信科技集团股份有限公司 数据查询方法、装置、电子设备和存储介质
CN111190924A (zh) * 2019-12-18 2020-05-22 中思博安科技(北京)有限公司 跨域的数据查询方法及装置
CN111930770A (zh) * 2020-07-15 2020-11-13 北京金山云网络技术有限公司 数据查询方法、装置及电子设备
CN112699141A (zh) * 2020-12-29 2021-04-23 医渡云(北京)技术有限公司 多源异构数据的数据查询方法、装置、存储介质及设备
CN113177062A (zh) * 2021-05-25 2021-07-27 深圳前海微众银行股份有限公司 一种数据查询方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680061A (zh) * 2023-08-02 2023-09-01 腾讯科技(深圳)有限公司 任务执行方法、装置、设备及存储介质
CN116680061B (zh) * 2023-08-02 2024-03-15 腾讯科技(深圳)有限公司 任务执行方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113177062B (zh) 2023-06-09
CN113177062A (zh) 2021-07-27

Similar Documents

Publication Publication Date Title
WO2022247201A1 (zh) 一种数据查询方法及装置
CN111382174B (zh) 多方数据联合查询方法、装置、服务器和存储介质
US8073857B2 (en) Semantics-based data transformation over a wire in mashups
US9323580B2 (en) Optimized resource management for map/reduce computing
ES2765415T3 (es) Aparato, método y programa de procesamiento de datos basado en microservicios
US20180024863A1 (en) Task Scheduling and Resource Provisioning System and Method
CN110162559B (zh) 一种基于通用json同步和异步数据api接口调用的区块链处理方法
US20120284730A1 (en) System to provide computing services
CN110750592B (zh) 数据同步的方法、装置和终端设备
CN103473696A (zh) 一种收集、分析和分发网络商业信息的方法和***
US11637868B2 (en) Attestation support for elastic cloud computing environments
CN112866421B (zh) 基于分布式缓存以及nsq的智能合约运行方法及装置
CN112860744A (zh) 一种业务流程处理方法和装置
US8966047B2 (en) Managing service specifications and the discovery of associated services
CN112783874A (zh) 一种数据分析方法、装置和***
WO2023029509A1 (zh) 动态服务发布方法、装置、电子设备及存储介质
US20130014082A1 (en) Method of configuring business logic supporting multi-tenancy
WO2018045610A1 (zh) 用于执行分布式计算任务的方法和装置
US20110055373A1 (en) Service identification for resources in a computing environment
US9229980B2 (en) Composition model for cloud-hosted serving applications
CN113722114A (zh) 一种数据服务的处理方法、装置、计算设备及存储介质
CN109858285B (zh) 区块链数据的处理方法、装置、设备和介质
WO2020024824A1 (zh) 一种用户状态标识确定方法及装置
CN112988738B (zh) 用于区块链的数据分片方法和装置
CN115705256A (zh) 就服务事务达成共识的请求促进

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942764

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.03.2024)