WO2020228801A1 - 一种多语言融合查询方法及多模数据库*** - Google Patents

一种多语言融合查询方法及多模数据库*** Download PDF

Info

Publication number
WO2020228801A1
WO2020228801A1 PCT/CN2020/090393 CN2020090393W WO2020228801A1 WO 2020228801 A1 WO2020228801 A1 WO 2020228801A1 CN 2020090393 W CN2020090393 W CN 2020090393W WO 2020228801 A1 WO2020228801 A1 WO 2020228801A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
engine
type
management system
extended
Prior art date
Application number
PCT/CN2020/090393
Other languages
English (en)
French (fr)
Inventor
周敏奇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CA3137857A priority Critical patent/CA3137857A1/en
Publication of WO2020228801A1 publication Critical patent/WO2020228801A1/zh
Priority to US17/525,792 priority patent/US11907216B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • This application relates to the field of databases, and more specifically, to a fusion query method and a multi-modal database system.
  • the database system is the core of many application systems.
  • the traditional database system is a relational database system based on a relational model, which is specially used to process structured data.
  • a relational model is a two-dimensional table model
  • a relational database is a data organization composed of two-dimensional tables and the connections between them.
  • Typical applications of structured data include bank transactions, etc.; while semi-structured data is used on a large scale in scenarios such as user portraits, IoT device log collection, and application clickstream analysis; unstructured data corresponds to massive pictures, Video, and document processing services.
  • many non-relational special database systems have been developed, including XML databases, graph databases, time series databases, document databases, key-value (KV) databases, etc.
  • the current application system is becoming more and more complex.
  • applications need to use multiple types of data at the same time, such as relational data, graphs, and time series data.
  • the database also needs to provide corresponding computing capabilities, such as graph traversal and graph analysis. , Timing calculation, etc.
  • computing capabilities such as graph traversal and graph analysis. , Timing calculation, etc.
  • the police need to query the suspect’s basic information and behavior records through a relational database, as well as analyze and query the suspect’s information through a graph computing engine and a graph database.
  • Relations such as peers, living together, talking, socializing, etc., and then searching for people who have direct or indirect contact with the suspect.
  • the storage and management services of different types of data are usually provided by different types of databases. Therefore, users need to use multiple database systems. The use process is cumbersome. Multiple sets of independent database systems lead to complex management and maintenance of the system. Importing and exporting data between them increases the risk of data exposure, and data consistency is difficult to guarantee.
  • the prior art adds specific data types, such as JSON type, Spatial type, etc., in the form of UDT (user-defined type) on the basis of relational database, and uses user-defined function (UDF) Ways to increase computing power for type data.
  • UDT user-defined type
  • UDF user-defined function
  • This application provides a method of fusion query and a multi-modal database management system to provide users with unified data access and maintenance interfaces for multi-modal databases such as relational databases, graph databases, and time series databases, simplifying the learning and use costs of operation and maintenance and application developers , Improve data usage security.
  • an embodiment of the present application provides a database system, including: a main calculation engine, one or more extended calculation engines, and an adapter; the main calculation engine is used to receive a fusion query from a client, the fusion query Including a first type of query and a second type of query; processing the first type of query to obtain a first processing result, and passing the second type of query to the adapter through the first interface;
  • the adapter is configured to determine, based on the metadata of the one or more extended computing engines, a first extended computing engine used to process the second type of query, and a second corresponding to the first extended computing engine Interface; passing the second type of query to the first extended computing engine through the second interface; the first extended computing engine is used to process the second type of query to obtain a second processing result , And return the second processing result to the main calculation engine through the adapter; the main calculation engine is also used to generate a query result according to the first processing result and the second processing result, and Return the query result to the client.
  • the first extended calculation engine converts the second type of query into the first type of query, and sends the converted query to the main calculation engine; the main calculation engine The engine processes the converted query to obtain query results.
  • the first type of query is a SQL query
  • the second type of query is a graph query, a time series query, or an approximate query.
  • the second type of query is defined by a user-defined function UDF.
  • the first interface includes at least one hook function; the at least one hook function is associated with the UDF.
  • the metadata includes: information about an extended computing engine supported by the multi-mode database management system.
  • the information about the extended computing engine includes: the type of the extended computing engine, the address of the server where one or more instances of the extended computing engine are located, and the location of the extended computing engine.
  • the adapter is specifically configured to query the metadata to determine the first engine instance of the first extended computing engine, and the interface corresponding to the first engine instance, through the first engine The interface corresponding to the instance transmits the second type of query to the first engine instance for processing.
  • the metadata is stored in a user table of the multi-mode database management system.
  • the main calculation engine is a structured query language SQL engine
  • the one or more extended calculation engines include at least one of a graph calculation engine, a time series engine, or an approximate query engine.
  • the first type of query is a structured query statement
  • the second type of query is a graph query statement
  • the first extended calculation engine is a graph calculation engine
  • an embodiment of the present application provides a fusion query method that can be applied to a multi-modal database management system.
  • the method includes: a database manager system receives a fusion query submitted by a client, and the fusion query includes the first type Queries and queries of the second type; the first type of query is processed by the main calculation engine to obtain the first processing result; based on the metadata, the first extended calculation for processing the second type of query is determined Engine, and the interface corresponding to the first extended computing engine; the second type of query is transferred to the first extended computing engine through the interface; the first extended computing engine responds to the second type of The query is processed to obtain a second processing result, the main computing engine receives the second processing result through the interface, generates a query result according to the first processing result and the second processing result, and combines the The query result is returned to the client.
  • the first extended calculation engine converts the second type of query into the first type of query, and sends the converted query to the main calculation engine; the main calculation engine The engine processes the converted query to obtain query results.
  • an embodiment of the present application provides a database server, which includes one or more functional units for executing the multi-mode database management system described in the first aspect or any implementation of the first aspect, these functional units It can be implemented by a software module, or by hardware, such as a processor, or by software combined with necessary hardware.
  • an embodiment of the present application provides a database server, including a memory, a processor, and a computer program stored on the memory.
  • the processor executes the computer program, the first aspect or the first aspect described above is implemented
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program (instruction) is stored, and when the program (instruction) is executed by a processor, the first aspect or any implementation of the first aspect is realized The function of the multi-mode database management system described in the method.
  • Fig. 1 is an architecture diagram of a database system provided by an embodiment of the present application.
  • Figure 2 is a schematic structural diagram of a database management system according to an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a calculation engine of an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the work flow of the database management system of the embodiment of the present application.
  • FIG. 5 is a schematic diagram of the work flow of the database management system of the embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a database management system according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the work flow of the database management system of the embodiment of the present application.
  • Fig. 8 is an architecture diagram of a database system according to an embodiment of the present application.
  • FIG. 1 shows a typical logical architecture of a database system.
  • the database system 100 includes a database 110 and a database management system (DBMS) 130.
  • DBMS database management system
  • the database 110 is an organized data collection stored in a data storage (Data Storage) 120, that is, an associated data collection organized, stored, and used according to a specific data model.
  • Data Storage data storage
  • Relational data is data modeled using a relational model, usually expressed as a table, and rows in the table represent a collection of related values of an object or entity.
  • Graph data referred to as "graphs” for short, is used to represent relationships between objects or entities, such as social relationships.
  • Time series data referred to as time series data, is a data column recorded and indexed in chronological order, used to describe the state change information of an object in the time dimension.
  • the database management system 130 is the core of the database system, and is system software for organizing, storing, and maintaining data.
  • the client 200 can access the database 110 through the database management system 130, and the database administrator also performs database maintenance work through the database management system.
  • the database management system 130 provides multiple functions for the client 200 to establish, modify, and query a database.
  • the client 200 may be an application program or a user device.
  • the functions provided by the database management system 130 may include but are not limited to the following items: (1) Data definition function.
  • the database management system 130 provides a Data Definition Language (DDL) to define the structure of the database 110, and DDL is used to describe The database framework can be stored in the data dictionary; (2) Data access function, the database management system 130 provides Data Manipulation Language (DML) to implement basic access operations to the database 110, such as retrieval and insertion , Modify and delete; (3) database operation management function, database management system 130 provides data control function to effectively control and manage the operation of database 110 to ensure that the data is correct and effective; (4) database establishment and maintenance function, including database Initial data loading, database dumping, restoring, reorganization, system performance monitoring, analysis and other functions; (5) Database transmission, database management system provides processing data transmission to realize the connection between client and database management system The communication is usually completed in coordination with the operating system.
  • DDL Data Definition Language
  • DML Data Manipulation Language
  • database management system 130 provides data control function to effectively control and manage the operation of database 110 to ensure that the data is correct and effective
  • database establishment and maintenance function including database Initial data loading, database dumping,
  • the data storage 120 includes, but is not limited to, solid state drives (SSD), disk arrays, cloud storage, or other types of non-transitory computer-readable storage media.
  • SSD solid state drives
  • a database system may include fewer or more components than those shown in FIG. 1, or include components different from those shown in FIG. 1.
  • FIG. 1 only shows the implementation of the present invention. Examples of the disclosed implementation are more relevant components.
  • the embodiment of the application provides a multi-model database (MMDB) management system that can simultaneously support multiple data models (such as relational, graph, key-value, time series, etc.), and management based on the multi-model database Systematic multi-language fusion query method.
  • MMDB multi-model database
  • the method and the device are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition will not be repeated.
  • Figure 2 shows an example of a multi-mode database management system according to an embodiment of the present application.
  • the database management system 130 includes: a storage engine 170, an adapter 135, and several calculation engines (the main calculation engine 132 and the extended calculation engines 140 and 150 shown in FIG. 2).
  • the calculation engines 132, 140, and 150 are different types of calculation engines.
  • Each type of calculation engine supports one type of query language.
  • a relational database engine (referred to as a "relational engine”) supports a relational data model for processing Process relational queries, such as Structured Query Language (SQL) queries; graph computing engines are used to process graph queries, such as Gremlin queries; timing engines are used to query in time series.
  • SQL Structured Query Language
  • the main function of the calculation engine is to generate a corresponding execution plan according to the query (Query) submitted by the client 200, and perform data operations according to the execution plan to generate query results.
  • the calculation engine mainly includes a SQL engine and an execution engine.
  • the SQL engine mainly completes the analysis of SQL queries, the rewriting of queries, and the generation of execution plans;
  • the execution engine is composed of operation operators and related execution environments. Commonly used operation operators include scan, hash join, aggregate, etc.
  • the execution environment is mainly composed of an execution framework and a resource manager.
  • the storage engine 170 is responsible for providing an interface for accessing data to the computing engine on the file system, and at the same time providing index management, runtime cache, transaction, log and other data management. For example, the storage engine 170 may write the execution result of the main calculation engine 132 into the data storage 120 through physical I/O.
  • a calculation engine includes a parser 210, a rewriter 230, an optimizer 250, and an executor 270.
  • the parser 210 is used to perform lexical analysis, syntax analysis, and semantic analysis on the input query sentence, and output a query parse tree.
  • the rewriter 230 is used to transform the query into a format that is easy to optimize, for example, to rewrite the query statement through operation merging, predicate conversion, etc.
  • the optimizer 250 is used to select an optimal execution path based on query cost estimation, rule-based, or machine learning-based methods, and then generate an execution plan.
  • the executor 270 is configured to read data through the storage engine, process the data according to the execution plan to obtain a processing result, and return the processing result to the client.
  • the main calculation engine 132 on the basis of the main calculation engine 132, other calculation engines, such as calculation engines 140 and 150, are also extended.
  • the original data is always stored in relational type, and only one copy is stored.
  • the main computing engine 132 can dynamically call the extended computing engine for specific processing, to support the fusion query of multiple query languages, and avoid importing and exporting data between different database systems, improving the security of the system Sex.
  • the extended calculation engines 140 and 150 are different types of calculation engines from the main calculation engine 132.
  • the main calculation engine 132 may be a relational calculation engine
  • the extended calculation engine 140 is a graph engine
  • the extended calculation engine 150 is a time series engine.
  • the database management system 130 receives the query from the client 200, passes the query to the main computing engine 132 for processing, and returns the processing result to the client 200.
  • the query initiated by the client 200 is a fusion query, that is, an extended query statement that includes multiple query languages.
  • fusion query is given below:
  • the above query statement is a fusion query statement that includes both SQL and graph query.
  • the parts in bold and italics are graph query statements, and the part at the beginning of "select" is SQL query statement.
  • the database management system 130 of the embodiment of the present application can dynamically expand an external computing engine at runtime to support a fusion query composed of multiple types of query languages. Specifically, after the database management system 130 receives the fusion query, it identifies the first type of query (such as SQL query) and the second type of query (such as graph query) included in the fusion query, and submits the first type of query to The main calculation engine 132 performs processing, and transmits the second type of query to the adapter 135 through one or more pre-configured interfaces, such as the interface 142 integrated in the main calculation engine 132.
  • the adapter 135 is a bridge between the main computing engine 132 and the extended computing engines 140 and 150.
  • Metadata (Pseudo Catalog) 122 is used to store information about the extended computing engine.
  • Metadata 122 includes but is not limited to one or more of the following information: the type of the extended computing engine currently available in the system, the ID of the extended computing engine, and the extension The address of the server where the computing engine instance is located, the interface information of the extended computing engine, etc.
  • the Pseudo Catalog may include the mapping between the type of the extended calculation engine and the address of the server where the external extension engine is located, and the mapping between the type of the extended calculation engine and the interface of the extended calculation engine.
  • the PseudoCatalog also includes the mapping between the external extension engine type and the external extension engine instance.
  • the database management system may store the above-mentioned mapping in the form of one or more user tables, so that the core of the main calculation engine 132 is less modified.
  • the adapter 135 determines the extended computing engine 140 for processing the second type of query based on the information recorded in the Pseudo Catalog 122, and the corresponding interface of the extended computing engine 140, and compares the second type of query, or The parameters of the second type of query are passed to the extended computing engine 140 through this interface for processing.
  • the extended calculation engine 140 obtains the processing result after processing the second type of query, and feeds the processing result back to the main calculation engine 132 through the adapter. It is understandable that the extended calculation engine 140 may also return intermediate results to the main calculation engine 132 through the adapter 135 in the process of processing the second type of query, and the main calculation engine 132 may perform query processing based on the intermediate results returned by the extended calculation engine 140 . That is, the main calculation engine 132 may refer to the intermediate result of the second type of query processing by the extended calculation engine 140 when processing the first type of query.
  • the adapter 135 includes a common module wrapper (Common Envelope Wrapper) and an external engine wrapper (Foreign Engine Wrapper).
  • the Common Envelope Wrapper is used to initialize, start, and terminate the extended computing engine, and realize the heartbeat, handshake, exception handling, etc. between the extended computing engine and the main computing engine.
  • Foreign Engine Wrapper provides some hook functions for the execution of the function, which are used to pass information such as query parameters to the extended computing engine, and are used in the parser, rewriter, and optimizer of the extended computing engine.
  • the processing stage of each component, such as executor returns the results to the main computing engine for corresponding processing.
  • Common Envelope Wrapper is called to initialize the Pseudo Catalog in the system table and other operations.
  • hook functions are added to realize interaction with the extended computing engine.
  • the main computing engine 132 and the extended computing engines 140 and 150 can all register hook functions, and each hook function will be called under specific conditions or events to implement corresponding functions, such as message delivery.
  • the main calculation engine 132 may register one or more hook functions. When the main calculation engine 132 is processing a query, the registered hook function will be called.
  • the hook function triggers the adapter 335 to determine a specific extended calculation engine or extension. Calculate the engine instance and determine the relevant interface, and then pass the information to the extended calculation engine through the corresponding interface, for example, pass the information related to the second type of query to the extended calculation engine.
  • the extended computing engine can also register a series of hook functions.
  • hook function In the process of processing the second type of query by the extended computing engine, for example, through graph query parsing, rewriting and optimization, after the graph query is transformed into a SQL query, the hook function is called, hook The function then returns the graph query converted into the SQL query to the main calculation engine 132 through the adapter 135 and the interface corresponding to the main calculation engine 132, and the main calculation engine 132 continues to process the graph query converted into the SQL query to obtain the query result. At the end of the thread, call Common Envelope Wrapper again to release resources and clear the cache.
  • FIG. 5 shows the process of the database management system 130 processing the fusion query statement including SQL and graph query.
  • the database management system 130 may define a user-defined function (UDF), the input parameter of the UDF is a graph query statement supported by the graph computing engine, and the return type is a table result set of multiple records.
  • the graph query in the fusion query can be regarded as the UDF.
  • the UDF transfers the graph query sentence to the adapter 135 by calling the interface 142, and the adapter 135 determines the interface 152 corresponding to the graph calculation engine 340 based on the metadata of the extended computing engine recorded by the Pseudo Catalog 122, and then transmits the graph query sentence through the calling interface 152 Give the graph calculation engine 340 for processing.
  • the graph calculation engine 340 sequentially performs operations such as parsing, rewriting, optimizing, and executing graph query statements to obtain query results. Further, the graph calculation engine 340 may return the query result to the main calculation engine 132 through the adapter 135.
  • the graph calculation engine 340 may also return intermediate results to the main calculation engine 132 through the adapter 135 in each stage of the graph query processing.
  • the graph computing engine 340 can convert graph queries into SQL queries through operations such as parsing and rewriting, and then pass the converted SQL queries to the main computing engine (relational computing engine) through the adapter 135, and the relational computing engine then The converted SQL query is further processed to obtain the processing result.
  • the database management system 130 may include fewer or more components than those shown in FIG. 2, or include components different from those shown in FIG.
  • the implementations disclosed in the embodiments of the invention are more relevant components.
  • the extended computing engine included in the database management system 130 includes but is not limited to the two shown in FIG. 2, and there may be one or more than two.
  • Fig. 6 shows another example of a database management system according to an embodiment of the present application.
  • the database management system 230 includes a SQL engine 330, a graph calculation engine 340, a timing engine 350, an approximate query engine 360, an adapter 335, and a storage engine 370.
  • the database management system 230 supports multiple types of fusion queries, such as fusion queries including relational queries and graph queries, fusion queries including relational queries and time series queries, and so on.
  • the SQL engine 330 After the SQL engine 330 receives the fusion query, it identifies the specific type of query contained in the fusion query, such as graph query, time series query, and so on. Among them, the embodiment of the present application may extend other types of queries in SQL queries through user-defined functions (UDFs), such as query queries and time series queries. Take the query in the upper right corner of Figure 5 as an example, the sentence in italics beginning with "Gremlin" can be regarded as a UDF. UDF is usually associated with a specific interface.
  • the SQL engine 330 includes a parser 332, an optimizer 334, an executor 336, and a jump module 338.
  • the parser 332 is used to parse the SQL query sentence into a specific structure, such as a query tree, through lexical and grammar.
  • the optimizer 334 generates an optimal execution plan corresponding to the query sentence based on the rule or based on the cost model.
  • the executor 336 executes the execution plan generated by the optimizer 334 to obtain the query result.
  • the jump module 338 includes a series of hook functions (Hook), and each UDF is associated with one or more hook functions.
  • the UDF calls its associated hook function, and the hook function then transmits information to the extended calculation engine through the adapter 335, such as one of the graph calculation engine 340, the timing engine 350, and the approximate query engine 360.
  • the processing result of the extended calculation engine can also be returned to the UDF through the adapter 335.
  • the UDF corresponding to the graph query will call its associated hook function, and then query the graph through the hook function Pass to the adapter 335.
  • the adapter 335 determines the type of extended computing engine used to process the query of the graph, the server ID where the extended computing engine instance of this type is located, and the interface corresponding to the extended computing engine instance, Then, the graph query is sent to the extended computing engine instance for processing through the interface.
  • the server ID here includes but is not limited to the server IP address and/or port number.
  • the metadata recorded by Pseudo Catalog 122 is shown in Table 1:
  • the adapter 335 receives the graph query, based on the metadata, it determines that the external engine currently supported by the database management system includes the graph calculation engine, and then determines the graph calculation that can be used to process the graph query according to the foreign_engine_mapping in the metadata Engine instance, as shown in the figure, calculate the IP address of the server where the engine instance is located. Further, the adapter 335 determines the foreign_engine_wrapper corresponding to the graph calculation engine instance, that is, the interface and related hook functions, according to the foreign_engine_wrapper information in the metadata. Finally, the adapter 335 sends the graph query to the graph computing engine instance for processing through the determined interface, and returns the intermediate result and/or final structure of the processing to the SQL engine 330 through the hook function.
  • FIG. 8 shows a database system integrated with the multi-mode database management system described in the above embodiments provided by an embodiment of the present application, including: a data storage 203, a database management system 200, and a database 201 stored in the data storage 203 .
  • the database 201 contains data tables organized according to a relational model.
  • the client 10 establishes a communication connection with the database management system 200 through the network 30, and sends a request or query to the database management system 200 to access and/or modify the database 201 in the data storage 203, or import new data to the database 201.
  • the database management system 200 executes corresponding operations according to the received query to generate query results corresponding to the query, and returns the query results to the client 10.
  • the client 10 includes any type of device or application that is configured to interact with the database management system 200.
  • the client 10 includes one or more application servers.
  • the query initiated by the client 10 is described in a specific database language.
  • Database languages include but are not limited to: Structured Query Language (SQL) suitable for relational databases, and graph query languages suitable for graph databases (such as Gremlin ), a time series language suitable for time series database, etc.
  • the query submitted by the client 10 is a fusion query composed of multiple types of query languages, such as the fusion of the first type of query (such as SQL query) and the second type of query (such as graph query) included. Inquire.
  • the database management system 200 may be the multi-mode database management system described in the foregoing embodiment, and the specific working process may refer to the foregoing embodiments.
  • the operation of the database management system 200 depends on the necessary software and hardware environment, including but not limited to the hardware layer 251 and the operating system 255.
  • the hardware layer 251 includes the basic hardware units required for the operation of the operating system 255 and the database management system 200, such as processors, memory, input/output (I/O) devices, and network interface controllers. , NIC) etc.
  • the operating system 255 is system software that manages hardware units, and can provide functions such as memory management and thread scheduling.
  • the data storage 203 may be a non-transitory computer-readable storage medium such as a hard disk, a magnetic disk, a storage array, a storage server, a cloud storage, a storage area network (Storage Area Network, SAN), etc., and is in communication connection with the computing node where the hardware layer 251 is located.
  • the data storage 203 may also be integrated in the computing node where the hardware layer 251 is located, and exchange data with the processor and I/O devices through a bus or other internal communication methods.
  • computing node in the embodiments of this application refers to an entity that has hardware resources required to perform data calculation and/or storage, such as a physical machine or database server, or can call hardware resources for calculation and /Or storage entity, such as a virtual machine (VM) or container deployed in a physical machine.
  • VM virtual machine
  • the functions of the database management system 200 may be implemented by a processor executing an executable program stored in the memory.
  • executable program should be broadly interpreted as including but not limited to: instructions, instruction sets, codes, code segments, subroutines, software modules, applications, software packages, Threads, processes, functions, firmware, middleware, etc.
  • FIG. 8 only shows the components of the present invention.
  • the implementations disclosed in the embodiments are more relevant components.
  • "executable program” should be broadly interpreted as including but not limited to: instructions, instruction sets, codes, code segments, subroutines, software modules, applications, software packages, Threads, processes, functions, firmware, middleware, etc.
  • sequence numbers of the method steps described in the foregoing embodiments do not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种融合查询方法及多模数据库(Multi-Model Database,MMDB)框架,在关系数据库引擎中增加外部引擎的可扩展能力,通过用户表来进行外部扩展引擎的元数据管理,以最大限度的降低对关系数据库引擎的侵入,并实现外部引擎运行时的动态加载和卸载。为用户提供关系数据库、图数据库、时序数据库等多模数据库统一数据访问和维护接口,简化运维和应用开发人员的学习和使用成本,提升数据使用安全性。

Description

一种多语言融合查询方法及多模数据库*** 技术领域
本申请涉及数据库领域,更为具体地,涉及一种融合查询方法及多模数据库***。
背景技术
数据库***是许多应用***的核心。传统的数据库***是基于关系模型构建的关系型数据库***,专门用于处理结构化的数据。简单来说,关系模型就是二维表格模型,而一个关系型数据库就是由二维表及其之间的联系所组成的一个数据组织。随着互联网和人工智能化的发展,在结构化数据的基础上,逐步衍生出了半结构化数据,如JSON,XML格式,以及非结构化数据,如文本数据,音视频数据等。结构化数据的典型应用包括银行交易等;而半结构化数据则在用户画像、物联网设备日志采集、应用点击流分析等场景中得到大规模使用;非结构化数据则对应着海量的图片、视频、和文档处理等业务。为了适应各种类型数据的管理需求,很多非关系型的专用数据库***被开发出来,包括XML数据库,图数据库,时序数据库,文档数据库,key-value(KV)数据库等。
当前应用***变的越来越复杂,在很多场景下应用需要同时使用多种类型的数据,比如关系型数据、图、时序数据等,数据库也需要提供相应的计算能力,比如图遍历、图分析、时序计算等。以“平安城市”场景为例,当犯罪案件发生时,警方既需要通过关系型数据库查询犯罪嫌疑人的基本信息,行为记录等,也需要通过图计算引擎和图数据库来分析和查询嫌疑人的同行、同住、通话、社交等关系,进而搜索出和嫌疑人有直接或间接联系的人员。而不同类型数据的存储和管理服务通常分别由不同类型的数据库提供,因此用户需要分别使用多个数据库***,使用过程繁琐,多套独立的数据库***导致***的管理和维护复杂,且需要在数据库之间导入导出数据,增加了数据暴露的风险,数据的一致性也难以保证。
为解决上述问题,现有技术在关系型数据库的基础上,以UDT(user-defined type)的方式加入特定的数据类型,比如JSON类型,Spatial类型等,并通过user-defined function(UDF)的方式增加对类型数据的计算能力。相比于构建一个新的数据库***而言,虽然现有技术方案能够相对快速的扩展新数据类型的处理能力,但受限于原有关系型数据库的表结构,只能扩展一些数据长度较小的数据类型,而对于数据较大的数据类型,比如图数据,很难实现扩展,如果要支持图数据的处理,则要对原有的关系型数据库内核做较大的改造,开发周期长,且无法运行时扩展和卸载新的扩展计算引擎。
发明内容
本申请提供一种融合查询的方法和多模数据库管理***,为用户提供关系数据库、图数据库、时序数据库等多模数据库统一数据访问和维护接口,简化运维和应用开发人员的学习和使用成本,提升数据使用安全性。
第一方面,本申请实施例提供一种数据库***,包括:主计算引擎、一个或多个扩展计算引擎,以及适配器;所述主计算引擎用于,接收来自客户端的融合查询,所述融合查 询包括第一类型的查询和第二类型的查询;对所述第一类型的查询进行处理,以得到第一处理结果,并通过第一接口将所述第二类型的查询传递给所述适配器;所述适配器用于,基于所述一个或多个扩展计算引擎的元数据,确定用于处理所述第二类型的查询的第一扩展计算引擎,以及所述第一扩展计算引擎对应的第二接口;通过所述第二接口将所述第二类型的查询传递给所述第一扩展计算引擎;所述第一扩展计算引擎用于,处理所述第二类型的查询以得到第二处理结果,并将所述第二处理结果通过所述适配器返回给所述主计算引擎;所述主计算引擎还用于,根据所述第一处理结果和所述第二处理结果,生成查询结果,并将所述查询结果返回给所述客户端。
在一种可能的实现方式中,所述第一扩展计算引擎将所述第二类型的查询转化为第一类型的查询,并将转化后的查询发送给所述主计算引擎;所述主计算引擎处理所述转化后的查询,以得到查询结果。
在一种可能的实现方式中,所述第一类型的查询为SQL查询,所述第二类型的查询为图查询、时序查询或近似查询。
在一种可能的实现方式中,所述第二类型的查询通过用户定义函数UDF定义。
在一种可能的实现方式中,所述第一接口包括至少一个钩子函数;所述至少一个钩子函数与所述UDF关联。
在一种可能的实现方式中,所述元数据包括:所述多模数据库管理***支持的扩展计算引擎的信息。
在一种可能的实现方式中,所述扩展计算引擎的信息包括:所述扩展计算引擎的类型、所述扩展计算引擎的一个或多个实例所在的服务器的地址,以及所述扩展计算引擎所对应的接口信息;所述适配器具体用于,查询所述元数据以确定所述第一扩展计算引擎的第一引擎实例,以及所述第一引擎实例所对应的接口,通过所述第一引擎实例所对应的所述接口将所述第二类型的查询传递给所述第一引擎实例以进行处理。
在一种可能的实现方式中,所述元数据存储在所述多模数据库管理***的用户表中。
在一种可能的实现方式中,所述主计算引擎为结构化查询语言SQL引擎,所述一个或多个扩展计算引擎包括图计算引擎、时序引擎或近似查询引擎中的至少一个。
在一种可能的实现方式中,所述第一类型的查询为结构化查询语句,所述第二类型的查询为图查询语句,所述第一扩展计算引擎为图计算引擎。
第二方面,本申请实施例提供一种融合查询方法,可应用于多模数据库管理***中,该方法包括:数据库管理器***接收客户端提交的融合查询,所述融合查询包括第一类型的查询和第二类型的查询;通过主计算引擎对所述第一类型的查询进行处理,以得到第一处理结果;基于元数据,确定用于处理所述第二类型的查询的第一扩展计算引擎,以及所述第一扩展计算引擎对应的接口;通过所述接口将所述第二类型的查询传递给所述第一扩展计算引擎;所述第一扩展计算引擎对所述第二类型的查询进行处理以得到第二处理结果,所述主计算引擎通过所述接口接收所述第二处理结果,根据所述第一处理结果和所述第二处理结果,生成查询结果,并将所述查询结果返回给所述客户端。
在一种可能的实现方式中,所述第一扩展计算引擎将所述第二类型的查询转化为第一类型的查询,并将转化后的查询发送给所述主计算引擎;所述主计算引擎处理所述转化后 的查询,以得到查询结果。
第三方面,本申请实施例提供一种数据库服务器,包括用于执行上述第一方面或第一方面的任一实现方式所描述的多模数据库管理***的一个或多个功能单元,这些功能单元可以由软件模块实现,或者由硬件,比如处理器实现,或者由软件结合必要的硬件实现。
第四方面,本申请实施例提供一种数据库服务器,包括存储器、处理器以及存储在所述存储器上的计算机程序,当所述处理器执行所述计算机程序时实现上述第一方面或第一方面的任一实现方式所描述的多模数据库管理***的功能。
第五方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序(指令),该程序(指令)被处理器执行时实现上述第一方面或第一方面的任一实现方式所描述的多模数据库管理***的功能。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的一种数据库***的架构图。
图2是本申请实施例的数据库管理***的结构示意图。
图3是本申请实施例的计算引擎的示意图。
图4是本申请实施例的数据库管理***的工作流程示意图。
图5是本申请实施例的数据库管理***的工作流程示意图。
图6是本申请实施例的数据库管理***的结构示意图。
图7是本申请实施例的数据库管理***的工作流程示意图。
图8是本申请实施例的数据库***的架构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行详细描述,显然,所描述的实施例是本申请的一部分实施例,而不是全部实施例。
本申请实施例中所涉及的多个,是指两个或两个以上。另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
本申请实施例提供的方法可应用于数据库***(Database System)中。图1示出了数据库***的一种典型的逻辑架构,根据图1,数据库***100包括数据库110和数据库管理***(Database Management System,DBMS)130。
其中,数据库110是存储在数据存储器(Data Storage)120中的有组织的数据集合,即按照特定的数据模型组织、存储和使用的相关联的数据集合。根据组织数据所使用的数据模型的不同,数据可分为多种类型,比如关系型数据(relational data)、图(graph)数据、时序(time series)数据等。关系型数据是使用关系模型建模的数据,通常表示为表,表中的行表示一个对象或实体的相关值的集合。图数据,简称为“图”,用于表示对象或实体之间的关系,比如社交关系。时间序列数据,简称时序数据,是按时间顺序记录和索引的数 据列,用于描述一个对象在时间维度上的状态变化信息。
数据库管理***130是数据库***的核心,是用于组织、存储以及维护数据的***软件。客户端200可以通过数据库管理***130访问数据库110,数据库管理员也通过数据库管理***进行数据库的维护工作。数据库管理***130提供多种功能,供客户端200建立,修改和查询数据库,其中,客户端200可以为应用程序,或者用户设备。数据库管理***130所提供的功能可以包括但不限于以下几项:(1)数据定义功能,数据库管理***130提供数据定义语言(Data Definition Language,DDL)来定义数据库110的结构,DDL用于刻画数据库框架,并可以被保存在数据字典中;(2)数据存取功能,数据库管理***130提供数据操纵语言(Data Manipulation Language,DML),实现对数据库110的基本存取操作,比如检索、***、修改和删除;(3)数据库运行管理功能,数据库管理***130提供数据控制功能对数据库110运行进行有效地控制和管理,以确保数据正确有效;(4)数据库的建立和维护功能,包括数据库初始数据的载入,数据库的转储、恢复、重组织,***性能监视、分析等功能;(5)数据库的传输,数据库管理***提供处理数据的传输,以实现客户端与数据库管理***之间的通信,通常与操作***协调完成。
数据存储器120包括但不限于固态硬盘(SSD)、磁盘阵列、云存储或其他类型的非瞬态计算机可读存储介质。所属领域的技术人员可以理解一个数据库***可能包括比图1中所示的部件更少或更多的组件,或者包括与图1所示组件不同的组件,图1仅仅示出了与本发明实施例所公开的实现方式更加相关的组件。
本申请实施例提供一种能同时支持多种数据模型(比如关系型、图、键-值、时序等)的多模数据库(multi-model database,MMDB)管理***,以及基于该多模数据库管理***的多语言融合查询方法。其中,方法和装置是基于同一发明构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
图2示出了本申请实施例的一个多模数据库管理***的示例。根据图2,数据库管理***130包括:存储引擎170、适配器135以及若干个计算引擎(如图2所示的主计算引擎132,以及扩展计算引擎140和150)。计算引擎132、140和150分别为不同类型的计算引擎,每一种类型的计算引擎支持一种类型的查询语言,比如关系型数据库引擎(简称“关系引擎”)支持关系数据模型,用于处理处理关系型查询,比如结构化查询语言(Structured Query Language,SQL)查询;图计算引擎用于处理图查询,比如Gremlin查询;时序引擎用于处于时序查询。计算引擎的主要功能是根据客户端200提交的查询(Query),生成对应的执行计划,并依照执行计划进行数据操作,以产生查询结果。对于关系型数据库管理***来说,计算引擎主要包括SQL引擎和执行引擎。其中,SQL引擎主要完成SQL查询的解析,查询的重写以及执行计划的生成;执行引擎由操作算子及其相关的执行环境组成。常用的操作算子包括scan,hash join,aggregate等,执行环境主要由执行框架和资源管理器组成。
存储引擎170负责在文件***之上,向计算引擎提供访问数据的接口,同时提供索引管理,运行时的缓存、事务、日志等数据的管理。例如存储引擎170可以将主计算引擎132的执行结果通过物理I/O写入数据存储器120。
在一个实施例中,如图3所示,一个计算引擎包括解析器210,重写器230、优化器 250和执行器270。解析器210用于对输入的查询语句进行词法分析、语法分析、语义分析,输出查询解析树。重写器230用于将查询变换为易于优化的格式,比如通过操作合并、谓词转换等方式来重写查询语句。优化器250用于基于查询代价预估、基于规则或者基于机器学习等方法选择最优执行路径,进而生成执行计划。执行器270用于通过存储引擎读取数据,根据执行计划将数据处理后得到处理结果,并将处理结果返回给客户端。
本申请实施例的数据库管理***130,在主计算引擎132的基础上,还扩展了其它的计算引擎,如计算引擎140和150,原始数据始终以关系型存储,且仅存储一份,在执行查询的过程中主计算引擎132可以动态调用扩展的计算引擎以进行特定处理,以支持多种查询语言的融合查询,且避免了在不同的数据库***之间导入和导出数据,提高了***的安全性。可以理解的是,扩展计算引擎140和150是与主计算引擎132不同类型的计算引擎。比如,主计算引擎132可以为关系型计算引擎,扩展计算引擎140为图引擎,扩展计算引擎150为时序引擎。
参照图2,数据库管理***130接收来自客户端200的查询,将查询传递给主计算引擎132进行处理,并将处理结果返回给客户端200。在一个实施例中,客户端200发起的查询为融合查询,即包含多种查询语言的扩展查询语句。下面给出融合查询的一个示例:
with suspects(cid)as Gremlin(‘
(g.v().has(‘cid1’,‘1111111111’).
outE(‘flight’,‘CA1315’).has(‘time1’,‘2016/7/1’).
outV().
inE(‘call’).has(‘time1’,gt(‘2016/6/24’)).count().gt(3)’)
select photo,phone#,wechatid
from suspects s,citizen c
where c.id=s.id
以上查询语句是同时包含SQL和图查询的融合查询语句,其中,加粗以及斜体字部分为图查询语句,“select”开头的部分为SQL查询语句。
融合查询的另一个示例如下:
with crossing_traffic_flow(cno int,direction char,agg_traffic int)as
Timeseries(‘
select cno,direction,sum(laneout)-sum(lanein)
from traffic_flow
groupby laneid,timestamp,direction,cno’)
select crossing.add,traffic.cno,sum(laneout)-sum(lanein)
from crossing,ccrossing_traffic_flow traffic
where crossing.cno=traffic.cno
这是包含时序和SQL的融合查询,其中,加粗以及斜体字部分为图查询语句,“select”开头的部分为SQL查询语句。
传统的数据库管理***只能支持单一类型的查询,无法支持融合查询。本申请实施例的数据库管理***130可以在运行时动态扩展外部计算引擎,以支持由多种类型查询语言构成的融合查询。具体地,数据库管理***130接收到融合查询后,识别出该融合查询包含的第一类型的查询(比如SQL查询)和第二类型的查询(比如图查询),将第一类型的查询交由主计算引擎132以进行处理,并通过预先配置的一个或多个接口,比如集成在主计算引擎132中的接口142,将第二类型的查询传递给适配器135。适配器135是主计算引擎132和扩展计算引擎140和150之间的桥梁。
元数据(Pseudo Catalog)122用于存储扩展计算引擎的信息,元数据122包括但不限于如下信息中的一项或多项:***当前可用的扩展计算引擎的类型、扩展计算引擎的ID、扩展计算引擎实例所在的服务器的地址、扩展计算引擎的接口信息等。例如,Pseudo Catalog可以包括扩展计算引擎的类型与外部扩展引擎所在服务器的地址之间的映射,以及扩展计算引擎的类型与扩展计算引擎的接口之间的映射。在扩展计算引擎多实例部署的情况下,即同一扩展计算引擎的多个实例分布在多个计算节点上时,Pseudo Catalog还包括外部扩展引擎类型与外部扩展引擎实例的之间的映射。在一个实施例中,数据库管理***可以通过一张或多张用户表的形式来存储上述映射,这样对主计算引擎132的内核修改较小。
在一个实施例中,适配器135基于Pseudo Catalog 122记录的信息,确定出用于处理第二类型的查询的扩展计算引擎140,以及扩展计算引擎140对应的接口,并将第二类型的查询,或者第二类型的查询的参数通过该接口传递给扩展计算引擎140进行处理。扩展计算引擎140处理第二类型的查询后得到处理结果,并通过适配器将该处理结果反馈给主计算引擎132。可以理解的是,扩展计算引擎140也可以在处理第二类型的查询的过程中,通过适配器135返回中间结果给主计算引擎132,主计算引擎132可以基于扩展计算引擎140返回的中间进行查询处理。也就是说,主计算引擎132在处理第一类型的查询时可以参考扩展计算引擎140对第二类型的查询处理的中间结果。
在一个实施例中,适配器135包括公共模块封装器(Common Envelope Wrapper)和外部引擎封装器(Foreign Engine Wrapper)。Common Envelope Wrapper用于实现扩展计算引擎的初始化、启动、终止,实现扩展计算引擎与主计算引擎之间的心跳,握手,异常处理等。Foreign Engine Wrapper为函数的执行过程提供了一些钩子函数,用来向扩展计算引擎传递查询参数等信息,并且在扩展计算引擎的解析器(parser)、重写器(rewriter)、优化器(optimizer)、执行器(executor)等各个组件的处理阶段返回结果给主计算引擎进行相应地处理。
具体地,在一个实施例中,如图4所示,在主计算引擎132的线程启动阶段(InitPostgres),调用Common Envelope Wrapper,对***表中的Pseudo Catalog进行初始化等操作。在主计算引擎132处理查询的各个阶段,例如parser、rewriter、optimizer和executor等,都加入了钩子函数,实现与扩展计算引擎的交互。例如,主计算引擎132、扩展计算引擎140和150均可以注册钩子函数,每个钩子函数会在特定的条件或事件下被调用,进而实现相应功能,比如传递消息。例如,主计算引擎132可以注册一个或多个钩子函数,在主计算引擎132在处理查询的过程中,会调用注册的钩子函数,该钩子函数触发适配器335确定出一个特定的扩展计算引擎或者扩展计算引擎实例,并确定相关的接口,然后通 过对应的接口将信息传递给扩展计算引擎,比如将第二类型的查询相关的信息传递给扩展计算引擎。扩展计算引擎也可以注册一些列钩子函数,在扩展计算引擎处理第二类型的查询的过程中,比如通过图查询解析、重写和优化,进图查询转化为SQL查询后,调用钩子函数,钩子函数进而通过适配器135以及主计算引擎132对应的接口将转化为SQL查询的图查询返回给主计算引擎132,主计算引擎132继续处理该转化为SQL查询的图查询,得到查询结果。在线程结束阶段,再次调用Common Envelope Wrapper,释放资源,清理缓存。
图5示出了数据库管理***130处理包含SQL和图查询的融合查询语句的过程。根据图5,数据库管理***130可以定义一个用户定义函数(user defined function,UDF),该UDF的入参为图计算引擎支持的图查询语句,返回类型是多条记录的表结果集。融合查询中的图查询可以视为该UDF。该UDF通过调用接口142将图查询语句传递给适配器135,适配器135基于Pseudo Catalog 122记录的扩展计算引擎的元数据,确定图计算引擎340对应的接口152,进而通过调用接口152将图查询语句传递给图计算引擎340以进行处理。
在一个实施例中,图计算引擎340依次对图查询语句进行解析、重写、优化、执行等操作,以得到查询结果。进一步地,图计算引擎340可以通过适配器135将查询结果返回给主计算引擎132。
在另一个实施例中,图计算引擎340在对图查询处理的各个阶段,也可以通过适配器135将中间结果返回给主计算引擎132。例如,图计算引擎340可以通过解析、重写等操作,将图查询转化为SQL查询,然后通过适配器135将转化后的SQL查询传递给主计算引擎(关系型计算引擎),关系型计算引擎进而对该转化后的SQL查询进行进一步处理,以得到处理结果。
所属领域的技术人员可以理解数据库管理***130可能包括比图2中所示的部件更少或更多的组件,或者包括与图2中所示组件不同的组件,图2仅仅示出了与本发明实施例所公开的实现方式更加相关的组件。例如数据库管理***130包含的扩展计算引擎包括但不限于图2所示的2个,可以为1个,或者多于2个。
图6示出了本申请实施例的另一个数据库管理***的示例。根据图6,数据库管理***230包括SQL引擎330,图计算引擎340、时序引擎350、近似查询引擎360、适配器335和存储引擎370。数据库管理***230支持多种类型的融合查询,比如包含关系查询和图查询的融合查询、包含关系查询和时序查询的融合查询等等。
SQL引擎330接收到融合查询后,识别出融合查询中包含的特定类型的查询,比如图查询、时序查询等。其中,本申请实施例可以通过用户定义函数(UDF)来在SQL查询中扩展其它类型的查询,如图查询、时序查询等。以图5右上角方框中的查询为例,以“Gremlin”开头的斜体部分的语句可以视为一个UDF。UDF通常与特定的接口关联。
在一个实施例中,如图6所示,SQL引擎330包括解析器332、优化器334、执行器336,以及跳转模块338。解析器332用于把SQL查询语句通过词法和语法解析成特定的结构体,比如查询树。优化器334基于规则或基于代价模型生成查询语句对应的最优执行计划。执行器336执行优化器334生成的执行计划,以得到查询结果。
在一个实施例中,跳转模块338包括一系列钩子函数(Hook),每一个UDF都关联了一个或多个钩子函数。UDF调用其关联的钩子函数,钩子函数进而通过适配器335将信息 传递给扩展计算引擎,比如图计算引擎340、时序引擎350和近似查询引擎360中的某一个。同时,扩展计算引擎的处理结果也可以通过适配器335返回给UDF。
在一个实施例中,如图7所示,假设客户端200发起的查询为包含关系查询和图查询的融合查询,图查询对应的UDF会调用其关联的钩子函数,进而通过钩子函数将图查询传递至适配器335。适配器335基于Pseudo Catalog 122记录的扩展计算引擎的元数据,确定用于处理该图查询的扩展计算引擎类型、该类型的扩展计算引擎实例所在的服务器ID,以及该扩展计算引擎实例对应的接口,进而通过该接口将图查询发送给该扩展计算引擎实例进行处理。这里的服务器ID,包括但不限于服务器器IP地址和/或端口号。
在一个实施例中,Pseudo Catalog 122记录的元数据如表1所示:
Figure PCTCN2020090393-appb-000001
表1
如图7所示,适配器335接收到图查询后,基于该元数据,确定数据库管理***当前支持的外部引擎包含图计算引擎,然后根据元数据中的foreign_engine_mapping确定可用于处理该图查询的图计算引擎实例,如图计算引擎实例所在服务器的IP地址。进一步地,适配器335根据元数据中的foreign_engine_wrapper信息确定该图计算引擎实例所对应的foreign_engine_wrapper,即接口和相关的钩子函数。最后,适配器335通过确定的接口将图查询发送至图计算引擎实例进行处理,并将处理的中间结果和/或最终结构通过钩子函数返回给SQL引擎330。
图8示出了本申请实施例提供的一种集成了以上实施例描述的多模数据库管理***的数据库***,包括:数据存储器203、数据库管理***200,以及存储在数据存储器203中的数据库201。数据库201包含按照关系型模型组织的数据表。
客户端10通过网络30与数据库管理***200建立通信连接,并向数据库管理***200发送请求或查询(query),以访问和/或修改数据存储器203中的数据库201,或者导入新的数据至数据库201。数据库管理***200根据接收到的查询,执行相应的操作以生成查询所对应的查询结果,并将查询结果返回给客户端10。
客户端10包括被配置成与数据库管理***200交互的任何类型的设备或应用程序。在一些示例中,客户端10包括一个或多个应用服务器。客户端10发起的查询是使用特定数据库语言描述的,数据库语言包括但不限于:适用于关系型数据库的结构化查询语言(Structured Query Language,SQL),适用于图数据库的图查询语言(比如Gremlin),适用于时序数据库(time series database)的时序语言等等。在一个实施例中,客户端10提交的 查询是由多种类型查询语言构成的融合查询,比如包含的第一类型的查询(比如SQL查询)和第二类型的查询(比如图查询)的融合查询。
数据库管理***200可以为前述实施例所描述的多模数据库管理***,具体工作过程可以参见前述各个实施例。
数据库管理***200的运行依赖于必要的软硬件环境,包括但不限于硬件层251和操作***255。其中,硬件层251包含操作***255和数据库管理***200运行所需的基本硬件单元,例如,处理器,内存(Memory)、输入/输出(I/O)设备、网络接口控制器(network interface controller,NIC)等。操作***255是管理硬件单元的***软件,可以提供内存管理、线程调度等功能。
数据存储器203可以是硬盘、磁盘、存储阵列、存储服务器、云存储、存储区网络(Storage Area Network,SAN)等非瞬态计算机可读存储介质,与硬件层251所在的计算节点通信连接。或者,数据存储器203也可以集成在硬件层251所在的计算节点,与处理器和I/O设备通过总线或其它内部通信方式交互数据。需要说明的是,本申请实施例中的“计算节点”,是指具备执行数据计算和/或存储所需的硬件资源的实体,比如物理机或数据库服务器等,或者能够调用硬件资源进行计算和/或存储的实体,比如物理机中部署的虚拟机(virtual machine,VM)或容器等。
在一个实施例中,数据库管理***200的功能可以由处理器执行内存中存储的可执行程序来实现。应理解,在本发明的各种实施例中,“可执行程序”应被广泛地解释为包括但不限于:指令,指令集,代码,代码段,子程序,软件模块,应用,软件包,线程,进程,函数,固件,中间件等。
所属领域的技术人员可以理解一个数据库***可能包括比图8中所示的部件更少或更多的组件,或者包括与图8中所示组件不同的组件,图8仅仅示出了与本发明实施例所公开的实现方式更加相关的组件。应理解,在本发明的各种实施例中,“可执行程序”应被广泛地解释为包括但不限于:指令,指令集,代码,代码段,子程序,软件模块,应用,软件包,线程,进程,函数,固件,中间件等。上述实施例描述的方法步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实上施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以硬件、或者计算机软件和硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能。

Claims (14)

  1. 一种多模数据库管理***,其特征在于,包括:主计算引擎、一个或多个扩展计算引擎,以及适配器;
    所述主计算引擎用于,接收来自客户端的融合查询,所述融合查询包括第一类型的查询和第二类型的查询;对所述第一类型的查询进行处理,以得到第一处理结果,并通过第一接口将所述第二类型的查询传递给所述适配器;
    所述适配器用于,基于所述一个或多个扩展计算引擎的元数据,确定用于处理所述第二类型的查询的第一扩展计算引擎,以及所述第一扩展计算引擎对应的第二接口;通过所述第二接口将所述第二类型的查询传递给所述第一扩展计算引擎;
    所述第一扩展计算引擎用于,处理所述第二类型的查询以得到第二处理结果,并将所述第二处理结果通过所述适配器返回给所述主计算引擎;
    所述主计算引擎还用于,根据所述第一处理结果和所述第二处理结果,生成查询结果,并将所述查询结果返回给所述客户端。
  2. 根据权利要求1所述的多模数据库管理***,其特征在于,所述第二类型的查询通过用户定义函数UDF定义。
  3. 根据权利要求2所述的多模数据库管理***,其特征在于,所述第一接口包括至少一个钩子函数;所述至少一个钩子函数与所述UDF关联。
  4. 根据权利要求1至3任一项所述的多模数据库管理***,其特征在于,所述元数据包括:所述多模数据库管理***支持的扩展计算引擎的信息。
  5. 根据权利要求4所述的多模数据库管理***,其特征在于,所述扩展计算引擎的信息包括:所述扩展计算引擎的类型、所述扩展计算引擎的一个或多个实例所在的服务器的地址,以及所述扩展计算引擎所对应的接口信息。
  6. 根据权利要求1至5任一项所述的多模数据库管理***,其特征在于,所述元数据存储在所述多模数据库管理***的用户表中。
  7. 根据权利要求1至6任一项所述的多模数据库管理***,其特征在于,所述主计算引擎为结构化查询语言SQL引擎,所述一个或多个扩展计算引擎包括图计算引擎、时序引擎或近似查询引擎中的至少一个。
  8. 根据权利要求7所述的多模数据库管理***,其特征在于,所述第一类型的查询为结构化查询语句,所述第二类型的查询为图查询语句,所述第一扩展计算引擎为图计算引擎。
  9. 一种数据库服务器,包括处理器、存储器及存储在所述存储器上并可被所述处理器执行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至8中任一项所述的多模数据库管理***的功能。
  10. 一种融合查询方法,其特征在于,包括:
    接收客户端提交的融合查询,所述融合查询包括第一类型的查询和第二类型的查询;
    通过主计算引擎对所述第一类型的查询进行处理,以得到第一处理结果;
    基于元数据,确定用于处理所述第二类型的查询的第一扩展计算引擎,以及所述第一扩展计算引擎对应的接口;
    通过所述接口将所述第二类型的查询传递给所述第一扩展计算引擎;
    通过所述第一扩展计算引擎对所述第二类型的查询进行处理以得到第二处理结果;
    根据所述第一处理结果和所述第二处理结果,生成查询结果,并将所述查询结果返回给所述客户端。
  11. 根据权利要求10所述的方法,其特征在于,所述第二类型的查询通过用户定义函数UDF定义。
  12. 根据权利要求10或11所述的方法,其特征在于,所述元数据包括:所述多模数据库管理***支持的扩展计算引擎的信息。
  13. 根据权利要求10至12任一项所述的方法,其特征在于,所述扩展计算引擎的信息包括:所述扩展计算引擎的类型、所述扩展计算引擎的一个或多个实例所在的服务器的地址,以及所述扩展计算引擎所对应的接口信息;所述适配器具体用于,查询所述元数据以确定所述第一扩展计算引擎的第一引擎实例,以及所述第一引擎实例所对应的接口,通过所述第一引擎实例所对应的所述接口将所述第二类型的查询传递给所述第一引擎实例以进行处理。
  14. 根据权利要求10至13任一项所述的方法,其特征在于,所述第一类型的查询为结构化查询语句,所述第二类型的查询为图查询语句,所述第一扩展计算引擎为图计算引擎。
PCT/CN2020/090393 2019-05-15 2020-05-15 一种多语言融合查询方法及多模数据库*** WO2020228801A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CA3137857A CA3137857A1 (en) 2019-05-15 2020-05-15 Multi-language fusion query method and multi-model database system
US17/525,792 US11907216B2 (en) 2019-05-15 2021-11-12 Multi-language fusion query method and multi-model database system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910403857.0 2019-05-15
CN201910403857.0A CN111949650A (zh) 2019-05-15 2019-05-15 一种多语言融合查询方法及多模数据库***

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/525,792 Continuation US11907216B2 (en) 2019-05-15 2021-11-12 Multi-language fusion query method and multi-model database system

Publications (1)

Publication Number Publication Date
WO2020228801A1 true WO2020228801A1 (zh) 2020-11-19

Family

ID=73289817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090393 WO2020228801A1 (zh) 2019-05-15 2020-05-15 一种多语言融合查询方法及多模数据库***

Country Status (4)

Country Link
US (1) US11907216B2 (zh)
CN (1) CN111949650A (zh)
CA (1) CA3137857A1 (zh)
WO (1) WO2020228801A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064925A (zh) * 2021-03-15 2021-07-02 深圳依时货拉拉科技有限公司 一种大数据查询方法、***及计算机可读存储介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210336848A1 (en) * 2020-04-27 2021-10-28 Puneet Kumar Agarwal System for networking device with data model engines for configuring network parameters
CN112925916A (zh) * 2021-01-27 2021-06-08 北京一帆新媒网络科技有限公司 基于nlp和知识图谱的机器学习训练架构设计方法
CN112783927B (zh) * 2021-01-27 2023-03-17 浪潮云信息技术股份公司 一种数据库查询方法及***
CN113761290A (zh) * 2021-03-10 2021-12-07 中科天玑数据科技股份有限公司 一种基于sql实现全文检索图数据库的查询方法及查询***
US11720570B2 (en) * 2021-03-26 2023-08-08 Thoughtspot, Inc. Aggregation operations in a distributed database
CN113312378B (zh) * 2021-07-28 2022-02-01 阿里云计算有限公司 建模方法、组件、设备、存储介质和数据仓库创建***
US20230185817A1 (en) * 2021-12-09 2023-06-15 Dell Products L.P. Multi-model and clustering database system
US20220171772A1 (en) * 2022-02-15 2022-06-02 Garner Distributed Workflow Inc. Structured query language interface for tabular abstraction of structured and unstructured data
CN115269561B (zh) * 2022-09-21 2023-01-24 国网智能电网研究院有限公司 一种混合数据库管理方法、装置、混合数据库及电子设备
CN115358729B (zh) * 2022-10-21 2023-01-13 成都戎星科技有限公司 一种卫星影像数据智能发布***
CN116483886B (zh) * 2023-04-10 2024-04-02 上海沄熹科技有限公司 结合kv存储引擎和时序存储引擎查询olap的方法
CN117290411B (zh) * 2023-11-22 2024-02-13 深圳九有数据库有限公司 一种多模数据库查询方法、装置、电子设备及存储介质
CN117632970B (zh) * 2023-12-18 2024-06-14 智人开源(北京)科技有限公司 多模融合数据库、数据库的数字孪生实体数据存储方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706810A (zh) * 2009-11-23 2010-05-12 北京中创信测科技股份有限公司 一种数据库查询方法及装置
US20160224667A1 (en) * 2015-02-04 2016-08-04 Xinyu Xingbang Information Industry Co., Ltd. Method and system of implementing an integrated interface supporting operation in multi-type databases
CN107273422A (zh) * 2017-05-17 2017-10-20 南京中孚信息技术有限公司 一种为关系型数据库扩展图计算功能的***
CN109145025A (zh) * 2018-09-14 2019-01-04 阿里巴巴集团控股有限公司 一种多数据源集成的数据查询方法、装置及业务服务器
CN109241054A (zh) * 2018-08-02 2019-01-18 成都松米科技有限公司 一种多模型数据库***、实现方法以及服务器

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101705810B (zh) 2009-12-11 2012-09-05 安东石油技术(集团)有限公司 一种存在多孔管的油气井的控流过滤器管柱分段控流方法
CN104699720A (zh) * 2013-12-10 2015-06-10 中兴通讯股份有限公司 海量数据融合存储方法及***
WO2016018201A1 (en) * 2014-07-28 2016-02-04 Hewlett-Packard Development Company, L.P. Searching relational and graph databases
US9971755B2 (en) 2015-02-03 2018-05-15 Flipboard, Inc. Selecting additional supplemental content for presentation in conjunction with a content item presented via a digital magazine
US20160253380A1 (en) * 2015-02-26 2016-09-01 Red Hat, Inc. Database query optimization
US10452975B2 (en) * 2016-06-19 2019-10-22 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
EP3475845A1 (en) * 2016-06-23 2019-05-01 Schneider Electric USA Inc. Contextual-characteristic data driven sequential federated query methods for distributed systems
CN106844545A (zh) * 2016-12-30 2017-06-13 江苏瑞中数据股份有限公司 一种基于标准sql的双引擎数据库***的实现方法
US10776189B2 (en) * 2017-12-22 2020-09-15 MuleSoft, Inc. API query
US11321330B1 (en) * 2018-06-26 2022-05-03 Amazon Technologies, Inc. Combining nested data operations for distributed query processing
US10880273B2 (en) * 2018-07-26 2020-12-29 Insight Sciences Corporation Secure electronic messaging system
US11080282B2 (en) * 2018-10-02 2021-08-03 Sap Se Complex filter query of multiple data sets
US10963518B2 (en) * 2019-02-22 2021-03-30 General Electric Company Knowledge-driven federated big data query and analytics platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706810A (zh) * 2009-11-23 2010-05-12 北京中创信测科技股份有限公司 一种数据库查询方法及装置
US20160224667A1 (en) * 2015-02-04 2016-08-04 Xinyu Xingbang Information Industry Co., Ltd. Method and system of implementing an integrated interface supporting operation in multi-type databases
CN107273422A (zh) * 2017-05-17 2017-10-20 南京中孚信息技术有限公司 一种为关系型数据库扩展图计算功能的***
CN109241054A (zh) * 2018-08-02 2019-01-18 成都松米科技有限公司 一种多模型数据库***、实现方法以及服务器
CN109145025A (zh) * 2018-09-14 2019-01-04 阿里巴巴集团控股有限公司 一种多数据源集成的数据查询方法、装置及业务服务器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN, SHU: "Optimizations of Graph Queries in Relational Databases", CHINESE MASTER’S THESES FULL-TEXT DATABASE, INFORMATION SCIENCE AND TECHNOLOGY, 15 July 2016 (2016-07-15), pages 1 - 66, XP055753560 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064925A (zh) * 2021-03-15 2021-07-02 深圳依时货拉拉科技有限公司 一种大数据查询方法、***及计算机可读存储介质

Also Published As

Publication number Publication date
CA3137857A1 (en) 2020-11-19
CN111949650A (zh) 2020-11-17
US11907216B2 (en) 2024-02-20
US20220075780A1 (en) 2022-03-10

Similar Documents

Publication Publication Date Title
WO2020228801A1 (zh) 一种多语言融合查询方法及多模数据库***
JP7113040B2 (ja) 分散型データストアのバージョン化された階層型データ構造
US10007698B2 (en) Table parameterized functions in database
TWI710919B (zh) 資料儲存裝置、轉譯裝置及資料庫存取方法
JP6266630B2 (ja) アーカイブされたリレーションを有する連続クエリの管理
US11354284B2 (en) System and method for migration of a legacy datastore
WO2020135050A1 (zh) 知识图谱***及其图服务器
US7984043B1 (en) System and method for distributed query processing using configuration-independent query plans
WO2018233364A1 (zh) 索引更新方法、***及相关装置
WO2018177060A1 (zh) 查询优化方法及相关装置
US8224807B2 (en) Enhanced utilization of query optimization
US11216455B2 (en) Supporting synergistic and retrofittable graph queries inside a relational database
Li et al. An integration approach of hybrid databases based on SQL in cloud computing environment
CN108431766B (zh) 用于访问数据库的方法和***
WO2019226328A1 (en) Data analysis over the combination of relational and big data
CN115269561B (zh) 一种混合数据库管理方法、装置、混合数据库及电子设备
May et al. SAP HANA-From Relational OLAP Database to Big Data Infrastructure.
US20240143594A1 (en) Offloading graph components to persistent storage for reducing resident memory in distributed graph processing
Rompf et al. A SQL to C compiler in 500 lines of code
US9280582B2 (en) Optimization of join queries for related data
Kvet et al. Enhancing Analytical Select Statements Using Reference Aliases
US20190370242A1 (en) Electronic database and method for forming same
Gowraj et al. S2mart: smart sql to map-reduce translators
CN112732704B (zh) 一种数据处理方法、装置及存储介质
US20240143566A1 (en) Data processing method and apparatus, and computing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20805369

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3137857

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20805369

Country of ref document: EP

Kind code of ref document: A1