CN111475534B

CN111475534B - Data query method and related equipment

Info

Publication number: CN111475534B
Application number: CN202010397694.2A
Authority: CN
Inventors: 钟舒妍; 邓范鑫
Original assignee: Beijing Aibee Technology Co Ltd
Current assignee: Beijing Aibee Technology Co Ltd
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2023-04-14
Anticipated expiration: 2040-05-12
Also published as: CN111475534A

Abstract

The application discloses a data query method and related equipment, wherein the method comprises the following steps: after the query language input by the user is obtained, firstly, the query statement is analyzed to obtain first information and second information. The first information is data source information stored with a query target; the first information includes third information and fourth information; the third information represents the data type of the target data set, the fourth information represents the storage identification of the target data set in the data source, and the target data set is a data set required for query processing of a query target; the second information characterizes a feature identification of the query object. Then, determining a query action according to the first information, and determining a target data set from the data pool according to the first information; the data pool comprises N data sources; the data sources include at least one data set, and the data types of the data sets stored in the different data sources are different. Finally, a query target is determined using the query action and the target dataset to improve query efficiency.

Description

Data query method and related equipment

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data query method and related devices.

Background

With the proliferation of data information, the data types of data sets (e.g., tables, texts, documents, knowledge profiles, etc.) used to record data are also subject to diversification. For example, the data types of a dataset may include structured data (e.g., tables) as well as unstructured data (e.g., documents or knowledge graphs).

At present, when a technician needs to perform data Query on data to be queried, the technician needs to determine a Query Language (such as a Language such as an object oriented programming Language (object oriented programming Language) and a Structured Query Language (SQL) Language) needed for processing a target data set according to a data type of the target data set (that is, a data set used for determining the data to be queried) as a target Query Language. Then, the technician uses the target query language to perform query processing in the target data set, and determines the data to be queried. For example, when the target data set is structured data, the technician may first generate an SQL query statement using the SQL language, and then perform a data query from the database using the SQL query statement. Therefore, the data query process is complicated because technicians need to input different query languages for data sets of different data types for query.

Disclosure of Invention

In order to solve the technical problems in the prior art, the data query method and the related device are provided, so that technical personnel do not need to input different query languages for data sets of different data types for query, the data query process is simplified, and the data query efficiency is improved.

In order to achieve the above object, the embodiments of the present application provide the following technical solutions:

the embodiment of the application provides a data query method, which comprises the following steps:

acquiring a query statement input by a user; the query statement carries information required for querying a query target;

analyzing the query statement to obtain first information and second information; wherein the first information comprises third information and fourth information; the third information represents the data type of a target data set, the fourth information represents the storage identifier of the target data set in a data source, and the target data set is a data set required for query processing of the query target; the second information represents the characteristic identification of the query target;

determining a query action according to the third information, and determining the target data set from a data pool according to the first information; the data pool comprises N data sources, wherein N is a positive integer; the data sources comprise at least one data set, and the data types of the data sets stored in different data sources are different;

determining the query target using the query action and the target dataset.

Optionally, the determining a query action according to the third information specifically includes:

determining the query action according to the third information and the first mapping relation; and the first mapping relation is used for recording query actions corresponding to data sets of different data types.

Optionally, the analyzing the query statement to obtain the first information and the second information specifically includes:

analyzing the query statement to obtain first information, second information and data operation information;

the determining a query action according to the third information specifically includes:

and generating a query action according to the third information and the data operation information.

Optionally, the generating a query action according to the third information and the data operation information specifically includes:

determining an initial action according to the third information and a second mapping relation; the second mapping relation is used for recording query actions corresponding to data sets of different data types;

and generating a query action according to the initial action and the data operation information.

analyzing the query statement to obtain first information, second information and fifth information; wherein the fifth information is attribute description information of the query target in the target data set;

and generating a query action according to the third information and the fifth information.

Optionally, the determining the target data set from the data pool according to the first information specifically includes:

determining a target data source from the data pool according to the third information;

and determining a target data set from the target data source according to the fourth information.

identifying a programming normal form type used by the query statement, and determining the programming normal form type as a target programming normal form type;

and analyzing the query statement according to the target programming paradigm type to obtain first information and second information.

An embodiment of the present application further provides a data query device, including:

the acquisition unit acquires a query sentence input by a user; the query statement carries information required for querying a query target;

the analysis unit is used for analyzing the query statement to obtain first information and second information; wherein the first information comprises third information and fourth information; the third information represents the data type of a target data set, the fourth information represents the storage identifier of the target data set in a data source, and the target data set is a data set required for query processing of the query target; the second information represents the characteristic identification of the query target;

a first determining unit, configured to determine a query action according to the third information, and determine the target data set from a data pool according to the first information; the data pool comprises N data sources, wherein N is a positive integer; the data sources comprise at least one data set, and the data types of the data sets stored in different data sources are different;

a second determining unit for determining the query target using the query action and the target dataset.

An embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to execute any implementation manner of the data query method provided by the embodiment of the application according to the computer program.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a computer program, and the computer program is used for executing any implementation manner of the data query method provided in the embodiment of the present application.

Compared with the prior art, the embodiment of the application has at least the following advantages:

in the data query method provided by the embodiment of the application, after the query language input by the user is obtained, the query statement is firstly analyzed to obtain the first information and the second information. The first information comprises third information and fourth information; the third information represents the data type of the target data set, the fourth information represents the storage identification of the target data set in the data source, and the target data set is a data set required for query processing of the query target; the second information characterizes a feature identification of the query object. Then, determining a query action according to the first information, and determining a target data set from the data pool according to the first information; the data pool comprises N data sources; the data sources include at least one data set, and the data types of the data sets stored in the different data sources are different. Finally, a query objective is determined using the query action and the objective dataset.

It can be seen that, since the query statement input by the user carries information (for example, a data type of the target data set, a storage identifier of the target data set, and feature identifier information of the query target) required for querying the query target, after the first information and the second information are obtained by parsing the query statement, the query action and the target data set used for querying the query target can be directly determined by using the first information and the second information, and the query target is determined from the target data set by using the query action, so that the purpose of querying data of data sets of different data types by using one query statement input by the user is achieved, the defect that technicians need to input different query languages for the data sets of different data types for querying is overcome, the data query process is simplified, and the data query efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a data query method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a data pool provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a data query provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a syntax structure of an MQL statement provided in an embodiment of the present application;

FIG. 5 is a schematic illustration of a map provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a data set for storing RDF data according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data source for storing document data according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a data query device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.

Detailed Description

The inventor finds that the traditional data query has the following defects in the traditional data query research: (1) in a traditional data query process, because data sets of different data types need to be queried by different query languages, technicians need to input different query languages for querying the data sets of different data types, and thus the technicians need to know various types of query languages, thereby increasing the technical threshold of the technicians. (2) In conventional data query processes, a query language is only suitable for querying data sets of one data type (e.g., SQL is only suitable for querying structured data like tables). However, in many data query processes, data query from data sets of multiple data types may be involved, and in this case, a technician needs to perform query alternately using multiple different types of query languages, which results in a complicated query process and thus a low query efficiency. (3) In some complex query scenarios, dependency relationships often exist between query tasks of different data sets, so that a single language cannot meet the requirements of the complex query scenarios.

In order to solve the above technical problem, an embodiment of the present application provides a data query method, including: after the query language input by the user is obtained, firstly, the query statement is analyzed to obtain first information and second information. The first information comprises third information and fourth information; the third information represents the data type of the target data set, the fourth information represents the storage identification of the target data set in the data source, and the target data set is a data set required for query processing of the query target; the second information characterizes a feature identification of the query object. Then, determining a query action according to the first information, and determining a target data set from the data pool according to the first information; the data pool comprises N data sources; the data sources include at least one data set, and the data types of the data sets stored in the different data sources are different. Finally, a query objective is determined using the query action and the objective dataset.

It can be seen that, because the query statement input by the user carries information (for example, a plurality of information such as a data type of a target data set, a storage identifier of the target data set, and a feature identifier of the query target) required for querying the query target, after the first information and the second information are obtained by parsing in the query statement, the query action and the target data set used for querying the query target can be directly determined by using the first information and the second information, and the query target is determined from the target data set by using the query action, so that the purpose of querying data of data sets of different data types based on one query statement input by the user is achieved, disadvantages existing in conventional data query are overcome, a data query process is simplified, and data query efficiency is improved.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Method embodiment

Referring to fig. 1, the figure is a flowchart of a data query method provided in an embodiment of the present application.

The data query method provided by the embodiment of the application comprises the following steps of S1-S5:

s1: and acquiring a query statement input by a user.

The query statement refers to an instruction statement input by a user for performing data query; and the query statement carries information required for querying the query target. The query target refers to a query result determined by using the query statement. For example, when a user queries "sum of 3 and 2" using a query statement, then the query target is "5" (i.e., the sum of 3 and 2).

It should be noted that the query target is not limited in the embodiments of the present application, for example, the query target may include at least one of data existing in a table, data calculated by using the data in the table, entities and/or relationships in a graph, characters recorded in a document, text information (e.g., semantic information, translation information, subject information, etc.) processed by using the characters recorded in the document, data existing in stream data, and data information mined from the stream data.

It should be further noted that the syntax structure of the query statement is not limited in the embodiments of the present application, for example, the following may be adopted for the query statementGrammar structure embodimentThe syntax structure of the MQL statement provided in (1). That is, the query statement may be an MQL statement.

In addition, the information carried by the query statement is not limited in the embodiments of the present application. For ease of understanding and explanation, the following description is made in conjunction with the situation.

In the first case, the query statement carries the first information and the second information. The first information represents information of a data source storing the query target (namely, storage information of data required when the query target is determined); the second information represents the feature identifier (such as name, count, etc.) of the query target. It should be noted that the feature identifier of the query target is not limited in the embodiment of the present application, for example, the feature identifier of the query target may be a name identifier (e.g., an attribute name such as name, count, etc.).

In addition, the first information includes third information and fourth information, the third information characterizes a data type (e.g., table) of the target data set, the fourth information characterizes a storage identifier (e.g., web _ data. Websites) of the target data set in the data source, and the target data set is a data set required for query processing of the query target. Wherein the data source comprises at least one data set, in particular a plurality of data sets of the same data type, and wherein the data types of the data sets stored in different data sources are different. In addition, the plurality of data sources form a data pool, so that the data pool comprises N data sources, and N is a positive integer. For example, the data pool as shown in FIG. 2 includes a first data source and a second data source. The first data source may comprise a relational database (e.g., SQL) as shown in FIG. 3, and is used to store structured data. The second data source may comprise a spectral database and/or a Distributed File System (HDFS) as shown in fig. 3, and is used to store unstructured data.

It should be noted that the data type of the data set is not limited in the embodiments of the present application, for example, the data type of the data set may be structured data (e.g., table data) and unstructured data (e.g., atlas data, stream data, resource Description Framework (RDF) data, and document data).

Based on the first condition, the query statement may carry the data type of the target data set, the storage identifier of the target data set in the data source, and the feature identifier of the query target, so that the query target can be queried subsequently based on the information carried in the query statement.

In the second case, the query statement carries data operation information in addition to the first information and the second information. The data operation information refers to related information of part or all of data operations required to be used when querying a query target. It should be noted that the content of the data operation information is not limited in the embodiments of the present application, and in one possible implementation, the data operation information includes a data operation and/or a constraint condition of the data operation. For example, the data operation information may be "query person. Where "query" is a data operation and "person.

In a third case, the query statement carries fifth information in addition to the above-mentioned part or all of the information. And the fifth information is attribute description information of the query target in the target data set. For example, when the target dataset is a graph and the query targets an entity in the graph, then the fifth information is entity description information.

Based on the above, in the embodiment of the application, when a user (especially a technician) needs to determine a query target from a data pool, the user may input a query statement carrying information required for querying the query target, so that the query target can be determined by performing data query processing from the data pool based on the query statement in the following step.

S2: and analyzing the query statement to obtain first information and second information.

The embodiment of the application does not limit the analysis process, and the analysis process can be any process of extracting the information which is carried in the query statement and is needed when the query statement queries the query target. For example, the parsing process of the query statement may specifically be: the query statement is parsed in grammar and morphology to generate a grammar tree, and a prefix declaration, keywords (SELECT, FROM, and WHERE), an expression, a data source, and a query target are identified FROM the grammar tree.

In some cases, the query statement may be parsed according to information carried by the query statement, and based on this, some possible implementations of S2 are also provided, which are described in turn below.

In a first possible implementation manner, if the query statement carries the first information and the second information, S2 specifically is: and analyzing the query statement to obtain first information and second information. That is, when the query statement carries the first information and the second information, the first information and the second information may be parsed from the query statement only.

In a second possible implementation manner, if the query statement carries the first information, the second information, and the data operation information, S2 specifically is: and analyzing the query statement to obtain first information, second information and data operation information. That is, when the query statement carries the first information, the second information, and the data operation information, the first information, the second information, and the data operation information may be parsed from the query statement only.

In a third possible implementation manner, if the query statement carries the first information, the second information, and the fifth information, S2 specifically is: and analyzing the query statement to obtain first information, second information and fifth information. That is, when the query statement carries the first information, the second information, and the fifth information, the first information, the second information, and the fifth information may be parsed from the query statement only.

In a fourth possible implementation manner, if the query statement carries the first information, the second information, the data operation information, and the fifth information, S2 specifically is: and analyzing the query statement to obtain first information, second information, data operation information and fifth information. That is, when the query statement carries the first information, the second information, the data operation information, and the fifth information, the first information, the second information, the data operation information, and the fifth information may be parsed from the query statement only.

Based on the four possible implementation manners of S2, in the embodiment of the present application, useful information carried in the query statement (that is, information required when the query is performed on the query target) may be correspondingly analyzed, so as to obtain various information used when the query is performed on the query target.

In addition, the embodiments of the present application do not limit the supported programming paradigm types of the query language, such as an object-oriented programming paradigm, a functional programming paradigm, and an SQL-like programming paradigm. The object-oriented programming paradigm refers to that a value to be queried is regarded as an object, the data type and behavior of the object are defined by a class, the class comprises a corresponding data operation method and necessary attributes, and the data query and processing are realized by calling methods in the class when the class is applied. The functional programming paradigm means that an operation or query process is written in a function in the form of an expression, and a function value is a result returned after the expression is instantiated, and can be independently expressed or be embedded in a high-order function for expression. The SQL-like programming paradigm is to input an SQL-like query statement, which contains a data source, keywords and a query body, and can call existing functions in the statement to realize simple operation and also can realize slightly complex data processing by a user-defined method. SQL-like programming is more friendly to structured data queries.

Because the query sentences written by using the different programming normal forms have different characteristics, the query sentences written by using the different programming normal forms should use different analysis methods, so after the query sentences are obtained, the programming normal form types used by the query sentences can be determined firstly, and then the query sentences are analyzed based on the determined programming normal form types. Based on this, the present application example also provides another implementation manner of S2, in this implementation manner, S2 may specifically be: identifying a programming normal form type used by a query statement, and determining the programming normal form type as a target programming normal form type; and analyzing the query statement according to the target programming paradigm type to obtain first information and second information.

In the embodiment of the application, after a query statement input by a user is obtained, a programming normal form type used by the query statement is identified according to statement structural features of the query statement and is used as a target programming normal form type; and analyzing the query statement by using the target programming paradigm type to obtain at least two kinds of information of the first information, the second information, the data operation information and the fifth information. Therefore, when a user can write the query statement by adopting at least one programming paradigm according to personal habits or business requirements, the corresponding query statement can be analyzed according to the programming paradigm type used by the user, and the analysis accuracy of the query statement is improved. It can be seen that the user (especially the technician) only needs to understand one type of programming paradigm to realize the query process of the structured data and the unstructured data, so that the technical threshold of the technician can be effectively reduced.

Based on the related content of S2, in the embodiment of the present application, after the query statement input by the user is obtained, the query statement may be analyzed, so as to obtain various information required when querying the query target.

S3: and determining the query action according to the third information.

The query action refers to a data operation used when a query target is queried from a data pool. In addition, the query action is not limited in the embodiments of the present application, for example, if the query target is related to data in the table (for example, the query target is data existing in the table or data calculated by using data in the table), the query action may include a table query processing action; if the query target is related to data in the graph (e.g., entities and/or relationships in the graph), the query action may include a graph query processing action; if the query target is related to data in the document (such as characters recorded in the document or text information processed by using the characters recorded in the document), the query action may include a document query processing action; the query action may include a streaming data query processing action if the query target is related to data in the streaming data (e.g., data present in the streaming data, or data information mined from the streaming data).

The query action is not limited in the embodiment of the present application, for example, the query action may include at least one calculation function, and the at least one calculation function may include a calculation function in a conventional calculation function set and/or a calculation function in a preset function database. Where conventional computing functions are used to provide arithmetic operators and logical operators to support simple logical operations and mathematical operations, while supporting dictionaries, lists, derivations of collections, and iterative expressions. The preset function database can be a standard library and/or a third-party library, is used for providing mathematical function support to be responsible for complex operation, can call the standard library to realize functions of database access, text processing, image processing, XML processing and the like in the programming process, or can complete scientific calculation by using functions in the third-party library, such as functions of matrix calculation, linear algebra, data modeling, data visualization and the like.

The query action may be determined based on the data type of the target dataset (i.e., the dataset needed for query processing for the query target).

In addition, the present embodiment does not limit the determination manner of the query operation, and will be described with reference to various embodiments of S3.

In a first possible implementation, S3 may specifically be: and determining a query action according to the third information and the first mapping relation. Wherein the first mapping relation is used for recording the query action of the data sets of different data types.

Based on the first possible implementation manner, if the query action of the data sets with different data types is recorded by using the first mapping relationship in advance, after the third information (that is, the data type of the data set required for performing query processing on the query target) is analyzed from the query statement, the query action corresponding to the third information may be determined from the first mapping relationship.

In a second possible implementation manner, when the query statement carries the third information and the data operation information, S3 may specifically be: and generating a query action according to the third information and the data operation information.

The embodiment of the present application does not limit a specific implementation manner of generating the query action based on the third information and the data operation information. In a possible embodiment, S3 may specifically be: determining an initial action according to the third information and the second mapping relation; and generating a query action according to the initial action and the data operation information. Wherein the second mapping relation is used for recording the query action of the data sets of different data types.

Based on the related content of the second possible implementation manner of the S3, when the query statement carries the third information and the data operation information, the query action may be generated based on the third information and the data operation information, so that the determined query action meets the data type of the target data set carried in the query statement and the query requirement specified by the data operation information.

In a third possible implementation manner, when the query statement carries the third information and the fifth information, S3 may specifically be: and generating a query action according to the third information and the fifth information.

Based on the related content of the third possible implementation manner of S3, when the query statement carries the third information and the fifth information, the query action may be generated based on the third information and the fifth information, so that the determined data type of the target data set carried in the query statement and the query requirement specified by the fifth information are met.

In a fourth possible embodiment, S3 may specifically be: determining at least one set of candidate actions according to the third information; and determining a group of candidate actions meeting a preset condition in at least one group of candidate actions as the query action. The preset condition is preset, and the preset condition is not limited in the embodiment of the present application, for example, the preset condition may be a group of actions that takes the shortest time to select.

Based on the related content of the above-mentioned S3 in the fourth possible implementation manner, after determining multiple sets of candidate actions according to the third information, a set of candidate actions meeting the preset condition may be selected from the multiple sets of candidate actions by using a preset condition as a query action, so that the finally determined query action is better.

Based on the above-mentioned related content of S3, in the embodiment of the present application, after the third information is extracted from the query statement, the query action may be determined by using the third information, so that the query of the query target can be performed in the data pool based on the query action in the following.

S4: a target data set is determined from the data pool based on the first information.

In this embodiment of the application, after the first information is obtained, a target data set may be determined from the data pool according to the first information, and specifically, the target data set may be: determining a target data source from the data pool according to the third information; and determining the target data set from the target data source according to the fourth information. As can be seen, in the embodiment of the present application, after the first information is parsed from the query statement, a data source corresponding to the data type in the data pool may be determined as a target data source according to the data type (that is, the third information) of the target data set recorded in the first information; and determining the data set corresponding to the storage identifier in the target data source as the target data set according to the storage identifier of the target data set recorded in the first information in the data source.

It should be noted that the target data set is not limited by the embodiments of the present application, and for example, the target data set may include at least one of a table, a map, stream data, and a document.

It should be noted that the embodiment of the present application does not limit the execution order of S3 and S4. For example, S3 and S4 may be performed sequentially, S4 and S3 may be performed sequentially, and S3 and S4 may be performed simultaneously.

S5: a query objective is determined using the query action and the objective dataset.

In the embodiment of the application, after the query action and the target data set are obtained, data query can be performed from the target data set by using the query action, and a query target is determined.

Based on the relevant contents of S1 to S5, in the data query method provided in the embodiment of the present application, after the query language input by the user is obtained, the query statement is first analyzed to obtain the first information and the second information. The first information is data source information stored with a query target; the first information comprises third information and fourth information; the third information represents the data type of the target data set, the fourth information represents the storage identification of the target data set in the data source, and the target data set is a data set required for query processing of a query target; the second information characterizes a feature identification of the query object. Then, determining a query action according to the first information, and determining a target data set from the data pool according to the first information; the data pool comprises N data sources; the data sources include at least one data set, and the data types of the data sets stored in the different data sources are different. Finally, a query objective is determined using the query action and the objective dataset.

It can be seen that, because the query statement input by the user carries information (for example, a data type of the target data set, a storage identifier of the target data set, and feature identifier information of the query target) required for querying the query target, after the first information and the second information are obtained by parsing the query statement, the query action and the target data set used for querying the query target can be directly determined by using the first information and the second information, and the query target is determined from the target data set by using the query action, so that the purpose of performing data query (as shown in fig. 3) on data sets of different data types based on one query statement input by the user is achieved, the disadvantages of conventional data query are overcome, the data query process is simplified, and the data query efficiency is improved.

In addition, embodiments of the present application further provide a multi-Modal Query Language (MQL) statement (i.e., the above Query statement) that can be applied to the above data Query method, and the following description is combined with the above Query statementGrammar knot Structural exampleThe MQL sentence will be explained.

Example of syntactic Structure

Based on the above, the MQL statement provided in the embodiment of the present application can support various types of programming paradigms, so the embodiment of the present application does not limit the types of the programming paradigms supported by the MQL statement. For convenience of explaining the syntax structure of MQL, an MQL statement using a syntax similar to SQL will be described as an example.

Referring to fig. 4, this figure is a schematic diagram of a syntax structure of an MQL statement provided in the embodiment of the present application.

In a possible implementation manner, as shown in fig. 4, the MQL statement provided in this embodiment of the present application may be a multi-mode fused query statement, and the syntax structure of the MQL statement is similar to the syntax structure of the SQL statement, and is insensitive to case, and may support user update, query and command operations, including both rich function packages and mllibs, and also including user-defined parameters, files and functions.

In addition, as shown in fig. 4, the MQL statement includes a prefix declaration section, a query target information section, a data set storage information section, and a data manipulation information section. To facilitate understanding of the MQL statements, the above parts are described below in connection with table 1, respectively.

The prefix declaration part comprises a type declaration and an attribute declaration; wherein, the type declaration is used for declaring the data type of the target data set carried by the MQL statement. The attribute declaration is used to declare attribute description information that a query target carried by the MQL statement has in a target dataset (e.g., the attribute declaration may be an entity and/or a relationship in the graph). Based on this, when the query statement input by the user is an MQL statement, the third information above and the fifth information above may be parsed from the prefix declaration section of the query statement.

The query target information part is used for pointing out characteristic identification information (such as attribute identification in a table, entity identification in a graph, relation identification in the graph and the like) of a query target carried by the MQL statement. Based on this, when the query statement input by the user is an MQL statement, the above second information can be parsed from the query target information portion of the query statement.

The data set storage information part is used for pointing out the storage identification information of the target data set carried by the MQL statement in the data source. In this way, when the query statement input by the user is an MQL statement, the fourth information above can be parsed from the data set storage information part of the query statement.

The data operation information part is used for pointing out data operation related information carried by the MQL statement, and the data operation information part comprises data operation identification information and data operation constraint information. And the data operation identification information is used for uniquely identifying the data operation. The data operation constraint information refers to constraint condition information to which the data operation should comply. Based on this, when the query statement input by the user is an MQL statement, the above data operation information can be parsed from the data operation information part of the query statement.

It should be noted that the attribute declaration in the prefix declaration section is an optional parameter, that is, there may be no attribute declaration in some MQL statements, and there may be attribute declaration in other MQL statements. Similarly, the data operation information part is an optional part, that is, there may be no data operation information part in some MQL statements, and there may be a data operation information part in other MQL statements.

TABLE 1

In addition, in order to facilitate understanding of the syntax structure of the MQL statement shown in fig. 4, the following description is given by taking the query syntax of data sets of different data types as an example.

(1) MQL statement introduction to structured data (e.g., table data).

The data characteristics of the structured data are: the database for storing structured data (i.e., the above relational database) may contain a plurality of tables, and each table is a data structure in a two-dimensional form, with one row of data representing one entity information in units of rows.

The structure of the query paradigm of MQL statements for structured data is: predefining a data type table in a query statement; database in the query statement represents the database name, table represents the table name, select _ list is the query target, expressions are possible constraints (i.e., data manipulation information). That is, the syntax structure of the MQL statement for structured data is specifically as follows:

data type declaration

PREFIX table

# query statement

SELECT<select_list>FROM<database.tablename>WHERE<expressions>。

The syntax structure of the MQL statement for structured data described above is explained below with reference to specific examples.

For example, when selecting name and count columns from the Websites table of the web data database (i.e., one of the data sources in the data pool) and storing the query result in a result table, the user may enter the following query statement:

PREFIX table

SELECT name,country FROM web_data.Websites。

(2) MQL statement introduction for unstructured data (such as atlas data, streaming data, RDF data, document data, or mixed data).

(1) Atlas data query

The data characteristics of the map data are as follows: there is only one graph in the database for storing the graph data (i.e., the graph database above), each graph being composed of nodes and edges. The nodes comprise variables, attributes and labels of the entities, and the edges represent relationship types, relationship attributes and directions.

The structure of the query paradigm of MQL statements for atlas data is: and predefining a data type graph in the query statement, and then defining a limitation statement of the entity and the relationship attribute as a supplementary definition of data for the FROM statement. That is, the syntax structure of the MQL sentence for the map data is specifically as follows:

data type declaration

PREFIX graph

# Attribute declaration

[PREFIX entity:<expression>

PREFIX relation:<expression>]

# query statement

SELECT<select_list>[FROM entity|relation]WHERE<expressions>。

The syntax structure of the MQL statement for the map data described above is explained below with reference to specific examples.

For example, when there is a data source in a data pool that includes the graph shown in fig. 5, the data pool may be queried for entities in fig. 5 (i.e., nodes in the graph) and relationships between entities (i.e., edges in the graph), and the query contents are as follows:

as an example of node query in the graph, when searching for a node which has a relationship with a movie label and ID is 1 from the graph shown in fig. 5, the user may declare a data type in a query statement, declare a relationship that the node satisfies, and call an ID function in a where clause, so that the user may input the following query statement:

PREFIX graph

PREFIX relation:{(n)—(movie)}

SELECT n FROM relation WHERE id(n)＝1。

as an example of a relational query in a graph, when finding a relation between Tom Hanks and a movie from the graph shown in fig. 5, the user may then enter the following query statement:

PREFIX graph

PREFIX relation:{(person)—[r]->(movie)}

SELECT r,type(r)FROM relation WHERE person.name＝’Tom Hanks’and movie.name＝’Forrest Gump’。

(2) streaming data queries

The data characteristics of the stream data are similar to the relational data, and the stream data refers to real-time data in a rolling time window and can return the calculation result of the stream data at a certain moment. Additionally, the data source of the streaming data may be streaming data or other types of data.

The structure of the query paradigm for MQL statements of streaming data is: the data type stream is predefined in the query statement, and the configuration attribute is defined. In addition, when data query is performed on streaming data, the SELECT statement cannot be used alone, and can only be used together with the Insert statement. That is, the syntax structure of the MQL statement for stream data is specifically as follows:

data type declaration

PREFIX stream

# Attribute declaration

PREFIX properties:<expression>

# query statement

INSERT INTO STREAM streamname(select_list definition)properties

SELECT<select_list>FROM stream|datasource WHERE<expressions>。

The syntax structure of the MQL statement for stream data described above is explained below with reference to specific examples.

For example, when importing data in the relationship data table context _ tb into an undefined stream, the user may enter the following query statement:

PREFIX stream

PREFIXproperties:(topic＝’mqlout’,zookeepers＝’127.0.0.1:2181’,brokers＝’127.0.0.1:9092’)

INSERT INTO STREAM s1(context String,user_id String)propertites

SELECT context,user_id FROM context_tb。

(3) RDF data queries

The data characteristics of RDF data are: the RDF is used for assisting in the query of the dynamic webpage and is stored in a graph data form, the data comprise subject-predicate triple, a subject node, a predicate node and an object node are sequentially connected, and a query object is related among a plurality of RDFs in the query process.

The structure of the query paradigm of the MQL statement for RDF data is: the data type RDF is predefined in the query statement, as well as the RDF data associated with the query. In addition, triples that need to be queried can be defined, if necessary. That is, the syntax structure of the MQL statement for RDF data is specifically as follows:

data type declaration

PREFIX rdf

# Attribute declaration

PREFIX url_name:<url>

[PREFIX tri:<expression>]

# query statement

SELECT<select_list>FROM url_name WHERE<expression>。

The syntax structure of the MQL statement for RDF data is explained below with reference to specific examples.

For example, assume that the data set shown in fig. 6 exists in the data source for storing RDF data in the data pool, and the data set shown in fig. 6 is used to describe RDF data of an apartment and its location. Based on this assumption, when an apartment whose number of rooms is less than 4 needs to be found in the data set shown in fig. 6, the user can input the following query statement:

PREFIX rdf

PREFIX swp:<http://www.semanticwebprimer.org/ontology/apartments.ttl#>

PREFIX dbpedia:<http://www.dbpedia.org/resource/>

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>

PREFIX tri:{(appartment)-[swp:hasNumberOfBedrooms]-(num)}

SELECT apartment FROM tri

WHERE num<4。

(4) document data query

The data characteristics of the document data are as follows: the document data is stored in json form, and a plurality of document data groups are stored in a database (i.e., the above HDFS). It can be seen that a database that is a collection of documents is analogous to a table in a relational database, with each document being analogous to a row of data in a relational database.

The query paradigm for the MQL statement of document data has the structure: predefining a data type doc in a query statement; docset is a storage identifier of a document set (i.e., a data source for storing document data), and docset is equivalent to a table name in relational data, and the rest of the query process is similar to that of the relational data. That is, the syntax structure of the MQL sentence for document data is specifically as follows:

data type declaration

PREFIX doc

# query statement

SELECT<select_list>FROM database.docset WHERE<expression>。

The syntax structure of the MQL sentence for document data described above is explained below with reference to a specific example.

For example, it is assumed that the data source for storing document data shown in fig. 7 exists in the data pool, the storage of the data source for storing document data shown in fig. 7 in the data pool is identified as doc _ set, and two documents are stored in the data source for storing document data shown in fig. 7. Based on this assumption, when it is required to query a case of 5.0 less score in the data source for storing document data shown in fig. 7, the user can input the following query statement:

PREFIX doc

SELECT score FROM doc_set

WHERE score<5.0。

(5) hybrid data query

The data characteristics of the mixed data are as follows: and inquiring a plurality of fields which are targeted to different data sources, and finally storing the fields in a data format of a type relation type.

The structure of the query paradigm of the MQL statement for mixed data is: declaring and defining all data types to which a required data set belongs and necessary attribute declarations in a query statement in the process of querying a query target; and each field in the query statement is defined as a data type. That is, the syntax structure of the MQL statement for the hybrid data is specifically as follows:

PREFIX datatypeA

PREFIX datatypeB

[PREFIX……]

SELECT datatypeA.fieldA,datatypeB.fieldB FROM

datatypeA.database.table,datatypeB.database.table

WHERE<expression>。

the syntax structure of the MQL statement for the hybrid data described above is explained below with reference to a specific example.

For example, it is assumed that the data pool includes a map database and a document database (i.e., the above HDFS), wherein the map database includes a map for recording the relationship between movies and characters, and the document database stores a plurality of document data recorded with movie scores. Based on this assumption, when it is desired to query the score of a movie played by Tom Hanks, the user may enter the following query sentence:

PREFIX doc

PREFIX graph

PREFIX relation:{(person)—[r]->(movie)}

SELECT doc.doc_set.score,graph.relation.r FROM doc.doc_set,graph.relation

WHERE doc.doc_set.name＝graph.relation.movie and graph.person.name＝’Tom Hanks’。

based on the related content of the MQL sentence, the MQL sentence provided by the embodiment of the present application opens a language barrier, and realizes a function of performing data query on data sets of various data types by using one query language, so that a user can efficiently and accurately query various data in a data pool by using the MQL sentence.

Based on the data query method provided by the above method embodiment, the embodiment of the present application further provides a data query device, which is explained and explained with reference to the accompanying drawings.

Device embodiment

Please refer to the above method embodiment for technical details of the data query device provided by the device embodiment.

Referring to fig. 8, the figure is a schematic structural diagram of a data query device according to an embodiment of the present application.

The data query apparatus 800 provided in the embodiment of the present application includes:

an acquisition unit 801 that acquires a query sentence input by a user; the query statement carries information required for querying a query target;

an analyzing unit 802, configured to analyze the query statement to obtain first information and second information; wherein the first information comprises third information and fourth information; the third information represents the data type of a target data set, the fourth information represents the storage identifier of the target data set in a data source, and the target data set is a data set required for query processing of the query target; the second information represents the characteristic identification of the query target;

a first determining unit 803, configured to determine a query action according to the third information, and determine the target data set from a data pool according to the first information; the data pool comprises N data sources, wherein N is a positive integer; the data sources comprise at least one data set, and the data types of the data sets stored in different data sources are different;

a second determining unit 804, configured to determine the query target by using the query action and the target data set.

In one possible implementation, the first determining unit 803 includes:

the first determining subunit is configured to determine the query action according to the third information and the first mapping relationship; wherein the first mapping relation is used for recording query actions of data sets of different data types.

In a possible implementation manner, the parsing unit 802 is specifically configured to parse the query statement to obtain first information, second information, and data operation information;

the first determining subunit is specifically configured to generate a query action according to the third information and the data operation information.

In a possible implementation manner, the first determining subunit is specifically configured to: determining an initial action according to the third information and a second mapping relation; the second mapping relation is used for recording query actions of data sets of different data types; and generating a query action according to the initial action and the data operation information.

In a possible implementation manner, the parsing unit 802 is specifically configured to parse the query statement to obtain first information, second information, and fifth information; wherein the fifth information is attribute description information of the query target in the target data set;

the first determining subunit is specifically configured to generate a query action according to the third information and the fifth information.

In a possible implementation, the first determining unit 803 includes:

a second determining subunit, configured to determine, according to the third information, a target data source from the data pool; and determining a target data set from the target data source according to the fourth information.

In a possible implementation manner, the parsing unit 802 is specifically configured to identify a programming paradigm type used by the query statement, and determine that the programming paradigm type is a target programming paradigm type; and analyzing the query statement according to the target programming paradigm type to obtain first information and second information.

As can be seen from the related contents of the data query apparatus 800 provided above, in the embodiment of the present application, after the query language input by the user is acquired, the query statement is first analyzed to obtain the first information and the second information. The first information is data source information stored with a query target; the first information comprises third information and fourth information; the third information represents the data type of the target data set, the fourth information represents the storage identifier of the target data set in the data source, and the target data set is a data set required for carrying out query processing on a query target; the second information characterizes a feature identification of the query object. Then, determining a query action according to the first information, and determining a target data set from the data pool according to the first information; the data pool comprises N data sources; the data sources include at least one data set, and the data types of the data sets stored in the different data sources are different. Finally, a query objective is determined using the query action and the objective dataset.

Based on the data query method provided by the above method embodiment, the embodiment of the present application further provides a device, which is explained and explained below with reference to the accompanying drawings.

Apparatus embodiment

Please refer to the above method embodiment for the device technical details provided by the device embodiment.

Referring to fig. 9, the drawing is a schematic structural diagram of an apparatus provided in the embodiment of the present application.

The device 900 provided in the embodiment of the present application includes: a processor 901 and a memory 902;

the memory 902 is used for storing computer programs;

the processor 901 is configured to execute any implementation manner of the data query method provided by the above method embodiments according to the computer program. That is, the processor 901 is configured to perform the following steps:

determining the query objective using the query action and the objective dataset.

determining the query action according to the third information and the first mapping relation; wherein the first mapping relation is used for recording query actions of data sets of different data types.

determining an initial action according to the third information and a second mapping relation; wherein the second mapping relation is used for recording query actions of data sets of different data types;

analyzing the query statement to obtain first information, second information and fifth information; wherein the fifth information is attribute description information that the query target has in the target dataset;

The above is related to the apparatus 900 provided in the embodiment of the present application.

Based on the data query method provided by the method embodiment, the embodiment of the application also provides a computer readable storage medium.

Media embodiments

For technical details of a computer-readable storage medium provided in the media embodiment, please refer to the method embodiment.

The embodiment of the present application provides a computer-readable storage medium, which is used for storing a computer program, where the computer program is used for executing any implementation manner of the data query method provided by the above method embodiment. That is, the computer program is for performing the steps of:

determining the query target using the query action and the target dataset.

The above is related to the computer-readable storage medium provided in the embodiments of the present application.

It should be understood that, in this application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b and c may be single or plural.

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make many possible variations and modifications to the disclosed solution, or to modify equivalent embodiments, without departing from the scope of the solution, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are within the scope of the technical solution of the present invention, unless the technical essence of the present invention is not departed from the content of the technical solution of the present invention.

Claims

1. A method for querying data, comprising:

determining the query target using the query action and the target dataset;

the analyzing the query statement to obtain first information and second information specifically comprises:

2. The method according to claim 1, wherein the determining a query action according to the third information specifically comprises:

3. The method according to claim 1, wherein the parsing the query statement to obtain first information and second information specifically comprises:

4. The method according to claim 3, wherein the generating a query action according to the third information and the data operation information specifically comprises:

5. The method according to claim 1, wherein the parsing the query statement to obtain first information and second information specifically comprises:

6. The method according to claim 1, wherein the determining the target data set from a data pool according to the first information comprises:

7. A data query apparatus, comprising:

the acquisition unit acquires an inquiry sentence input by a user; the query statement carries information required for querying a query target;

a second determining unit for determining the query target using the query action and the target data set;

the analysis unit is specifically configured to identify a programming normal form type used by the query statement, and determine the programming normal form type as a target programming normal form type; and analyzing the query statement according to the target programming paradigm type to obtain first information and second information.

8. An apparatus, comprising a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to perform the method of any one of claims 1-6 in accordance with the computer program.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium is used for storing a computer program for performing the method of any of claims 1-6.