CN118193568A

CN118193568A - Data query processing method, device, computer equipment and storage medium

Info

Publication number: CN118193568A
Application number: CN202410277242.9A
Authority: CN
Inventors: 林谊
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2024-03-12
Filing date: 2024-03-12
Publication date: 2024-06-14

Abstract

The application relates to the technical field of big data, and particularly discloses a data query processing method, a data query processing device, computer equipment, a computer readable storage medium and a computer program product. The method comprises the following steps: acquiring a data query request aiming at a database, and extracting query conditions carried in the data query request; determining screening dimensions corresponding to the query conditions and screening ranges aiming at the screening dimensions in the screening conditions; determining a selected index set matching the data query request from a plurality of index sets configured for the database based on the screening dimension; determining a target index alias matched with the screening range in the index interval from a plurality of index aliases contained in the selected index group; wherein, the alias classification dimension of the multiple index aliases in the selected index group is matched with the screening dimension; the destination index alias is used to determine a data query path in response to the data query request. By adopting the method, the data traversal in the query process can be reduced, and the data query efficiency is improved.

Description

Data query processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of big data technology, and in particular, to a data query processing method, apparatus, computer device, computer readable storage medium, and computer program product.

Background

With the rapid development of internet technology, the amount of resources and traffic on the internet is rapidly increasing, and databases with data volumes reaching the billions or even billions are emerging. In the data query processing process for a database with large data volume, determining a reasonable data query path is a key point for improving the data query efficiency.

In the conventional technology, database splitting and table splitting processing are performed on database tables to obtain a plurality of fragment tables, and data query is completed by traversing the fragment tables. However, the essence of the database splitting and table splitting is to divide the database into zero, and if multi-dimensional table splitting processing is performed on the database according to various database splitting and table splitting, a large amount of redundant data can be generated, a large amount of computing resources are occupied, and the improvement of the data query efficiency is not facilitated. If the database is subjected to the table splitting processing according to the database splitting table, under the condition that the data query condition is inconsistent with the database splitting table splitting basis, the data query can be completed only by traversing all the fragment tables, and the problem of low data query efficiency is also caused.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data query processing method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve data query efficiency.

In a first aspect, the present application provides a data query processing method, where the method includes:

acquiring a data query request aiming at a database, and extracting query conditions carried in the data query request;

determining screening dimensions corresponding to the query conditions and screening ranges aiming at the screening dimensions in the screening conditions;

Determining, based on the screening dimension, a selected index set that matches the data query request from a plurality of index sets configured for the database; the alias classification dimension of the plurality of index aliases in the selected index group is matched with the screening dimension;

Determining a target index alias of which the index interval is matched with the screening range from a plurality of index aliases contained in the selected index group; the destination index alias is used for determining a data query path for responding to the data query request.

In one embodiment, the determining, based on the screening dimension, a selected index group matching the data query request from a plurality of index groups configured for the database, comprises:

determining alias classification dimensions corresponding to each of a plurality of index groups configured for the database;

And when each alias classification dimension comprises the screening dimension, determining an index group corresponding to the screening dimension as a selected index group matched with the data query request.

In one embodiment, the method further comprises:

determining selected classification dimensions semantically similar to the screening dimensions from each of the alias classification dimensions if the screening dimensions are not included in each of the alias classification dimensions;

And determining the index group corresponding to the selected classification dimension as the selected index group matched with the data query request.

In one embodiment, the determining, from the multiple index aliases included in the selected index group, a target index alias whose index interval matches the filtering range includes:

Determining an index interval corresponding to each index alias in the selected index group;

From each index alias, determining candidate index aliases with intersections of index intervals and the screening range;

And determining a target index alias matched with the screening range based on each candidate index alias.

In one embodiment, the determining, from each index alias, a candidate index alias where an intersection exists between an index interval and the filtering range includes:

According to the respective interval spans of the index intervals, determining the index aliases with the same corresponding interval spans as the same index hierarchy;

Determining a target level of which the interval span is matched with the screening range from each index level; at least one first type interval contained by the screening range and one second type interval not contained by the screening range exist in a plurality of index intervals corresponding to the target level;

And determining candidate index aliases with intersections of the index interval and the screening range from all index aliases of the target hierarchy.

In one embodiment, the determining, from the index aliases of the target hierarchy, candidate index aliases having intersections between index intervals and the filtering range includes:

Acquiring data effective conditions configured for the database;

Determining index aliases of which the corresponding data information in the target hierarchy meets the data effective conditions as index aliases to be selected;

And determining candidate index aliases with intersections of the index interval and the screening range from the candidate index aliases.

In one embodiment, the method further comprises:

Determining nodes mapped by the index aliases in an index tree based on the index intervals of the index aliases; the index intervals corresponding to the father nodes in the index tree are union sets of the index intervals corresponding to the child nodes connected with the father nodes;

The determining, based on each candidate index alias, a target index alias that matches the filtering scope, including:

searching leaf nodes associated with the candidate index aliases in the index tree;

and determining the index alias corresponding to the leaf node as a target index alias when the intersection exists between the index interval corresponding to the leaf node and the screening range.

In one embodiment, the number of destination index aliases is a plurality; the method further comprises the steps of:

Splicing each target index alias into an alias character string;

and carrying out data query in the database based on the alias character string, and determining a data query result corresponding to the data query request.

In a second aspect, the present application further provides a data query processing apparatus, where the apparatus includes:

the query condition extraction module is used for obtaining a data query request aiming at a database and extracting query conditions carried in the data query request;

The dimension range determining module is used for determining screening dimensions corresponding to the query conditions and screening ranges aiming at the screening dimensions in the screening conditions;

An index group selection module for determining a selected index group matching the data query request from a plurality of index groups configured for the database based on the screening dimension; the alias classification dimension of the plurality of index aliases in the selected index group is matched with the screening dimension;

An index alias selection module, configured to determine, from a plurality of index aliases included in the selected index group, a target index alias in which an index interval matches the filtering range; the destination index alias is used for determining a data query path for responding to the data query request.

In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

In a fifth aspect, the application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above method.

The data query processing method, the data query processing device, the computer equipment, the computer readable storage medium and the computer program product configure a plurality of index groups for the database from different classification dimensions, and do not directly perform database and table splitting processing on the database, so that the separation of indexes and applications can be realized, and the occupation amount of resources is reduced. In the data query process, a selected index group with an alias classification dimension matched with a screening dimension of the data query can be determined from a plurality of index groups according to query conditions in the data query request, and then a target index alias with an index interval matched with the screening range of the data query is determined from a plurality of index aliases contained in the selected index group so as to determine a data query path corresponding to the data query request. By adopting the technical scheme of the application, the configuration of index aliases can be flexibly carried out according to the specific data composition of the database, and the corresponding index aliases are matched according to the query conditions in the data query process, so that the corresponding indexes can be obtained through screening under different data query scenes, thereby being beneficial to reducing the data traversal in the query process and improving the data query efficiency.

Drawings

FIG. 1 is an application environment diagram of a data query processing method in one embodiment;

FIG. 2 is a flow diagram of a data query processing method in one embodiment;

FIG. 3 is a flow diagram of determining a selected index set matching a data query request from a plurality of index sets configured for a database based on screening dimensions in one embodiment;

FIG. 4 is a flow diagram of determining a target index alias for which an index interval matches a filter range from among a plurality of index aliases included in a selected index group, according to one embodiment;

FIG. 5 is a flow diagram of a candidate index alias for determining that an intersection exists between an index interval and a filter range from among index aliases in one embodiment;

FIG. 6 is a schematic diagram of an index tree structure in one embodiment;

FIG. 7 is a flow diagram of determining candidate index aliases for which an intersection exists between an index interval and a filter range from among index aliases of a target hierarchy in one embodiment;

FIG. 8 is a flow chart of a data query processing method in another embodiment;

FIG. 9 is a schematic diagram of a process for initializing data in one embodiment;

FIG. 10 is a schematic diagram of a data query processing process in one embodiment;

FIG. 11 is a diagram of an index determination process in one embodiment.

FIG. 12 is a block diagram of a data query processing apparatus in one embodiment;

Fig. 13 is an internal structural view of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.

The data query processing method provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104. The communication network may be a wired network or a wireless network. Accordingly, the terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication. For example, the terminal 102 may be indirectly connected to the server 104 through a wireless access point, or the terminal 102 may be directly connected to the server 104 through the internet, although the application is not limited in this respect.

The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like. The data storage system may store data that the server 104 needs to process. The data storage system may be provided separately, may be integrated on the server 104, or may be located on a cloud or other server.

It should be noted that, in the data query processing method provided by the embodiment of the present application, the execution subject of each step may be a computer device, where the computer device refers to an electronic device having data computing, processing and storage capabilities. The data query processing method can be realized by data query technical components such as Hbase, elasticsearch in computer equipment.

Taking the implementation of the scenario illustrated in fig. 1 as an example, the computer device may comprise at least one of the terminal 102 or the server 104. That is, the data query processing method may be executed by the terminal 102, the data query processing method may be executed by the server 104, or the terminal 102 and the server 104 may be interactively cooperated. Taking as an example the case where the server 104 performs the data query processing method. A user may send a data query request to the server 104 through the terminal 102, and the server 104 performs the data query processing: acquiring a data query request aiming at a database, and extracting query conditions carried in the data query request; determining screening dimensions corresponding to the query conditions and screening ranges aiming at the screening dimensions in the screening conditions; determining a selected index set matching the data query request from a plurality of index sets configured for the database based on the screening dimension; from among a plurality of index aliases included in the selected index group, a target index alias in which the index section matches the filtering range is determined. Wherein, the alias classification dimension of the multiple index aliases in the selected index group is matched with the screening dimension; the destination index alias is used to determine a data query path in response to the data query request.

In one embodiment, as shown in fig. 2, a data query processing method is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:

Step S202, a data query request aiming at a database is obtained, and query conditions carried in the data query request are extracted.

Herein, a Database (Database) refers to a container storing Data, and is also called a Data Store (Data Store). Databases are capable of storing large amounts of structured and unstructured data, including various types of data, including text, digital, image, audio, and the like. They are one of the most important components in computer systems and are widely used in various application and business fields. The data query request can take an XML message as a carrier, and the data query request can carry service interface information, request parameters and the like. The request parameters may include query conditions, ordering fields, and the like. The query conditions may include, for example, start and end times, start and end agency identifications, start and end customer numbers, and the like.

Specifically, the server may obtain a data query request sent by the terminal, and extract a query condition carried in the data query request according to field information included in the data query request. Further, one data query request may correspond to one or more query conditions. In the case that the data query request corresponds to a query condition, the server may perform subsequent data query processing based on the query condition. Under the condition that the data query request corresponds to a plurality of query conditions, the server can respectively perform data query processing on each query condition to obtain intermediate query results corresponding to the query conditions, and according to the logical connection among the query conditions, the server can solve the intersection or the union of the intermediate query results to determine the final data query result. For example, in the case where the query conditions are "condition a and (condition B or condition C), the server may first combine the intermediate query results of condition B and condition C after obtaining the intermediate query results of condition a, condition B, and condition C, and then cross the obtained union with the intermediate query result of condition a to obtain the data query result.

In the case that the data query request corresponds to a plurality of query conditions, the server may further determine a target query condition strongly related to the ranking field from among the query conditions, and perform subsequent data query processing based on the target query condition. The specific manner of determining the target query condition is not unique, for example, the server may extract the ranking field from the data query request and determine the query condition that is strongly related to the ranking field as the target query condition; the server may set a plurality of keywords in advance, and determine a target query condition including the keywords from among the query conditions. The keyword may be, for example, "time", "amount", or the like.

Step S204, determining screening dimensions corresponding to the query conditions and screening ranges aiming at the screening dimensions in the screening conditions.

The screening dimension corresponding to the query condition refers to a standard for screening data based on the query condition. The screening dimension may include, for example, time, institution, customer number, customer type, and the like. The screening range for the screening dimension refers to the value range of data expected to be obtained through the data in the screening dimension. Such as a time range, institution range, customer number range, customer type range, etc. It will be appreciated that the screening range may be characterized by a collection, which may be a continuous range of values, such as a time range, or a discrete range of values, such as a customer type range.

Specifically, the server may perform semantic analysis on the query condition, and determine a screening dimension corresponding to the query condition and a screening range for the screening dimension in the screening condition. For example, in the case where the query condition includes "1 st/1 st 2004 to 21 st/5 th 2004", the screening dimension corresponding to the query condition may include time, and the screening range for the screening dimension may include "20010101-20040521".

Step S206, based on the screening dimension, a selected index group matching the data query request is determined from a plurality of index groups configured for the database.

Wherein in a relational database, an index is a separate, physical storage structure that orders the values of one or more columns in a database table, which is a collection of one or more columns of values in a table and corresponding logical pointer lists that point to pages of data in the table that physically identify the values. The index function is equivalent to the catalogue of books, and the needed content can be quickly found according to the page numbers in the catalogue. That is, the data information to be queried can be quickly located from the database through the index, and the data query result is obtained.

The index group contains a plurality of index aliases that are categorized according to a particular alias categorization dimension. The alias classification dimensions of the plurality of index aliases in the selected index group match the screening dimensions. The alias classification dimension is a criterion for classifying to obtain a plurality of index aliases. In practice, the database may be index aliased from a number of different aliases classification dimensions. For example, an index alias may be set from the time dimension: setting global aliases, month aliases and year aliases for the month index data; global aliases and year aliases are set for the year index data. For another example, the server may also set an index alias from the organization dimension: general, province, county, branch, etc. For another example, the server may also set the index alias from the dimension of the client number. Further, taking the case of setting index aliases from the time dimension as an example, in order to prevent excessive index fragmentation, different periods can be selected to create indexes according to the size of daily increment data of each service table, and the data are split, for example, daily increment is in the hundred million levels, and the indexes are selected to be created daily; the increment of each day is from one hundred thousand to one million, and the index is created by month; the smaller data volume chooses to create the index year by year.

Specifically, the server may determine an alias classification dimension of each of a plurality of index groups configured in advance for the database, and match a filtering dimension corresponding to the query condition with the respective alias classification dimension, and determine a selected index group in which the alias classification dimension matches the filtering dimension from the index groups. Wherein, the alias classification dimension matches the filtering dimension, which may mean that the two are identical or semantically similar.

Step S208, a target index alias matching the index interval with the filtering range is determined from a plurality of index aliases contained in the selected index group.

The index section corresponding to the index alias refers to a section range of the index alias for the alias classification dimension. For example, the index section corresponding to the month index alias is one month, and the index section corresponding to the year index alias is one year.

Specifically, the server may determine, from among the index aliases, a target index alias in which the index interval matches the filtering range, based on the index interval corresponding to each of the plurality of index aliases included in the selected index group. In a specific embodiment, the union of index intervals corresponding to the finest granularity index alias in the target index alias contains a filtering range, so that the integrity of the data query can be ensured. In the case where the screening range is 1-2 months in 2004, an index alias whose index section is "quarter 1 in 2004" may be determined as the target index alias; a plurality of index aliases having index sections "month 1 2004", "month 2 2004", and "month 3 2004", respectively, may be determined as target index aliases.

Further, the destination index alias is used to determine a data query path in response to the data query request. As described above, the storage location of the data information to be queried can be quickly located through the index, and then the data query result is obtained through condition matching. In practical applications, the server may determine the data fragment mapped by the target index alias, and determine the data query result by traversing the data stored in the data fragment.

According to the data query processing method, a plurality of index groups are configured for the database from different classification dimensions, rather than directly carrying out database and table dividing processing on the database, separation of indexes and applications can be realized, and the occupation amount of resources is reduced. In the data query process, a selected index group with an alias classification dimension matched with a screening dimension of the data query can be determined from a plurality of index groups according to query conditions in the data query request, and then a target index alias with an index interval matched with the screening range of the data query is determined from a plurality of index aliases contained in the selected index group so as to determine a data query path corresponding to the data query request. By adopting the technical scheme of the application, the configuration of index aliases can be flexibly carried out according to the specific data composition of the database, and the corresponding index aliases are matched according to the query conditions in the data query process, so that the corresponding indexes can be obtained through screening under different data query scenes, thereby being beneficial to reducing the data traversal in the query process and improving the data query efficiency.

In one embodiment, the number of destination index aliases is a plurality, and in the case of this embodiment, the data query processing method further includes: splicing each target index alias into an alias character string; and carrying out data query in the database based on the alias character string, and determining a data query result corresponding to the data query request.

Specifically, the server may splice the multiple destination index aliases into an alias string, construct a data query instruction for the database based on the alias string, and determine a data query result corresponding to the data query request according to the returned data. The data query result may include detailed data information returned by query, and may also include statistical information obtained by counting the detailed data information, which is not limited herein.

In a specific embodiment, the traditional single database implementation logic is split into an application layer and an index layer, the application layer configures index aliases on the basis of being responsible for the application logic, and performs data query routing according to the index aliases, and the index layer is responsible for accessing each piece of data, so that mapping coupling between the application logic and the data access logic is realized, but the mapping coupling is not strong. Specifically, the application layer determines the target index alias from the multiple index aliases, splices the target index aliases to obtain an alias character string, and then the index layer accesses the fragment data based on the alias character string to inquire the required data information.

In the above embodiment, each target index alias is spliced into an alias character string, and data query is performed in the database based on the alias character string, so that the data query result corresponding to the data query request is determined, and the accuracy of the query result can be ensured on the basis of improving the data query efficiency.

In one embodiment, as shown in fig. 3, step S206 includes:

Step S302, determining alias classification dimensions corresponding to each of the plurality of index groups configured for the database.

Wherein, the specific limitation of the alias classification dimension is not repeated herein above. Specifically, the server may obtain respective configuration information of each index group, and determine, from a classification field included in the configuration information, an alias classification dimension corresponding to the index group. The server may also determine alias classification dimensions that can be used to distinguish between index aliases by semantically analyzing the index aliases of the same index group.

In step S304, in the case that the individual name classification dimension includes the filtering dimension, the index group corresponding to the filtering dimension is determined as the selected index group matching the data query request.

Specifically, after determining the filtering dimension corresponding to the query condition and the alias classification dimension corresponding to each of the plurality of index groups configured for the database, the server may compare the filtering dimension with the respective alias classification dimension to determine whether each alias classification dimension includes the filtering dimension. In the case that the screening dimension is included in the respective name classification dimension, the index set corresponding to the screening dimension is determined as the selected index set that matches the data query request. For example, when the alias classification dimension includes time, organization, and client number, if the filtering dimension is time, the index group in which each index alias classified by time is located may be determined as the selected index group.

In the above embodiment, when the respective name classification dimension includes the screening dimension, the index group corresponding to the screening dimension is determined as the selected index group matched with the data query request, so that the selected index group is ensured to be strongly correlated with the data query request, and further the working efficiency of the subsequent data query processing based on the selected index group is ensured.

In practical applications, there are also cases where the screening dimension is not included in the respective name classification dimension. In a specific embodiment, please continue with reference to fig. 3, step S206 further includes:

Step S306, determining a selected classification dimension semantically similar to the screening dimension from the respective name classification dimension, in case the screening dimension is not included in the respective name classification dimension.

In particular, in the case where the screening dimension is not included in the respective name classification dimension, the server may perform semantic analysis on the respective name classification dimension and the screening dimension, respectively, to determine respective first semantic features of the respective alias classification dimension, and second semantic features of the screening dimension. And then, the server carries out similarity matching on the first semantic features and each second semantic feature, and determines the alias classification dimension corresponding to the second semantic feature with the highest similarity to the first semantic feature as the selected classification dimension. For example, where the alias classification dimension includes a time and a customer number, and the screening dimension is a customer type, the customer number may be determined to be the selected classification dimension.

Step S308, determining the index group corresponding to the selected classification dimension as the selected index group matched with the data query request.

Specifically, after determining the selected classification dimension, the server may determine the index group corresponding to the selected classification dimension as the selected index group that matches the data query request.

In the above embodiment, when the respective name classification dimension does not include the screening dimension, the index group corresponding to the selected classification dimension similar to the screening dimension semantically is determined as the selected index group matched with the data query request, so that the selected index group is ensured to be strongly correlated with the data query request, and further the working efficiency of the subsequent data query processing based on the selected index group is ensured.

In one embodiment, as shown in fig. 4, step S208 includes:

step S402, determining an index interval corresponding to each index alias in the selected index group.

The index section corresponding to the index alias refers to a section range of the index alias for the alias classification dimension. Specifically, the server may determine, according to the alias classification dimension corresponding to the selected index group and the semantics of each index alias in the selected index group, an index interval corresponding to each index alias in the selected index group. For example, when the alias classification dimension is time, the index section corresponding to the month index alias is one month, and the index section corresponding to the year index alias is one year.

Step S404, a candidate index alias with intersection between the index section and the filtering range is determined from the index aliases.

Specifically, the server may determine whether an intersection exists between each index section and the filtering range according to a value range represented by each index section and the filtering range in a corresponding dimension, and further determine candidate index aliases having intersections between the index section and the filtering range from each index aliases.

Step S406, determining the target index alias matched with the screening range based on each candidate index alias.

Specifically, the server may determine each candidate index alias as a target index alias that matches the screening scope; the candidate index alias having the smallest classification granularity may be determined as the target index alias.

Taking the case of time as the alias classification dimension as an example, in the case where the filtering range is 1-2 months in 2004 and the index alias includes a quaternary index and a month index, the candidate index alias may include a candidate quaternary index alias with an index section of "1 quarter in 2004" and a candidate month index alias with an index section of "1 month in 2004" and "2 months in 2004", respectively. In this case, the server may determine the candidate quaternary index alias as the target index alias, or may determine each candidate month index alias as the target index alias.

In this embodiment, from among the index aliases, a candidate index aliases having an intersection between the index interval and the filtering range is determined, and then, based on the candidate index aliases, a target index aliases matching the filtering range is determined, so that index aliases completely irrelevant to the filtering range can be removed, thereby reducing the index range and improving the working efficiency of data query.

In a specific embodiment, as shown in fig. 5, step S404 includes:

Step S502, according to the interval spans of the index intervals, the index aliases with the same corresponding interval span are determined to be the same index level.

The interval span is used for representing the value span in the index interval. Taking the case where the alias classification dimension is time as an example, as shown in fig. 6, the index sections corresponding to the plurality of month index aliases are different from each other, but each month index alias corresponds to the same section span, that is, one month.

Specifically, the server may determine, as the same index hierarchy, each index alias having the same corresponding section span according to the section spans of each index section. For example, three index levels are included in fig. 6, namely a year index level, a season index level, and a month index level.

Step S504, determining a target level of which the interval span is matched with the screening range from the index levels.

The target level comprises a plurality of index sections corresponding to the target level, wherein at least one first type section contained in a screened range and one second type section not contained in the screened range exist in the plurality of index sections corresponding to the target level.

Specifically, the server may determine, from among the index levels, a target level in which the span of the interval matches the filtering range according to the relationship between the corresponding plurality of index intervals and the filtering range in the same index level. Taking the index names corresponding to 2004 as an example, when the screening range is 1 month to 5 months of 2004, since the screening range does not include one whole year, there is no first type section included in the screened range in the year index hierarchy, and the year index hierarchy cannot be determined as the target hierarchy. And the index section corresponding to the first quarter in the quaternary index level belongs to the first type section contained in the screened range, and the index sections of the second quarter to the fourth quarter belong to the second type section contained in the non-screened range, so that the quaternary index level can be determined as the target level. Likewise, the month index level may also be determined as the target level.

Step S506, determining candidate index aliases with intersections between the index interval and the screening range from the index aliases of the target hierarchy.

Specifically, after determining the target hierarchy, the server may determine, from among the index aliases of the target hierarchy, candidate index aliases for which the index interval intersects with the filtering scope. Taking the index names corresponding to 2004 as an example, when the screening range is 1 month to 5 months of 2004, if the target level is a quaternary index level, the first quarter and the second quarter may be determined as candidate index aliases.

In the above embodiment, each index alias with the same corresponding interval span is determined as the same index level, the target level with the interval span matched with the filtering range is determined from each index level, and then the candidate index alias with the intersection between the index interval and the filtering range is determined from each index alias of the target level, so that the candidate index alias with relatively finer granularity can be determined, thereby reducing the searching range.

In a specific embodiment, as shown in fig. 7, step S506 includes:

step S702, obtaining data effective conditions configured for a database;

Step S704, determining index aliases of which the corresponding data information in the target level meets the data effective conditions as index aliases to be selected;

step S706, determining candidate index aliases with intersections between the index interval and the screening range from the candidate index aliases.

The data valid condition may mean that the generation time of the service data is within a valid time range. The effective time range may be determined according to specific business requirements. Specifically, the server may obtain a data effective condition configured for the database, and after determining the target level, may determine an index alias, corresponding to the data information in the target level, meeting the data effective condition, as a candidate index alias, and further determine, from the candidate index aliases, a candidate index alias in which an intersection exists between the index interval and the filtering range.

Taking the case that the alias classification dimension is time as an example, after determining the target level, the server may determine whether the data information corresponding to each index alias in the target level meets the data valid condition according to the index interval corresponding to each index alias, if so, determine the index alias as a candidate index alias, and further determine, from the candidate index aliases, a candidate index alias in which an intersection exists between the index interval and the filtering range.

In the above embodiment, in the process of determining the target index alias, the index alias corresponding to the expired data is removed based on the data valid condition, so that data traversal can be further reduced, and data query efficiency can be improved.

In one embodiment, the data query processing method further includes: based on the index intervals of the index aliases, the nodes mapped by the index aliases in the index tree are determined. In the case of this embodiment, step S406 includes: searching leaf nodes associated with candidate index aliases in the index tree; and when the intersection exists between the index interval corresponding to the leaf node and the screening range, determining the index alias corresponding to the leaf node as the target index alias.

The index interval corresponding to the father node in the index tree is a union set of index intervals corresponding to a plurality of child nodes connected with the father node. For example, in the case where the first quarter in fig. 6 is a parent node, 1 month, 2 months, and 3 months are a plurality of child nodes corresponding to the parent node. Leaf nodes are nodes in the index tree that do not contain child nodes, i.e., the leaf nodes are at the lowest level of the index tree, representing the finest classification granularity of the index tree. The leaf node associated with the candidate index alias can be the node where the candidate index alias is located, can be the leaf node directly connected with the node where the candidate index alias is located, and can be the leaf node indirectly connected with the node where the candidate index alias is located along the direction away from the root node.

Specifically, the server may map nodes for each index alias based on the index section of each index alias, and determine the node mapped by each index alias in the index tree. After the candidate index aliases are determined, searching leaf nodes associated with the candidate index aliases from the index tree, and determining the index aliases corresponding to the leaf nodes as target index aliases under the condition that intersections exist between index intervals corresponding to the leaf nodes and screening ranges. Taking the index names corresponding to 2004 as an example, if the screening range is 1 month-5 months in 2004, if the first quarter and the second quarter are determined to be candidate index aliases, the leaf nodes associated with the candidate index aliases include the leaf nodes mapped for 1 month to 6 months. And determining 1 month to 5 months as a target index alias because the intersection exists between the index interval corresponding to each 1 month to 5 months and the screening range.

In the above embodiment, each index alias is mapped into the index tree, and the leaf node associated with the candidate index alias is searched from the index tree, and when the intersection exists between the index interval corresponding to the leaf node and the filtering range, the index alias corresponding to the leaf node is determined to be the target index alias, so that the finally determined target index alias is the index alias under the finest granularity, thereby further reducing the data traversal range and improving the data query efficiency.

In one embodiment, as shown in fig. 8, there is provided a data query processing method, which may be performed by a computer device, which may be a terminal or a server shown in fig. 1, taking the computer device as an example a server, in this embodiment, the method includes the following steps:

Step S801, a data query request aiming at a database is obtained, and query conditions carried in the data query request are extracted;

step S802, determining screening dimensions corresponding to the query conditions and screening ranges aiming at the screening dimensions in the screening conditions;

Step S803, determining alias classification dimensions corresponding to each of a plurality of index groups configured for the database;

Step S804, in the case that the screening dimension is included in the respective name classification dimension, determining the index group corresponding to the screening dimension as the selected index group matched with the data query request;

step S805, determining a selected classification dimension semantically similar to the screening dimension from the respective name classification dimensions, in the case that the screening dimension is not included in the respective name classification dimensions;

Step S806, determining the index group corresponding to the selected classification dimension as the selected index group matched with the data query request;

step S807, determining an index interval corresponding to each index alias in the selected index group;

step S808, determining nodes mapped by the index aliases in the index tree according to the index intervals of the index intervals;

The index intervals corresponding to the father nodes in the index tree are union sets of the index intervals corresponding to the child nodes connected with the father nodes; each index alias with the same span of the corresponding interval is in the same index level in the index tree;

Step S809, determining a target level of which the interval span is matched with the screening range from all index levels;

wherein, in a plurality of index intervals corresponding to the target level, at least one first type interval contained in the screened range and one second type interval not contained in the screened range exist;

Step S810, acquiring data effective conditions configured for a database, and determining index aliases of which the corresponding data information in the target level meets the data effective conditions as index aliases to be selected;

Step S811, determining candidate index aliases with intersections between the index interval and the screening range from the candidate index aliases;

step S812, searching leaf nodes associated with candidate index aliases in the index tree;

Step S813, when the intersection exists between the index interval corresponding to the leaf node and the screening range, determining the index alias corresponding to the leaf node as the target index alias;

step S814, splicing the target index aliases into an aliases character string; and carrying out data query in the database based on the alias character string, and determining a data query result corresponding to the data query request.

The data query processing method of the present application will be described in detail below taking the case where the alias classification dimension includes time as an example. In particular, in a data query process of a large data volume, a large volume of data needs to be searched and filtered quickly, and a response time is required to be very short. For trillion-level mass data, index scanning needs to be reduced as much as possible, useless data scanning is avoided, resource consumption is reduced, and query efficiency is improved.

The existing query mechanism and the query method mainly have the following two modes:

First, database and table dividing design is carried out on database tables according to data quantity, and indexes are built on the basis of the tables to carry out quick query. However, the database tables are divided into separate tables, and the essential is that the large tables are all zero, but the management is complex, and the maintenance cost is high.

Second, the indexes are pre-ordered when data is inserted, rather than re-ordering the indexes at query time, which improves the performance of the range query and ordering operations. However, pre-ordering will increase the cost of data analysis engine writing, and turning on index pre-ordering will result in write performance degradation.

Therefore, the traditional method has the problem of low data query work efficiency. Based on the method, the query method of the configurable query route is realized by combining database capacity and application optimization when the large data volume is queried, and the problem that the query data needs to be scanned by too many indexes is solved, so that the rapid and efficient query is realized.

Specifically, the traditional single database implementation logic is split into an application layer and an index layer, the application layer configures an index alias on the basis of being responsible for the application logic, and performs data query routing according to the index alias, and the index layer is responsible for accessing each piece of data, so that mapping coupling between the application logic and the data access logic is realized, but the mapping coupling is not strong. Specifically, the application layer determines the target index alias from the multiple index aliases, splices the target index aliases to obtain an alias character string, and then the index layer accesses the fragment data based on the alias character string to inquire the required data information.

Further, the database may be index aliased from a variety of different aliases classification dimensions. For example, an index alias may be set from the time dimension: setting global aliases, month aliases and year aliases for the month index data; global aliases and year aliases are set for the year index data. For another example, the server may also set an index alias from the organization dimension: general, province, county, branch, etc. For another example, the server may also set the index alias from the dimension of the client number. Further, taking the case of setting index aliases from the time dimension as an example, to prevent excessive index fragmentation, different periods can be selected to create the index and split the data according to the size of daily incremental data of each service table. If the daily increment is in the hundred million levels, selecting to create indexes by day; the increment of each day is from one hundred thousand to one million, and the index is created by month; the smaller data volume chooses to create the index year by year.

In one embodiment, the data initialization process is as shown in FIG. 9. Specifically, the server may select a period for creating the index according to the daily gain data amount and the data age. For example, in the case of ten thousand levels of daily gain data, the index may be established by year; under the condition that the daily gain data is millions, the index can be built according to months; the daily gain data can be indexed by year in the case of tens or hundreds of millions. After the index is established, the data is loaded, and after the data is loaded, a corresponding index alias is set, so that the initialization of the data is completed, and the data query is carried out later.

In one embodiment, the data query process is as shown in FIG. 10. Specifically, the external request takes an XML message format as a carrier, wherein the request message contains a service interface ID, request parameters and the like; the request parameters contain a query condition, which may include a start time and an end time, an ordering field. Then, the server gives a service method, analyzes the XML message, acquires an index routing table which is predefined and needs to be queried according to the service interface ID, and acquires information of an index name and an index alias. And then the server rapidly maps the corresponding fragments according to the aliases, invokes an index time interval screening query method, acquires an index name array to be queried, and splices the arrays into character strings by commas. And finally, the server takes the obtained index alias character string into a database for inquiring, and obtains return data.

The algorithm logic corresponding to the index determination process can be packaged into a general purpose for calling by the steps in the flow. Specifically, as shown in fig. 11, whether the index name string contains complete year and month can be judged successively according to the starting time and the ending time in the query condition, the index year alias and month alias arrays are obtained, and the index name string is spliced according to comma separation.

By adopting the scheme of the application, through index alias design and combining the mapping comparison of aliases and index fields, the routing of configurable fields is realized, the time interval is rapidly inquired, the corresponding indexes are screened, and the data traversal is reduced, so that the rapid inquiry is realized.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a data query processing device for realizing the above related data query processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the data query processing device or devices provided below may refer to the limitation of the data query processing method hereinabove, and will not be described herein.

In one embodiment, as shown in fig. 12, there is provided a data query processing apparatus, including: a query condition extraction module 1201, a dimension range determination module 1202, an index group selection module 1203, and an index alias selection module 1204, wherein:

The query condition extraction module 1201 is configured to obtain a data query request for a database, and extract a query condition carried in the data query request;

a dimension range determining module 1202, configured to determine a screening dimension corresponding to the query condition, and a screening range for the screening dimension in the screening condition;

an index group selection module 1203 configured to determine, based on the screening dimension, a selected index group matching the data query request from a plurality of index groups configured for the database; selecting alias classification dimensions of a plurality of index aliases in the index group to be matched with the screening dimensions;

an index alias selection module 1204, configured to determine, from among a plurality of index aliases included in the selected index group, a target index alias whose index interval matches the filtering range; the destination index alias is used to determine a data query path in response to the data query request.

In one embodiment, the index group selection module 1203 is specifically configured to:

determining alias classification dimensions corresponding to each of a plurality of index groups configured for a database;

In the case that the screening dimension is included in the respective name classification dimension, the index set corresponding to the screening dimension is determined as the selected index set that matches the data query request.

In one embodiment, the index group selection module 1203 is further configured to: determining selected classification dimensions semantically similar to the screening dimensions from the respective name classification dimensions, if the screening dimensions are not included in the respective name classification dimensions; and determining the index group corresponding to the selected classification dimension as the selected index group matched with the data query request.

In one embodiment, the index alias selection module 1204 includes: an index interval determining unit, configured to determine an index interval corresponding to each index alias in the selected index group; a candidate index alias determination unit for determining a candidate index alias having an intersection between the index section and the filtering range from among the index aliases; and the target index alias determination unit is used for determining a target index alias matched with the screening range based on each candidate index alias.

In one embodiment, the candidate index alias determination unit includes: the index level dividing component is used for determining each index alias with the same corresponding interval span as the same index level according to the interval spans of each index interval; a target level determining component for determining a target level in which the span of the interval matches the screening range from among the index levels; at least one first type interval contained in the screened range and one second type interval contained in the non-screened range exist in a plurality of index intervals corresponding to the target level; and the candidate index alias determination component is used for determining candidate index aliases with intersection between the index interval and the screening range from all index aliases of the target hierarchy.

In one embodiment, the candidate index alias determination component is specifically configured to: acquiring data effective conditions configured for a database; determining index aliases of which the corresponding data information in the target level meets the data effective conditions as index aliases to be selected; and determining candidate index aliases with intersections of the index interval and the screening range from the candidate index aliases.

In one embodiment, the data query processing apparatus further includes: the index tree mapping module is used for determining nodes mapped by each index alias in the index tree based on the index interval of each index alias; the index interval corresponding to the father node in the index tree is the union of the index intervals corresponding to the child nodes connected with the father node. In the case of this embodiment, the target index alias determination unit is specifically configured to: searching leaf nodes associated with candidate index aliases in the index tree; and when the intersection exists between the index interval corresponding to the leaf node and the screening range, determining the index alias corresponding to the leaf node as the target index alias.

In one embodiment, the number of destination index aliases is multiple. In the case of this embodiment, the data query processing apparatus further includes a data query module for: splicing each target index alias into an alias character string; and carrying out data query in the database based on the alias character string, and determining a data query result corresponding to the data query request.

The respective modules in the above-described data query processing apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 13. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store data information and index information. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data query processing method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 13 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of implementing the data query processing method described above when the computer program is executed.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the data query processing method described above.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the data query processing method described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of data query processing, the method comprising:

2. The method of claim 1, wherein the determining a selected index set that matches the data query request from a plurality of index sets configured for the database based on the screening dimension comprises:

3. The method according to claim 2, wherein the method further comprises:

4. The method of claim 1, wherein said determining a target index alias for which an index interval matches the filter range from among a plurality of index aliases contained in the selected index group comprises:

5. The method of claim 4, wherein said determining, from each of said index aliases, candidate index aliases for which an index interval intersects said filtering range comprises:

6. The method of claim 5, wherein said determining candidate index aliases from among index aliases of the target hierarchy for which index intervals intersect the filtering scope comprises:

Acquiring data effective conditions configured for the database;

7. The method according to claim 4, wherein the method further comprises:

8. The method of any one of claims 1 to 7, wherein the number of destination index aliases is a plurality; the method further comprises the steps of:

Splicing each target index alias into an alias character string;

9. A data query processing apparatus, the apparatus comprising:

10. The apparatus of claim 9, wherein the index group selection module is specifically configured to:

11. The apparatus of claim 10, wherein the index set selection module is further configured to:

12. The apparatus of claim 9, wherein the index alias selection module comprises:

an index interval determining unit, configured to determine an index interval corresponding to each index alias in the selected index group;

A candidate index alias determination unit configured to determine, from among the index aliases, a candidate index alias in which an intersection exists between an index section and the filtering range;

and the target index alias determining unit is used for determining target index aliases matched with the screening range based on the candidate index aliases.

13. The apparatus of claim 12, wherein the candidate index alias determination unit comprises:

the index level dividing component is used for determining each index alias with the same corresponding interval span as the same index level according to the interval span of each index interval;

A target hierarchy determining component for determining a target hierarchy from each of the index hierarchies for which an interval span matches the screening range; at least one first type interval contained by the screening range and one second type interval not contained by the screening range exist in a plurality of index intervals corresponding to the target level;

and the candidate index alias determination component is used for determining candidate index aliases with intersections of index intervals and the screening range from all index aliases of the target hierarchy.

14. The apparatus of claim 13, wherein the candidate index alias determination component is specifically configured to:

Acquiring data effective conditions configured for the database;

15. The apparatus of claim 12, wherein the apparatus further comprises:

the index tree mapping module is used for determining nodes mapped by the index aliases in the index tree based on the index intervals of the index aliases; the index intervals corresponding to the father nodes in the index tree are union sets of the index intervals corresponding to the child nodes connected with the father nodes;

the target index alias determination unit is specifically configured to:

16. The apparatus of any one of claims 9 to 15, wherein the number of destination index aliases is a plurality; the device also comprises a data query module for:

Splicing each target index alias into an alias character string;

17. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

18. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.

19. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.