CN118132609A

CN118132609A - Streaming data service quick response method, computer program product and electronic equipment

Info

Publication number: CN118132609A
Application number: CN202410535104.6A
Authority: CN
Inventors: 张垚
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-04-30
Filing date: 2024-04-30
Publication date: 2024-06-04

Abstract

The application provides a quick response method for stream data service, a computer program product and electronic equipment, and relates to the technical field of computers, wherein the method comprises the following steps: receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; acquiring target cache data corresponding to a target cache policy from the cache data, screening out needed cache data from the target cache data, and calculating to obtain a data calculation result corresponding to a target query request; and sending the data calculation result to the target equipment. The application aims to provide a quick response method, a computer program product and electronic equipment for stream data service, which are used for improving the response speed of the stream data service by classifying data query requests of similar queries and executing corresponding cache strategies for each class according to classification results.

Description

Streaming data service quick response method, computer program product and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a fast response method for streaming data service, a computer program product, and an electronic device.

Background

Streaming data services are a technique for processing and analyzing real-time data streams. It allows an organization to receive data from multiple data sources continuously, in real time, and process the data to make decisions or responses quickly.

In the related art, the streaming data service may allow a service caller to return a real-time data stream calculated result when a user invokes an interface using a previously configured query rule, for example, a structured query language (Structured Query Language, SQL). However, the flow computing system in the related art generally creates a task for each query, loads data for calculation, and has a slow response.

Based on this, there is a strong need for a fast response method for streaming data services to improve the response speed of the streaming data services.

Disclosure of Invention

The application aims to provide a quick response method, a computer program product and electronic equipment for stream data service, which are used for improving the response speed of the stream data service by classifying data query requests of similar queries and executing corresponding cache strategies for each class according to classification results.

The application provides a quick response method for stream data service, comprising the following steps:

Receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; acquiring target cache data corresponding to the target cache policy from the cache data, and screening out needed cache data from the target cache data for calculation to obtain a data calculation result corresponding to the target query request; transmitting the data calculation result to the target equipment; wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation.

Optionally, before obtaining the target cache data corresponding to the target cache policy in the cache data, the method further includes: acquiring all data query requests registered in the data service, and classifying the data query requests based on a request field of each data query request to obtain at least one group of similar queries; wherein each set of homogeneous queries includes a plurality of data query requests.

Optionally, the classifying the data query request based on the request field of each data query request to obtain at least one group of similar queries includes: dividing a plurality of data query requests with the same main table, the same slave table and the same conditional statement of the joint query into a group of similar queries; the master table is a data table indicated by a from field in a data query request, the slave table is a data table indicated by a join field in the data query request, and the conditional statement of the join query is an on condition in the data query request.

Optionally, before obtaining the target cache data corresponding to the target cache policy in the cache data, the method further includes: performing multi-stage classification based on an aggregation function, a preset field and a window in stream data calculation of each data query request in the plurality of data query requests to obtain a multi-stage classification result; and determining a caching strategy of the data query request corresponding to each level based on each level indicated by the multi-level classification result, and caching data based on the caching strategy of the data query request corresponding to each level to obtain the cached data.

Optionally, the preset field includes: the first preset field and the second preset field; the multi-stage classification is performed based on the aggregation function, the preset field and the window in the stream data calculation of each data query request in the plurality of data query requests, so as to obtain a multi-stage classification result, including: dividing a first level based on the aggregation function, and dividing the data query requests with the same aggregation function in the plurality of data query requests into the same category; dividing a second level based on the classification result of the first level and the first preset field, and dividing the data query requests with the same filtering condition corresponding to the first preset field in the plurality of data query requests into the same category; dividing a third level based on the classification result of the second level and the second preset field, and dividing the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category; dividing a fourth level based on the classification result of the third level and windows in stream data calculation, and dividing the data query requests with the same windows in the plurality of data query requests into the same category; the first preset field is a Group by field; the second preset field is a Where field.

Optionally, the dividing the data query requests with the same aggregation function in the plurality of data query requests into the same category based on the first level division of the aggregation function includes: dividing the data query requests with the same aggregation function in the plurality of data query requests into first sub-levels of the first hierarchy, wherein the aggregation function is the data query request of a first type; and/or dividing the data query requests with the same aggregation function in the plurality of data query requests into second sub-levels of the first level, wherein the aggregation function is the data query request of the second type; wherein the aggregation function of the first type is: the calculation result of the whole data can be calculated by the calculation result of each data in the data; the aggregation function of the second type is: the calculation result of the entirety of the plurality of data cannot be calculated by the calculation result of each of the plurality of data alone.

Optionally, the determining, based on each level indicated by the multi-level classification result, a caching policy of the data query request corresponding to each level includes: caching the overall calculation result of each data corresponding to each data query request aiming at the data query request contained in the first sub-level of the first level; and/or, for the data query requests contained in the second sub-level of the first level, respectively caching the calculation results of each data in the respective data corresponding to each data query request.

Optionally, the classifying the data query requests with the same filtering condition corresponding to the first preset field in the plurality of data query requests into the same category based on the classification result of the first hierarchy and the first preset field for performing the second hierarchy classification includes: dividing the data query request of which the clause of the first preset field in the first data query request meets a first preset condition into a first sub-level of the second level; and/or dividing the data query request of which the clause of the first preset field does not meet the first preset condition in the first data query request into a second sub-level of the second level; the first data query request is: the data query requests which belong to the same category in the first hierarchy and have query frequencies smaller than a preset frequency threshold value in the plurality of data query requests; the first preset condition is as follows: and the clauses of the first preset field have inclusion relations.

Optionally, the determining, based on each level indicated by the multi-level classification result, a caching policy of the data query request corresponding to each level includes: caching aggregated data queried by a data query request meeting a second preset condition in the first data query request aiming at the data query request contained in a first sub-level of the second level; and/or, for the data query requests contained in the second sub-level of the second level, caching the data queried by each data query request in the first data query requests; wherein the second preset condition includes: and the clause of the first preset field contains the most content.

Optionally, the dividing the third level based on the classification result of the second level and the second preset field divides the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category, including: dividing a data query request meeting a third preset condition in the second data query request into a first sub-level of the third level; and/or dividing the data query request meeting a fourth preset condition in the second data query request into a second sub-level of the third level; and/or dividing the data query request meeting a fifth preset condition in the second data query request into a third sub-level of the third level; and/or dividing the data query request meeting a sixth preset condition in the second data query request into a fourth sub-level of the third level; and/or dividing the data query request meeting a seventh preset condition in the second data query request into a fifth sub-level of the third level; and/or dividing the data query request meeting the eighth preset condition in the second data query request into a sixth sub-level of the third level; and/or dividing the data query request meeting the ninth preset condition in the second data query request into a seventh sub-level of the third level; wherein the second data query request is: data query requests belonging to the same category in the second hierarchy among the plurality of data query requests; the third preset condition includes: the first preset sub-condition is not satisfied and the second preset sub-condition is not satisfied; the fourth preset condition includes: the first preset sub-condition is not satisfied, the second preset sub-condition is satisfied, the third preset sub-condition is not satisfied, and the fourth preset sub-condition is not satisfied; the fifth preset condition includes: the first preset sub-condition is not satisfied, the second preset sub-condition is satisfied, the third preset sub-condition is not satisfied, and the fourth preset sub-condition is satisfied; the sixth preset condition includes: the first preset sub-condition is met, and a fifth preset sub-condition is not met; the seventh preset condition includes: the first preset sub-condition, the fifth preset sub-condition and the second preset sub-condition are met or the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition and the sixth preset sub-condition are met or the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition, the sixth preset sub-condition and the seventh preset sub-condition are met; the eighth preset condition includes: the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition, the sixth preset sub-condition, the seventh preset sub-condition and the eighth preset sub-condition are met; the ninth preset condition includes: the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition, the sixth preset sub-condition, the seventh preset sub-condition and the eighth preset sub-condition are met; the first preset sub-condition is: the field judgment condition corresponding to the second preset field is a dynamic condition; the dynamic conditions are: judging conditions which are not preset; the second preset sub-condition is: the field judgment condition comprises a logic operation; the third preset sub-condition is: the query frequency is greater than or equal to a preset frequency threshold; the fourth preset sub-condition is: the field judgment condition comprises a non-resolvable condition field; the non-resolvable condition fields are: and an operation condition field; the fifth preset sub-condition is: the field judgment condition is a field of dimension type; the fields of the dimension type are: the number of the lines of the slave table is smaller than the field judgment condition of the number of the lines of the master table of the data query statement, or the number of the non-repeated values of the field corresponding to the field judgment condition in the master table is smaller than the field judgment condition of the number of the lines of the master table; the sixth preset sub-condition is: the field judgment condition comprises a plurality of logic operations; the seventh preset sub-condition is: the field judgment condition comprises the condition that the number of a plurality of logic operations exceeds a preset logic operation number threshold; the eighth preset sub-condition is: there is historical result data cache data of the same data query request as the plurality of logical operations contained in the field judgment condition.

Optionally, the determining, based on each level indicated by the multi-level classification result, a caching policy of the data query request corresponding to each level includes: caching aggregated data of all data query requests for the data query requests contained in the first sub-level of the third level; and/or, for the data query requests contained in the second sub-level of the third level, caching the aggregate data of all the data query requests; and/or, for the data query request contained in the third sub-level of the third level, caching the aggregate data of each logical operation; and/or, for the data query requests contained in the fourth sub-level of the third level, invoking a computing engine to query the data queried by each data query request; and/or, aiming at the data query request contained in the fifth sub-level of the third level, querying the data corresponding to the data query request of each dimension type field; and/or, for the data query requests contained in the sixth sub-level of the third level, invoking a computing engine to query the data queried by each data query request and caching the queried data; and/or, for the data query requests contained in the seventh sub-level of the third level, invoking a calculation engine to query the data queried by each data query request, merging the queried data to be merged, and caching the merged data with the historical result data cache data.

Optionally, the dividing the third level based on the classification result of the second level and the second preset field divides the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category, including: dividing the data query request meeting the tenth preset condition in the third data query request into a first sub-stage of the fourth level; and/or dividing the data query request meeting the eleventh preset condition in the third data query request into a second sub-level of the fourth level; and/or dividing the data query request meeting the twelfth preset condition in the third data query request into a third sub-level of the fourth level; and/or dividing the data query request meeting the thirteenth preset condition in the third data query request into a fourth sub-level of the fourth level; wherein the third data query request is: data query requests belonging to the same category in the third hierarchy among the plurality of data query requests; the tenth preset condition includes: the ninth preset sub-condition is not satisfied and the tenth preset sub-condition is satisfied; the eleventh preset condition includes: the ninth preset sub-condition is satisfied and the eleventh preset sub-condition is not satisfied; the twelfth preset condition includes: the ninth preset sub-condition, the eleventh preset sub-condition and the twelfth preset sub-condition are met; the thirteenth preset condition includes: satisfying the ninth preset sub-condition, satisfying the eleventh preset sub-condition, and not satisfying the twelfth preset sub-condition; the ninth preset sub-condition is: window conditions exist in field judgment conditions corresponding to the third preset field; the tenth preset sub-condition is: the query frequency is less than a preset frequency threshold; the eleventh preset sub-condition is: the window condition is a session window; the session window is used for dividing the window of the streaming data based on the session interval; the twelfth preset sub-condition is: whether the interval of the incremental data exceeds the session interval; the incremental data is the data which is added in the current query compared with the last query.

Optionally, the determining, based on each level indicated by the multi-level classification result, a caching policy of the data query request corresponding to each level includes: aiming at the data query request contained in the first sub-level of the fourth level, merging the increment data obtained during each query with the historical result data during the last query, caching, and taking the merged aggregate data as the historical result data during the next query; and/or, for the data query requests contained in the second sub-level of the fourth level, obtaining a cache granularity based on the calculation granularity of the windows of the plurality of data query requests, and caching the aggregated data based on the cache granularity; the calculated granularity is used for indicating the moving distance of the window; the cache granularity is calculated based on integer time unit distances of a plurality of calculation granularities; and/or dividing the incremental data according to the session interval aiming at the data query request contained in the third sub-level of the fourth level to obtain a plurality of pieces of data, and calculating the aggregate data of each piece of divided data; combining the result of the first section of data in the plurality of sections of data with the historical result data, and caching the result of the last section of data to be used as the historical result data of the next calculation; and/or, for the data query request contained in the fourth sub-level of the fourth level, aggregating and calculating the data corresponding to the window belonging to the historical result data, and caching the aggregated data obtained by calculation.

Optionally, the determining, based on each level indicated by the multi-level classification result, a caching policy of the data query request corresponding to each level includes: for a first data query request with the query frequency greater than or equal to a preset frequency threshold value in the second hierarchy, caching all aggregate data queried by the first data query request; and/or, for a second data query request in the third hierarchy, which does not satisfy the first preset sub-condition, satisfies the second preset sub-condition, and has a query frequency greater than or equal to a preset frequency threshold, caching all aggregated data queried by the second data query request; and/or, aiming at a third data query request with the query frequency greater than or equal to a preset frequency threshold in the fourth level, merging the query result of the third data query request with the historical result data when each data query is performed, and taking the merged result as the historical result data when the next query is performed; wherein, the first preset sub-condition is: the field judgment condition corresponding to the second preset field is a dynamic condition; the dynamic conditions are: judging conditions which are not preset; the second preset sub-condition is: the field judgment condition includes a logical operation.

Optionally, the preset field further includes: a third preset field; the data caching is performed based on the caching strategy of the data query request corresponding to each level, so as to obtain the cached data, which comprises the following steps: and caching data indicated by the union of fields in the clauses of the third preset field of each data query request in the plurality of data query requests.

Optionally, the determining, based on each level indicated by the multi-level classification result, a caching policy of the data query request corresponding to each level includes: and judging whether the cache data can be multiplexed by subsequent computation based on the window, and discarding the cache data which cannot be multiplexed by subsequent computation.

Optionally, after the data is cached based on the caching policy of the data query request corresponding to each level to obtain the cached data, the method further includes: judging whether the data queried by the data query request is data in the user subscription period or not, and discarding the cache data which does not belong to the user subscription period.

The application also provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of a streaming data service fast response method as described in any of the above.

The application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the streaming data service quick response method as described in any of the above when executing the program.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the streaming data service fast response method as described in any of the above.

The application provides a quick response method of stream data service, a computer program product and electronic equipment, which comprises the steps of firstly, receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; then, target cache data corresponding to the target cache policy is obtained from the cache data, and needed cache data is screened from the target cache data to calculate, so that a data calculation result corresponding to the target query request is obtained; finally, sending the data calculation result to the target equipment; wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation. Therefore, the response speed of the streaming data service is improved by classifying the data query requests of the same type of query and executing corresponding caching strategies for each class according to the classification result.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of the structure of a streaming data service provided by the present application;

fig. 2 is a flow chart of a fast response method for streaming data service provided by the present application;

FIG. 3 is a schematic flow chart of generating a caching strategy based on a where condition provided by the application;

FIG. 4 is a schematic flow chart of generating a caching strategy based on window conditions provided by the application;

Fig. 5 is a schematic structural diagram of an electronic device provided by the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The data aggregation technology in the related technology can well aggregate and operate data in advance according to the data dimension of the query, store the result, greatly multiplex the operation result and reduce the operation amount. However, the following problems exist: the data dimension, the aggregation operation and the like need to be well defined in advance and manually maintained. This results in a defined data dimension and data aggregation for an aggregation operation that can only be applied to cache acceleration scenarios for such aggregation operations in that dimension. In addition, the common query form of the service personnel is required to be manually analyzed, corresponding establishment and aggregation are performed, and the manual maintenance has high technical requirements. The cache granularity is too coarse to limit applicable acceleration scenes, and too fine to cause storage space waste and secondary aggregation calculation to influence efficiency.

Aiming at the technical problems in the related art, the embodiment of the application provides a quick response method for stream data service, which accelerates the calculation process and reduces the response time of an interface by maintaining the data result after aggregation, thereby improving the response speed of the stream data service.

The following description is made with respect to terms related to embodiments of the present application:

similar query: the main tables indicated by the from field in SQL are the same, the auxiliary tables indicated by the join field are the same, and the conditional statement on condition of the join query is the same. The cache aggregate results between queries of the same kind have the possibility of multiplexing.

Similar queries of the reusable cache data: a group of queries belonging to the same class of queries may not necessarily all have the ability to multiplex the cached results of other queries. After analysis, some queries which can use the cached results of other queries in the same group of similar queries are called similar queries of the reusable cached data.

Frequent queries: the user invokes more frequent queries. Defined by a threshold configuration (e.g., frequency 1/min or more). Frequent queries may employ different caching policies.

Stream processing system: the stream processing system is a process for infinite data. Its data source may be considered as continuously generating data, and may be considered as a data table without a head and a tail. The stream processing system performs a series of transformation or aggregation operations on these continuously arriving data. The real-time performance of stream processing requires fast calculation response and real-time data coming continuously are brought into calculation, and compared with batch calculation, the change response of the data is timely.

Water line (watermark): the raw data in the stream computation may be out of order for some reasons, and entering the stream computation system may cause problems such as inaccurate computation. The water line is proposed to cope with the problem of disorder. The water line is a time stamp and is automatically generated along with the currently received data (the generation rule is custom). Only the data before the water line is considered to be already in-line will the computation be triggered. For example, the rule of the water line generation is that the time stamp of the currently received data minus 5 seconds, then the subsequent data may be allowed to be as late as 5 seconds at most.

A window: a window is a time-range-based constraint on source data in stream computation. Because the data stream is characterized by no head and no tail, the stream-based aggregation calculation must adopt a certain means to intercept the data in the stream to form a section of head and tail data set, so that the aggregation calculation can be performed. This way of intercepting in time range is called a window.

Session window: is a rule window of the previous window in a specific business scenario. The rule is that if the time interval between two adjacent data exceeds the session interval time in a series of stream data, the latter data belongs to a new session window. One typical applicable scenario is to query the user for a number of consecutive attempts to login within 3 minutes.

Data service: popular is data query customization service. The stream processing engine supports generic queries, but is more complex to use. The data service is a customized query, and the caller can quickly acquire specific data without knowing the details of use.

The query rules configured in the streaming data service are determined, and the data related to the same query can have repeated or logical relations. Even different queries in a streaming data service may have logical relationships between them. This feature enables data cache optimization for such scenarios.

Streaming data services are different from traditional data analysis in that query statements performed by users are unpredictable and any query statement can be performed. The SQL contained by the streaming data service is predefined well in advance. For the predefined logic, the invention realizes automatic analysis of the calculation logic and conversion into a cache strategy matched with the calculation logic. The invention can comprehensively analyze the SQL and make a caching strategy. The policy enables as much cache aggregate data as possible to be multiplexed between these SQL queries.

Flow computation and batch computation also differ greatly. The batch analysis data may consider that the raw data to be analyzed does not change over a period of time. But the flow calculation analysis is different. In stream computation analysis, the data to be analyzed is generated in real time. I.e., for example, 2 identical queries are performed in a short period of time, the returned results are likely to be different (the data to be analyzed has changed). However, in these 2 calculations, there may be duplicate portions of the original data to be calculated, and the user's query may be predictable (i.e., configured in advance) for the data service. The data aggregation cache in this scenario can serve up the use of the field.

The purposes of the aggregate cache include: 1. previous settlement results can be reused, and even cached results among different but similar queries can be reused; 2. because the user's query logic is predictable, the aggregate result can be calculated in advance.

As shown in fig. 1, the streaming data service in the embodiment of the present application includes the following modules:

1. and a data loading module. And the calculation engine inquires a task and is used for calling the calculation engine to acquire a result into the cache according to the cache policy requirement.

The data loading module is a resident stream calculation task, and avoids extra time consumption caused by frequent starting and stopping. The module is responsible for starting calculation periodically or according to a triggering rule according to the requirement of a caching strategy, reading streaming original data which reaches the system at the current moment, filtering and polymerizing according to the requirement condition of the caching strategy to generate an aggregation result, and then delivering the aggregation result to the caching storage module for persistent storage.

2. And caching the storage module. The cached aggregate value is stored and a specific query service is associated. The cache storage module is responsible for persisting the result calculated by the data loading module.

3. And a cache policy module. SQL features registered in the data service are analyzed, and a caching strategy is generated and stored.

The caching strategy module is responsible for analyzing all queries SQL registered in the data service by the administrator and generating caching strategies capable of multiplexing caching results as much as possible. The policy then controls the data loading module to load data periodically/automatically as needed and aggregate operations. And finally, saving the generated strategy.

4. And a query execution module. When a user calls a certain data query SQL, the result in the cache is loaded according to the corresponding cache strategy to generate a final query result and the final query result is returned.

The query execution module intercepts the query execution process of a user, reads the cache policy corresponding to the query, reads the corresponding cache data according to the cache policy, and if a part needing incremental calculation exists (determined according to the cache policy). These data calculations recall the calculation engine to generate results. And finally, merging the cached and read results or the calculated added results to obtain a final query result and returning the final query result to the query caller.

The fast response method for stream data service provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 2, a method for fast response of streaming data service according to an embodiment of the present application may include the following steps 201 to 203:

step 201, receiving a target query request called by a target device, and determining a target cache policy corresponding to the target query request.

The target device is, for example, a device that invokes a target query request, and after receiving the target query request invoked by the target device, the query execution module shown in fig. 1 intercepts a query process of the target query request and reads a target cache policy corresponding to the target query request.

Step 202, obtaining target cache data corresponding to the target cache policy from the cache data, and screening out needed cache data from the target cache data to calculate, thereby obtaining a data calculation result corresponding to the target query request.

Wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation.

After the target cache policy corresponding to the target query request is obtained, the corresponding target cache data can be read according to the target cache policy, and calculation is performed based on the target cache data, so as to obtain a data calculation result required by the target query request. If the part needing increment calculation exists, the data calculation is called to generate a result by a calculation engine, and finally, the result read from the cache data is combined with the result data calculated by the increment calculation to obtain a final query result, namely the data calculation result.

The cache data is obtained based on a cache policy formulated based on classification results of a plurality of data query requests, the plurality of data query requests are similar queries, and the target query request is one of the plurality of data query requests.

And 203, transmitting the data calculation result to the target equipment.

Illustratively, after the final query result (i.e., the data calculation result) is obtained, the query result may be returned to the query caller, i.e., the target device.

It should be noted that, the fast response method for stream data service in the embodiment of the present application can function in the following three situations:

The method comprises the following steps that 1, an administrator edits data services, and data query SQL is added and modified or deleted; 2, caching the aggregated data along with the continuous arrival of the stream data; and 3, calling a certain data service by the user, and executing the corresponding SQL. These 3 cases are described below, respectively.

For case 1: because the caching strategy is generated based on SQL analysis, after an administrator changes SQL, the caching strategy needs to be regenerated by using a caching strategy module. Causing the caching policy to change. The newly modified SQL points to the changed caching policy. Whether the original SQL adopts the new caching strategy generated after the new SQL addition or not is requested to see the section of the caching strategy module. If SQL is changed, the query execution module uses a new caching strategy to load the corresponding cache of SQL.

For case 2: unlike conventional batch calculations, the data to be analyzed in stream calculations is generated in real-time. In the step, according to the caching strategy generated in the case 1, the data loading module periodically performs aggregation operation on the incoming stream data, and caches the aggregation result. The cache result is stored in a lasting way through the cache storage module.

For case 3: in this case, the query execution module intercepts the user's query and does not directly submit the query to the stream computing system for processing. The corresponding cache strategy of the query is found first, and the stored cache aggregate data is read according to the cache strategy generated in the case 1, or returned to the user after the secondary calculation processing of the previous cache aggregate data is utilized.

The rapid response method for stream data service provided by the embodiment of the application has the advantages that the strategy for generating the reusable data aggregation cache in the scene of data service (the query mode is predictable). And generating a corresponding caching strategy by analyzing the elements of the query service SQL. The time consumption of the execution of the current query SQL can be reduced by the cache generated according to the strategy, and the generated cache can be multiplexed to other query SQL as much as possible, so that the utilization rate of the cache is improved, and the storage occupation of the cache is reduced.

According to the quick response method for the stream data service, provided by the embodiment of the application, SQL (structured query language) is analyzed to make a cache plan when the query service is configured, the resident data loading module is maintained to load incremental data at regular time/according to the policy requirement, and the user is intercepted to query and use the cache data to generate a calculation result.

Optionally, in the embodiment of the present application, the SQL element of the data query request of the user can be analyzed, and the cached data that can be multiplexed as much as possible is generated for all the data query requests in the service.

Illustratively, before the step 202, the method for fast responding to streaming data services provided in the embodiment of the present application may further include the following step 204:

Step 204, all data query requests registered in the data service are acquired, and the data query requests are classified based on the request field of each data query request, so as to obtain at least one group of similar queries.

Wherein each set of homogeneous queries includes a plurality of data query requests.

Illustratively, since the cache policy is formulated for queries within the same group (i.e., queries of the same type) in the embodiments of the present application, all data query requests registered by an administrator into a data service need to be classified first before the cache policy is formulated.

Specifically, the step 204 may further include the following step 204a:

step 204a, dividing a plurality of data query requests with the same master table, the same slave table and the same conditional statement of the join query into a group of similar queries;

The master table is a data table indicated by a from field in a data query request, the slave table is a data table indicated by a join field in the data query request, and the conditional statement of the join query is an on condition in the data query request.

Illustratively, data query statements in the data query request having the same master table indicated by the from field, the same slave table indicated by the join field, and the same conditional statement on condition of the join query are determined as a set of homogeneous queries. The cache aggregate results between queries of the same kind have the possibility of multiplexing.

For example, after classifying all the data query sentences, multi-stage classification can be performed for a plurality of data query sentences in the same type of query, and different caching strategies can be formulated for different types of data query sentences according to classification results.

Illustratively, before the step 202, the method for fast responding to streaming data services provided by the embodiment of the present application may further include the following steps 205 and 206:

step 205, performing multi-level classification based on the aggregation function of each data query request in the plurality of data query requests, a preset field and a window in stream data calculation to obtain a multi-level classification result.

Illustratively, in the embodiment of the present application, all data query requests in the same group may be split one by one, and reclassified according to the following hierarchical order: condition 1, aggregation function; condition 2, group by field; condition 3, while field; condition 4, window. The 4 levels are classified according to the condition 1, the same condition 1 and then the condition 2, and so on.

Illustratively, after multi-level classification of the same class of queries, each sub-class may be potentially multiplexed into the cached results for other queries in the same class.

And 206, determining a caching strategy of the data query request corresponding to each level based on each level indicated by the multi-level classification result, and caching data based on the caching strategy of the data query request corresponding to each level to obtain the cached data.

For example, after classifying a plurality of data query requests belonging to the same type of query in multiple stages, a corresponding caching policy may be formulated for each sub-stage of each hierarchy, and data caching may be performed based on the formulated caching policy, so as to obtain the cached data.

Illustratively, the preset fields include: the first preset field and the second preset field; the first preset field is a Group by field; the second preset field is a Where field.

Specifically, the step 205 may further include the following steps 205a, 205b, 205c, and 205d:

Step 205a, dividing the first hierarchy based on the aggregation function, and dividing the data query requests with the same aggregation function in the plurality of data query requests into the same category.

Step 205b, dividing the second hierarchy based on the classification result of the first hierarchy and the first preset field, and dividing the data query requests with the same filtering condition corresponding to the first preset field in the plurality of data query requests into the same category.

Step 205c, dividing the third hierarchy based on the classification result of the second hierarchy and the second preset field, and dividing the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category.

And 205d, dividing the data query requests with the same window in the plurality of data query requests into the same category based on the classification result of the third level and the window in the stream data calculation to perform fourth level division.

Illustratively, a first hierarchy of partitioning is first performed on a plurality of data query requests in the same class of queries, then a second hierarchy of partitioning is performed on each sub-level of the first hierarchy, and so on, for a total of four hierarchies of partitioning of the plurality of query requests. Each sub-level of each hierarchy has a corresponding cache policy.

Specifically, the step of grading based on the aggregation function in the step 205a may further include at least one of the following steps 205a1 and 205a 2:

In step 205a1, in the data query requests with the same aggregation function, the data query requests with the aggregation function being of the first type are divided into a first sub-level of the first hierarchy.

Step 205a2, dividing the data query requests with the same aggregation function in the plurality of data query requests into a second sub-level of the first hierarchy for the data query requests with the aggregation function of the second type.

Wherein the aggregation function of the first type is: the calculation result of the whole data can be calculated by the calculation result of each data in the data; the aggregation function of the second type is: the calculation result of the entirety of the plurality of data cannot be calculated by the calculation result of each of the plurality of data alone.

Illustratively, the first sub-level of the first hierarchy and the second sub-level of the first hierarchy correspond to different cache policies, respectively.

Specifically, the step of formulating the cache policy for each sub-level of the first hierarchy in the step 206 may further include at least one of the following steps 206a1 and 206a2 based on the step 205a1 and the step 205a 2:

Step 206a1, for the data query requests included in the first sub-level of the first hierarchy, caching the overall calculation results of the respective data corresponding to each data query request.

Step 206a2, for the data query requests included in the second sub-level of the first level, buffering the calculation results of each data in each data corresponding to each data query request.

Illustratively, for the step of ranking based on the aggregate function, the cache policy module does not necessarily cache the results of the aggregation directly, and the aggregate data that the aggregate function needs to cache is different. For example, the minimum value aggregation function, the minimum value of the whole two pieces of data can be calculated through the minimum values of the two pieces of data respectively, so that the minimum value result can be directly cached. However, the average value of the two pieces of data of the average value cannot be calculated by the average value of the two pieces of data respectively, so that the average value cannot be cached, and the combinable calculated intermediate value needs to be cached according to the characteristics of the aggregation function formula. For example, the average number may buffer the sample sum and the sample number. Such an average value of two pieces of data can be found by summing the samples of the two pieces of data and dividing the sum by the number of samples of the two pieces of data. The sum of squares of samples, the sum of samples, and the number of samples may be buffered. The other aggregation functions are the same.

Specifically, the step of grading based on the first preset field in the step 205b may further include at least one of the following steps 205b1 and 205b 2:

Step 205b1, dividing the data query request in which the clause of the first preset field in the first data query request satisfies the first preset condition into a first sub-level of the second hierarchy.

Step 205b2, dividing the data query request in which the clause of the first preset field in the first data query request does not satisfy the first preset condition into a second sub-level of the second hierarchy.

The first data query request is: the data query requests which belong to the same category in the first hierarchy and have query frequencies smaller than a preset frequency threshold value in the plurality of data query requests; the first preset condition is as follows: and the clauses of the first preset field have inclusion relations.

Illustratively, the second hierarchy level may be divided into two sub-levels, with different bytes corresponding to different caching policies, depending on whether there is an inclusive relationship between the clauses of the first preset field.

Specifically, based on the step 205b1 and the step 205b2, the step of formulating the cache policy for each sub-level of the second hierarchy in the step 206 may include at least one of the following steps 206b1 and 206b 2:

Step 206b1, for the data query request included in the first sub-level of the second level, caching the aggregate data queried by the data query request meeting the second preset condition in the first data query request.

Step 206b2, for the data query requests included in the second sub-level of the second hierarchy, caching the data queried by each data query request in the first data query requests.

Wherein the second preset condition includes: and the clause of the first preset field contains the most content.

Illustratively, the fields in the Group by clause are dimension fields. If there are multiple groups of queries in the same group with a query having a containing relationship (e.g., group by a, b, and group by a), the aggregate data in the case where the group by field is the most is cached. In this case, the group by a can be obtained by querying the caches of all group by a and b and traversing all b condition cache results under each a condition to be combined. If the group by post fields in the same group of queries are different, the content of the queries is cached separately (e.g., group by a, b and group by c, the aggregate data of group by a, b and the aggregate data of group by c are cached separately).

Specifically, the step 206 includes the following step 206b3:

Step 206b3, for a first data query request with a query frequency greater than or equal to a preset frequency threshold in the second hierarchy, caching all aggregate data queried by the first data query request.

Illustratively, the group by condition also has an optimization strategy for frequent queries, which does not decompose the group by condition for frequent queries. Therefore, a final result can be generated without using secondary aggregation operation of aggregation data with finer dimensionality, and the time consumption of calculation is reduced.

Specifically, the step of grading based on the second preset field in the step 205b may further include at least one of the following steps 205c1 to 205c 7:

step 205c1, dividing the data query request satisfying the third preset condition in the second data query request into the first sub-level of the third hierarchy.

Step 205c2, dividing the data query request satisfying the fourth preset condition in the second data query request into a second sub-level of the third level.

And step 205c3, dividing the data query request meeting the fifth preset condition in the second data query request into a third sub-level of the third level.

Step 205c4, dividing the data query request satisfying the sixth preset condition in the second data query request into a fourth sub-level of the third level.

Step 205c5, dividing the data query request satisfying the seventh preset condition in the second data query request into a fifth sub-level of the third level.

Step 205c6, dividing the data query request satisfying the eighth preset condition in the second data query request into a sixth sub-level of the third level.

Step 205c7, dividing the data query request satisfying the ninth preset condition in the second data query request into a seventh sub-level of the third level.

Wherein the second data query request is: data query requests belonging to the same category in the second hierarchy among the plurality of data query requests; the third preset condition includes: the first preset sub-condition is not satisfied and the second preset sub-condition is not satisfied; the fourth preset condition includes: the first preset sub-condition is not satisfied, the second preset sub-condition is satisfied, the third preset sub-condition is not satisfied, and the fourth preset sub-condition is not satisfied; the fifth preset condition includes: the first preset sub-condition is not satisfied, the second preset sub-condition is satisfied, the third preset sub-condition is not satisfied, and the fourth preset sub-condition is satisfied; the sixth preset condition includes: the first preset sub-condition is met, and a fifth preset sub-condition is not met; the seventh preset condition includes: the first preset sub-condition, the fifth preset sub-condition and the second preset sub-condition are met or the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition and the sixth preset sub-condition are met or the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition, the sixth preset sub-condition and the seventh preset sub-condition are met; the eighth preset condition includes: the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition, the sixth preset sub-condition, the seventh preset sub-condition and the eighth preset sub-condition are met; the ninth preset condition includes: the first preset sub-condition, the fifth preset sub-condition, the second preset sub-condition, the sixth preset sub-condition, the seventh preset sub-condition and the eighth preset sub-condition are met; the first preset sub-condition is: the field judgment condition corresponding to the second preset field is a dynamic condition; the dynamic conditions are: judging conditions which are not preset; the second preset sub-condition is: the field judgment condition comprises a logic operation; the third preset sub-condition is: the query frequency is greater than or equal to a preset frequency threshold; the fourth preset sub-condition is: the field judgment condition comprises a non-resolvable condition field; the non-resolvable condition fields are: and an operation condition field; the fifth preset sub-condition is: the field judgment condition is a field of dimension type; the fields of the dimension type are: the number of the lines of the slave table is smaller than the field judgment condition of the number of the lines of the master table of the data query statement, or the number of the non-repeated values of the field corresponding to the field judgment condition in the master table is smaller than the field judgment condition of the number of the lines of the master table; the sixth preset sub-condition is: the field judgment condition comprises a plurality of logic operations; the seventh preset sub-condition is: the field judgment condition comprises the condition that the number of a plurality of logic operations exceeds a preset logic operation number threshold; the eighth preset sub-condition is: there is historical result data cache data of the same data query request as the plurality of logical operations contained in the field judgment condition.

Illustratively, as shown in FIG. 3, a flow diagram of generating a caching policy based on a where condition is illustrated. Wherein 8 preset sub-conditions (including conditions 1 to 9, corresponding to first preset sub-conditions to ninth preset sub-conditions) are included, and the third level can be divided into 7 sub-levels (including sub-levels 1 to 7, corresponding to first sub-level to seventh sub-level of the third level) based on the 9 preset sub-conditions.

Specifically, the step of formulating the cache policy for each sub-level of the third level in the step 206 may further include at least one of the following steps 206c1 to 206c7, based on the steps 205c1 to 205c 7:

206c1, for the data query requests contained in the first sub-level of the third level, caching the aggregate data of all the data query requests.

206C2, for the data query requests contained in the second sub-level of the third level, caching the aggregate data of all the data query requests.

206C3, buffering the aggregate data of each logical operation for the data query request contained in the third sub-level of the third hierarchy.

206C4, for the data query requests contained in the fourth sub-level of the third level, invoking a computing engine to query the data queried by each data query request.

206C5, for the data query request included in the fifth sub-level of the third level, query the data corresponding to the data query request for each field of the dimension type.

206C6, for the data query requests contained in the sixth sub-level of the third level, invoking a calculation engine to query the data queried by each data query request and caching the queried data.

206C7, for the data query requests contained in the seventh sub-level of the third level, invoking a calculation engine to query the data queried by each data query request, merging the queried data to be merged, and caching the merged data with the historical result data cache data.

Illustratively, the while filtering condition part is known based on the generation policy flow diagram shown in fig. 3. The Where condition may be divided into an equivalent condition (e.g., WHERE FIELD _a=x) and a non-equivalent condition (e.g., WHERE FIELD _a < x). These conditional filtered statements are cached directly for both types of queries. If the where filtering exists or the logical operation needs to decompose the or operation, all or conditions appearing in the set of queries are cached separately, with the finest granularity of sub-conditions appearing in the user query. For example, WHERE FIELD _a=x or field_a=y or field_a=z and WHERE FIELD _a=x or field_a=z appear in the same group of queries, and what is needed to calculate the aggregate result of the two conditions of field_a=x or field_a=z and field_a=y. Thus, the condition field_a=x or field_a=y or field_a=z can be obtained by combining the results of the first two operations. For the condition that more similar queries exist in the same group, the number of the cache aggregate data can be reduced, and the number of times of cache reconstruction is reduced. If the user joins a finer granularity query condition (e.g., WHERE FIELD _a=z), three filter conditions of data in field_a=x, field_a=y, and field_a=z need to be cached. Considering that the user recreates a query, e.g., WHERE FIELD _a=y or field_a=z, the filtering condition may be derived by combining field_a=y and field_a=z, and the introduction of the query may even be without reconstructing the cached data. The or logic operation of the non-equivalent query is the same as that of the equivalent query. It is not possible for the and condition to merge the cache aggregate results by a logical operation. The and condition needs to be cached as a whole (the and condition is not resolvable, or the condition is resolvable). For the case that a complex condition exists in the and sub-condition, for example, in the case that A and (B or C) has finer granularity in the same group (for example, A and C in the same group), A and (B or C) can be equivalently disassembled into (A and B) or (A and C) through a logic operation formula, and the results of A and B and A and C are buffered. Not logical operations are the same.

Illustratively, based on the generation policy flow diagram shown in fig. 3, it can be seen that the question mark for the where dynamic condition (e.g., WHERE FIELD _a=. The caching policy at this time needs to determine whether field_a is a dimension type. Dimension types are characterized by discrete and finite values. Whether a field is a dimension field can be configured explicitly by a user or can be detected automatically.

As an example, based on the generating policy flow chart shown in fig. 3, the automatic detection is to count the number of rows of the field corresponding table (table corresponding to the join field) and the number of rows of the main table (table corresponding to the clause of the from field), and if the number of rows of the field corresponding table is far smaller than the number of the main table or the number of rows of the field corresponding table is within the range (standard configurable), the field is regarded as the dimension type. Or the field table corresponding to field_a cannot be found (without a join field), the field (how many non-duplicate values there are in the main table) can be compared count distinct with the number of main table rows, and if the number of non-duplicate values in the field in the main table is far smaller than the number of main table rows (standard configurable), this field is regarded as the dimension type.

Illustratively, as can be seen from the schematic diagram of the generation policy flow shown in fig. 3, for the dynamic conditions of the dimension type field, the results under the filtering conditions of the respective dimension values need to be cached. Therefore, the cache quantity is controllable (the value quantity of the dimension field is limited) and does not occupy too much storage space, and the response is quick (the results of various dimension filtering conditions are cached, and the cache is taken out and returned). For non-equivalent dynamic conditions, the above-described caching scheme is also applicable if the condition field can be considered a dimension field (field type is a comparable type). If the dynamic condition field is not a dimension field, the acceleration cannot be achieved through caching, and the calculation engine query result needs to be directly called.

Illustratively, based on the generation policy flow diagram shown in FIG. 3, if multiple dimension field logical operations occur in a dynamic condition, whether or not frequent queries require the logical operations to be broken down in the manner described above. If the break down occurs to the last occurrence of a similar WHERE FIELD _a=. If the number of combinations exceeds the upper configuration limit, the cache is abandoned to call the query engine, the result after the call is cached, and the same parameters can be reused when the call occurs again. And if the number of the combinations does not exceed the upper configuration limit, respectively caching each combination condition.

Illustratively, the caching policy module needs to optimize for frequent queries for the where condition. Frequent query calls are high in frequency, and secondary calculation processes for caching the aggregated results are required to be reduced as much as possible. The sphere condition which is frequently inquired does not make any decomposition, and the whole is directly used as a filtering condition to cache the aggregated data.

Specifically, the step 206 may further include the following step 206c8:

step 206c8, for a second data query request in the third hierarchy, where the first preset sub-condition is not satisfied, the second preset sub-condition is satisfied, and the query frequency is greater than or equal to the preset frequency threshold, caching all aggregated data queried by the second data query request.

Illustratively, as shown in the generating policy flowchart of fig. 3, the step 206c8 is the division of the eighth sub-level of the third hierarchy in fig. 3, and may be based on the caching policy of the sub-level 8 obtained by the judgment of the condition 3.

Specifically, the step of performing the fourth-level sub-division based on the window in the step 205 may include at least one of the following steps 205d1 to 205d 4:

Step 205d1, dividing the data query request satisfying the tenth preset condition in the third data query request into the first sub-level of the fourth hierarchy.

And step 205d2, dividing the data query request meeting the eleventh preset condition in the third data query request into a second sub-level of the fourth level.

And step 205d3, dividing the data query request meeting the twelfth preset condition in the third data query request into a third sub-level of the fourth level.

And step 205d4, dividing the data query request meeting the thirteenth preset condition in the third data query request into a fourth sub-level of the fourth level.

Wherein the third data query request is: data query requests belonging to the same category in the third hierarchy among the plurality of data query requests; the tenth preset condition includes: the ninth preset sub-condition is not satisfied and the tenth preset sub-condition is satisfied; the eleventh preset condition includes: the ninth preset sub-condition is satisfied and the eleventh preset sub-condition is not satisfied; the twelfth preset condition includes: the ninth preset sub-condition, the eleventh preset sub-condition and the twelfth preset sub-condition are met; the thirteenth preset condition includes: satisfying the ninth preset sub-condition, satisfying the eleventh preset sub-condition, and not satisfying the twelfth preset sub-condition; the ninth preset sub-condition is: window conditions exist in field judgment conditions corresponding to the third preset field; the tenth preset sub-condition is: the query frequency is less than a preset frequency threshold; the eleventh preset sub-condition is: the window condition is a session window; the session window is used for dividing the window of the streaming data based on the session interval; the twelfth preset sub-condition is: whether the interval of the incremental data exceeds the session interval; the incremental data is the data which is added in the current query compared with the last query.

Illustratively, the cache policy module may divide the fourth hierarchy into at most four sub-levels based on the window.

Specifically, the step of specifying the cache policy for each sub-level of the fourth hierarchy in the step 206 may further include at least one of the following steps 206d1 to 206d4, based on the steps 205d1 to 205d4 described above:

Step 206d1, for the data query request included in the first sub-level of the fourth level, merging the incremental data obtained during each query with the historical result data during the previous query, and then caching, and taking the merged aggregated data as the historical result data during the next query.

Step 206d2, for the data query requests included in the second sub-level of the fourth level, obtaining a cache granularity based on the calculation granularity of the windows of the plurality of data query requests, and caching the aggregate data based on the cache granularity; the calculated granularity is used for indicating the moving distance of the window; the cache granularity is calculated based on integer time unit distances of a plurality of calculation granularities.

206D3, dividing the incremental data according to the session interval to obtain a plurality of pieces of data according to the data query request contained in the third sub-level of the fourth level, and calculating the aggregate data of each divided piece of data; and merging the result of the first section of data in the plurality of sections of data with the historical result data, and caching the result of the last section of data as the historical result data of the next calculation.

Step 206d4, for the data query request included in the fourth sub-level of the fourth level, performing aggregate calculation on the data corresponding to the window belonging to the history result data, and caching the calculated aggregate data.

An exemplary flowchart of generating a cache policy based on window conditions is shown in fig. 4, wherein 4 preset sub-conditions (including condition 1 to condition 4, corresponding to the ninth preset sub-condition to the twelfth preset sub-condition) are involved, and 4 sub-levels (including sub-level 1 to sub-level 4, corresponding to the first sub-level of the fourth hierarchy to the fourth sub-level of the fourth hierarchy) are obtained based on the judgment result of the 4 preset sub-conditions.

Illustratively, it can be seen from the flow chart of generating the caching policy based on the window condition shown in fig. 4 that the essence of the window condition is to split the stream data according to time. The user query may or may not have window conditions. The caching strategy in these two cases is quite different. Historical data needs to be considered for queries without window conditions. The need to divide the processing into two cases, frequent query and infrequent query. When the frequent inquiry is executed each time, the cached aggregate result (called as a history result) of the last inquiry is combined with the aggregate result (called as an increment result) of the increment data between the last inquiry and the current inquiry and then returned, and the calculated result is cached for use in the next inquiry. The data volume to be calculated in each inquiry is reduced to an increment result, so that the calculation time consumption is reduced.

For example, as shown in the flow chart of generating the caching policy based on the window condition in fig. 4, similar to the above caching measure, for infrequent queries, a maximum interval time may be specified to avoid excessively long time consumption of the query to calculate the increment result, and when the maximum interval time arrives, the calculation increment is automatically triggered and then combined with the historical result for caching. So that the amount of data calculated per increment does not exceed the maximum interval time range. Queries for windowed conditions may be considered as a where condition filtered by a timestamp field. The aggregate result of the window data is most easily multiplexed, where a special caching strategy is devised. Firstly, the same group of queries with windows is required, and the initial offset of the windows is aligned (the offset is not aligned, which is equivalent to different initial starting lines, and the results cannot be multiplexed), so that the principle is that the minimum granularity window aggregate data is cached. A concept called window computation granularity is presented herein. The calculated granularity of the tumbling window (which can be considered as a sliding window having a sliding distance equal to the window time span) takes the span of the tumbling window. The window length of the sliding window (which has a time span per se, slides backwards for a period of time after each calculation and there may be overlap of the data of the two participating calculations) is calculated as the sliding distance if it is an integer multiple of the sliding distance, otherwise it is an integer time unit distance (for example, window span 30 seconds, sliding distance 20 seconds, calculation granularity is 10 seconds). After calculating the calculation granularity of the windows, taking the integer time unit distance of the calculation granularity of the windows as the cache granularity (for example, the calculation granularity is 40s and 20s respectively, and the cache granularity is 10 s). Here, the probability of reconstructing the cache can be reduced without taking the greatest common divisor (for example, the calculation granularity is 40s and 20s respectively, if the cache granularity takes the greatest common divisor 20s, a query with the window calculation granularity of 50s is created again, the cache cannot be reused and needs to be reconstructed, and generally, the number of window caches is more, and the reconstruction cost is higher). After the strategy is used for caching, window calculation can be obtained after secondary merging of the cached fine-granularity window aggregation data, so that the return speed of the result is increased. The stream calculation also has a special session window (the window length is not fixed, the session window has a configuration item session interval, the time difference between two adjacent data exceeds the session interval, which means that the two adjacent data are the demarcation points of the session window, and the second data belong to another session window), and the window length and the demarcation points are not fixed, so that the buffer result can not be multiplexed. Can translate into frequent query processing without window. The difference is that whether the increment data interval exceeds the conversation interval is checked when calculation is needed, if not, a history plus increment mode (like the frequent query processing under the condition without window) is adopted; if there are data disconnection packets with intervals exceeding the session interval, respectively, the earliest group needs to consider the historical data of the last calculation (belonging to the last session window), and the historical result data of the previous group is not considered when the later group is calculated until the last packet is calculated, and the calculation result of the packet is saved (the earlier data in the next increment data possibly belongs to the session window where the packet is located).

Specifically, the step 206 may further include the following step 206d5:

Step 206d5, for the third data query request with the query frequency greater than or equal to the preset frequency threshold in the fourth hierarchy, merging the query result of the third data query request with the historical result data each time when the data query is performed, and taking the merged result as the historical result data when the next query is performed.

Illustratively, as shown in fig. 4, step 206d5 is a caching policy corresponding to the sub-level 5 obtained by the determination based on the condition 2.

Illustratively, in an embodiment of the present application, the union of fields in all Select clauses is cached for the Select portion of the data query request contained at each level described above. Thus, the query can screen the field really needed by itself by projecting the cached result, and one cache can be multiplexed by a plurality of SQL.

Illustratively, the preset field further includes: a third preset field; the third preset field is a Select field.

Specifically, the step 206 may further include the following step 206e:

Step 206e, caching data indicated by the union of fields in the clauses of the third preset field of each data query request in the plurality of data query requests.

Therefore, through classifying and multi-level division of SQL sentences corresponding to registered data query requests, corresponding caching strategies can be formulated for different sub-levels of different levels, the execution time consumption of the current query SQL can be reduced, and meanwhile, generated caches can be multiplexed by other query SQL to the greatest extent, so that the utilization rate of the caches is improved, and the storage occupation of the caches is reduced.

Optionally, in the embodiment of the present application, in order to balance the storage space occupation, the cache cannot be stored without limitation. Discarding of the cache is only effective when the aggregated result filtered by time is valid (e.g., window) or the query service corresponding to the cache that cannot be reused is deleted.

Illustratively, after the step 206, the method for fast responding to streaming data services provided in the embodiment of the present application may further include at least one of the following steps 207 and 208:

step 207, determining whether the cache data can be multiplexed by the subsequent computation based on the window, and discarding the cache data which cannot be multiplexed by the subsequent computation.

Step 208, determining whether the data queried by the data query request is data in the user subscription period, and discarding the cached data not in the user subscription period.

Illustratively, the discarding of the cache depends on case 1, whether the cache has the possibility of being multiplexed/directly used; and 2, inquiring whether historical data needs to be acquired by the user. For example, in case 1, if the data in the window from the time 50s to the time 80s has not reached the full trigger calculation, the buffer result from the time 50s to the time 60s, the buffer result from the time 60s to the time 70s can be multiplexed, and cannot be discarded, and the buffer aggregation data between 40s and 50s can be discarded. For case 2 there are 3 subdivision cases as follows: if the user inquires the full data, the historical data is related, and the cache cannot be abandoned; if the user needs to query the data created by the SQL in an incremental way, the historical data is also related, and the cache cannot be abandoned; if the user only needs to query the data during the subscription to the service, only the aggregate results during the subscription of the user to the service are cached. During the period that the inquiry is not subscribed by the user, the historical data only caches the last calculation result, so as to ensure quick return when the inquiry is subscribed for the first time.

Therefore, the occupation of the cache to the storage space can be reduced to a great extent, and the space utilization rate of the storage space is improved.

According to the quick response method for stream data service provided by the embodiment of the application, the data loading module loads data aggregation operation periodically/according to the requirement in a resident service mode and stores the data aggregation operation in the cache, so that the effect of early operation and the effect of reducing the calculated amount when inquiring a service request are achieved.

The quick response method for stream data service provided by the embodiment of the application comprises the steps of firstly, receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; then, target cache data corresponding to the target cache policy is obtained from the cache data, and needed cache data is screened from the target cache data to calculate, so that a data calculation result corresponding to the target query request is obtained; finally, sending the data calculation result to the target equipment; wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation. Therefore, the response speed of the streaming data service is improved by classifying the data query requests of the same type of query and executing corresponding caching strategies for each class according to the classification result.

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a streaming data service quick response method comprising: receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; acquiring target cache data corresponding to the target cache policy from the cache data, and screening out needed cache data from the target cache data for calculation to obtain a data calculation result corresponding to the target query request; transmitting the data calculation result to the target equipment; wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of fast responding to streaming data services provided by the above methods, the method comprising: receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; acquiring target cache data corresponding to the target cache policy from the cache data, and screening out needed cache data from the target cache data for calculation to obtain a data calculation result corresponding to the target query request; transmitting the data calculation result to the target equipment; wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation.

In yet another aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above-described respective provided streaming data service quick response method, the method comprising: receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request; acquiring target cache data corresponding to the target cache policy from the cache data, and screening out needed cache data from the target cache data for calculation to obtain a data calculation result corresponding to the target query request; transmitting the data calculation result to the target equipment; wherein, the cache data is: based on a cache strategy formulated by classification results of a plurality of data query requests; the plurality of data query requests are homogeneous queries, and the target query request is one of the plurality of data query requests; the homogeneous query satisfies at least one of: the master table is the same, the slave tables are the same, and the conditional sentences for the joint query are the same; the classification result is as follows: and carrying out multi-level classification based on the aggregation function in the data query request, the preset field and the window in stream data calculation.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of fast response for streaming data services, comprising:

Receiving a target query request called by target equipment, and determining a target cache strategy corresponding to the target query request;

Acquiring target cache data corresponding to the target cache policy from the cache data, and screening out needed cache data from the target cache data for calculation to obtain a data calculation result corresponding to the target query request;

transmitting the data calculation result to the target equipment;

2. The method of claim 1, wherein prior to obtaining target cache data corresponding to the target cache policy from the cache data, the method further comprises:

acquiring all data query requests registered in the data service, and classifying the data query requests based on a request field of each data query request to obtain at least one group of similar queries;

3. The method of claim 2, wherein classifying the data query requests based on the request field of each data query request results in at least one set of homogeneous queries, comprising:

dividing a plurality of data query requests with the same main table, the same slave table and the same conditional statement of the joint query into a group of similar queries;

4. The method of claim 1, wherein prior to obtaining target cache data corresponding to the target cache policy from the cache data, the method further comprises:

performing multi-stage classification based on an aggregation function, a preset field and a window in stream data calculation of each data query request in the plurality of data query requests to obtain a multi-stage classification result;

And determining a caching strategy of the data query request corresponding to each level based on each level indicated by the multi-level classification result, and caching data based on the caching strategy of the data query request corresponding to each level to obtain the cached data.

5. The method of claim 4, wherein the preset field comprises: the first preset field and the second preset field;

The multi-stage classification is performed based on the aggregation function, the preset field and the window in the stream data calculation of each data query request in the plurality of data query requests, so as to obtain a multi-stage classification result, including:

Dividing a first level based on the aggregation function, and dividing the data query requests with the same aggregation function in the plurality of data query requests into the same category;

dividing a second level based on the classification result of the first level and the first preset field, and dividing the data query requests with the same filtering condition corresponding to the first preset field in the plurality of data query requests into the same category;

dividing a third level based on the classification result of the second level and the second preset field, and dividing the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category;

Dividing a fourth level based on the classification result of the third level and windows in stream data calculation, and dividing the data query requests with the same windows in the plurality of data query requests into the same category;

the first preset field is a Group by field; the second preset field is a Where field.

6. The method of claim 5, wherein the first hierarchical division based on the aggregate function divides the data query requests having the same aggregate function from the plurality of data query requests into the same category, comprising:

Dividing the data query requests with the same aggregation function in the plurality of data query requests into first sub-levels of the first hierarchy, wherein the aggregation function is the data query request of a first type;

And/or the number of the groups of groups,

Dividing the data query requests with the same aggregation function in the plurality of data query requests into second sub-levels of the first level, wherein the aggregation function is a second type of data query request;

7. The method of claim 6, wherein determining a caching policy for the data query request corresponding to each tier based on the respective tiers indicated by the multi-tier classification result comprises:

Caching the overall calculation result of each data corresponding to each data query request aiming at the data query request contained in the first sub-level of the first level;

And/or the number of the groups of groups,

And caching calculation results of each data in the data corresponding to each data query request according to the data query requests contained in the second sub-level of the first level.

8. The method of claim 5, wherein the classifying the data query requests with the same filtering condition corresponding to the first preset field in the plurality of data query requests into the same category based on the classification result of the first hierarchy and the first preset field comprises:

Dividing the data query request of which the clause of the first preset field in the first data query request meets a first preset condition into a first sub-level of the second level;

And/or the number of the groups of groups,

Dividing the data query request of which the clause of the first preset field does not meet a first preset condition in the first data query request into a second sub-level of the second level;

9. The method of claim 8, wherein determining a caching policy for the data query request corresponding to each tier based on the respective tiers indicated by the multi-tier classification result comprises:

caching aggregated data queried by a data query request meeting a second preset condition in the first data query request aiming at the data query request contained in a first sub-level of the second level;

And/or the number of the groups of groups,

Caching data queried by each data query request in the first data query request aiming at the data query requests contained in a second sub-level of the second level;

10. The method of claim 5, wherein the classifying the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category based on the classification result of the second hierarchy and the second preset field comprises:

Dividing a data query request meeting a third preset condition in the second data query request into a first sub-level of the third level;

And/or the number of the groups of groups,

Dividing a data query request meeting a fourth preset condition in the second data query request into a second sub-level of the third level;

And/or the number of the groups of groups,

Dividing a data query request meeting a fifth preset condition in the second data query request into a third sub-level of the third level;

And/or the number of the groups of groups,

Dividing a data query request meeting a sixth preset condition in the second data query request into a fourth sub-level of the third level;

And/or the number of the groups of groups,

Dividing a data query request meeting a seventh preset condition in the second data query request into a fifth sub-level of the third level;

And/or the number of the groups of groups,

Dividing a data query request meeting an eighth preset condition in the second data query request into a sixth sub-level of the third level;

And/or the number of the groups of groups,

Dividing a data query request meeting a ninth preset condition in the second data query request into a seventh sub-level of the third level;

11. The method of claim 10, wherein determining a caching policy for the data query request corresponding to each tier based on the respective tiers indicated by the multi-tier classification result comprises:

Caching aggregated data of all data query requests for the data query requests contained in the first sub-level of the third level;

And/or the number of the groups of groups,

Caching aggregated data of all data query requests for the data query requests contained in the second sub-level of the third level;

And/or the number of the groups of groups,

Caching the aggregate data of each logic operation for a data query request contained in a third sub-level of the third level;

And/or the number of the groups of groups,

Invoking a computing engine to query data queried by each data query request for the data query request contained in a fourth sub-level of the third level;

And/or the number of the groups of groups,

Aiming at the data query requests contained in the fifth sub-level of the third level, the data queried by the data query requests corresponding to the fields of each dimension type;

And/or the number of the groups of groups,

Aiming at the data query requests contained in the sixth sub-level of the third level, invoking a calculation engine to query the data queried by each data query request and caching the queried data;

And/or the number of the groups of groups,

And for the data query requests contained in the seventh sub-level of the third level, invoking a calculation engine to query the data queried by each data query request, merging the queried data to be merged, and caching the merged data with the historical result data cache data.

12. The method of claim 5, wherein the classifying the data query requests with the same filtering condition corresponding to the third preset field in the plurality of data query requests into the same category based on the classification result of the second hierarchy and the second preset field comprises:

dividing the data query request meeting the tenth preset condition in the third data query request into a first sub-stage of the fourth level;

And/or the number of the groups of groups,

Dividing the data query request meeting the eleventh preset condition in the third data query request into a second sub-level of the fourth level;

And/or the number of the groups of groups,

Dividing the data query request meeting the twelfth preset condition in the third data query request into a third sub-level of the fourth level;

And/or the number of the groups of groups,

Dividing the data query request meeting the thirteenth preset condition in the third data query request into a fourth sub-level of the fourth level;

13. The method of claim 12, wherein determining a caching policy for the data query request corresponding to each tier based on the respective tiers indicated by the multi-tier classification result comprises:

Aiming at the data query request contained in the first sub-level of the fourth level, merging the increment data obtained during each query with the historical result data during the last query, caching, and taking the merged aggregate data as the historical result data during the next query;

And/or the number of the groups of groups,

Aiming at the data query requests contained in the second sub-level of the fourth level, obtaining a cache granularity based on the calculation granularity of windows of a plurality of data query requests, and caching the aggregated data based on the cache granularity; the calculated granularity is used for indicating the moving distance of the window; the cache granularity is calculated based on integer time unit distances of a plurality of calculation granularities;

And/or the number of the groups of groups,

Dividing the incremental data according to the session interval aiming at the data query request contained in the third sub-level of the fourth level to obtain a plurality of pieces of data, and calculating the aggregate data of each piece of divided data; combining the result of the first section of data in the plurality of sections of data with the historical result data, and caching the result of the last section of data to be used as the historical result data of the next calculation;

And/or the number of the groups of groups,

And aiming at the data query request contained in the fourth sub-level of the fourth level, carrying out aggregation calculation on data corresponding to the window belonging to the historical result data, and caching the calculated aggregation data.

14. The method of claim 5, wherein determining a caching policy for the data query request corresponding to each tier based on the respective tiers indicated by the multi-tier classification result comprises:

for a first data query request with the query frequency greater than or equal to a preset frequency threshold value in the second hierarchy, caching all aggregate data queried by the first data query request;

And/or the number of the groups of groups,

For a second data query request which does not meet a first preset sub-condition, meets a second preset sub-condition and has a query frequency greater than or equal to a preset frequency threshold value in the third hierarchy, caching all aggregated data queried by the second data query request;

And/or the number of the groups of groups,

Aiming at a third data query request with the query frequency greater than or equal to a preset frequency threshold value in the fourth level, merging the query result of the third data query request with historical result data when data query is performed each time, and taking the merged result as the historical result data when the next query is performed;

Wherein, the first preset sub-condition is: the field judgment condition corresponding to the second preset field is a dynamic condition; the dynamic conditions are: judging conditions which are not preset; the second preset sub-condition is: the field judgment condition includes a logical operation.

15. The method according to any one of claims 4 to 14, wherein the preset field further comprises: a third preset field;

the data caching is performed based on the caching strategy of the data query request corresponding to each level, so as to obtain the cached data, which comprises the following steps:

And caching data indicated by the union of fields in the clauses of the third preset field of each data query request in the plurality of data query requests.

16. The method of claim 4, wherein determining a caching policy for the data query request corresponding to each tier based on the respective tiers indicated by the multi-tier classification result comprises:

and judging whether the cache data can be multiplexed by subsequent computation based on the window, and discarding the cache data which cannot be multiplexed by subsequent computation.

17. The method of claim 4, wherein after the caching of the data based on the caching policy of the data query request corresponding to each level, the method further comprises:

judging whether the data queried by the data query request is data in the user subscription period or not, and discarding the cache data which does not belong to the user subscription period.

18. A computer program product comprising computer program/instructions which, when executed by a processor, implement the steps of the streaming data service fast response method of any one of claims 1 to 17.

19. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the streaming data service fast response method according to any one of claims 1 to 17 when the program is executed.

20. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of the streaming data service fast response method according to any of claims 1 to 17.