CN115098542A

CN115098542A - Flow type big data frequency division pre-polymerization and query method

Info

Publication number: CN115098542A
Application number: CN202210691498.5A
Authority: CN
Inventors: 蒋烁淼; 陆宏鸣
Original assignee: Shanghai Cloudcare Information Technology Co ltd
Current assignee: Shanghai Cloudcare Information Technology Co ltd
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-09-23

Abstract

The invention provides a method and a system for pre-polymerizing and inquiring streaming big data by frequency division, wherein the method comprises the following steps: grouping the index data, and performing pre-polymerization statistics on the grouped index data and call relation data between the same link services at a minute level; starting an hour-level timing task, and aggregating the minute-level data into hour-level pre-polymerization statistical data; starting a day-level timing task, and aggregating the hour-level data into day-level pre-polymerization statistical data; segmenting the time range of data query; counting data of corresponding time ranges from the frequency division pre-polymerization data of different levels through a querier; and summarizing and aggregating calculation in the memory to obtain a calculation result and returning the calculation result to the caller. The invention utilizes a small amount of storage resources to reduce the consumption of computing resources and shorten the data query time of the user, and can automatically query and count the pre-polymerization data of the pre-polymerization frequency bands of different levels according to the time range queried by the user.

Description

Flow type big data frequency division pre-polymerization and query method

Technical Field

The invention relates to the technical field of statistical analysis of big data, in particular to a streaming big data frequency division pre-polymerization and query method.

Background

Under the current internet information era, the complexity of the system is higher and higher, a large amount of data needs to be collected to complete the observability analysis of the whole system, and under the condition of tens of billions, hundreds of billions or even hundreds of billions of data, the aggregation of the large amount of data is very difficult, so that not only is the server resource consumed, but also the calculation time is very long, and the user experience is poor.

Therefore, a method for reducing the computing resource consumption and shortening the computing time is needed to achieve the purpose of reducing the computing cost and improving the user experience.

Disclosure of Invention

In view of this, the present invention provides a method for statistical analysis of big data, which uses a small amount of limited storage resources to replace the advantages of reducing computation cost and improving user experience.

The core of the realization of the invention is that a large amount of data is grouped, pre-polymerized and counted according to different frequency bands, when in query, the time range selected by a user is automatically divided into different time periods, and the time periods are used for carrying out statistical query in corresponding hierarchical pre-polymerized data.

The invention provides a streaming big data frequency division pre-polymerization and query method, which comprises the following steps:

a1, before raw data flow into a data center and are written into a storage engine, grouping the index data by the service name of a data link service, and performing pre-polymerization statistics at a minute level on the grouped index data and call relation data between the same link services;

performing pre-aggregation grouping statistics on streaming data before the original data flows into a data center and enters a storage engine, storing the pre-aggregation data separately, and searching a pre-aggregation time precision balance point, for example, a pre-aggregation statistic point per minute according to service precision requirements in order to reduce the amount of statistical data and achieve the purpose that a user performs statistical analysis on data in different time ranges;

2, starting a timing task at an hour level, and performing primary aggregation on the prepolymerization data at the minute level in the previous whole hour to aggregate the prepolymerization statistical data at the hour level of the component groups;

a3, starting a timing task at a day level, performing first-level polymerization on the hour-level pre-polymerization data of the previous whole day, and polymerizing day-level pre-polymerization statistical data of component groups;

a4, cutting the time range of data query from hour level to: one of five ways of one hour level and a plurality of minute levels, a plurality of hour levels and a plurality of minute levels;

if a total 3 half-hour link service call relation lists between 1:40 and 5:10 and index data such as the number of requests, the number of errors, the average response time and the like of each service need to be inquired, the 3 half-hours of 1:40 to 5:10 can be automatically divided into a time range of two minutes of 1:40 to 2:00, a time range of 5:00 to 5:10 and a time range of one hour of 2:00 to 5: 00;

a5, counting data of minute-level time range from the minute-level pre-polymerization data through a querier, counting data of small-level time range from the small-level pre-polymerization data, and so on, counting data of corresponding time range from frequency-division pre-polymerization data of different levels;

counting data in two minute-level time ranges of 1: 40-2: 00 and 5: 00-5: 10 from the minute-level pre-polymerization data through a querier, counting data in a time range of 2: 00-5: 00 from the hour-level pre-polymerization data, and so on;

a6, summarizing and aggregating the data link service and the index data inquired from the frequency division pre-polymerization data of different levels in a memory to obtain a calculation result.

Further, the service name of the a1 step includes: number of requests, number of errors, average response time.

Further, the pre-polymerization statistics of the a1 step further includes:

storing the pre-polymerization data of the pre-polymerization statistics in a single part; storing the original data into a storage engine, separately storing a copy of the pre-polymerization data in the storage engine,

in particular, the pre-polymerization data is grouping statistics that have been pre-polymerization statistics on a minute level.

Further, the method for counting data of corresponding time ranges from the frequency-division pre-polymerization data of different levels of the step a5 comprises:

if the query time range exceeds one day, the time range is automatically divided into data query ranges of three time range levels of minutes, hours and days.

Further, the method for counting data of corresponding time ranges from the frequency-division pre-polymerization data of different levels of the step a5 further comprises:

if the data in the time range of one month is inquired, pre-polymerization is carried out on the pre-polymerization data in the minute level of the first layer for one layer or N layers in a larger time range;

if the number of the groups in the original data is large, the data volume of the pre-polymerization data in the minute level is very large, and if the user needs to search the data in the month time range, the aims of low computing resource consumption and quick query statistics cannot be met. It is therefore also necessary to pre-polymerize the first layer in minutes and for a larger time frame for one or N more layers. The method comprises the following steps: starting a timing task every hour, and aggregating the prepolymerization data of the minute level into prepolymerization data of the hour level again; a daily timing task is initiated to aggregate the hour-level pre-polymerization data again into daily pre-polymerization data. By analogy, different levels of pre-polymerization time ranges can be defined according to business needs;

further, the step a6 is followed by: and returning the calculation result to the calling party.

The invention also provides a streaming big data frequency division pre-polymerization and query system, which executes the streaming big data frequency division pre-polymerization and query method, and comprises the following steps:

a prepolymerization statistical module: the method comprises the steps that before original data flow into a data center and are written into a storage engine, index data are grouped according to service names of data link services, and pre-polymerization statistics of the grouped index data and call relation data among the same link services in a minute level is carried out;

an hourly polymerization module: the timing task is used for starting an hour level, the pre-polymerization data of the minute level in the previous whole hour is subjected to first-level polymerization, and the pre-polymerization statistical data of the hour level of the component group is polymerized;

a daily aggregation module: the system comprises a data processing unit, a data processing unit and a data processing unit, wherein the data processing unit is used for starting a day-level timing task, performing first-level aggregation on hour-level pre-aggregation data of the previous whole day, and aggregating day-level pre-aggregation statistical data of component groups;

query range segmentation module: the time range for data queries is cut from the hour level into: one of five ways of one hour level and a plurality of minute levels, a plurality of hour levels and a plurality of minute levels;

a data statistics and query module: counting data in a minute-level time range from the minute-level pre-polymerization data through an inquirer, counting data in a small-level time range from the small-level pre-polymerization data, and so on, and counting data in a corresponding time range from the frequency-division pre-polymerization data in different levels;

a summary aggregation module: the device is used for summarizing and aggregating the data link service and the index data inquired from the frequency division pre-polymerization data of different levels in the memory to obtain a calculation result, and returning the calculation result to the calling party.

The present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the streaming big data frequency division pre-aggregation and query method as described above.

The invention also provides a computer device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program to implement the steps of the streaming big data frequency division pre-aggregation and query method as described above.

Compared with the prior art, the invention has the beneficial effects that:

the invention uses a small amount of storage resources to greatly reduce the consumption of computing resources and greatly shorten the data query time of a user, avoids the operation of grouping statistics which consumes huge amount from huge original data, and can automatically query and count from pre-polymerization data of pre-polymerization frequency bands of different levels according to the time range queried by the user.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.

In the drawings:

FIG. 1 is a flow chart of a streaming big data frequency division pre-polymerization and query method of the present invention;

FIG. 2 is a schematic diagram of a computer device according to an embodiment of the present invention;

fig. 3 is a processing flow diagram of a streaming big data frequency division pre-polymerization and query method according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, and third may be used in this disclosure to describe various signals, these signals should not be limited to these terms. These terms are only used to distinguish one type of signal from another. For example, a first signal may also be referred to as a second signal, and similarly, a second signal may also be referred to as a first signal, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The implementation core of the embodiment of the invention is that grouping pre-polymerization statistics is carried out on a large amount of data according to different frequency bands, during query, the time range selected by a user is automatically divided into different time periods, and statistical query is carried out in corresponding hierarchical pre-polymerization data.

The embodiment of the invention provides a streaming big data frequency division pre-polymerization and query method, which is shown in figure 1 and comprises the following steps:

preferably, the service name includes: number of requests, number of errors, average response time;

storing the pre-polymerization data of the pre-polymerization statistics in a single part; storing the raw data into a storage engine, and separately storing a copy of the pre-polymerization data in the storage engine, as shown in fig. 3;

in particular, the pre-polymerization data is grouped statistical data that has been pre-polymerization counted on a minute level;

a3, starting a timing task at a day level, performing first-level aggregation on the hour-level pre-aggregation data of the previous whole day, and aggregating the day-level pre-aggregation statistical data of the component groups;

in this embodiment, a total 3 half-hour link service call relationship lists between 1:40 and 5:10, and index data such as the number of requests, the number of errors, the average response time of each service need to be queried, and 3 half-hours of 1:40 to 5:10 can be automatically divided into a time range of two minutes of 1:40 to 2:00, a time range of 5:00 to 5:10, and a time range of one hour of 2:00 to 5: 00;

in this embodiment, the querier counts data in two minute-level time ranges of 1: 40-2: 00 and 5: 00-5: 10 from the minute-level pre-polymerization data, counts data in a time range of 2: 00-5: 00 from the hour-level pre-polymerization data, and so on;

a6, summarizing and aggregating the data link service and the index data queried from the frequency division pre-polymerization data of different levels in a memory to obtain a calculation result.

The step a6 further comprises the following steps: and returning the calculation result to the calling party.

The embodiment of the present invention further provides a streaming big data frequency division pre-polymerization and query system, which executes the streaming big data frequency division pre-polymerization and query method described above, including:

a prepolymerization statistical module: the method comprises the steps that before original data flow into a data center and are written into a storage engine, index data are grouped according to service names of data link services, and minute-level pre-polymerization statistics is carried out on the grouped index data and call relation data between the same link services;

an hourly polymerization module: the timing task for starting one hour level is used for carrying out primary aggregation on the prepolymerization data of the minute level in the previous whole hour and aggregating the prepolymerization statistical data of the hour level of the component group;

a daily aggregation module: the day-level pre-polymerization statistical data of the group of the aggregated components is obtained by carrying out first-level aggregation on the hour-level pre-polymerization data of the previous whole day;

The embodiment of the invention achieves the purposes of greatly reducing the consumption of computing resources and greatly shortening the data query time of a user by using a small amount of storage resources, avoids the operation of carrying out huge consumption grouping statistics from huge original data, and can automatically query and count the pre-polymerization data of pre-polymerization frequency bands of different levels according to the time range queried by the user.

Fig. 2 is a schematic structural diagram of a computer device provided in an embodiment of the present invention; referring to fig. 2 of the drawings, the computer apparatus comprises: an input device 23, an output device 24, a memory 22 and a processor 21; the memory 22 for storing one or more programs; when the one or more programs are executed by the one or more processors 21, the one or more processors 21 implement the streaming big data frequency division pre-aggregation and query method provided by the above embodiment; wherein the input device 23, the output device 24, the memory 22 and the processor 21 may be connected by a bus or other means, as exemplified by the bus connection in fig. 2.

The memory 22, which is a readable and writable storage medium of a computing device, may be used to store a software program, a computer executable program, and program instructions corresponding to the streaming big data frequency division pre-aggregation and query method according to the embodiment of the present invention; the memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like; further, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device; in some examples, the memory 22 may further include memory located remotely from the processor 21, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 23 may be used to receive input numeric or character information and to generate key signal inputs relating to user settings and function control of the apparatus; the output device 24 may include a display device such as a display screen.

The processor 21 executes software programs, instructions and modules stored in the memory 22, so as to execute various functional applications and data processing of the device, that is, implement the streaming big data frequency division pre-aggregation and query method described above.

The computer device provided above can be used to execute the streaming big data frequency division pre-aggregation and query method provided above, and has corresponding functions and advantages.

Embodiments of the present invention also provide a storage medium containing computer executable instructions, which when executed by a computer processor, are configured to perform the streaming big data frequency division pre-aggregation and query method provided in the above embodiments, where the storage medium is any of various types of memory devices or storage devices, and the storage medium includes: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc.; the storage medium may also include other types of memory or combinations thereof; in addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet); the second computer system may provide program instructions to the first computer for execution. A storage medium includes two or more storage media that may reside in different locations, such as in different computer systems connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the streaming big data frequency division pre-aggregation and query method described in the above embodiments, and may also perform related operations in the streaming big data frequency division pre-aggregation and query method provided by any embodiments of the present invention.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the accompanying drawings, but it is apparent to those skilled in the art that the scope of the present invention is not limited to these specific embodiments. Without departing from the principle of the present invention, a person skilled in the art can make the same changes or substitutions on the related technical features, and the technical solutions after the changes or substitutions will fall within the protection scope of the present invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention; various modifications and alterations of this invention will occur to those skilled in the art. Any modification, substitution and improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for pre-polymerizing and querying streaming big data by frequency division is characterized by comprising the following steps:

a2, starting a timing task at an hour level, and carrying out first-level aggregation on the pre-polymerization data at the minute level in the previous whole hour to aggregate the pre-polymerization statistical data at the hour level of the component group;

5, counting data in a minute-level time range from the minute-level pre-polymerization data through an interrogator, counting data in a small-level time range from the small-level pre-polymerization data, and so on, counting data in a corresponding time range from the frequency-division pre-polymerization data in different levels;

2. The streaming big data frequency division pre-polymerization and query method according to claim 1, wherein the service name of the step a1 includes: number of requests, number of errors, average response time.

3. The streaming big data frequency division pre-polymerization and query method according to claim 1, wherein the pre-polymerization statistics of the step a1 further include:

storing the pre-polymerization data of the pre-polymerization statistics in a single part; and storing the original data into a storage engine, and separately storing a copy of the pre-polymerization data in the storage engine.

4. The streaming big data frequency division pre-polymerization and query method according to claim 1, wherein the method for counting data of corresponding time range from the frequency division pre-polymerization data of different levels in the step a5 comprises:

5. The streaming big data frequency division pre-polymerization and query method according to claim 1, wherein the method for counting data of corresponding time range from the frequency division pre-polymerization data of different levels of the step a5 further comprises:

if the data of a month time range is inquired, the prepolymerization of one or N layers with a larger time range is carried out on the prepolymerization data of the minute level of the first layer.

6. The streaming big data frequency division pre-polymerization and query method according to claim 1, wherein the step a6 is followed by further comprising: and returning the calculation result to the calling party.

7. A streaming big data frequency division pre-polymerization and query system for performing the streaming big data frequency division pre-polymerization and query method according to any one of claims 1 to 6, comprising:

a pre-polymerization statistics module: the method comprises the steps that before original data flow into a data center and are written into a storage engine, index data are grouped according to service names of data link services, and minute-level pre-polymerization statistics is carried out on the grouped index data and call relation data between the same link services;

an hourly polymerization module: the system is used for starting a timing task at an hour level, and performing primary aggregation on the pre-polymerization data at the minute level in the previous whole hour to aggregate into pre-polymerization statistical data at an hour level;

8. A computer readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the streaming big data divide-by-frequency pre-polymerization and query method of any of claims 1 to 6.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the streaming big data divide-by-frequency pre-polymerization and query method according to any of claims 1-6.