CN111723114A

CN111723114A - Stream type statistical method and device and electronic equipment

Info

Publication number: CN111723114A
Application number: CN202010585418.9A
Authority: CN
Inventors: 赵文越; 徐端丰; 章孜谦; 朱敏
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-09-29
Anticipated expiration: 2040-06-24
Also published as: CN111723114B

Abstract

The disclosure provides a streaming statistical method, a streaming statistical device and electronic equipment. The method comprises the following steps: setting a variable scale; determining a statistical starting point and a statistical ending point aiming at a specified variable on a variable scale; determining statistical points, wherein the statistical points at least comprise a statistical starting point and a statistical ending point; determining a statistical result of the flow data for the specified variable based on the statistical result of the statistical point; wherein the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data.

Description

Stream type statistical method and device and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a streaming statistics method and apparatus, and an electronic device.

Background

For shopping malls or daily promotional activities, it is often desirable to track changes in the trading of goods over a given period of time as accurately as possible. In the related art, the processing of the streaming data only involves the acquisition and collection of the streaming data based on the time window, and the field of streaming statistics and presentation can be said to be blank. For example, in the case of streaming statistics, the related art may temporarily store stream data in time windows, and then cyclically accumulate the stream data in time windows one by one, and issue data delayed for several minutes for presentation.

In the course of implementing the disclosed concept, the inventors found that the related art has at least the following problems: the prior art causes higher time delay by circularly calculating one time window by one time window.

Disclosure of Invention

In view of the above, the present disclosure provides a streaming statistical method, an apparatus and an electronic device, which are helpful for improving the problem of high latency.

One aspect of the present disclosure provides a streaming statistical method, where streaming data is processed through a data processing stage to complete a data processing process, the method including: setting a variable scale; determining a statistical starting point and a statistical ending point aiming at a specified variable on a variable scale; determining statistical points, wherein the statistical points at least comprise a statistical starting point and a statistical ending point; determining a statistical result of the flow data for the specified variable based on the statistical result of the statistical point; wherein the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data.

The streaming statistical method provided by the embodiment of the disclosure creatively provides the concept of the parameter scale, and then, through the plurality of statistical points based on the parameter scale, the streaming statistical data of a series of statistical points is cut and calculated once without circularly calculating the parameter variable window one by one, thereby effectively improving the statistical efficiency and reducing the time delay.

One aspect of the present disclosure provides a streaming statistics apparatus, comprising: the device comprises a scale setting module, a starting point and stopping point determining module, a statistical point determining module and a statistical module. The scale setting module is used for setting a variable scale; the starting point and stopping point determining module is used for determining a statistical starting point and a statistical stopping point which are determined on the variable scale and aim at the specified variable, wherein the statistical starting point and the statistical stopping point are determined based on the processing speed of the data processing stage of the stream data; the statistic point determining module is used for determining statistic points, and the statistic points at least comprise a statistic starting point and a statistic ending point; the statistical module is used for determining the statistical result of the flow data aiming at the specified variable based on the statistical result of the statistical point.

The stream type statistical device provided by the embodiment of the disclosure sets the variable scale through the scale setting module, so that stream type statistical data of a series of statistical points meeting the shortest time effect is calculated based on one-time cutting of the variable scale, the data precision can be specified, and the commitment and precision regulation of the time effect are clear and controllable.

Another aspect of the present disclosure provides an electronic device comprising one or more processors and a storage for storing executable instructions that, when executed by the processors, implement the method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of a streaming statistics method, apparatus and electronic device according to an embodiment of the present disclosure;

fig. 2 schematically illustrates an exemplary system architecture to which the streaming statistics method, apparatus and electronic device may be applied, according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a streaming statistics method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart for determining a process age according to an embodiment of the disclosure;

FIG. 5 schematically shows a schematic diagram of data processing stages according to an embodiment of the disclosure;

FIG. 6 schematically shows a diagram of processing durations of data processing phases according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a logic diagram of a streaming statistics method according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a streaming statistics method according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a streaming statistics method according to another embodiment of the present disclosure;

fig. 10 schematically shows a schematic structural diagram of a streaming statistics apparatus according to an embodiment of the present disclosure; and

FIG. 11 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. One or more embodiments may be practiced without these specific details. In the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art and are to be interpreted as having a meaning that is consistent with the context of this specification and not in an idealized or overly formal sense expressly so defined herein.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more features.

The embodiment of the disclosure provides a streaming statistical method, a streaming statistical device and electronic equipment. The streaming statistical method comprises a start point and a stop point determining process and a statistical process. In the process of determining the starting point and the stopping point, firstly, a variable scale is set, then, a statistical starting point and a statistical stopping point which are aimed at a specified variable on the variable scale are determined, and then, a statistical point is determined, wherein the statistical point at least comprises the statistical starting point and the statistical stopping point, and the statistical starting point and the statistical stopping point are determined based on the processing speed of the data processing stage of the streaming data. After the start point and the end point determining process are completed, a statistical process is entered, and a statistical result of the flow data for the specified variable is determined based on the statistical result of the statistical point. The embodiment of the disclosure determines a plurality of statistical points based on the parameter scale, and does not need to circularly calculate the parameter window one by one, but cuts and calculates the streaming statistical data of a series of statistical points once, thereby effectively improving the statistical efficiency and reducing the time delay.

Fig. 1 schematically illustrates an application scenario of a streaming statistics method, apparatus and electronic device according to an embodiment of the present disclosure.

As shown in fig. 1, for shopping malls or daily sales promotion activities, such as double-11 shopping segment, 618 shopping segment, home appliance subsidy activity, merchant sales promotion activity, etc., it is desirable for users to be able to determine the change of commodity transaction efficiently and accurately. As shown in fig. 1, a platform releases XX shopping nodes, and platform operators want to know transaction conditions of various commodities and the like in real time, such as transaction conditions (e.g., number of transaction strokes, transaction amount, etc.) of products such as electronic products, clothing, tourism, etc. In addition, the platform operator may want to know the transaction situation of the more finely classified commodities, such as mobile phones, computers, household appliances, etc. under the category of electronic products. As another example, the platform operator may wish to know the time period: the amount of bargaining for each item during aa: bb to cc: dd. Wherein the values aa: bb, cc: dd of the time period can be adjusted by the user in real time. Values such as X, A, B, C in fig. 1 may be dynamically changed, and a visual chart may be further formed to improve the intuitiveness. The platform operator can allocate resources and judge the running state of the platform according to the real-time statistical result, and the operation performance of the platform is promoted.

Fig. 2 schematically illustrates an exemplary system architecture to which the streaming statistics method, apparatus and electronic device may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 2 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 2, the system architecture 200 according to this embodiment may include

terminal devices

201, 202, 203, a network 204 and a server 205. The network 204 may include a plurality of gateways, hubs, network lines, etc. to provide a medium for communication links between the

terminal devices

201, 202, 203 and the server 205. Network 204 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user can use the

terminal devices

201, 202 and 203 to interact with other terminal devices and the server 205 through the network 204 to receive or send information and the like, such as association relation sending request, information sending request, processing result receiving and the like. The

terminal devices

201, 202, 203 may be installed with various communication client applications, such as banking applications, web browser applications, search applications, office applications, instant messaging tools, mailbox clients, social platform software, etc. (just examples).

The

terminal devices

201, 202, 203 include, but are not limited to, self-service terminals, smart phones, virtual reality devices, augmented reality devices, tablets, laptop portable computers, and the like.

The server 205 may receive a request, for example, a commodity information request, a real-time statistics result request, a shopping request, and the like from the

terminal devices

201, 202, and 203, and the server 205 may obtain stream data from a terminal, another server (e.g., an information platform, a database server, a cloud database, and the like), and perform statistics on the stream data. For example, the server 205 may be a back office management server, a cluster of servers, or the like. The background management server may analyze and process the received service request, information request, and the like, and feed back a processing result (such as a statistical result of the request) to the terminal device.

It should be noted that the streaming statistics method provided by the embodiment of the present disclosure may be generally executed by the server 205. The streaming statistics method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 205 and capable of communicating with the

terminal devices

201, 202, 203 and/or the server 205. It should be understood that the number of terminal devices, networks, and servers are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 3 schematically shows a flow chart of a streaming statistics method according to an embodiment of the present disclosure.

As shown in fig. 3, the streaming statistical method includes operations S301 to S307.

In operation S301, a variable scale is set.

In this embodiment, the parameters of the variable scale include: any one of physical parameters such as time parameter, length parameter, volume parameter, weight parameter, flow parameter, electric quantity parameter, etc. For example, the variable scale may be a time scale, a length scale, a volume scale, a flow scale, or the like. The variable scale chosen may depend on the variable for a particular application scenario. For example, the method is popularized to other measurement scales according to actual scenes, and the variable scale is set, so that the method can be applied to various scenes in which the statistical values corresponding to a series of variable points are calculated through setting the variable scale and mapping at one time.

The following description will be made by taking a time scale as an example. The time scale may include parameters such as scale length and scale accuracy. For example, the total length of the time scale is in hours, the accuracy is in seconds, etc. When using the time scale, the time scale may be initialized first.

In one embodiment, the variable scale may be initialized based on a total length of the variable scale, a preset precision. Taking the time scale as an example, the precision and the total length of the time scale can be flexibly set along with different application scenes, so that the requirements of various application scenes can be met, such as the application scenes with high precision requirements can be met, and the effect of reducing the time delay in the high-precision application scenes is more obvious. It should be noted that the total length of the time scale may be determined by the processing time length of the data processing stage, for example, the total length of the time scale is an integral multiple of the processing time length.

In one embodiment, the total length of the time scale (e.g., 1 minute, 5 minutes, 10 minutes, half hour, 1 hour, 5 hours, 1 day, 1 week, etc.), the accuracy (e.g., 1, 3, 5, 10, 30, 50, or 100, etc.), the accuracy unit (e.g., seconds), etc. may be determined, and the design initialization parameter table (shown in table 1 below) may include fields for time scale accuracy a, time accuracy unit, time scale value, etc. It should be noted that the scales of the time scale may be equally spaced, or may be variably spaced, for example, the scale spacing at one segment is different from the scale spacing at another segment, and is not limited herein.

The initialization algorithm is exemplified below.

When the time scale value in the time scale coefficient table 1 is not present, a record is inserted, which may include the following parameters: time zero scale value, time scale precision, time precision unit.

When the maximum value of the time scale value in the time scale coefficient table 1 is smaller than the total length of the time scale, the maximum value of the current time scale value is increased by one time scale unit to obtain a new time scale value, and then the new time scale value is inserted into the time scale coefficient table 1. This procedure is repeated until the maximum value of the time scale values in coefficient table 1 is greater than or equal to the total length of the time scale.

For example, the total length of the time scale is 24 hours, the input time scale precision a is 10, the time precision unit is second(s), that is, the time scale precision is 10 seconds, and a time scale coefficient table can be obtained, as shown in table 1.

TABLE 1

Value of time scale	Time scale accuracy	Time accuracy unit
			00:00:00	10	s
00:00:10	10	s
			00:00:20	10	s
...	...	...
			23:59:40	10	s
23:59:50	10	s

In addition, variable values of equal or unequal intervals can be used as the parameter table of the initialization variable scale according to actual requirements.

In operation S303, a statistical start point and a statistical end point for a specified variable on a variable scale are determined.

Wherein the stream data is processed in a data processing stage to complete the data processing process. The statistical start point and the statistical end point are determined based on processing characteristics of the data processing stage of the stream data, such as a processing time length, a processing speed, and the like. The processing speed of the data processing stage may be determined based on the processing duration of the data processing node and the processing duration threshold, and if the processing duration of the data processing node is greater than the processing duration threshold, the processing speed of the data processing stage may be considered slower, and if the processing duration of the data processing node is less than the processing duration threshold, the processing speed of the data processing stage may be considered faster.

In one embodiment, the specified variable is a time variable. Determining the statistical start point and the statistical end point on the variable scale for the given variable may include the following operations. And determining a statistical starting point and a statistical ending point for the current time point based on the processing aging and/or the duration information of the processing units, wherein each processing unit comprises at least one data processing stage.

FIG. 4 schematically illustrates a flow chart for determining a process age according to an embodiment of the disclosure.

As shown in fig. 4, the time length information of the processing unit may be determined first, and then the processing aging may be determined.

Specifically, the duration information of the processing unit can be determined by determining how many stages of data processing (specifying which stage the streaming statistics stage is at), and estimating the time consumption of each stage. For example, the time length information of the processing unit is determined by: first, a specified number of data processing stages that the stream data needs to pass through is determined. Then, the processing time length of each data processing stage is determined. And then, combining at least part of the data processing stages in the specified number of data processing stages based on a preset rule and the processing time length of each data processing stage to obtain at least one processing unit and respective time length information.

Fig. 5 schematically shows a schematic diagram of data processing stages according to an embodiment of the present disclosure.

The investigation flow data processing process can be divided into a plurality of stages, and a plurality of processing stages with relatively independent technology and function are listed. As shown in fig. 5, after investigating each function of a certain system stream data processing stage, the system stream data processing stage can be roughly divided into a real-time data acquisition stage a, a processing stage B by minute, a spark sql platform processing stage C, a file transmission stage D, an application side-stream statistics stage E, and a foreground server processing presentation request-response stage F.

FIG. 6 schematically shows a diagram of processing durations of data processing stages according to an embodiment of the disclosure.

As shown in fig. 6, the processing time of the acquisition phase a of the real-time data may be preliminarily determined by means of investigation, statistics, and the like, the processing time of the processing phase B is collected by minutes for about 1 minute, the processing time of the processing phase C of the spark sql platform is about 2 minutes and 45 seconds (2min45s), the processing time of the file transmission phase D is less than 2 minutes, the processing time of the application side-flow statistics phase E is less than 1 minute, and the processing time of the foreground server processing the request-response phase F is about second level.

Determining the working age may include the following operations. First, time length information of a time slice is determined based on at least one processing unit. Then, the machining aging is determined based on the time length information of the time slice and the number of the at least one machining unit.

For example, find out the data processing stage that is estimated to consume the most time, record the consumed time of this data processing stage as Tmax, and the processing consumed time of the i-th stage data as Ti, the data amount of one round of processing of each stage is measured by the time length, and is recorded as a single time slice L. Since the data throughput of the ith stage is to be guaranteed integrity within a single time slice and the processing of the ith stage is dependent on the processing of the ith-1 stage, if the ith stage takes more time than the ith-1 stage. i is a positive integer of 1 or more. For example, the stream data includes at least one round of data, and each round of data includes a data amount processed by one data processing stage having the longest processing time.

Then the assigned time slice needs to satisfy: and L mod Ti-1 is 0, and if the time consumption of the ith stage is less than that of the ith-1 stage, the distributed time slice needs to satisfy L > -Ti-1. For example, the time consumption Tb of the processing stage B collected by minutes is greater than the time consumption Ta of the acquisition stage a of the real-time data, and L mod Ta needs to be 0. The consumed time Tc of the spark sql platform processing stage C is greater than the consumed time Tb of the collecting processing stage B by minutes, so that L mod Tb is 0, the consumed time Td of the file transmission stage D is less than the consumed time Tc of the spark sql platform processing stage C, and L > -Tc, similarly, L > -Td, and L > -Te need to be satisfied. In summary, the set of L is {3, 4, 5. }, i.e. a natural number greater than 3, where the minimum value of L min (L) is 3 minutes. Where mod is the remainder function.

If the precision of the time scale is a, the time slice L is subdivided into (b ═ L/a) smaller precision slices. Obviously, the time slice L takes the minimum value, the entire process is most time efficient.

The preset rule includes at least one of: for each data processing stage, if the total processing time length after the data processing stages are combined into the adjacent data processing stages is less than the set time length threshold value, the data processing stages are combined into the adjacent data processing stages, and the above operations are repeated until the total processing time length of the combined data processing stages is greater than or equal to the set time length threshold value. It should be noted that the merging process may not be performed for the data processing stage with a small functional dependency.

For example, if the time taken for a certain stage is less than 1 second as evaluated in actual conditions, the influence of the time taken for a certain stage on the processing line is ignored, and the processing line is not divided into one processing unit as continuous uninterrupted operation. If the total time consumption of successive m stages is determined to be less than the time slice length L, then they may be combined into the same processing unit. If the consumed time of the two stages is less than the time slice L and the sum of the consumed time is greater than the time slice L, the two stages are divided into different processing units. Therefore, the minimum value min (N) of the number of processing units N can be calculated, and the ideal time consumption Tx of the entire processing is the time slice length x the number of processing units, and the fastest aging time min (Tx) is the minimum value min (l) of the number of processing units. According to the schematic time consumption diagram of each stage, according to the preset rule, the stage a and the stage B may be merged into one processing unit, the stage B and the stage C belong to different processing units, the stage C and the stage D belong to different processing units, and the stage D and the stage E are merged into the same processing unit, so that the minimum value min (n) of the number of the processing units is 3, and the fastest aging min (tx) min (l) min (n) is 9 minutes, as shown in reference table 2.

TABLE 2

Phases

A

B

C

D

E

F

0-3min

<1 second

First wheel

Wait for

3-6min

<1 second

Second wheel

First wheel

Wait for

6-9min

<1 second

Third wheel

Second wheel

First wheel

Wait for

9-12min

......

Third wheel

Second wheel

First wheel

12-15min

......

Third wheel

Second wheel

15-18min

......

Third wheel

18-21min

......

In another embodiment, the method may further include the following operations. After the work aging is determined, the fluctuation range of the work aging is determined. Then, a commitment age is determined based on the machining age and the fluctuation range.

The commitment to the outside aging may be (9+ Δ) minutes, where Δ is an error determined based on a fluctuation range or the like, for example, a fluctuation range of the stage a to the stage F is 1 second to 60 seconds, and then the commitment to the outside aging may be 10 minutes.

Accordingly, since the stream data system has the characteristics of fast processing, weak stability and easy lapse compared with the batch system, determining the statistical start point and the statistical end point for the current time point based on the processing aging and/or the duration information of the processing unit for each new stream statistics may include: and determining a statistical starting point and a statistical ending point for the current time point based on the promised time period and/or the duration information of the processing unit.

In another embodiment, the method may further include the following operations.

After determining the commitment age based on the tooling age and the fluctuation range, updating the commitment age if a deviation between the commitment age and the actual age exceeds a set deviation threshold.

For example, if the average actual running time of each stage of production is recalculated after a period of time and does not conform to the estimated condition of the original design, no matter the average actual running time is too long or too short, each parameter should be readjusted according to the previous steps, so that the data is accurate as much as possible, the actual condition is reflected as much as possible in real time, and the promised time effect is reissued to the outside.

After determining the commitment age, a statistical start point and a statistical end point may be determined based on the following.

In one embodiment, for the jth turn data and the jth-1 turn data, determining the statistical start point and the statistical end point for the current time point based on the committed age and/or the duration information of the machining unit may include the following operations. Wherein j is a positive integer greater than or equal to 1.

And when the difference value between the system time for the j-th turn data and the statistical termination point for the j-1 th turn data is greater than or equal to the promised effectiveness, determining the statistical starting point for the j-th turn data based on the system time for the j-th turn data and the promised effectiveness.

And when the difference value between the system time aiming at the j-th turn data and the statistical termination point aiming at the j-1 th turn data is smaller than the promised effectiveness, determining the statistical termination point aiming at the j-th turn data based on the system time aiming at the j-th turn data, the time length information of the time slice and the promised effectiveness.

And when the difference value between the system time aiming at the j-th turn data and the statistical termination point aiming at the j-1 th turn data is larger than or equal to zero and is smaller than the promised time limit, determining the statistical start point aiming at the j-th turn data based on the statistical termination point aiming at the j-1 th turn data.

And when the difference value between the system time for the jth alternate data and the statistical termination point for the jth-1 alternate data is greater than or equal to zero and less than the promised time limit, determining the statistical termination point for the jth alternate data based on the latest input time for the jth alternate data.

For example, to satisfy the external commitment age Tx, according to the single time slice length L and the accuracy a, the start time stamp and the end time stamp of the data to be processed need to be calculated: defining the maximum time stamp of the input data of the ith round of the streaming statistic stage as in (i), and outputting the data Dat with the minimum time stamp of P (i) and the maximum time stamp of Q (i) for the ith round. In (i) refers to a time stamp of the last data in the ith round of input data, which is influenced by the network or the processing speed of other devices, so that part of the data may arrive later.

For the j-th round-robin statistics to be processed, recording the current system time as sys (j), obtaining the maximum time stamp of the processed data, i.e. the maximum time stamp Q (j-1) of the output data of the j-1 th round, if j is 0, then Q (j-1) does not exist, initializing Q (j) to be a specified starting value, such as: initialize Q (j-1) to 00: 00: 00.

the time difference between Sys (j) and Q (j-1) is used as delta t1(j), and according to the relation between the time difference delta t1(j) and the committed aging Tx, the starting point time stamp P (j), namely the starting time stamp P (j), meeting the maximum data processing capacity in the committed aging Tx and the time slice L at the current system time can be determined; since the j-1 th round robin timing generally does not process the output data resulting in future time stamps, Q (j-1) < ═ sys (j), i.e., Δ t1(j) > -0.

When the Δ t1(j) < Tx, it is stated that the streaming statistics of the j-1 th round are faster than the preset aging time, the next time stamp of the j-th round can be processed, so when Δ t1(j) < Tx, p (j) ═ Q (j-1); when Δ t1(j) > Tx, it indicates that the processing speed of the j-1 th round is slower than the preset aging time or just reaches the aging time, the time stamp of the j-th round for processing the output data from sys (j) -Tx is reached, the input data is still input according to the latest maximum data amount, and the time stamp from Q (j-1) to sys (j) -Tx exceeds the aging time, so the output time stamp from Q (j-1) to sys (j) -Tx is ignored, so when Δ t1(j) > -Tx, p (j) > -sys (j) -Tx is reached.

Taking the time difference between the current system time Sys (j) and in (j) as delta t2(j), and determining Q (j) meeting the guaranteed time period Tx and the maximum data processing capacity in the time slice L according to the relation between the time difference delta t2(j) and the guaranteed time period Tx, namely a termination time stamp Q (j); since the processing in the previous stage takes time when the j-th round of statistics is performed, in (j) < ═ sys (j), that is, Δ t2(j) > is 0.

When Δ t2(j) < Tx, the previous stage (for example, stage C) is described as the input data of the streaming statistics stage, and the processing aging is faster than the preset aging, then the streaming statistics of the jth round processes the maximum timestamp (i.e., in (j)) to the previous stage of the jth round, so when Δ t2(j) < Tx, q (j) ═ in (j)); when Δ t2(j) > is Tx, the previous stage is described as input data of the streaming statistics stage, and the processing aging is delayed from the preset aging, so that the streaming processing of the j-th round needs to process a time stamp obtained by subtracting Tx from the future time stamp sys (j) + L of the reserved running time of the round, and the error of the data-time correspondence relationship is as small as possible when the aging is satisfied, so when Δ t2(j) > is Tx, q (j) > (sys) (j) + L-Tx. The calculation formulas of the statistical start point and the statistical end point may be as shown in table 3.

TABLE 3

For example, the fastest aging time min (tx) is guaranteed to be 10min, and the minimum time slice length min (l) is 3min, the statistical start point and the statistical end point obtained by calculation may be as shown in table 4.

TABLE 4

And after the calculated promised aging is manually confirmed, setting the promised aging as a specified value of the promised aging, including the fixed parameter, and carrying out subsequent algorithm and program operation according to the parameter value.

In operation S305, a statistical point is determined, the statistical point including at least a statistical start point and a statistical end point.

Specifically, determining the statistic point may include the following operations.

Firstly, dividing a variable scale based on preset precision to obtain at least one statistical fragment. It should be noted that, in the statistical segment obtained by dividing, one scale may correspond to the variable scale, and may also correspond to multiple scales of the variable scale. Further, the length of each statistical slice may be the same or different. Referring to table 1, one or more of the time points may be selected as statistical points.

Then, a statistical slice located between the statistical start point and the statistical end point is determined from the at least one statistical slice.

Specifically, after manual confirmation, the specified value or the optimal value of the time slice or the statistical fragment is set and is included in the fixed parameter, and the subsequent algorithm and program operation are performed according to the parameter value.

In operation S307, a statistical result of the flow data for the specified variable is determined based on the statistical result of the statistical point.

In one embodiment, determining the statistics of the flow data for the specified variables based on the statistics of the statistics points may include the following operations. And establishing a mapping relation between each statistical point and the operator so as to determine the statistical result of each statistical point based on the operator. Wherein an operator may be a statistical operator on a data set, including but not limited to: SUM integrated value, AVREGE mean, MAX maximum, variance \ mean variance, etc.

Fig. 7 schematically illustrates a logic diagram of a streaming statistics method according to an embodiment of the present disclosure.

As shown in fig. 7, after initializing the timescale, the commitment age and time slice length are determined based on the calculated age and time slice optimal values. Thus, the current statistical start point and the current statistical end point can be calculated based on the commitment age, the time slice length and the like. And then carrying out stream type statistics on stream data output by other processing stages through a primary mapping algorithm. The results of the streaming statistics may be sent to other processing stages.

In another embodiment, the method further comprises: and if the statistical range of the stream data spans at least two variable scales, respectively determining sub-statistical results of the stream data based on the at least two variable scales.

Then, a statistical result of the stream data is determined based on the sub-statistical result of the stream data.

Fig. 8 schematically illustrates a principle schematic of a streaming statistics method according to an embodiment of the present disclosure. Fig. 9 schematically illustrates a schematic diagram of a streaming statistics method according to another embodiment of the present disclosure.

Fig. 8 and 9 show schematic diagrams of a streaming system calculation method for slice mapping data based on a time scale in an embodiment of the present disclosure, where stream data generated by a distributed server cluster is collected into one table after being processed in several stages and recorded as a stream data table. The stream data of the stream data table is cut and mapped with equal precision (such as precision a) in table 1 along with a time scale for determining a statistical start point P and a statistical end point Q, for example, the (statistical start point P) + (i times precision a) is a statistical point Pa (i), i is a positive integer greater than 0, specifically, the stream data of the stream data table is sequentially cut and calculated from the stream data corresponding to a certain scale value (such as a zero scale value in this embodiment) of the time scale to the statistical start point P, the statistical point Pa (1), the statistical point Pa (2), the statistical point Pa (3),. the statistical point Pa (n), and the statistical end point Q at one time, so as to obtain a stream statistical result. In another embodiment, the length from a certain scale value (which may also be a statistical point at a fixed distance from the statistical point pa (i)) to the statistical point pa (k) of the time scale, k being 0,1,2, 3. In another embodiment, s (k) is a fixed distance, so the statistical fragment s (k) may be an arithmetic series interval or a constant value interval, and may even be generalized to an geometric series interval. Therefore, the stream data is mapped once in the time scale segment from P to Q by a series of statistical slices s (k) to obtain the stream statistical result Dat.

It should be noted that fig. 8 is a schematic diagram of the statistical principle of the xth rotation data. Data002 and data005 are data which should be arrived but not arrived at the current statistical time, and the statistical result for data002 and the statistical result for data005 can be missing from the statistical result Dat01 and the statistical result Dat03 respectively. With the continuous arrival of the stream data, after the data002 and the data005 arrive successively, the subsequent statistical results include the statistical results aiming at the data002 and the data 005. Fig. 9 is a schematic diagram of the statistical principle for the z-th turn data. The statistical principle shown in fig. 8 and 9 is an embodiment in which a zero scale of a time scale is used as a statistical starting point. In other embodiments, the statistical starting point may be a non-zero scaled point on a time scale.

The algorithm is explained below.

Specific pseudo-code references are as follows:

INSERT INTO stream statistics VALUES (Dat, Pa (i))

SELECT operator (NVL (data stream Table), Pa (i) FROM stream data Table

RIGHT JOIN (SELECT TIME SCALE FROM TABLE 1WHERE TIME SCALE BETWEEN P AND Q) timescale segment

ON (stream data table time stamp [ ═ time scale segment. time scale value pa (i))

Timestamp > some fixed scale value (or Pa (i) -S (k))

)

GROUP BY time scale segment time scale value

The operator may be a statistical operator on the data set, such as SUM total, AVREGE mean, MAX maximum, variance \ mean variance, and the like. If the mapping operation is carried out on the upper cycle (such as hour, day, month, season and year), if the starting point P and the end point Q of the cross-cycle timestamp statistics are not involved, the mapping operation is directly carried out according to the algorithm of the pseudo codes; if the starting point P and the ending point Q of the cross-cycle timestamp statistics are related, the two steps are carried out: and calculating the stream type statistical result from the statistical starting point P to the zero scale value of the next period, and calculating the stream type statistical result from the zero scale value of the next period to the statistical ending point Q. For example,

INSERT INTO stream statistics VALUES (Dat, Pa (i))

SELECT operator (NVL (data stream Table), Pa (i) FROM stream data Table

RIGHT JOIN (SELECT TIME SCALE FROM TABLE 1WHERE TIME SCALE BETWEEN P AND THE END OF THE PRIOR CYCLE) timescale segment

Timestamp > some fixed scale value (or Pa (i) -S (k))

)

GROUP BY time scale segment time scale value

Then proceed with

INSERT INTO stream statistics VALUES (Dat, Pa (i))

SELECT operator (NVL (data stream Table), Pa (i) FROM stream data Table

RIGHT JOIN (SELECT TIME GRADIENT FROM TABLE 1WHERE TIME GRADIENT BETWEEN one period starting point AND Q) TIME Scale segment

Timestamp > some fixed scale value (or Pa (i) -S (k))

)

GROUP BY time scale segment time scale value

Whether it is streaming statistics that does not cross cycles or that crosses cycles, the above pseudo code merely illustrates the logic of the critical portion, but is not limited to the logic in embodiments where more tables may be associated.

The streaming statistical method provided by the embodiment of the disclosure provides a streaming data processing method which is as efficient as possible and accurate as possible at present by setting the time scale, and can flexibly adjust commitment timeliness, adjust data precision, adjust time slicing, and adjust statistical slicing/statistical intervals.

The streaming statistical method provided by the embodiment of the disclosure gives the cross-correlation system of the time efficiency and the time slice, and can continuously adjust and calculate the optimal values of the time efficiency and the time slice through the production record, and the determination of the time efficiency is based and definite.

According to the streaming statistical method provided by the embodiment of the disclosure, the precision and the scale of the time scale can be flexibly set along with different production scenes, so that the requirements of various application scenes can be met, and the scenes with high precision requirements can be met.

The streaming statistical method provided by the embodiment of the disclosure, when an application scene continuously generates data, usually performs loop iteration calculation on a statistical value at a certain stage, but the loop iteration efficiency is low and the occupied resources are large, for example, if the application side accumulated display processing stage E adopts a traditional loop iteration mode, the processing cannot be completed within less than 1 minute, and the streaming statistical method provided by the embodiment of the disclosure can shorten the processing time by multiples based on an algorithm of a variable scale, and has the characteristics of high efficiency and less occupied resources.

The streaming statistical method provided by the embodiment of the disclosure can automatically solve the optimal statistical starting point and the statistical ending point which need to be processed in real-time calculation under the condition of appointed commitment and timeliness based on the streaming statistical algorithm of the time scale, so that the statistical result which is as accurate and timely as possible can be obtained under the condition of the appointed condition, and the scene requirement of streaming data application is highly met.

The solution provided by the embodiment of the present disclosure is not only a time scale, but also can be generalized to other metrics according to an actual scene, and a variable scale is set, and can be generalized to complete calculation of statistical values corresponding to a series of variable points by setting a variable scale for one-time mapping, so that the generalization performance is high.

Another aspect of the present disclosure provides a streaming statistics apparatus.

Fig. 10 schematically shows a structural diagram of a streaming statistics apparatus according to an embodiment of the present disclosure.

As shown in fig. 10, the streaming statistic apparatus 1000 includes: a scale setting module 1010, a start point and stop point determination module 1020, a statistical point determination module 1030, and a statistical module 1040.

The scale setting module 1010 is configured to set a variable scale. The total length and precision of the variable scale, the scale initialization and other processes can refer to the content of the relevant part of the method, and are not described herein again.

The start point and stop point determining module 1020 is configured to determine a statistical start point and a statistical end point on the variable scale for the specified variable, where the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data.

The statistic point determining module 1030 is configured to determine statistic points, where the statistic points at least include a statistic starting point and a statistic ending point.

The statistics module 1040 is configured to determine statistics of the flow data for the specified variables based on the statistics of the statistics points.

The stream-oriented statistical device 1000 relates to a time slice design algorithm, a shortest aging calculation method, an adjustable time slice, an adjustable statistical interval, a stream-oriented statistical algorithm capable of committing aging, and the like, and specifically refers to relevant part of contents in the method embodiment.

It should be noted that the implementation, solved technical problems, implemented functions, and achieved technical effects of the modules and the like in the embodiment of the apparatus part are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of the corresponding steps in the embodiment of the method part, and are not described in detail here.

Any of the modules according to embodiments of the present disclosure, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules according to the embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging the circuit, or in any one of three implementations, or in any suitable combination of any of the software, hardware, and firmware. Alternatively, one or more of the modules according to embodiments of the disclosure may be implemented at least partly as computer program modules which, when executed, may perform corresponding functions.

For example, any number of the scale setting module 1010, the start and stop point determining module 1020, the statistical point determining module 1030, and the statistical module 1040 may be combined in one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the scale setting module 1010, the start and stop point determining module 1020, the statistical point determining module 1030, and the statistical module 1040 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the scale setting module 1010, the start and stop point determining module 1020, the statistical point determining module 1030, and the statistical module 1040 may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.

Another aspect of the present disclosure provides an electronic device.

FIG. 11 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 11, an electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to the embodiments of the present disclosure.

In the RAM1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, the ROM 1102, and the RAM1103 are communicatively connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1102 and/or the RAM 1103. It is to be noted that the programs may also be stored in one or more memories other than the ROM 1102 and the RAM 1103. The processor 1101 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 1100 may also include input/output (I/O) interface 1105, input/output (I/O) interface 1105 also connected to bus 1104, according to an embodiment of the disclosure. Electronic device 1100 may also include one or more of the following components connected to I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The computer program, when executed by the processor 1101, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 1102 and/or the RAM1103 and/or one or more memories other than the ROM 1102 and the RAM1103 described above.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A streaming statistical method, the streaming data being processed through a data processing stage to complete a data processing process, the method comprising:

setting a variable scale;

determining a statistical starting point and a statistical ending point for a specified variable on the variable scale;

determining statistical points, wherein the statistical points at least comprise the statistical starting point and the statistical ending point; and

determining a statistical result of the flow data for the specified variable based on the statistical result of the statistical point;

wherein the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data.

2. The method of claim 1, wherein the specified variable is a time variable;

the determining a statistical start point and a statistical end point for a given variable on the variable scale comprises:

and determining a statistical starting point and a statistical ending point for the current time point based on the processing aging and/or the duration information of the processing units, wherein each processing unit comprises at least one data processing stage.

3. The method of claim 2, wherein:

the time length information of the processing unit is determined in the following way:

determining a specified number of data processing stages that the streaming data needs to go through,

the processing time duration of each data processing stage is determined,

merging at least part of the data processing stages in the specified number of data processing stages based on a preset rule and the processing duration of each data processing stage to obtain at least one processing unit and respective duration information;

the working aging is determined in the following manner:

determining time duration information for the time slices based on the at least one processing unit,

and determining the machining time effectiveness based on the time length information of the time slice and the number of the at least one machining unit.

4. The method of claim 3, wherein the preset rules include at least one of: for each of the stages of the data processing,

and if the total processing time length after the data processing stages are combined into the adjacent data processing stages is less than the set time length threshold value, combining into the adjacent data processing stages, and repeating the operations until the total processing time length of the combined data processing stages is greater than or equal to the set time length threshold value.

5. The method of claim 3, further comprising: after the age of the working is determined,

determining the fluctuation range of the processing aging;

determining a committed age based on the machining age and the fluctuation range;

the determining a statistical start point and a statistical end point for a current time point based on the processing aging and/or the duration information of the processing unit includes: and determining a statistical starting point and a statistical ending point for the current time point based on the promised time period and/or the duration information of the processing unit.

6. The method according to claim 5, wherein the determining a statistical start point and a statistical end point for a current time point based on the committed age and/or the duration information of the machining unit comprises: for the jth alternate data and the jth-1 alternate data, wherein j is a positive integer greater than or equal to 1,

when the difference value between the system time for the jth alternate data and the statistical termination point for the jth-1 alternate data is greater than or equal to the promised effectiveness, determining a statistical start point for the jth alternate data based on the system time for the jth alternate data and the promised effectiveness;

when the difference value between the system time for the jth alternate data and the statistical termination point for the jth-1 alternate data is smaller than the promised effectiveness, determining the statistical termination point for the jth alternate data based on the system time for the jth alternate data, the time slice duration information and the promised effectiveness;

when the difference value between the system time aiming at the jth alternate data and the statistical termination point aiming at the jth-1 alternate data is greater than or equal to zero and less than the promised time limit, determining a statistical starting point aiming at the jth alternate data based on the statistical termination point aiming at the jth-1 alternate data; and

7. The method of claim 5, further comprising: after said determining a committed age based on said tooling age and said fluctuation range,

updating the committed age if a deviation between the committed age and an actual age exceeds a set deviation threshold.

8. The method of claim 1, wherein the streaming data includes at least one round of data, each round of data including an amount of data processed by a data processing stage having a longest processing duration.

9. The method of claim 1, further comprising: prior to said setting of the variable scale,

and initializing the variable scale based on the total length and preset precision of the variable scale.

10. The method of claim 1, wherein the determining a statistic point comprises:

dividing the variable scale based on preset precision to obtain at least one statistical fragment;

and determining the statistical fragments located between the statistical starting point and the statistical ending point from the at least one statistical fragment.

11. The method of claim 1, further comprising:

if the statistical range of the flow data spans at least two variable scales, respectively determining sub-statistical results of the flow data based on the at least two variable scales; and

determining a statistical result of the streaming data based on the sub-statistical result of the streaming data.

12. The method of claim 1, wherein the parameters of the variable scale comprise: any one of time parameter, length parameter, volume parameter, weight parameter, flow parameter and electric quantity parameter.

13. The method of claim 1, wherein the determining statistics of the flow data for the specified variables based on the statistics of the statistics points comprises:

and establishing a mapping relation between each statistical point and the operator so as to determine the statistical result of each statistical point based on the operator.

14. A statistical apparatus of streaming data, comprising:

the scale setting module is used for setting a variable scale;

a start point and stop point determination module for determining a statistical start point and a statistical end point for a specified variable on the variable scale, wherein the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data;

a statistic point determining module, configured to determine a statistic point, where the statistic point at least includes the statistic start point and the statistic end point; and

and the statistical module is used for determining the statistical result of the flow data for the specified variable based on the statistical result of the statistical point.

15. An electronic device, comprising:

one or more processors;

a storage device for storing executable instructions which, when executed by the processor, implement the method of any one of claims 1 to 13.

16. A computer readable storage medium having stored thereon instructions which, when executed, implement a method according to any one of claims 1 to 13.