CN116339872A - Data processing method, stream processing system, medium and device based on sliding window - Google Patents

Data processing method, stream processing system, medium and device based on sliding window Download PDF

Info

Publication number
CN116339872A
CN116339872A CN202111583376.6A CN202111583376A CN116339872A CN 116339872 A CN116339872 A CN 116339872A CN 202111583376 A CN202111583376 A CN 202111583376A CN 116339872 A CN116339872 A CN 116339872A
Authority
CN
China
Prior art keywords
sliding window
stream data
data
target sliding
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111583376.6A
Other languages
Chinese (zh)
Inventor
范潇
贾炎
赵俊杰
胡玉婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111583376.6A priority Critical patent/CN116339872A/en
Publication of CN116339872A publication Critical patent/CN116339872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure provides a data processing method based on a sliding window, a stream processing system, a data processing device based on the sliding window, a computer readable storage medium and electronic equipment, and belongs to the technical field of data processing. The method is applied to a stream processing system and comprises the following steps: determining a target sliding window of the received stream data and a slicing position of the stream data in the target sliding window according to a preset dividing rule; invoking an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result; updating a tree structure constructed by using a cyclic array by utilizing an aggregation result, wherein leaf nodes and non-leaf nodes of the tree structure are respectively used for storing stream data in a target sliding window and local aggregation results of the stream data in the target sliding window; and determining the calculation result of the stream data in the target sliding window through the nodes of the tree structure. The method and the device can improve the processing instantaneity and processing efficiency of the stream data and reduce the resource overhead of stream data processing.

Description

Data processing method, stream processing system, medium and device based on sliding window
Technical Field
The disclosure relates to the technical field of data processing, in particular to a data processing method based on a sliding window, a stream processing system, a data processing device based on the sliding window, a computer readable storage medium and electronic equipment.
Background
With the development of the internet and the widespread use of mobile terminal applications, huge-scale data are generated, and in order to provide better quality services to users and optimize service structures, analysis and processing of the data are often required.
In some data-intensive applications, streaming data is often generated rapidly and is fast, and data needs to be processed in time, for example, in applications such as network monitoring, telecommunication data management, sensor networks, and quantization transactions, the streaming data generated in real time needs to be processed. The existing method can process borderless stream data in a batch processing mode, but the method cannot meet the real-time requirement of stream data. Therefore, it is necessary to provide a method capable of improving the real-time performance of stream data processing.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The disclosure provides a data processing method based on a sliding window, a stream processing system, a data processing device based on the sliding window, a computer readable storage medium and electronic equipment, so that the problem of weak real-time performance of stream data processing in the prior art is solved at least to a certain extent.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a sliding window based data processing method applied to a stream processing system, the method comprising: determining a target sliding window of received stream data and a slicing position of the stream data in the target sliding window according to a preset dividing rule; invoking an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result; updating a tree structure constructed by using a cyclic array by utilizing the aggregation result, wherein leaf nodes and non-leaf nodes of the tree structure are respectively used for storing stream data in the target sliding window and local aggregation results of the stream data in the target sliding window; and determining the calculation result of the stream data in the target sliding window through the nodes of the tree structure.
In an exemplary embodiment of the present disclosure, the method further comprises: determining whether the receiving time of the stream data is greater than or equal to a dynamic arrival time, wherein the dynamic arrival time is determined according to a time difference between the maximum event time of the stream data entering a window and a preset delay time; and triggering and determining a target sliding window of the stream data when the receiving time of the stream data is greater than or equal to the dynamic arrival time.
In an exemplary embodiment of the present disclosure, the determining, according to a preset partitioning rule, a target sliding window of received stream data and a slicing position of the stream data within the target sliding window includes: determining a target sliding window of the stream data and a slicing position of the stream data in the target sliding window according to the data parameters of the stream data and the window parameters of each window; the data parameters of the stream data comprise the receiving time and the receiving sequence of the stream data, and the window parameters of each window comprise the window length and the window type of each window.
In an exemplary embodiment of the present disclosure, the determining a target sliding window of the stream data and a slice position of the stream data within the target sliding window according to the data parameter of the stream data and the window parameter of each window includes: when the receiving time of the stream data is longer than the window length of each sliding window, creating a new sliding window, and determining the new sliding window as a target sliding window of the stream data; and when the receiving time of the stream data is not more than the window length of any sliding window, determining the any sliding window as a target sliding window of the stream data.
In an exemplary embodiment of the present disclosure, each slice location within the target sliding window corresponds to a data slice, and the data slice is composed of one or more stream data.
In an exemplary embodiment of the present disclosure, when an aggregation algorithm is invoked to aggregate the stream data at each tile location within the target sliding window, the method further includes: when the moving step length of the target sliding window is smaller than the window length, determining repeated fragments in the target sliding window; and calculating the aggregation results of the other data fragments except the repeated fragments in the target sliding window, and determining the aggregation results of the stream data in the target sliding window by utilizing the aggregation results of the repeated fragments and the aggregation results of the other data fragments.
In an exemplary embodiment of the present disclosure, the method further comprises: when stream data at each slicing position in the target sliding window is subjected to aggregation processing, determining whether the stream data at each slicing position in the target sliding window completely arrives; when stream data of any slice position in the target sliding window does not all arrive, receiving new stream data, and adding the new stream data to any slice position according to the preset dividing rule; and when all the stream data of each slicing position in the target sliding window arrives, adding the stream data of each slicing position to a leaf node of the tree structure.
In an exemplary embodiment of the present disclosure, after the aggregating the stream data at each tile position within the target sliding window, the method further includes: updating the data fragments of each fragment position in the target sliding window according to the moving step length of the target sliding window; and calculating an aggregation result of the data fragments of each fragment position in the target sliding window after updating the data fragments.
In an exemplary embodiment of the present disclosure, the updating the data slicing of each slicing position in the target sliding window according to the step size of the target sliding window includes: deleting the expired fragments in the target sliding window, and deleting node data of the expired fragments from the tree structure, wherein the expired fragments are data fragments of fragment positions which are subjected to aggregation treatment in the target sliding window; and receiving new stream data, and adding the new stream data to the slicing position corresponding to the expired slicing in the target sliding window according to the preset dividing rule.
In one exemplary embodiment of the present disclosure, the slice range corresponding to the stream data in the target sliding window is located by a head-to-tail pointer.
According to a third aspect of the present disclosure, there is provided a stream processing system, the system comprising: a determining node, configured to determine a target sliding window of received stream data and a slicing position of the stream data in the target sliding window according to a preset partitioning rule; the computing node is used for calling an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result, and updating a tree structure constructed by using a cyclic array by utilizing the aggregation result; a storage node for storing the tree structure; the leaf nodes and the non-leaf nodes of the tree structure are respectively used for storing stream data in the target sliding window and local aggregation results of the stream data in the target sliding window.
In an exemplary embodiment of the present disclosure, the system further includes a trigger node configured to determine whether a reception time of the stream data is greater than or equal to a dynamic arrival time, and trigger determining a target sliding window of the stream data when the reception time of the stream data is greater than or equal to the dynamic arrival time; wherein the dynamic arrival time is determined according to a time difference between a maximum event time of stream data entering the window and a preset delay time.
According to a third aspect of the present disclosure, there is provided a sliding window based data processing apparatus, the apparatus being applied to a stream processing system, comprising: the first determining module is used for determining a target sliding window of the received stream data and a slicing position of the stream data in the target sliding window according to a preset dividing rule; the aggregation module is used for calling an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result; the updating module is used for updating a tree structure constructed by using a cyclic array by utilizing the aggregation result, and leaf nodes and non-leaf nodes of the tree structure are respectively used for storing stream data in the target sliding window and local aggregation results of the stream data in the target sliding window; and the second determining module is used for determining the calculation result of the stream data in the target sliding window through the nodes of the tree structure.
In an exemplary embodiment of the present disclosure, the first determining module is configured to determine whether a receiving time of the stream data is greater than or equal to a dynamic arrival time, where the dynamic arrival time is determined according to a time difference between a maximum event time of the stream data entering a window and a preset delay time, and trigger to determine a target sliding window of the stream data when the receiving time of the stream data is greater than or equal to the dynamic arrival time.
In an exemplary embodiment of the present disclosure, the first determining module is configured to determine a target sliding window of the stream data and a slice position of the stream data within the target sliding window according to a data parameter of the stream data and a window parameter of each window, where the data parameter of the stream data includes a receiving time and a receiving order of the stream data, and the window parameter of each window includes a window length and a window type of each window.
In an exemplary embodiment of the present disclosure, the first determining module is further configured to create a new sliding window when the receiving time of the stream data is greater than a window length of each sliding window, determine the new sliding window as a target sliding window of the stream data, and determine any sliding window as a target sliding window of the stream data when the receiving time of the stream data is not greater than the window length of any sliding window.
In an exemplary embodiment of the present disclosure, each slice location within the target sliding window corresponds to a data slice, and the data slice is composed of one or more stream data.
In an exemplary embodiment of the present disclosure, when an aggregation algorithm is invoked to aggregate stream data at each slice position in the target sliding window, the aggregation module is configured to determine, when a moving step size of the target sliding window is smaller than a window length, a repeated slice in the target sliding window, calculate an aggregate result of other data slices in the target sliding window except for the repeated slice, and determine an aggregate result of the stream data in the target sliding window by using the aggregate result of the repeated slice and the aggregate result of the other data slices.
In an exemplary embodiment of the present disclosure, the aggregation module is further configured to determine, when performing aggregation processing on stream data at each slice location in the target sliding window, whether all stream data at each slice location in the target sliding window arrives, receive new stream data when all stream data at any slice location in the target sliding window does not arrive, and add the new stream data to any slice location according to the preset partitioning rule, and add the stream data at each slice location to a leaf node of the tree structure when all stream data at each slice location in the target sliding window arrives.
In an exemplary embodiment of the present disclosure, after the stream data at each slicing position in the target sliding window is aggregated, the updating module is configured to update, according to a moving step length of the target sliding window, data slices at each slicing position in the target sliding window, and calculate an aggregation result of the data slices at each slicing position in the target sliding window after updating the data slices.
In an exemplary embodiment of the present disclosure, the update module is further configured to delete an expired slice in the target sliding window, and delete node data of the expired slice from the tree structure, where the expired slice is a data slice of a slice position in the target sliding window where aggregation processing is completed, receive new flow data, and add the new flow data to a slice position corresponding to the expired slice in the target sliding window according to the preset partitioning rule.
In one exemplary embodiment of the present disclosure, the slice range corresponding to the stream data in the target sliding window is located by a head-to-tail pointer.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the sliding window based data processing methods described above.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the sliding window based data processing methods described above via execution of the executable instructions.
The present disclosure has the following beneficial effects:
in summary, according to the sliding window based data processing method, the stream processing system, the sliding window based data processing apparatus, the computer readable storage medium and the electronic device in the present exemplary embodiment, the target sliding window of the received stream data and the slicing position of the stream data in the target sliding window may be determined according to a preset partitioning rule, an aggregation algorithm is invoked to perform aggregation processing on the stream data at each slicing position in the target sliding window, so as to obtain an aggregation result, then a tree structure constructed using a cyclic array is updated by using the aggregation result, and a calculation result of the stream data in the target sliding window is determined by nodes of the tree structure. On one hand, the stream data at each fragment position in the target sliding window is aggregated by calling an aggregation algorithm, so that the sliding window can be used for realizing the aggregation of the stream data, and partial aggregation results of the stream data can be shared among the sliding windows, thereby reducing the influence caused by redundant calculation and unnecessary data copying operation; on the other hand, by updating the tree structure constructed using the cyclic array with the result of aggregation, the complexity of stream data processing can be reduced and the data processing efficiency can be improved by utilizing the characteristics of the tree structure.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely some embodiments of the present disclosure and that other drawings may be derived from these drawings without undue effort.
FIG. 1 is a flow chart showing a method of processing streaming data in the related art;
fig. 2 shows a flowchart of a data processing method based on a sliding window in the present exemplary embodiment;
fig. 3 shows a sub-flowchart of a data processing method in the present exemplary embodiment;
fig. 4 shows a schematic view of a target sliding window in the present exemplary embodiment;
fig. 5 shows a flowchart of calculating an aggregation result in the present exemplary embodiment;
FIG. 6 shows a flowchart of another sliding window based data processing method in the present exemplary embodiment;
Fig. 7 is a schematic diagram showing the structure of a stream processing system in the present exemplary embodiment;
fig. 8 is a schematic diagram showing the structure of another stream processing system in the present exemplary embodiment;
FIG. 9 shows a schematic diagram of classes contained by a window operator in the present exemplary embodiment;
fig. 10 is a block diagram showing a structure of a data processing apparatus based on a sliding window in the present exemplary embodiment;
fig. 11 shows a computer-readable storage medium for implementing the above-described method in the present exemplary embodiment;
fig. 12 shows an electronic device for implementing the above method in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In one scheme of the related art, stream data may be converted into bounded data by a sliding window division method through a stream computation framework, and then the data within the window is processed. Specifically, as shown in fig. 1, a sliding window may be used to copy data into a window to which the data belongs, for example, for a sliding window with a window size of M and a moving step length of N, where M and N are numbers greater than 0, the data needs to be copied M/N times and the repeated data needs to be calculated M/N times, when the window size is greater than the moving step length, since each window cannot predict the size of the data, a small memory needs to be frequently applied to store the incoming stream data, resulting in too slow system response, and meanwhile, since the processing of the stream data by each window is independently performed, there is a large number of overlapping portions between different windows, the calculation result of the overlapping portions cannot be shared by other windows, resulting in excessive redundant calculation, and a large amount of calculation resources are wasted.
In view of one or more of the foregoing problems, exemplary embodiments of the present disclosure first provide a sliding window based data processing method that may be applied to a stream processing system so that it may perform aggregation processing on acquired stream data using a sliding window and determine a calculation result of the stream data from a result of the aggregation processing. For example, in an application program with an information recommendation function, the number of users viewing recommendation information in 10 minutes may be counted by the above method, specifically, a sliding window, such as a time sliding window, may be defined, user click events within the past 10 minutes may be collected, and data in each time sliding window may be counted.
Fig. 2 shows a flow of the present exemplary embodiment, which may include the following steps S210 to S240:
s210, determining a target sliding window of the received stream data and a slicing position of the stream data in the target sliding window according to a preset dividing rule.
The target sliding window is used for processing streaming data, one or more target sliding windows of one streaming data can be used, and different target sliding windows can be used for processing the streaming data differently; the slice position refers to a position of a data slice formed by stream data in a target sliding window, and the target sliding window may generally include a plurality of slice positions, where each slice position may correspond to one data slice, and the data slice may be formed by one or more stream data. For example, the stream data may be data generated every second, and one data slice may include one stream data, i.e., stream data generated within 1 second, or may include a plurality of stream data, e.g., may include stream data generated within 10 seconds. In the present exemplary embodiment, the slicing positions may be arranged according to the reception time of the stream data, and the data slices of each slicing position may be the same or different in size. The preset dividing rule may be configured by an operator in advance according to the data content and the receiving time of the stream data, and the window type and the window length of the window, which is not limited in particular in this exemplary embodiment.
One feature of the stream data is that it arrives continuously, and in order to facilitate processing of the stream data, a target sliding window of each received stream data and a slicing position of the stream data within the target sliding window may be determined according to a corresponding partitioning rule. For example, the target sliding window of the stream data may be determined according to the reception time of the stream data, so that the stream data in the target sliding window has continuous reception time, and the stream data of each slice position in the target sliding window is arranged according to the order of the reception time. In the method, the data of each sliding window can be composed of a plurality of fragments, so that the fragments are used for storing stream data, and the memory occupation can be minimized.
Because the stream data entering the stream processing system often has problems of out-of-order data records and delayed arrival due to delay, backpressure and other factors, in order to restore the sequence of the stream data, in an alternative embodiment, as shown in fig. 3, the following method may be performed:
step S310, determining whether the reception time of the stream data is greater than or equal to the dynamic arrival time.
The reception time of the stream data may be the time when the stream data arrives at the stream processing system; the dynamic arrival time is essentially a dynamically changing timestamp, and is determined according to a time difference between a maximum event time of the entry window and a preset delay time, specifically, the dynamic arrival time=the maximum event time of the entry window—a designated delay time, where the maximum event time is a maximum receiving time of stream data in the entry window, the designated delay time is a time length configured by a user to support the maximum delay arrival, and when the receiving time of the stream data exceeds the designated delay time, the receiving time is regarded as delay data or abnormal data.
Step S320, when the receiving time of the stream data is greater than or equal to the dynamic arrival time, triggering to determine the target sliding window of the stream data.
When the reception time of the stream data is greater than or equal to the dynamic arrival time, it is indicated that the stream data belonging to the sliding window has arrived entirely, and at this time, a target sliding window for determining the stream data may be triggered. By the method, the accuracy of the calculation result in the out-of-order data can be ensured, namely, the processing accuracy of the stream data is improved.
Further, in an alternative embodiment, the target sliding window of the stream data and the slicing position of the stream data within the target sliding window may be determined according to the data parameter of the stream data and the window parameter of each window. The data parameters of the stream data may include a receiving time and a receiving order of the stream data, and the window parameters of each window may include a window length and a window type of each window.
For example, a sliding window adapted to the stream data may be searched according to a window type and a window length of each window, and then windows in which the data does not reach all are searched in the searched sliding windows according to a receiving time of the stream data, so as to determine a target sliding window of the stream data and a slicing position in the target sliding window; for another example, after determining the sliding window adapted to the stream data, the sliding window to which the previous stream data belongs may be determined as the target sliding window of the stream data according to the receiving order of the stream data, and the stream data may be arranged according to the receiving order so as to be located at the corresponding slicing position.
Specifically, in an alternative embodiment, the following method may be performed to determine a target sliding window for streaming data:
when the receiving time of the stream data is longer than the window length of each sliding window, creating a new sliding window, and determining the new sliding window as a target sliding window of the stream data;
and when the receiving time of the stream data is not more than the window length of any sliding window, determining any sliding window as a target sliding window of the stream data.
When the receiving time of the stream data is longer than the window length of each sliding window, the data arriving in each sliding window at the current moment is indicated to reach the maximum limit of the window, and the created new sliding window can be determined to be the target sliding window of the stream data; otherwise, when the receiving time of the stream data is not greater than the window length of any sliding window, indicating that the slicing position of the data vacancy exists in any sliding window, determining any sliding window as the target sliding window of the stream data. By the method, the stream data can be divided into corresponding sliding windows according to the receiving time of the stream data, so that the stream data is ensured to carry out correct window calculation.
S220, calling an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result.
The aggregation algorithm can be realized through an aggregation function pre-written by an operator, and stream data in a window can be counted and aggregated by using the aggregation algorithm; the aggregation result is a result obtained by performing aggregation processing on stream data in the target sliding window, and may include a result obtained by directly aggregating the stream data, or may include a result obtained by performing one or more times of aggregation on an aggregation result after the stream data is aggregated.
Specifically, when aggregation processing is performed, an aggregation algorithm may be invoked to perform aggregation processing on stream data at each slice position in the target sliding window. For example, when calculating the average value of data in a period of time, assuming that 4 slicing positions exist in a window, an aggregation algorithm may be invoked to count stream data of two adjacent slicing positions in a target sliding window, calculate an average value of sums of stream data of every two adjacent slicing positions, and further calculate the average value again for the calculated average value, so as to obtain the average value of stream data in the target sliding window. By the method, the aggregation result of stream data in the window can be obtained by calculation, and because the data in the window is formed by data fragments of each fragment position, when the window length is larger than the moving step length of the window, the aggregation result of repeated calculation can be shared among different windows, the influence caused by redundant calculation can be reduced, unnecessary data copying operation is avoided, and the calculation efficiency is improved.
In an alternative embodiment, the slice range corresponding to the stream data in the target sliding window may be located by the head-to-tail pointer. For example, in the target sliding window shown in fig. 4, the window size of the target sliding window is 4s, the moving step length is 1 second, the moving direction is from left to right, the window contains 4 data slices, the numerical value before each data slice is an array index, then in the 0 th to 4 th seconds, the target sliding window is window-1, the head pointer and the tail pointer are respectively an array index 3 and 6, the 1 st to 5 th seconds, the target sliding window is window-2, at this time, the array index of the data slice in the window is respectively 4, 5, 6 and 3, so the head pointer and the tail pointer are respectively an array index 4 and 3, and so on, the head pointer and the tail pointer of the data slice in the target sliding window in the 2 nd to 6 th seconds are respectively 5 and 4, and the head pointer and the tail pointer of the data slice in the target sliding window in the 3 rd to 7 th seconds are respectively 6 and 5. By the method, the data slicing range in the target sliding window can be effectively positioned, and accuracy of window calculation is ensured.
In the aggregation process, if the moving step length of the sliding window is smaller than the window length, the aggregation result of the partial stream data in the window is repeatedly calculated, so, in order to improve the calculation efficiency, in an alternative embodiment, in step S230, the following method may be further performed:
When the moving step length of the target sliding window is smaller than the window length, determining repeated fragments in the target sliding window;
and calculating the aggregation results of the other data fragments except the repeated fragments in the target sliding window, and determining the aggregation result of the stream data in the target sliding window by utilizing the aggregation results of the repeated fragments and the aggregation results of the other data fragments.
The repeated slices may be the same data slices in the window before and after the target sliding window moves once, for example, in the target sliding window shown in fig. 4, the repeated slices in the target sliding window corresponding to the 0 th to 4 th seconds and the 1 st to 5 th seconds are data slices of the slicing positions with the subscripts of 4, 5 and 6 in the array.
For repeated fragments in the target sliding window, the aggregation result obtained by calculation in the history moving process can be obtained during each calculation, and for the rest other data fragments, the aggregation result in the target sliding window at the current moment can be calculated to obtain the aggregation result of all the data fragments in the target sliding window. By the method, the aggregation results in the target sliding window can be shared, so that the times of calculating the aggregation results of the data fragments can be reduced, the data processing efficiency can be improved, the occupation and consumption of computer resources and the like caused by repeated calculation can be greatly reduced, and the resource utilization rate can be improved.
In an alternative embodiment, when stream data of each slice position in the target sliding window is aggregated, checking whether data slices exist in each slice position in the target sliding window, and when any slice position does not exist, indicating that the stream data does not all arrive, continuing to receive new stream data, and adding the new stream data to any slice position until the data slices in the slice position are completely written; when the data fragments exist in each fragment position, the aggregation processing of the stream data of each fragment position in the target sliding window can be triggered. By the method, all data fragments in the target sliding window can be fully written during each aggregation treatment, the calculation times are reduced, and the calculation efficiency is improved.
Further, in an alternative embodiment, the following method may also be performed:
when stream data at each slicing position in the target sliding window is subjected to aggregation treatment, determining whether the stream data at each slicing position in the target sliding window completely arrives;
when stream data of any slice position in the target sliding window does not all arrive, receiving new stream data, and adding the new stream data to the any slice position according to a preset dividing rule;
When all the stream data of each slicing position in the target sliding window arrives, the stream data of each slicing position is added to the leaf nodes of the tree structure.
By the method, all stream data in the target sliding window can be ensured to arrive, errors of the aggregation result due to the loss of part of stream data are avoided, and accuracy of the calculation result is ensured.
After the stream data at each slice position in the target sliding window is aggregated, the target sliding window can move according to the corresponding moving step length, update the stream data in the window, and aggregate the new stream data. Thus, in an alternative embodiment, with reference to FIG. 5, the following method may be performed:
step S510, updating the data fragments of each fragment position in the target sliding window according to the moving step length of the target sliding window.
The moving step length is the length of each sliding of the target sliding window, and may be a time length, such as 1 second, or a data length, such as 1 data slice.
And updating the data fragments of the control target sliding window according to the moving step length of the target sliding window, for example, updating the data fragments 1 second before in the target sliding window into the data fragments formed by the newly received stream data, and finishing updating the data fragments in the target sliding window.
Specifically, in an alternative embodiment, step S510 may also be implemented by the following method:
deleting the expired fragments in the target sliding window, and deleting the node data of the expired fragments from the tree structure;
and receiving new stream data, and adding the new stream data to the corresponding slicing positions of the expired slices in the target sliding window according to a preset dividing rule.
An expired tile may be a data tile that completes the aggregation process within the target sliding window and is discarded in tile locations after one step of movement. After the aggregation processing of the stream data in the target sliding window is completed, the node data of the expired fragments in the expired fragments and the tree structure can be deleted, and then the new stream data which is continuously reached is added to the fragments positions of the expired fragments in the target sliding window according to the preset dividing rule. For example, in the target sliding window shown in fig. 4, the data fragments in the target sliding window-2 after the aggregation processing are data with the index of 3, namely, the expiration fragments, the window-1 is triggered to perform window calculation in the 4 th second, after the window calculation is triggered, the data in 0-1s can be directly discarded, the newly created data fragments in the 4-5s can be placed at the position with the index of 3 in the array, the position pointed by the head pointer and the tail pointer can be recorded in the window-2, the range of the data fragments contained in the window-2 can be determined according to the head pointer and the tail pointer and the aggregation calculation can be performed when the window is 5s, and the method is similar.
And deleting node data of the expired fragments in the target sliding window from the tree structure by determining the expired fragments in the target sliding window, so that the data of corresponding nodes in the tree structure are released, then receiving new stream data, and dividing the new stream data to the fragments positions of the expired fragments according to the preset dividing rule to finish data updating.
Step S520, calculating the aggregation result of the data fragments of each fragment position in the target sliding window after updating the data fragments.
Specifically, the above aggregation algorithm may be invoked to perform aggregation processing on the data fragments of each fragment position in the target sliding window after updating the data fragments, so as to obtain a new aggregation result. Thereby, processing of stream data based on the sliding window is realized.
And S230, updating a tree structure constructed by using the cyclic array by utilizing the aggregation result.
The leaf nodes and the non-leaf nodes of the tree structure can be used for storing the stream data in the target sliding window and the local aggregation result of the stream data in the target sliding window respectively, and the local aggregation result can comprise a result obtained by aggregating the data fragments of part of the fragment positions in the target sliding window, a result obtained by performing one or more aggregation treatments on the aggregation result of the data fragments, and the like. A circular array is a block of contiguous memory data that can be used to store stream data and its aggregate results.
In this exemplary embodiment, the node of the tree structure may represent the stream data and the aggregation result of the stream data, after the aggregation result of each data slice in the target sliding window is obtained, the node of the tree structure may be updated by using the aggregation result, for example, the aggregation result of each data slice may be a child node of the tree structure, and then the value of the parent node of the child node may be updated according to the value of the child node until the update of the root node of the tree structure is completed.
For example, in the target sliding window shown in fig. 4, assuming that 2 stream data are coming per second, the size of the cyclic array is 4, that is, the target sliding window includes four data slices, each data slice includes 2 stream data, the size of the memory space occupied by the cyclic array is 7, where the first 3 elements, that is, elements with array subscripts of 0, 1, 2 are used to store the aggregation result of the partial data slices, child nodes with "1" and "2" of "0", and elements with array subscripts of 3, 4, 5, 6 may be used to store the stream data. In the 0 th to 4 th seconds, 4 data fragments consisting of 8 stream data arrive in turn, the fragments are stored in the fragment positions of the array subscripts 3, 4, 5 and 6, and in the 4 th second, the cyclic array is full of data, and window calculation is triggered at this time to obtain the aggregation results of the data fragments 3 and 4 and the data fragments 5 and 6, and the aggregation results can be used for updating the aggregation results of the nodes of the array subscript 0, for example, in the summation calculation, the element of the array subscript 1 can be directly updated into the summation value of the data fragments of the array subscripts 3 and 4. At the 5 th second, the target sliding window moves forward for 1 second, the window is converted into window-2 from window-1, the data fragment [2 6] arriving at the front data fragment [6 2] is updated for 4-5 seconds, at this time, the data fragment [7 4] with the array index of 4 is located at the head position of window-2, at this time, window aggregation calculation is performed again, an aggregation result with the array index of 4, 5 and the data index of 6 and 3 is obtained, the values of the array index of 0, 1 and 2 are updated by using the aggregation result, and the aggregation result of the stream data of the whole target sliding window in 1-5 seconds is obtained.
In fact, when updating the tree structure, the non-leaf nodes of the tree structure before and after the target sliding window moves can store the aggregation result before the movement, on the basis of the aggregation result, the updating of the flow data after the target sliding window moves in the tree structure can be performed based on the value of the non-leaf nodes before the movement, in this way, when the target sliding window moves to the last time, the aggregation result of all the flow data passing through the target sliding window can be obtained by using the tree structure.
By the method, the circulating array can be used for storing the streaming data of the bottom layer, so that the sharing of the sliding window to the data fragments is realized, and the frequent application of small memories is avoided, and the memory allocation efficiency is reduced; meanwhile, the characteristic of the tree structure enables the calculation of the data to be hierarchical, the calculation complexity is reduced, and the calculation efficiency can be improved by multiplexing the data of part of nodes.
And S240, determining a calculation result of the stream data in the target sliding window through the nodes of the tree structure.
Specifically, the leaf nodes of the tree structure may be utilized to determine the flow data in the target sliding window, and the non-leaf nodes of the tree structure may be utilized to determine the aggregation result of the flow data in the target sliding window, where the root node in the non-leaf nodes may represent the final aggregation result of the flow data, and the child nodes of the root node may represent the local aggregation result of the flow data.
In the stream data processing process, since the nodes in the tree structure can also store the aggregation result of the stream data of the target sliding window in the history moving process, the nodes in the tree structure can store the aggregation result obtained by the previous moving calculation in each calculation, so that the aggregation result of the stream data in the target sliding window can be calculated after the last moving, the node data of the tree structure is updated, and the aggregation result of all the stream data processed by the target sliding window can be obtained by using the node data.
Fig. 6 shows another sliding window based data processing method, which may include the following steps S601 to S611:
step S601, stream data is received.
Step S602, it is determined whether the receiving time of the stream data is greater than or equal to the dynamic arrival time.
When the receiving time of the stream data is greater than or equal to the dynamic arrival time, triggering to determine a target sliding window of the stream data, and executing step S603, otherwise, when the receiving time of the stream data is less than the dynamic arrival time, indicating that the arriving stream data does not reach a corresponding amount, executing step S604, and determining whether the receiving time of the stream data is greater than the window length of the sliding window.
Step S603, determining a target sliding window of stream data.
Specifically, a target sliding window of the stream data may be determined according to a preset division rule. For example, the stream data may be divided into sliding windows, i.e., target sliding windows, in which the stream data does not totally arrive, according to the window type of each window.
In step S604, it is determined whether the reception time of stream data is longer than the window length of the sliding window.
When the reception time of the stream data is longer than the window length of the sliding window, it is indicated that the stream data in the sliding window has reached the maximum limit, step S605 may be executed to create a new sliding window as the target sliding window, whereas when the reception time of the stream data is not longer than the window length of the sliding window, it is indicated that the stream data in the sliding window is not yet full, step S606 is executed to determine the sliding window as the target sliding window, and to add the stream data to the corresponding slice position in the target sliding window.
Step S605 creates a new sliding window as the target sliding window and creates a new tile position.
The new slice location may be a location where the data slice is located in the target sliding window, and may store each data slice through a cyclic array.
Step S606, the sliding window is determined as a target sliding window, and the stream data is added to the corresponding slice position in the target sliding window. For example, stream data may be sequentially added to each slice position in the target sliding window according to the reception time of the stream data.
In step S607, the partial aggregation result in the target sliding window is updated.
After the stream data is added to the corresponding slicing position in the target sliding window, the data is updated, so that a responsive aggregation algorithm can be called to aggregate the stream data in the target sliding window, and an aggregation result of the stream data in the target sliding window is obtained.
Step S608, determining whether there is a data slice at each slice position in the target sliding window.
When there is any slice position where there is no data slice, step S601 may be performed to receive new stream data and add the new stream data to the above any slice position; when there is a data slice in each slice position, step S609 is executed, a new data slice is added in the tree structure, and then step S601 is executed continuously, and new stream data is received until a target sliding window for determining stream data is triggered.
In step S609, a new data slice is added to the tree structure. For example, new data shards may be added at corresponding leaf nodes of the tree structure, and the aggregate result of the new data shards added to non-leaf nodes of the tree structure.
In step S610, an aggregation result of the data slices at the respective slice positions in the target sliding window is calculated.
In step S611, the expired fragment is deleted, and the node data of the expired fragment is deleted from the tree structure.
After the calculation is completed, the discarded expired fragments and the node data thereof can be deleted, so that the storage resources occupied by the expired fragments in the target sliding window are released. Finally, after the aggregation processing of the stream data in the target sliding window is completed, step S601 may be executed to receive new stream data, update the target sliding window with the new stream data, and recalculate the aggregation result of the stream data in the moved target sliding window.
In summary, according to the sliding window based data processing method in the present exemplary embodiment, the target sliding window of the received stream data and the slicing positions of the stream data in the target sliding window may be determined according to a preset partitioning rule, an aggregation algorithm is invoked to perform aggregation processing on the stream data in each slicing position in the target sliding window, to obtain an aggregation result, then a tree structure constructed using a cyclic array is updated by using the aggregation result, and a calculation result of the stream data in the target sliding window is determined by nodes of the tree structure. On one hand, the stream data at each fragment position in the target sliding window is aggregated by calling an aggregation algorithm, so that the sliding window can be used for realizing the aggregation of the stream data, and partial aggregation results of the stream data can be shared among the sliding windows, thereby reducing the influence caused by redundant calculation and unnecessary data copying operation; on the other hand, by updating the tree structure constructed using the cyclic array with the result of aggregation, the complexity of stream data processing can be reduced and the data processing efficiency can be improved by utilizing the characteristics of the tree structure.
Exemplary embodiments of the present disclosure also provide a stream processing system, which may be a window operator in a stream computation framework, and may aggregate received stream data using sliding windows. Fig. 7 is a schematic diagram of a stream processing system in this exemplary embodiment, where as shown in the drawing, a stream processing system 700 may include a determining node 710, a calculating node 720 and a storage node 730, where the determining node 710 may be configured to determine a target sliding window of received stream data and a slice position of the stream data in the target sliding window according to a preset partitioning rule, the calculating node 720 may be configured to invoke an aggregation algorithm to perform an aggregation process on the stream data at each slice position in the target sliding window to obtain an aggregation result, and update a tree structure constructed using a cyclic array with the aggregation result, and the storage node 730 may be configured to store the tree structure, where leaf nodes and non-leaf nodes of the tree structure may be configured to store the stream data in the target sliding window and a local aggregation result of the stream data in the target sliding window, respectively.
After receiving the stream data, the determining node 710 may extract key information in the stream data, such as information of receiving time and receiving sequence of the stream data, and determine a target sliding window of the stream data according to a preset dividing rule, for example, the target sliding window of the stream data may be determined as a sliding window to which the stream data received at the previous time belongs according to the extracted key information, and divide the stream data to corresponding slicing positions in the target sliding window according to a slicing rule in the dividing rule. The storage node 730 may maintain a tree structure for storing stream data and an aggregation result thereof, which is a core part for implementing window computation, and when performing computation, the computing node 720 may acquire stream data in the storage node 730, aggregate the stream data according to a corresponding aggregation algorithm, and then store the aggregation result obtained by computation in the storage node 730. Over time, the nodes of the tree structure in storage node 730 may update as the aggregate result changes.
In the present exemplary embodiment, for each aggregation algorithm, an aggregation function may be written by an operator, and when executed, the aggregation function may be decomposed into primitive operations such as Calculate, merge, split, getRes. Wherein:
calcualate (e: elements): partAgg can be used to Calculate part of the data in a certain data slice, typically a data slice will contain many Elements, calculate the aggregate result of these Elements, and the aggregate result can be used to merge with other aggregate results; the Merge (x: partAgg, y: partAgg) Merge operation may be denoted as x=x y for merging the two partial polymerization results; split (x: partAgg, y: partAgg) Split operation can be expressed as x=x, y, separating the result of one partial aggregation from the result of another partial aggregation, and most of the old data is discarded after window triggering to be needed to strip the scene of child node data from the parent node; getREs (x: partAgg): res GetREs operation can obtain the final polymerization result.
In a compute node, the following 6 basic operations may be included, which may be implemented based on the four primitive operations above:
an update (type) update operation updates the associated parent node of the data slice at the specified location.
add (slice) can find the next inserted position from the circular array, call update to update the corresponding value.
remove (num) deletes the plurality of expired fragments, and invokes update to update the corresponding value.
prefix (index) is used to calculate the prefix sum of the cyclic array to the end of index.
The suffix (index) is used to calculate the suffix sum of the cyclic array from index to end.
aggregation (head, tail) is invoked when a window is triggered, and the aggregate result of the head pointer to the tail pointer is calculated.
In an alternative embodiment, as shown in fig. 8, the stream processing system 700 may further include a trigger node 740 that may be used to determine whether the receive time of the stream data is greater than or equal to the dynamic arrival time, and trigger a determination of a target sliding window for the stream data when the receive time of the stream data is greater than or equal to the dynamic arrival time. Wherein the dynamic arrival time is determined according to a time difference between a maximum event time of stream data entering the window and a preset delay time.
After the stream data is input, the stream processing system may process the stream data, extract key information, such as a receiving time, and determine whether the receiving time of the stream data is greater than or equal to a dynamic arrival time, and trigger a determination of a target sliding window of the stream data by the trigger node 740 when the receiving time of the stream data is greater than or equal to the dynamic arrival time, that is, the determination node 710 reads the key information of the stream data from the storage node 730, determines the target sliding window of the stream data according to a preset division rule, and a fragmentation position of the stream data in the target sliding window. After determining the target sliding window, the computing node 720 may call an aggregation algorithm to aggregate the stream data in the target sliding window, where the stream data and an aggregation result obtained by the aggregation process may be stored in an array form in the storage node 730. After the aggregation process is completed, the output result may be output as an output stream.
In addition, as shown in fig. 8, the stream processing system 700 may further include a window node 750 and a function node 760, where the window node 750 may set a window type, a window length, etc. of each window, and the function node 760 may set a specific window function, such as Max, min, count, sum, etc.
In the present exemplary embodiment, the window operator based on the tree structure is the most core class in the stream processing system, and in order to implement window calculation, the determining node 710 may allocate the received stream data to the target sliding window through the partition operator. The window operator based on the tree structure comprises member variables of members, such as window size, moving step length of the window, starting time of the window, ending time of the window, window functions, an aggregation algorithm of the window and the like.
The window operator based on the tree structure comprises 2 important members, namely a window function and a window aggregation algorithm. Wherein the window function may be an interface. Fig. 9 shows a schematic diagram of classes contained in a window operator, taking the standard deviation as an example, where the standard deviation implements four basic primitives in a window function interface, which can be used to update a partial aggregation result in a tree structure. The standard deviation class can contain some basic member variables for calculation, record the number, accumulation sum, square sum and other information of all data in the data fragments, the father node of the tree structure can store the data of the leaf node, when the data fragments newly added into the leaf node are removed from the tree structure, the relevant calculation information of the father node is updated, and meanwhile, in window calculation, the window aggregation result can be calculated by directly utilizing the calculation information. The tree structure algorithm implements several basic operations in the window algorithm interface for updating the aggregate tree in the context of generating new slices and culling old data, and for computing the aggregate result of the window. The tree structure algorithm includes a member variable store data representing the underlying data structure of the tree structure. The tree structure algorithm may be used to maintain the constructed tree structure.
Further, the present exemplary embodiment also provides a data processing apparatus based on a sliding window, referring to fig. 10, the data processing apparatus 1000 based on a sliding window may include: a first determining module 1010, configured to determine a target sliding window of the received stream data and a slice position of the stream data in the target sliding window according to a preset dividing rule; the aggregation module 1020 is configured to invoke an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window, so as to obtain an aggregation result; the updating module 1030 may be configured to update a tree structure constructed using a cyclic array with an aggregation result, where leaf nodes and non-leaf nodes of the tree structure may be respectively configured to store stream data within a target sliding window and a local aggregation result of stream data within the target sliding window; the second determining module 1040 may be configured to determine, by the nodes of the tree structure, a calculation result of the flow data within the target sliding window.
In an exemplary embodiment of the present disclosure, the first determining module 1010 may be configured to determine whether a receiving time of the streaming data is greater than or equal to a dynamic arrival time, the dynamic arrival time being determined according to a time difference between a maximum event time of the streaming data entering the window and a preset delay time, and trigger to determine a target sliding window of the streaming data when the receiving time of the streaming data is greater than or equal to the dynamic arrival time.
In an exemplary embodiment of the present disclosure, the first determining module 1010 may be configured to determine a target sliding window of stream data and a slice position of the stream data within the target sliding window according to a data parameter of the stream data and a window parameter of each window, where the data parameter of the stream data includes a reception time and a reception order of the stream data, and the window parameter of each window includes a window length and a window type of each window.
In an exemplary embodiment of the present disclosure, the first determining module 1010 may be further configured to create a new sliding window when the reception time of the stream data is greater than the window length of each sliding window, determine the new sliding window as a target sliding window of the stream data, and determine any sliding window as a target sliding window of the stream data when the reception time of the stream data is not greater than the window length of any sliding window.
In one exemplary embodiment of the present disclosure, each tile location within the target sliding window corresponds to a data tile, which is made up of one or more stream data.
In an exemplary embodiment of the present disclosure, when an aggregation algorithm is invoked to aggregate stream data at each tile position in a target sliding window, the aggregation module 1020 may be configured to determine a repeated tile in the target sliding window when a moving step size of the target sliding window is smaller than a window length, calculate an aggregation result of other data tiles in the target sliding window except the repeated tile, and determine an aggregation result of the stream data in the target sliding window using the aggregation result of the repeated tile and the aggregation result of the other data tiles.
In an exemplary embodiment of the present disclosure, the aggregation module 1020 may be further configured to determine, when the stream data at each slice location in the target sliding window is aggregated, whether all the stream data at each slice location in the target sliding window arrives, receive new stream data when all the stream data at any slice location in the target sliding window does not arrive, and add the new stream data to any slice location according to a preset partitioning rule, and add the stream data at each slice location to a leaf node of the tree structure when all the stream data at each slice location in the target sliding window arrives.
In an exemplary embodiment of the present disclosure, after the stream data at each slicing position in the target sliding window is aggregated, the updating module 1030 may be configured to update the data slices at each slicing position in the target sliding window according to the moving step of the target sliding window, and calculate an aggregation result of the data slices at each slicing position in the target sliding window after updating the data slices.
In an exemplary embodiment of the present disclosure, the update module 1030 may be further configured to delete an expired slice in the target sliding window, and delete node data of the expired slice from the tree structure, where the expired slice is a data slice of a slice location in the target sliding window where aggregation is completed, receive new stream data, and add the new stream data to a slice location corresponding to the expired slice in the target sliding window according to a preset partitioning rule.
In one exemplary embodiment of the present disclosure, the slice range corresponding to the stream data in the target sliding window is located by the head-to-tail pointer.
The specific details of each module in the above apparatus are already described in the method section embodiments, and the details of the undisclosed solution may be referred to the method section embodiments, so that they will not be described in detail.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.
Referring to fig. 11, a program product 1100 for implementing the above-described method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Program product 1100 may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The exemplary embodiment of the disclosure also provides an electronic device capable of implementing the method. An electronic device 1200 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 12. The electronic device 1200 shown in fig. 12 is merely an example, and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 12, the electronic device 1200 may be embodied in the form of a general purpose computing device. Components of electronic device 1200 may include, but are not limited to: the at least one processing unit 1210, the at least one memory unit 1220, a bus 1230 connecting the different system components (including the memory unit 1220 and the processing unit 1210), and a display unit 1240.
In which the storage unit 1220 stores program code that can be executed by the processing unit 1210, so that the processing unit 1210 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary method" section of the present specification. For example, the processing unit 1210 may perform the method steps shown in fig. 2 to 3 and fig. 5 to 6, etc.
The storage unit 1220 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 1221 and/or cache memory unit 1222, and may further include Read Only Memory (ROM) 1223.
Storage unit 1220 may also include a program/utility 1224 having a set (at least one) of program modules 1225, such program modules 1225 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 1230 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 1200 may also communicate with one or more external devices 1300 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1200, and/or any device (e.g., router, modem, etc.) that enables the electronic device 1200 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1250. Also, the electronic device 1200 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet through the network adapter 1260. As shown, the network adapter 1260 communicates with other modules of the electronic device 1200 over bus 1230. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 1200, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
From the description of the embodiments above, those skilled in the art will readily appreciate that the exemplary embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the exemplary embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the exemplary embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (15)

1. A sliding window based data processing method applied to a stream processing system, the method comprising:
determining a target sliding window of received stream data and a slicing position of the stream data in the target sliding window according to a preset dividing rule;
invoking an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result;
updating a tree structure constructed by using a cyclic array by utilizing the aggregation result, wherein leaf nodes and non-leaf nodes of the tree structure are respectively used for storing stream data in the target sliding window and local aggregation results of the stream data in the target sliding window;
And determining the calculation result of the stream data in the target sliding window through the nodes of the tree structure.
2. The method according to claim 1, wherein the method further comprises:
determining whether the receiving time of the stream data is greater than or equal to a dynamic arrival time, wherein the dynamic arrival time is determined according to a time difference between the maximum event time of the stream data entering a window and a preset delay time;
and triggering and determining a target sliding window of the stream data when the receiving time of the stream data is greater than or equal to the dynamic arrival time.
3. The method according to claim 1, wherein the determining a target sliding window of the received stream data and a slicing position of the stream data within the target sliding window according to a preset partitioning rule comprises:
determining a target sliding window of the stream data and a slicing position of the stream data in the target sliding window according to the data parameters of the stream data and the window parameters of each window;
the data parameters of the stream data comprise the receiving time and the receiving sequence of the stream data, and the window parameters of each window comprise the window length and the window type of each window.
4. A method according to claim 3, wherein said determining a target sliding window of said stream data and a slice position of said stream data within said target sliding window according to data parameters of said stream data and window parameters of respective windows comprises:
when the receiving time of the stream data is longer than the window length of each sliding window, creating a new sliding window, and determining the new sliding window as a target sliding window of the stream data;
and when the receiving time of the stream data is not more than the window length of any sliding window, determining the any sliding window as a target sliding window of the stream data.
5. The method of claim 1, wherein each slice location within the target sliding window corresponds to a data slice, the data slice being made up of one or more streams of data.
6. The method of claim 5, wherein when an aggregation algorithm is invoked to aggregate stream data at each tile location within the target sliding window, the method further comprises:
when the moving step length of the target sliding window is smaller than the window length, determining repeated fragments in the target sliding window;
And calculating the aggregation results of the other data fragments except the repeated fragments in the target sliding window, and determining the aggregation results of the stream data in the target sliding window by utilizing the aggregation results of the repeated fragments and the aggregation results of the other data fragments.
7. The method according to claim 1, wherein the method further comprises:
when stream data at each slicing position in the target sliding window is subjected to aggregation processing, determining whether the stream data at each slicing position in the target sliding window completely arrives;
when stream data of any slice position in the target sliding window does not all arrive, receiving new stream data, and adding the new stream data to any slice position according to the preset dividing rule;
and when all the stream data of each slicing position in the target sliding window arrives, adding the stream data of each slicing position to a leaf node of the tree structure.
8. The method of claim 5, wherein after aggregating stream data at each tile location within the target sliding window, the method further comprises:
Updating the data fragments of each fragment position in the target sliding window according to the moving step length of the target sliding window;
and calculating an aggregation result of the data fragments of each fragment position in the target sliding window after updating the data fragments.
9. The method of claim 8, wherein updating the data slices for each slice location within the target sliding window in accordance with the step size of the target sliding window comprises:
deleting the expired fragments in the target sliding window, and deleting node data of the expired fragments from the tree structure, wherein the expired fragments are data fragments of fragment positions which are subjected to aggregation treatment in the target sliding window;
and receiving new stream data, and adding the new stream data to the slicing position corresponding to the expired slicing in the target sliding window according to the preset dividing rule.
10. The method of claim 1, wherein the slice range corresponding to the stream data in the target sliding window is located by a head-to-tail pointer.
11. A stream processing system, the system comprising:
a determining node, configured to determine a target sliding window of received stream data and a slicing position of the stream data in the target sliding window according to a preset partitioning rule;
The computing node is used for calling an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result, and updating a tree structure constructed by using a cyclic array by utilizing the aggregation result;
a storage node for storing the tree structure;
the leaf nodes and the non-leaf nodes of the tree structure are respectively used for storing stream data in the target sliding window and local aggregation results of the stream data in the target sliding window.
12. The system of claim 11, further comprising a trigger node for determining whether the receive time of the streaming data is greater than or equal to a dynamic arrival time, and triggering a determination of a target sliding window for the streaming data when the receive time of the streaming data is greater than or equal to the dynamic arrival time;
wherein the dynamic arrival time is determined according to a time difference between a maximum event time of stream data entering the window and a preset delay time.
13. A sliding window based data processing apparatus, the apparatus comprising:
the first determining module is used for determining a target sliding window of the received stream data and a slicing position of the stream data in the target sliding window according to a preset dividing rule;
The aggregation module is used for calling an aggregation algorithm to aggregate the stream data at each fragment position in the target sliding window to obtain an aggregation result;
the updating module is used for updating a tree structure constructed by using a cyclic array by utilizing the aggregation result, and leaf nodes and non-leaf nodes of the tree structure are respectively used for storing stream data in the target sliding window and local aggregation results of the stream data in the target sliding window;
and the second determining module is used for determining the calculation result of the stream data in the target sliding window through the nodes of the tree structure.
14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-10.
15. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-10 via execution of the executable instructions.
CN202111583376.6A 2021-12-22 2021-12-22 Data processing method, stream processing system, medium and device based on sliding window Pending CN116339872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111583376.6A CN116339872A (en) 2021-12-22 2021-12-22 Data processing method, stream processing system, medium and device based on sliding window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111583376.6A CN116339872A (en) 2021-12-22 2021-12-22 Data processing method, stream processing system, medium and device based on sliding window

Publications (1)

Publication Number Publication Date
CN116339872A true CN116339872A (en) 2023-06-27

Family

ID=86891667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111583376.6A Pending CN116339872A (en) 2021-12-22 2021-12-22 Data processing method, stream processing system, medium and device based on sliding window

Country Status (1)

Country Link
CN (1) CN116339872A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775199A (en) * 2023-08-23 2023-09-19 中国电信股份有限公司 Method, system and communication equipment for realizing dynamic window

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775199A (en) * 2023-08-23 2023-09-19 中国电信股份有限公司 Method, system and communication equipment for realizing dynamic window
CN116775199B (en) * 2023-08-23 2023-11-24 中国电信股份有限公司 Method, system and communication equipment for realizing dynamic window

Similar Documents

Publication Publication Date Title
US11836533B2 (en) Automated reconfiguration of real time data stream processing
US11005933B2 (en) Providing queueing in a log streaming messaging system
US8484269B2 (en) Computing time-decayed aggregates under smooth decay functions
JP6205066B2 (en) Stream data processing method, stream data processing apparatus, and storage medium
CN109508326B (en) Method, device and system for processing data
CN107729570B (en) Data migration method and device for server
US9774662B2 (en) Managing transactional data for high use databases
US20190220366A1 (en) Method, apparatus and computer program product for managing data backup
CN111352967A (en) Frequency control method, system, device and medium for sliding window algorithm
CN116339872A (en) Data processing method, stream processing system, medium and device based on sliding window
US20220107750A1 (en) Method, electronic device, and computer program product for processing data
EP2940600A1 (en) Data scanning method and device
CN113760982B (en) Data processing method and device
CN114020469A (en) Edge node-based multi-task learning method, device, medium and equipment
CN111596864A (en) Method, device, server and storage medium for data delayed deletion
CN113254191A (en) Method, electronic device and computer program product for running applications
CN112579576B (en) Data processing method, device, medium and computing equipment
CN115016890A (en) Virtual machine resource allocation method and device, electronic equipment and storage medium
CN114090252A (en) Resource management method and device, storage medium and computer system
US9734461B2 (en) Resource usage calculation for process simulation
CN112749204A (en) Method and device for reading data
Bruneel et al. On queues with general service demands and constant service capacity
CN114900477B (en) Message processing method, server, electronic equipment and storage medium
US11502971B1 (en) Using multi-phase constraint programming to assign resource guarantees of consumers to hosts
CN113225228B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination