CN111210156B - Real-time stream data processing method and device based on stream window - Google Patents

Real-time stream data processing method and device based on stream window Download PDF

Info

Publication number
CN111210156B
CN111210156B CN202010033093.3A CN202010033093A CN111210156B CN 111210156 B CN111210156 B CN 111210156B CN 202010033093 A CN202010033093 A CN 202010033093A CN 111210156 B CN111210156 B CN 111210156B
Authority
CN
China
Prior art keywords
data
window
processing process
time
index data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010033093.3A
Other languages
Chinese (zh)
Other versions
CN111210156A (en
Inventor
夏志富
许巧生
孔垂建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lazas Network Technology Shanghai Co Ltd
Original Assignee
Lazas Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lazas Network Technology Shanghai Co Ltd filed Critical Lazas Network Technology Shanghai Co Ltd
Priority to CN202010033093.3A priority Critical patent/CN111210156B/en
Publication of CN111210156A publication Critical patent/CN111210156A/en
Application granted granted Critical
Publication of CN111210156B publication Critical patent/CN111210156B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a real-time stream data processing method and a device based on a stream window, wherein the method comprises the following steps: acquiring real-time streaming data from at least one data source, wherein the real-time streaming data comprises waybill streaming data and distribution facility streaming data; carrying out data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table; and receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by using the processing process of the stream window to obtain the corresponding single amount index data. According to the scheme provided by the embodiment of the invention, for the single-quantity index data defined at different times, the single-quantity index data can be individually aggregated and calculated by calling the processing process of the corresponding flow window, so that the pressure balance can be conveniently carried out according to the calculated single-quantity index data, and the requirement of reasonable scheduling of the waybill can be met.

Description

Real-time stream data processing method and device based on stream window
Technical Field
The invention relates to the technical field of data processing, in particular to a real-time stream data processing method and device based on a stream window.
Background
During the distribution of the waybill, the supply and demand may be unbalanced for various reasons, such as slow distribution due to weather, insufficient capacity due to surge of waybill volume, capacity pressure due to online sales promotion, and the like. Therefore, an automatic pressure balance regulating and controlling means is needed, which can perform operations such as reducing distribution range, increasing distribution facility distribution time and the like on the waybill, ensure timely fulfillment of the waybill, and reduce economic losses of merchants and capacity. The data characterizing the pressure balance are called single quantity index data, for example: the number of the single units in different waybill states, the number of the refusal units of the distribution facilities, the order receiving time of the distribution facilities, the number of the distribution facilities in the distribution and the like. The single-quantity index data have different definitions on time, and the prior art needs to design a data processing flow separately for different single-quantity index data, so that the problem that how to achieve personalized calculation under the condition that the data quantity level of the data is large is to be solved urgently.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide a real-time stream data processing method and apparatus implemented based on a stream window, which overcome or at least partially solve the above problems.
According to an aspect of the embodiments of the present invention, a method for processing real-time stream data based on a stream window is provided, which includes:
acquiring real-time streaming data from at least one data source, wherein the real-time streaming data comprises waybill streaming data and distribution facility streaming data;
carrying out data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table;
and receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by using the processing process of the stream window to obtain the corresponding single amount index data.
Optionally, the method further comprises: the method comprises the steps of deploying a processing process of a plurality of flow windows in advance, wherein each flow window corresponds to single-amount index data with various data characteristics.
Optionally, invoking a processing process of a corresponding stream window according to the data characteristics of the single amount of index data, and performing aggregation calculation on the data in the data width table by the processing process of the stream window to obtain the corresponding single amount of index data further includes:
establishing a single-quantity index data coordinate system, and mapping data in the data width table into the single-quantity index data coordinate system, wherein the coordinate parameters of each coordinate axis of the coordinate system are respectively as follows: time, single index data, geospatial information;
and calling a corresponding processing process of the stream window according to the data characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the stream window to obtain the corresponding single-quantity index data.
Optionally, invoking a processing process of a corresponding stream window according to the data characteristic of the single amount of index data, and performing aggregation calculation on data in the single amount of index data coordinate system by the processing process of the stream window to obtain corresponding single amount of index data further includes:
and calling a processing process of the global window according to the real-time characteristic of the single index data, and carrying out aggregation calculation on the data in the single index data coordinate system by the processing process of the global window to obtain the single index data at each moment, wherein the data in the single index data coordinate system are in the same global window.
Optionally, invoking a processing process of a corresponding stream window according to the data characteristic of the single amount of index data, and performing aggregation calculation on data in the single amount of index data coordinate system by the processing process of the stream window to obtain corresponding single amount of index data further includes:
calling a processing process of a global window according to the accumulative proportion characteristic of the single-quantity index data, and carrying out aggregation calculation on data in a single-quantity index data coordinate system by the processing process of the global window;
and calling a processing process of the rolling window, and carrying out aggregation calculation on the aggregation result of the processing process of the global window by the processing process of the rolling window according to the window size parameter to obtain accumulated same-proportion single-quantity index data, wherein the window size parameter of the rolling window is determined according to the first aggregation time.
Optionally, invoking a processing process of a corresponding stream window according to the data characteristic of the single amount of index data, and performing aggregation calculation on data in the single amount of index data coordinate system by the processing process of the stream window to obtain corresponding single amount of index data further includes:
and calling a processing process of the sliding window according to the slicing characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the sliding window according to the window size parameter and the sliding step length to obtain the single-quantity index data of the slice, wherein the window size parameter of the sliding window is determined according to the second aggregation time.
Optionally, the waybill flow data contains one or more of the following dimensional data: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID;
the distribution facility flow data contains one or more of the following dimensional data: a distribution facility ID, a waybill number, and/or a distribution facility order pickup time.
Optionally, after acquiring the real-time streaming data from the at least one data source, the method further comprises:
and if detecting that the real-time stream data has partial dimension data loss, performing data completion processing on the real-time stream data.
Optionally, after acquiring the real-time streaming data from the at least one data source, the method further comprises:
and if the fact that the real-time streaming data has fraudulent data or data of a pre-ordered order or repeated data is detected, carrying out data cleaning processing on the real-time streaming data.
According to another aspect of the embodiments of the present invention, there is provided a real-time stream data processing apparatus implemented based on a stream window, including:
the acquisition module is suitable for acquiring real-time streaming data from at least one data source, wherein the real-time streaming data comprises waybill streaming data and distribution facility streaming data;
the data connection processing module is suitable for performing data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table;
and the aggregation calculation module is suitable for receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by the processing process of the stream window to obtain the corresponding single amount index data.
Optionally, the apparatus further comprises: the deployment module is suitable for pre-deploying a processing process of a plurality of flow windows, wherein each flow window corresponds to single-amount index data with various data characteristics.
Optionally, the aggregation calculation module is further adapted to: establishing a single-quantity index data coordinate system, and mapping data in the data width table into the single-quantity index data coordinate system, wherein the coordinate parameters of each coordinate axis of the coordinate system are respectively as follows: time, single index data, geospatial information;
and calling a corresponding processing process of the stream window according to the data characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the stream window to obtain the corresponding single-quantity index data.
Optionally, the aggregation calculation module is further adapted to: and calling a processing process of the global window according to the real-time characteristic of the single index data, and carrying out aggregation calculation on the data in the single index data coordinate system by the processing process of the global window to obtain the single index data at each moment, wherein the data in the single index data coordinate system are in the same global window.
Optionally, the aggregation calculation module is further adapted to: calling a processing process of a global window according to the accumulative proportion characteristic of the single-quantity index data, and carrying out aggregation calculation on data in a single-quantity index data coordinate system by the processing process of the global window;
and calling a processing process of the rolling window, and carrying out aggregation calculation on the aggregation result of the processing process of the global window by the processing process of the rolling window according to the window size parameter to obtain accumulated same-proportion single-quantity index data, wherein the window size parameter of the rolling window is determined according to the first aggregation time.
Optionally, the aggregation calculation module is further adapted to: and calling a processing process of the sliding window according to the slicing characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the sliding window according to the window size parameter and the sliding step length to obtain the single-quantity index data of the slice, wherein the window size parameter of the sliding window is determined according to the second aggregation time.
Optionally, the waybill flow data contains one or more of the following dimensional data: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID;
the distribution facility flow data contains one or more of the following dimensional data: a distribution facility ID, a waybill number, and/or a distribution facility order pickup time.
Optionally, the apparatus further comprises: the data completion processing module is suitable for: and if detecting that the real-time stream data has partial dimension data loss, performing data completion processing on the real-time stream data.
Optionally, the apparatus further comprises: and the data cleaning processing module is suitable for cleaning the real-time streaming data if the real-time streaming data is detected to have fraudulent data or data of a pre-order or repeated data.
According to still another aspect of an embodiment of the present invention, there is provided a computing device including: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the real-time stream data processing method realized based on the stream window.
According to another aspect of the embodiments of the present invention, there is provided a computer storage medium, in which at least one executable instruction is stored, and the executable instruction causes a processor to perform an operation corresponding to the above real-time streaming data processing method implemented based on a streaming window.
According to the scheme provided by the embodiment of the invention, real-time streaming data is acquired from at least one data source, wherein the real-time streaming data comprises waybill streaming data and distribution facility streaming data; carrying out data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table; and receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by using the processing process of the stream window to obtain the corresponding single amount index data. According to the scheme provided by the embodiment of the invention, for the single-quantity index data defined at different times, the single-quantity index data can be individually aggregated and calculated by calling the processing process of the corresponding flow window, so that the pressure balance can be conveniently carried out according to the calculated single-quantity index data, and the requirement of reasonable scheduling of the waybill can be met.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the embodiments of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the embodiments of the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flow chart illustrating a real-time stream data processing method implemented based on a stream window according to an embodiment of the present invention;
fig. 2A is a flowchart illustrating a real-time stream data processing method implemented based on a stream window according to another embodiment of the present invention;
FIG. 2B shows a schematic diagram of real-time streaming data processing based on a global window implementation;
FIG. 2C shows a flow diagram of a real-time streaming data processing method implemented based on a rolling window;
fig. 2D is a flowchart illustrating a real-time stream data processing method implemented based on a stream window according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a real-time stream data processing apparatus implemented based on a stream window according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computing device provided in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a flowchart of a real-time stream data processing method implemented based on a stream window according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S101, acquiring real-time streaming data from at least one data source, wherein the real-time streaming data comprises waybill streaming data and distribution facility streaming data.
Specifically, the data needs to be acquired before the data processing, the data source is a data source, and the present embodiment is to process real-time streaming data, so that the real-time streaming data needs to be acquired from at least one data source, where the real-time streaming data includes waybill streaming data and distribution facility streaming data, for example, the waybill streaming data and the distribution facility streaming data may be acquired from one data source, or the waybill streaming data may be acquired from the waybill data source and the distribution facility streaming data may be acquired from the distribution facility data source, and of course, the data may also need to be acquired from other data sources, which is not limited herein.
And step S102, performing data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table.
After acquiring the waybill flow data and the distribution facility flow data, data concatenation processing (which may also be referred to as a dual-stream join) may be performed on the waybill flow data and the distribution facility flow data, and a data width table may be obtained after the dual-stream join is performed. The double-flow join is performed for the convenience of subsequent polymerization calculation, and the polymerization calculation is performed according to a data wide table so as to obtain single-quantity index data for representing the pressure equilibrium state. The data connection may be performed by a data connection processing method commonly used in the art, and details are not described here.
Step S103, receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by the processing process of the stream window to obtain the corresponding single amount index data.
The single index data are used for representing the pressure balance state, and the calculated single index data can be used for calculating a pressure balance coefficient, so that the distribution of the freight note can be conveniently regulated and controlled according to the pressure balance coefficient.
Different single-quantity index data may have different data characteristics, where the data characteristics are time requirements of the calculated data, for example, the data characteristics are a real-time characteristic, an accumulative comparison characteristic or a slicing characteristic, where the real-time characteristic refers to that the single-quantity index data requires zero delay, the accumulative comparison characteristic refers to that the single-quantity index data requires refreshing once every period of time (for example, every minute) and can be compared with the historical time, and the slicing characteristic refers to that the single-quantity index data requires calculation within a period of time.
Specifically, the data width table obtained in step S102 provides a data basis for calculating the single quantity index data, receives a calculation task of the single quantity index data for representing the pressure balance state, where the calculation task indicates which single quantity index data need to be calculated, then invokes a corresponding processing process of the stream window according to the data characteristics of the single quantity index data, and performs aggregation calculation on the data in the data width table by using the processing process of the stream window to obtain the corresponding single quantity index data.
According to the method provided by the embodiment of the invention, real-time streaming data is acquired from at least one data source, and the real-time streaming data comprises waybill streaming data and distribution facility streaming data; carrying out data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table; and receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by using the processing process of the stream window to obtain the corresponding single amount index data. According to the scheme provided by the embodiment of the invention, for the single-quantity index data defined at different times, the single-quantity index data can be individually aggregated and calculated by calling the processing process of the corresponding flow window, and the rapid real-time flow data processing process is realized, so that the pressure balance can be conveniently carried out according to the calculated single-quantity index data, and the requirement of reasonable scheduling of the freight notes is realized.
Fig. 2A shows a flowchart of a real-time stream data processing method implemented based on a stream window according to another embodiment of the present invention. As shown in fig. 2A, the method includes the steps of:
step S201, acquiring real-time streaming data from at least one data source, where the real-time streaming data includes waybill streaming data and distribution facility streaming data.
Specifically, the data needs to be acquired before the data processing, the data source is a data source, and the present embodiment is to process real-time streaming data, so that the real-time streaming data needs to be acquired from at least one data source, where the real-time streaming data includes waybill streaming data and distribution facility streaming data, for example, the waybill streaming data and the distribution facility streaming data may be acquired from one data source, or the waybill streaming data may be acquired from the waybill data source and the distribution facility streaming data may be acquired from the distribution facility data source, which is not limited herein.
Wherein the waybill flow data comprises one or more of the following dimensional data: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID; the distribution facility flow data contains one or more of the following dimensional data: a distribution facility ID, a waybill number, and/or a distribution facility order pickup time. This is by way of example only and is not intended to be limiting.
The waybill status indicates the current status of the waybill, and may be, for example, to receive the waybill, to be delivered, to be completed, or the like. The business circles, the grids and the stations are geospatial representations, wherein the sizes of spaces represented by the business circles, the grids and the stations are reduced in sequence, the grids represent a space area for distribution, and the stations represent distribution stations.
Step S202, if detecting that the real-time stream data has partial dimension data missing, performing data completion processing on the real-time stream data.
After acquiring the real-time stream data from the at least one data source, it is required to detect whether there is a partial missing of the dimensional data in the real-time stream data, and specifically, a dimensional field of the standard real-time stream data may be compared with a dimensional field of the real-time stream data acquired from the at least one data source, and it may be determined whether there is a partial missing of the dimensional data according to a comparison result. If the real-time stream data is detected to have partial missing dimension data, performing data completion processing on the real-time stream data, for example, the missing dimension data may be obtained from other data sources.
Taking the stream data as an example, data loss of the following dimensions may occur: and the dimensions of the business district ID, the grid ID, the site ID and the like, therefore, the data of the business district ID, the grid ID, the site ID and the like can be obtained from other data sources, so as to complete the data. And the data of the missing dimensionality is supplemented, so that the subsequent calculation of single index data is facilitated.
Step S203, if it is detected that the real-time streaming data has fraudulent data, data of a pre-ordered order, or repeated data, performing data cleaning processing on the real-time streaming data.
In practical applications, interference of special situations such as fraudulent data or reservation order data is likely to occur, and data repetition may also be caused due to synchronization of DRC data, and these data may affect performing subsequent aggregation calculation, for example, the accuracy of aggregation calculation may be affected by fraudulent data or reservation order data, and the load of calculation may be increased by repeated data, which causes resource waste. And if the fact that the fraudulent data or the pre-ordering data or the repeated data exist in the real-time streaming data is detected, carrying out data cleaning processing on the real-time streaming data, and removing the fraudulent data or the pre-ordering data or the repeated data.
It should be noted that the present embodiment does not limit the execution sequence of step S202 and step S203, and step S202 may be executed after step S203 is executed.
By cleaning the real-time stream data, the power of the aggregation calculation can be achieved, the calculation load can be reduced, and the resource consumption caused by repeated calculation can be reduced.
And step S204, performing data connection processing on the processed waybill flow data and the distribution facility flow data to obtain a data width table.
After acquiring the waybill flow data and the distribution facility flow data, data concatenation processing (which may also be referred to as a dual-stream join) may be performed on the waybill flow data and the distribution facility flow data, and a data width table may be obtained after the dual-stream join is performed. The double-flow join is performed for the convenience of subsequent polymerization calculation, and the polymerization calculation is performed according to a data wide table so as to obtain single-quantity index data for representing the pressure equilibrium state.
For example, the waybill flow data obtained after step S203 includes: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID; the distribution facility flow data includes: the data width table obtained by performing data connection processing on the processed waybill flow data and the distribution facility flow data is as follows: the waybill number, the waybill state, the waybill creation time, the waybill rejection time, the waybill completion time, the business district ID, the grid ID, the site ID, the distribution facility ID, and the distribution facility order pickup time. This is by way of example only and is not intended to be limiting.
And step S205, receiving a calculation task of single quantity index data for representing the pressure balance state.
The single index data are used for representing the pressure balance state, and the calculated single index data can be used for calculating a pressure balance coefficient, so that the distribution of the freight note can be conveniently regulated and controlled according to the pressure balance coefficient.
And after the data wide table is obtained, receiving a calculation task of single quantity index data for representing the pressure balance state. The single index data required to be calculated may be: the single quantity in different waybill states, the quantity of rejected orders of the distribution facilities, the order receiving time of the distribution facilities, the quantity of the distribution facilities in distribution, etc. are only used for illustration and have no limiting effect.
Step S206, establishing a single-quantity index data coordinate system, and mapping the data in the data width table into the single-quantity index data coordinate system, wherein the coordinate parameters of each coordinate axis of the coordinate system are respectively as follows: time, single index data, geospatial information.
For convenience of statistics, in this embodiment, a coordinate system of the single amount of index data needs to be established, so as to facilitate the processing process of the corresponding stream window to perform aggregation calculation of the single amount of index data, where coordinate parameters of each coordinate axis of the coordinate system are respectively: time, single index data, and geospatial information, where the geospatial information is: the business circle ID, the grid ID, and the site ID may be understood as single quantity index data of which business circle, which grid, or which site needs to be aggregated and calculated, and after a single quantity index data coordinate system is established, the data in the data width table is mapped into the single quantity index data coordinate system.
In this embodiment, a processing process of multiple stream windows needs to be deployed in advance, where each stream window corresponds to a single metric data with various data characteristics, for example, a processing process of deploying a global window, a processing process of a rolling window, and a processing process of a sliding window, where the global window corresponds to a single metric data with real-time characteristics; the rolling window corresponds to single-quantity index data with accumulated geometric characteristics; the sliding window corresponds to single-magnitude index data having a slice characteristic. Data properties are closely related to the window, and determine which stream window's processing procedure to invoke.
Specifically, the method in steps S207 to S210 may be utilized to aggregate and calculate the single amount index data:
step S207, calling a processing process of the global window according to the real-time characteristics of the single-quantity index data, and performing aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the global window to obtain the single-quantity index data at each moment, wherein the data in the single-quantity index data coordinate system are in the same global window.
Specifically, some single-quantity index data require real-time performance of zero delay, for example, single quantity under different waybill conditions, the number of distribution facilities in distribution, and the like, and the data characteristics of such single-quantity index data can be considered as real-time characteristics, which requires aggregation calculation of the single-quantity index data at each time.
In order to meet the requirement of aggregating and calculating single-quantity index data in real time, a processing process of a global window is called according to the real-time characteristics of the single-quantity index data, and the processing process of the global window performs aggregation calculation on the data in a single-quantity index data coordinate system to obtain the single-quantity index data at each time, wherein the global window is an unbounded window, as shown in fig. 2B, all data in the single-quantity index data coordinate system are in the same global window, it should be noted that fig. 2B is a projection diagram of the single-quantity index data coordinate system and is a schematic diagram, an abscissa represents time, and an ordinate represents geospatial information, such as a business circle ID, a grid ID, and a site ID, and one of the geospatial information can also be aggregated and calculated according to needs. In fig. 2B, each point represents a piece of mapped data, the single amount index data at a certain time is calculated, and the single amount index data at each time can be obtained by performing aggregation calculation on the data mapped at the certain time in the single amount index data coordinate system.
And S208, calling a processing process of the global window according to the accumulative proportion characteristic of the single-quantity index data, and performing aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the global window.
Some single indicator data require refreshing once per period of time (e.g., every minute) and can be compared with historical equivalent time, e.g., compared with yesterday equivalent time, the data characteristic of such single indicator data can be considered as cumulative equivalent characteristic, and for such single indicator data, the two-process completion can be performed.
Specifically, a processing procedure of the global window is called according to the cumulative parity characteristic of the single indicator data, and the processing procedure of the global window performs aggregation calculation on the data in the single indicator data coordinate system, where the aggregation calculation procedure is similar to step S207 and is not described here again.
Step S209, a processing process of the rolling window is called, and the processing process of the rolling window carries out aggregation calculation on the aggregation result of the processing process of the global window according to the window size parameter to obtain accumulated same-proportion single-quantity index data, wherein the window size parameter of the rolling window is determined according to the first aggregation time.
After obtaining the aggregation result at each time according to step S208, a processing process of the rolling window may be invoked according to the cumulative same-proportion characteristic of the single-quantity index data, and the processing process of the rolling window completes aggregation calculation of the cumulative same-proportion single-quantity index data, specifically, the processing process of the rolling window performs aggregation calculation on the aggregation result of the processing process of the global window according to a window size parameter, where the rolling window is bounded, that is, the window size parameter specifies a window size of the rolling window, the size of the window determines a data amount of the data, the window size parameter is determined according to a first aggregation time, for example, once per minute aggregation is required, and then the window size of the rolling window is 1 minute; requiring polymerization every 5 minutes, the window size of the rolling window is 5 minutes, which is merely illustrative and not limiting. As shown in fig. 2C, the window data in the rolling window are not overlapped, and each point in fig. 2C represents the result of the processing process aggregation calculation of the global window, wherein fig. 2C is a projection view of a single metric data coordinate system, which is a schematic illustration, the abscissa represents time, and the ordinate represents geospatial information, such as a business circle ID, a grid ID, and a site ID.
For example, the window size parameter is 1 minute, 5 windows are schematically shown in fig. 2C, which is actually a result of one window scrolling, the data in fig. 2C is an aggregation result of a processing process of a global window, the processing process of the scrolling window performs aggregation calculation on the data in the window 1 to obtain single indicator data of the minute, then continues to scroll to the position of the window 2, performs aggregation calculation on the window data in the window 2, and so on, and details are not repeated here.
The reason why the accumulated analog-to-digital unit index data is calculated by adopting the processing process of the rolling window instead of the processing process of the sliding window is to avoid data calculation delay caused by aggregation calculation by using the processing process of the sliding window, because the larger the window size of the sliding window is, the larger the data amount is, and the extra long calculation time is consumed. Moreover, the sliding window may have data duplication, and the calculated result may have an influence on the cumulative parity.
And step S210, calling a processing process of the sliding window according to the slicing characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the sliding window according to the window size parameter and the sliding step length to obtain the single-quantity index data of the slice, wherein the window size parameter of the sliding window is determined according to the second aggregation time.
Some single-amount index data require calculation of data within a period of time (for example, 10 minutes), the data characteristic of such single-amount index data may be considered as a slice characteristic, a processing process of a sliding window may be called for such single-amount index data to perform aggregation calculation, specifically, a processing process of a sliding window is called according to the slice characteristic of the single-amount index data, and the processing process of the sliding window performs aggregation calculation on data in a single-amount index data coordinate system according to a window size parameter and a sliding step length to obtain slice single-amount index data, where the window size parameter specifies a window size of the sliding window, the size of the window determines a data amount of the data, the window size parameter is determined according to a second aggregation time, for example, data requiring aggregation calculation for 10 minutes, and then the window size of the sliding window is 10 minutes; requiring the data to be aggregated for 15 minutes, then the window size of the sliding window is 15 minutes; the sliding step is used to control the new creation frequency of the sliding window, when the sliding step is smaller than the window size, multiple sliding windows may overlap, and at this time, a situation may occur in which part of the data is allocated to multiple windows, and the processing procedure of the sliding window is to perform aggregation calculation on the data allocated to the corresponding sliding window, as shown in fig. 2D.
Each point in fig. 2D represents a piece of data mapped, where fig. 2D is a projected view of a single metric data coordinate system, which is a schematic illustration, with the abscissa representing time and the ordinate representing geospatial information, e.g., business turn ID, grid ID, site ID. In fig. 2D, 4 windows are schematically shown, in fact as a result of a window sliding. In fig. 2D, there is a certain height difference between the window 1 and the window 2, between the window 2 and the window 3, and between the window 3 and the window 4, and the height difference is for convenience of illustration, and the sliding windows are actually sliding in parallel.
The embodiment is realized based on a distributed stream processing engine, and is a complete stream data platform integrating a fault tolerance mechanism, state management, data communication, distributed computation, monitoring and alarming and other characteristics. The aggregation result after the real-time stream data is obtained every time is maintained in the real-time stream data processing system, the next real-time stream data processing can carry out incremental aggregation on the previous aggregation result, the full-volume calculation every time is avoided, the zero-delay real-time performance of the aggregation calculation is realized, and the defects that in the prior art, due to the fact that the full-volume waybill is in the thousands of days, 10 waybill states are each, hundreds of millions of waybill data in the full-volume waybill states every day are generated, consumed resources are huge, and dozens of index results cannot be aggregated every minute on time are overcome. In addition, since data is cached to the distributed stream processing engine, traceability of indexes can be achieved by playing back data.
According to the method provided by the embodiment of the invention, the accuracy of subsequent aggregation calculation can be improved, the calculation load is reduced, the resources are saved by performing data completion processing and data cleaning processing on the acquired real-time streaming data, and for single-quantity index data defined at different times, the single-quantity index data can be aggregated and calculated individually by calling the processing process of the corresponding streaming window, so that the rapid real-time streaming data processing process is realized, the pressure balance is conveniently performed according to the calculated single-quantity index data, and the requirement of reasonable scheduling of the freight notes is realized.
Fig. 3 is a schematic structural diagram of a real-time stream data processing apparatus implemented based on a stream window according to an embodiment of the present invention. As shown in fig. 3, the apparatus includes: the system comprises an acquisition module 301, a data linkage processing module 302 and an aggregation calculation module 303.
An obtaining module 301 adapted to obtain real-time streaming data from at least one data source, the real-time streaming data including waybill streaming data and distribution facility streaming data;
the data connection processing module 302 is adapted to perform data connection processing on waybill flow data and distribution facility flow data to obtain a data width table;
the aggregation calculation module 303 is adapted to receive a calculation task of the single amount of index data for representing the pressure balance state, invoke a processing process of a corresponding stream window according to data characteristics of the single amount of index data, and perform aggregation calculation on data in the data wide table by the processing process of the stream window to obtain corresponding single amount of index data.
Optionally, the apparatus further comprises: the deployment module is suitable for pre-deploying a processing process of a plurality of flow windows, wherein each flow window corresponds to single-amount index data with various data characteristics.
Optionally, the aggregation calculation module is further adapted to: establishing a single-quantity index data coordinate system, and mapping data in the data width table into the single-quantity index data coordinate system, wherein the coordinate parameters of each coordinate axis of the coordinate system are respectively as follows: time, single index data, geospatial information;
and calling a corresponding processing process of the stream window according to the data characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the stream window to obtain the corresponding single-quantity index data.
Optionally, the aggregation calculation module is further adapted to: and calling a processing process of the global window according to the real-time characteristic of the single index data, and carrying out aggregation calculation on the data in the single index data coordinate system by the processing process of the global window to obtain the single index data at each moment, wherein the data in the single index data coordinate system are in the same global window.
Optionally, the aggregation calculation module is further adapted to: calling a processing process of a global window according to the accumulative proportion characteristic of the single-quantity index data, and carrying out aggregation calculation on data in a single-quantity index data coordinate system by the processing process of the global window;
and calling a processing process of the rolling window, and carrying out aggregation calculation on the aggregation result of the processing process of the global window by the processing process of the rolling window according to the window size parameter to obtain accumulated same-proportion single-quantity index data, wherein the window size parameter of the rolling window is determined according to the first aggregation time.
Optionally, the aggregation calculation module is further adapted to: and calling a processing process of the sliding window according to the slicing characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the sliding window according to the window size parameter and the sliding step length to obtain the single-quantity index data of the slice, wherein the window size parameter of the sliding window is determined according to the second aggregation time.
Optionally, the waybill flow data contains one or more of the following dimensional data: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID;
the distribution facility flow data contains one or more of the following dimensional data: a distribution facility ID, a waybill number, and/or a distribution facility order pickup time.
Optionally, the apparatus further comprises: the data completion processing module is suitable for: and if detecting that the real-time stream data has partial dimension data loss, performing data completion processing on the real-time stream data.
Optionally, the apparatus further comprises: and the data cleaning processing module is suitable for cleaning the real-time streaming data if the real-time streaming data is detected to have fraudulent data or data of a pre-order or repeated data.
According to the device provided by the embodiment of the invention, the real-time streaming data is acquired from at least one data source, and the real-time streaming data comprises waybill streaming data and distribution facility streaming data; carrying out data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table; and receiving a calculation task of the single amount index data for representing the pressure balance state, calling a corresponding processing process of the stream window according to the data characteristics of the single amount index data, and performing aggregation calculation on the data in the data wide table by using the processing process of the stream window to obtain the corresponding single amount index data. According to the scheme provided by the embodiment of the invention, for the single-quantity index data defined at different times, the single-quantity index data can be individually aggregated and calculated by calling the processing process of the corresponding flow window, and the rapid real-time flow data processing process is realized, so that the pressure balance can be conveniently carried out according to the calculated single-quantity index data, and the requirement of reasonable scheduling of the freight notes is realized.
An embodiment of the present invention provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer executable instruction may execute the real-time stream data processing method implemented based on a stream window in any of the above method embodiments.
Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
As shown in fig. 4, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.
Wherein: the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. The processor 402, configured to execute the program 410, may specifically execute relevant steps in the above embodiment of the real-time streaming data processing method implemented based on the streaming window for the computing device.
In particular, program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 410 may be specifically configured to enable the processor 402 to execute a real-time stream data processing method implemented based on a stream window in any of the above-described method embodiments. For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in the above real-time stream data processing embodiment implemented based on the stream window, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best modes of embodiments of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. Embodiments of the invention may also be implemented as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (20)

1. A real-time stream data processing method based on stream window implementation includes:
acquiring real-time streaming data from at least one data source, wherein the real-time streaming data comprises waybill streaming data and distribution facility streaming data;
performing data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table;
receiving a calculation task of single amount of index data for representing a pressure balance state, calling a processing process of a corresponding stream window according to data characteristics of the single amount of index data, and performing aggregation calculation on data in the data wide table by the processing process of the stream window to obtain corresponding single amount of index data, wherein the calculation task comprises the following steps: and if the data characteristics are real-time characteristics, calling a processing process of a global window to perform aggregate calculation on the data, wherein the data are in the same global window in the established single-quantity index data coordinate system.
2. The method of claim 1, wherein the method further comprises: the method comprises the steps of deploying a processing process of a plurality of flow windows in advance, wherein each flow window corresponds to single-amount index data with various data characteristics.
3. The method according to claim 1, wherein the invoking of the processing process of the corresponding stream window according to the data characteristics of the single metric data, the performing of the aggregation calculation on the data in the data wide table by the processing process of the stream window, and obtaining the corresponding single metric data further comprises:
establishing a single-quantity index data coordinate system, and mapping the data in the data width table into the single-quantity index data coordinate system, wherein the coordinate parameters of each coordinate axis of the coordinate system are respectively as follows: time, single index data, geospatial information;
and calling a corresponding processing process of the stream window according to the data characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the stream window to obtain the corresponding single-quantity index data.
4. The method according to claim 3, wherein the invoking of the processing process of the corresponding stream window according to the data characteristics of the single amount of index data, the performing of aggregation calculation on the data in the single amount of index data coordinate system by the processing process of the stream window, and obtaining the corresponding single amount of index data further comprises:
and calling a processing process of a global window according to the real-time characteristics of the single-amount index data, and carrying out aggregation calculation on the data in the single-amount index data coordinate system by the processing process of the global window to obtain the single-amount index data at each moment, wherein the data in the single-amount index data coordinate system are in the same global window.
5. The method according to claim 3, wherein the invoking of the processing process of the corresponding stream window according to the data characteristics of the single amount of index data, the performing of aggregation calculation on the data in the single amount of index data coordinate system by the processing process of the stream window, and obtaining the corresponding single amount of index data further comprises:
calling a processing process of a global window according to the accumulative proportion characteristic of the single-amount index data, and carrying out aggregation calculation on the data in the single-amount index data coordinate system by the processing process of the global window;
and calling a processing process of the rolling window, and carrying out aggregation calculation on the aggregation result of the processing process of the global window by the processing process of the rolling window according to the window size parameter to obtain accumulated same-proportion single-quantity index data, wherein the window size parameter of the rolling window is determined according to the first aggregation time.
6. The method according to claim 3, wherein the invoking of the processing process of the corresponding stream window according to the data characteristics of the single amount of index data, the performing of aggregation calculation on the data in the single amount of index data coordinate system by the processing process of the stream window, and obtaining the corresponding single amount of index data further comprises:
and calling a processing process of a sliding window according to the slicing characteristics of the single-amount index data, and carrying out aggregation calculation on the data in the single-amount index data coordinate system by the processing process of the sliding window according to a window size parameter and a sliding step length to obtain the single-amount index data of the slice, wherein the window size parameter of the sliding window is determined according to a second aggregation time.
7. The method of any of claims 1-6, wherein the waybill flow data includes one or more of the following dimensional data: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID;
the distribution facility flow data includes one or more of the following dimensional data: a distribution facility ID, a waybill number, and/or a distribution facility order pickup time.
8. The method of claim 7, wherein after acquiring real-time streaming data from at least one data source, the method further comprises:
and if detecting that partial dimension data of the real-time stream data is missing, performing data completion processing on the real-time stream data.
9. The method of any of claims 1-6, wherein after acquiring real-time streaming data from at least one data source, the method further comprises:
and if the fact that fraudulent data or data of a pre-ordered order or repeated data exist in the real-time streaming data is detected, carrying out data cleaning processing on the real-time streaming data.
10. A real-time stream data processing apparatus implemented based on a stream window, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is suitable for acquiring real-time streaming data from at least one data source, and the real-time streaming data comprises waybill streaming data and distribution facility streaming data;
the data connection processing module is suitable for performing data connection processing on the waybill flow data and the distribution facility flow data to obtain a data width table;
the aggregation calculation module is adapted to receive a calculation task of single amount of index data for representing a pressure balance state, invoke a processing process of a corresponding stream window according to data characteristics of the single amount of index data, and perform aggregation calculation on data in the data wide table by the processing process of the stream window to obtain corresponding single amount of index data, where the aggregation calculation module includes: and if the data characteristics are real-time characteristics, calling a processing process of a global window to perform aggregate calculation on the data, wherein the data are in the same global window in the established single-quantity index data coordinate system.
11. The apparatus of claim 10, wherein the apparatus further comprises: the deployment module is suitable for pre-deploying a processing process of a plurality of flow windows, wherein each flow window corresponds to single-amount index data with various data characteristics.
12. The apparatus of claim 10, wherein the aggregate calculation module is further adapted to: establishing a single-quantity index data coordinate system, and mapping the data in the data width table into the single-quantity index data coordinate system, wherein the coordinate parameters of each coordinate axis of the coordinate system are respectively as follows: time, single index data, geospatial information;
and calling a corresponding processing process of the stream window according to the data characteristics of the single-quantity index data, and carrying out aggregation calculation on the data in the single-quantity index data coordinate system by the processing process of the stream window to obtain the corresponding single-quantity index data.
13. The apparatus of claim 12, wherein the aggregate calculation module is further adapted to: and calling a processing process of a global window according to the real-time characteristics of the single-amount index data, and carrying out aggregation calculation on the data in the single-amount index data coordinate system by the processing process of the global window to obtain the single-amount index data at each moment, wherein the data in the single-amount index data coordinate system are in the same global window.
14. The apparatus of claim 12, wherein the aggregate calculation module is further adapted to: calling a processing process of a global window according to the accumulative proportion characteristic of the single-amount index data, and carrying out aggregation calculation on the data in the single-amount index data coordinate system by the processing process of the global window;
and calling a processing process of the rolling window, and carrying out aggregation calculation on the aggregation result of the processing process of the global window by the processing process of the rolling window according to the window size parameter to obtain accumulated same-proportion single-quantity index data, wherein the window size parameter of the rolling window is determined according to the first aggregation time.
15. The apparatus of claim 12, wherein the aggregate calculation module is further adapted to: and calling a processing process of a sliding window according to the slicing characteristics of the single-amount index data, and carrying out aggregation calculation on the data in the single-amount index data coordinate system by the processing process of the sliding window according to a window size parameter and a sliding step length to obtain the single-amount index data of the slice, wherein the window size parameter of the sliding window is determined according to a second aggregation time.
16. The apparatus of any of claims 10-15, wherein the waybill flow data comprises one or more of the following dimensional data: the waybill number, the waybill state, the waybill establishing time, the waybill rejecting time, the waybill completing time, the business district ID, the grid ID and the site ID;
the distribution facility flow data includes one or more of the following dimensional data: a distribution facility ID, a waybill number, and/or a distribution facility order pickup time.
17. The apparatus of claim 16, wherein the apparatus further comprises: the data completion processing module is suitable for: and if detecting that partial dimension data of the real-time stream data is missing, performing data completion processing on the real-time stream data.
18. The apparatus of any one of claims 10-15, wherein the apparatus further comprises: and the data cleaning processing module is suitable for cleaning the real-time streaming data if the real-time streaming data is detected to have fraudulent data or data of a pre-order or repeated data.
19. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the real-time streaming data processing method based on the streaming window implementation according to any one of claims 1 to 9.
20. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the stream window based real-time stream data processing method according to any one of claims 1 to 9.
CN202010033093.3A 2020-01-13 2020-01-13 Real-time stream data processing method and device based on stream window Active CN111210156B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010033093.3A CN111210156B (en) 2020-01-13 2020-01-13 Real-time stream data processing method and device based on stream window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010033093.3A CN111210156B (en) 2020-01-13 2020-01-13 Real-time stream data processing method and device based on stream window

Publications (2)

Publication Number Publication Date
CN111210156A CN111210156A (en) 2020-05-29
CN111210156B true CN111210156B (en) 2022-04-01

Family

ID=70788157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010033093.3A Active CN111210156B (en) 2020-01-13 2020-01-13 Real-time stream data processing method and device based on stream window

Country Status (1)

Country Link
CN (1) CN111210156B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800146B (en) * 2021-02-02 2024-05-14 北京互金新融科技有限公司 Backtracking method and device of wind control data, storage medium and processor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957832A (en) * 2009-07-16 2011-01-26 Sap股份公司 Unified window support for the flow of event data management
CN106599182A (en) * 2016-12-13 2017-04-26 飞狐信息技术(天津)有限公司 Feature engineering recommendation method and device based on spark streaming real-time streams and video website
CN109271412A (en) * 2018-09-28 2019-01-25 中国-东盟信息港股份有限公司 The real-time streaming data processing method and system of smart city
CN109408347A (en) * 2018-09-28 2019-03-01 北京九章云极科技有限公司 A kind of index real-time analyzer and index real-time computing technique
CN110058977A (en) * 2019-01-14 2019-07-26 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, device and equipment based on Stream Processing
CN110471944A (en) * 2018-05-11 2019-11-19 北京京东尚科信息技术有限公司 Indicator-specific statistics method, system, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005391A1 (en) * 2006-06-05 2008-01-03 Bugra Gedik Method and apparatus for adaptive in-operator load shedding
IN2013CH01044A (en) * 2013-03-12 2015-08-14 Yahoo Inc
US10931854B2 (en) * 2017-10-26 2021-02-23 Facebook, Inc. Aggregating video streams from cameras based on social connections in an online system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957832A (en) * 2009-07-16 2011-01-26 Sap股份公司 Unified window support for the flow of event data management
CN106599182A (en) * 2016-12-13 2017-04-26 飞狐信息技术(天津)有限公司 Feature engineering recommendation method and device based on spark streaming real-time streams and video website
CN110471944A (en) * 2018-05-11 2019-11-19 北京京东尚科信息技术有限公司 Indicator-specific statistics method, system, equipment and storage medium
CN109271412A (en) * 2018-09-28 2019-01-25 中国-东盟信息港股份有限公司 The real-time streaming data processing method and system of smart city
CN109408347A (en) * 2018-09-28 2019-03-01 北京九章云极科技有限公司 A kind of index real-time analyzer and index real-time computing technique
CN110058977A (en) * 2019-01-14 2019-07-26 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, device and equipment based on Stream Processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Flink Program Guide(6)--窗口(DataStream API编程指导 -- For Java)";张安;《博客园》;20160816;全文 *

Also Published As

Publication number Publication date
CN111210156A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
US9606530B2 (en) Decision support system for order prioritization
CN108492150B (en) Method and system for determining entity heat degree
CN112506619B (en) Job processing method, job processing device, electronic equipment and storage medium
CN116090808A (en) RPA breakpoint reconstruction method and device, electronic equipment and medium
CN106339252A (en) Self-adaptive optimization method and device for distributed DAG system
CN114490078A (en) Dynamic capacity reduction and expansion method, device and equipment for micro-service
CN111210156B (en) Real-time stream data processing method and device based on stream window
US9292405B2 (en) HANA based multiple scenario simulation enabling automated decision making for complex business processes
CN116227599A (en) Inference model optimization method and device, electronic equipment and storage medium
CN116400888A (en) Supply chain service data early warning method and device, electronic equipment and storage medium
CN112464569A (en) Machine learning method and system
CN106648839A (en) Method and device for processing data
CN110442598A (en) A kind of data query method and apparatus
CN113742036B (en) Index processing method and device and electronic equipment
CN107194712B (en) Method and device for recording change information of shared account and method and system for supplementing account of internal account
CN117369941A (en) Pod scheduling method and system
CN110362387B (en) Distributed task processing method, device, system and storage medium
CN108470242B (en) Risk management and control method, device and server
CN113553180B (en) Container scheduling method and device and electronic equipment
CN115409537A (en) Method and equipment for determining subsidies of users
CN112181443B (en) Automatic service deployment method and device and electronic equipment
CN112181825A (en) Test case library construction method and device, electronic equipment and medium
CN117290113B (en) Task processing method, device, system and storage medium
CN109723610A (en) Generating set rate of load condensate lacks value complement and recruits method and device
CN113220230B (en) Data export method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant