CN117009094B - Data oblique scattering method and device, electronic equipment and storage medium - Google Patents

Data oblique scattering method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117009094B
CN117009094B CN202311282068.9A CN202311282068A CN117009094B CN 117009094 B CN117009094 B CN 117009094B CN 202311282068 A CN202311282068 A CN 202311282068A CN 117009094 B CN117009094 B CN 117009094B
Authority
CN
China
Prior art keywords
service data
data
scattering
sampling
aggregation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311282068.9A
Other languages
Chinese (zh)
Other versions
CN117009094A (en
Inventor
杨田
韩丰景
韩勇
蒋晓艺
郑朴汉
王永强
夏百川
吴璟
陈国利
沈梦伶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Online Information Technology Co Ltd
Original Assignee
China Unicom Online Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Online Information Technology Co Ltd filed Critical China Unicom Online Information Technology Co Ltd
Priority to CN202311282068.9A priority Critical patent/CN117009094B/en
Publication of CN117009094A publication Critical patent/CN117009094A/en
Application granted granted Critical
Publication of CN117009094B publication Critical patent/CN117009094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data oblique scattering method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: and acquiring a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to a configured sampling proportion. When the service data contains the data sample, the service data is sampled according to the sampling proportion to obtain the service data sample. Based on the number of kinds of the service data samples, the overall duty ratio of each service data sample is calculated to obtain a scattering factor. And generating a scattering configuration formed by the scattering factors and the business data samples according to the scattering factors and the business data samples. And performing primary aggregation on the service data samples containing the scattered service data, removing scattering factors after the primary aggregation, and performing secondary aggregation to obtain an aggregation result. The method reduces the influence on business service and has higher flexibility by identifying data inclination and giving out scattering factors.

Description

Data oblique scattering method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of big data distributed real-time computing technologies, and in particular, to a data oblique scattering method, device, electronic device, and storage medium.
Background
The large data distribution real-time computation refers to a processing technology for carrying out high concurrency and low delay on data by utilizing a plurality of computing devices in a large-scale data volume scene. Large data distributed real-time computing has three features, unlimited data, unbounded data processing, and low latency, respectively, where unlimited data refers to an ever-growing substantially unlimited set of data, commonly referred to as "streaming data", as opposed to a preferential set of data. Unbounded data processing refers to a continuous data processing mode that can repeatedly process unlimited data through a processing engine, which can break through the bottleneck of a limited data processing engine. Low latency refers to the fact that there is no clear definition of how much to delay, but the value of the data will decrease over time and timeliness needs to be continuously addressed.
Existing big data real-time computing technologies such as Spark-streaming (real-time computing framework built on Spark), flink (distributed stream processing framework), and the like are all stream processing frameworks implemented based on an MR (MapReduce) computing model. The MapReduce fully uses the thought of 'divide and conquer', and splits a data processing process into two steps of main Map and Reduce, wherein MapReduce is the 'task decomposition and result summarization'. The MR model can fully utilize a plurality of equipment resources to process data in parallel, so as to achieve the purpose of large-scale real-time processing. Map reads input slice data, and one input slice (input split) performs Map logic processing for one Map task. Reduce performs Reduce processing logic on the output of maps in a packet aggregation and outputs the result. And carrying out data interaction transmission between the Map and the Reduce through a grouping key, and summarizing and calculating the data with the same key value.
In an actual data application scenario, due to different service characteristics and customer behaviors, data distribution is not uniform, especially in a ToB (for enterprise user service) service scenario, data of a few customers usually occupy most of the total, and in this case, data distribution is extremely unbalanced, so that a serious data inclination problem occurs in an MR model when the data is processed. The data tilting means that in task allocation and execution of batch data, task data dispersion is insufficient, task data amounts corresponding to different computing nodes are unbalanced, task data amounts of one (some) computing nodes are huge and need to bear huge pressure, and task data amounts of the other (some) computing nodes are less. The data distribution is inclined in theory, and accords with the 'two eight principle'.
When data is inclined, the data is processed by a single node of the distributed computation, so that the computation pressure is excessive, and the progress of the whole computation task is slowed down and even crashed to fail. The existing data tilting processing method lacks certain flexibility by scattering fixed data fields, and needs to restart a program when the scattered fields are modified, so that business service is affected to a certain extent.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data skew scattering method, apparatus, electronic device, and storage medium that have high flexibility and can reduce the degree of influence on business services.
The invention provides a data tilting and scattering method, which comprises the following steps:
acquiring a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion;
when the service data contains the data sample, sampling the service data according to the sampling proportion to obtain a service data sample;
calculating the overall duty ratio of each service data sample based on the variety number of the service data samples so as to obtain a scattering factor;
generating a scattering configuration formed by the scattering factors and the business data samples according to the scattering factors and the business data samples;
and performing primary aggregation on the business data samples containing the business data of the scattering configuration, removing the scattering factors after the primary aggregation, and performing secondary aggregation to obtain an aggregation result.
In one embodiment, the acquiring a data sample and service data to be perceived, where the data sample is obtained by sampling a configured field to be aggregated according to a configured sampling proportion, includes:
acquiring the field to be aggregated and the sampling proportion through configuration;
and sampling the fields to be aggregated according to the sampling proportion to obtain the data sample, wherein the data sample consists of a plurality of first fields and configuration scattering factors.
In one embodiment, the service data to be perceived is composed of a plurality of second fields;
when the service data contains the data sample, sampling the service data according to the sampling proportion to obtain a service data sample, including:
sampling the plurality of second fields according to the sampling proportion to obtain a plurality of third fields;
and acquiring the service data sample based on the third fields, wherein the third fields jointly form the service data sample.
In one embodiment, when the service data includes the data sample, the service data is sampled according to the sampling proportion to obtain a service data sample, and then the method includes:
acquiring the sampling time of the service data sample, a first field value corresponding to the third field and the sampling times of the service data sample;
storing the business data samples according to a first storage format;
the first storage format is a storage format of the sampling time, the first field value and the sampling times.
In one embodiment, the calculating, based on the number of kinds of the service data samples, an overall duty ratio of each of the service data samples to obtain a scattering factor includes:
calculating the kind number of the service data samples, and sequencing the service data samples to obtain the service data sample with the highest kind number;
and acquiring the proportion of the service data samples with the highest category number to the category number based on the category number and the service data samples with the highest category number.
In one embodiment, the method further comprises:
judging whether the proportion of the service data samples with the highest category number to the category number exceeds a first threshold value or not; if yes, then
And identifying the service data sample with the highest category number as a data inclination field.
In one embodiment, the performing the first aggregation on the service data samples including the service data of the scattering configuration, removing the scattering factor after the first aggregation, and performing the second aggregation to obtain an aggregation result includes:
associating the service data with a break-up configuration, and adding the break-up factor to the service data when the service data contains the break-up configuration;
performing aggregation calculation on the business data samples containing the scattering factors to obtain a preliminary aggregation result;
and removing the scattering factors in the business data sample based on the preliminary aggregation result, and performing aggregation calculation again to obtain the aggregation result.
The invention also provides a data tilting and scattering device, which comprises:
the data acquisition module is used for acquiring data samples and service data to be perceived, wherein the data samples are obtained by sampling configured fields to be aggregated according to configured sampling proportions;
the data sampling module is used for sampling the service data according to the sampling proportion when the service data contains the data sample to obtain a service data sample;
the scattering factor analysis module is used for calculating the overall duty ratio of each business data sample based on the variety number of the business data samples so as to obtain scattering factors;
the scattering configuration generation module is used for generating scattering configuration formed by the scattering factors and the business data samples according to the scattering factors and the business data samples;
and the aggregation calculation module is used for carrying out primary aggregation on the business data samples containing the business data of the scattering configuration, removing the scattering factors after the primary aggregation and carrying out secondary aggregation to obtain an aggregation result.
The invention also provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor realizes the data tilting and scattering method according to any one of the above when executing the computer program.
The invention also provides a computer storage medium storing a computer program which when executed by a processor implements a data tilting and scattering method as described in any of the above.
According to the data oblique scattering method, the device, the electronic equipment and the storage medium, the data sample obtained by sampling the configured fields to be aggregated according to the configured sampling proportion and the service data to be perceived are obtained, and when the service data to be perceived contains the data sample, the service data is sampled according to the configured sampling proportion, so that the service data sample is obtained. And then, calculating the overall duty ratio of each service data sample based on the variety number of the service data samples, and acquiring a corresponding scattering factor. Then, according to the scattering factor and the business data sample, a scattering configuration formed by the scattering factor and the business data sample is generated. And finally, performing primary aggregation calculation on the service data sample containing the scattered service data, removing scattering factors after the primary aggregation calculation, and performing secondary aggregation to obtain a final aggregation result, thereby completing the scattering treatment of data inclination. According to the method, efficient and real-time data inclination sensing can be realized through a configured sampling mechanism, and the sampling device is changed in a state that a computing task is not restarted, so that the influence on business service is reduced. In addition, by identifying the data inclination and giving out the scattering factors, the method can operate as independent service and has higher flexibility.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a data skew scattering method according to the present invention;
FIG. 2 is a second flow chart of the data skew scattering method according to the present invention;
FIG. 3 is a third flow chart of the data skew scattering method according to the present invention;
FIG. 4 is a schematic diagram of a data skew scattering method according to the present invention;
FIG. 5 is a flowchart of a data skew scattering method according to the present invention;
FIG. 6 is a flowchart of a data skew scattering method according to the present invention;
FIG. 7 is a schematic diagram of a data skew scattering method according to the present invention;
FIG. 8 is a flow chart of a data skew scattering method according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a data oblique scattering device according to the present invention;
fig. 10 is an internal structural diagram of a computer device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The data skew scattering method, apparatus, electronic device and storage medium of the present invention are described below with reference to fig. 1 to 10.
As shown in fig. 1, in one embodiment, a data skew scattering method includes the steps of:
step S110, obtaining a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to a configured sampling proportion.
Specifically, the server acquires a data sample obtained by sampling the configured fields to be aggregated according to the configured sampling proportion and current service data to be perceived.
Step S120, when the service data contains data samples, the service data is sampled according to the sampling proportion, and the service data samples are obtained.
Specifically, when the service data to be perceived obtained in step S110 includes the data sample obtained in step S110, the current service data to be perceived is sampled according to the configured sampling proportion, so as to obtain a corresponding service data sample.
Step S130, calculating the overall duty ratio of each service data sample based on the variety number of the service data samples to obtain a scattering factor.
Specifically, the server calculates the overall duty ratio corresponding to each service data sample based on the number of kinds of service data samples obtained in step S130, and further obtains a corresponding scattering factor.
Step S140, according to the scattering factors and the business data samples, scattering configuration formed by the scattering factors and the business data samples is generated.
Specifically, the server generates a scattering configuration composed of the scattering factor and the service data sample according to the scattering factor obtained in step S130 and the service data sample obtained in step S120.
Step S150, carrying out primary aggregation on the business data samples containing the business data with the scattered configuration, removing the scattered factors after the primary aggregation, and carrying out secondary aggregation to obtain an aggregation result.
Specifically, the server performs primary aggregation calculation on the service data sample containing the service data in the scattering configuration to obtain a primary aggregation result, removes scattering factors in the service data on the basis of the primary aggregation result, and performs aggregation calculation again to obtain a final aggregation result, namely, the data tilting scattering processing is completed.
According to the data tilting scattering method, the data sample obtained by sampling the configured fields to be aggregated according to the configured sampling proportion and the service data to be perceived are obtained, and when the service data to be perceived contains the data sample, the service data is sampled according to the configured sampling proportion, so that the service data sample is obtained. And then, calculating the overall duty ratio of each service data sample based on the variety number of the service data samples, and acquiring a corresponding scattering factor. Then, according to the scattering factor and the business data sample, a scattering configuration formed by the scattering factor and the business data sample is generated. And finally, performing primary aggregation calculation on the service data sample containing the scattered service data, removing scattering factors after the primary aggregation calculation, and performing secondary aggregation to obtain a final aggregation result, thereby completing the scattering treatment of data inclination. According to the method, efficient and real-time data inclination sensing can be realized through a configured sampling mechanism, and the sampling device is changed in a state that a computing task is not restarted, so that the influence on business service is reduced. In addition, by identifying the data inclination and giving out the scattering factors, the method can operate as independent service and has higher flexibility.
As shown in fig. 2, in one embodiment, the data oblique scattering method provided by the present invention obtains a data sample and service data to be perceived, where the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportions, and includes the following steps:
in step S210, the fields to be aggregated and the sampling ratio are obtained by configuration.
Specifically, the server first obtains the fields to be aggregated and the sampling proportion by configuration before obtaining the data samples.
Step S220, sampling the fields to be aggregated according to the sampling proportion to obtain a data sample, wherein the data sample is composed of a plurality of first fields and configuration scattering factors.
Specifically, the server samples the fields to be aggregated obtained in step S210 according to a sampling ratio to obtain a required data sample, where the data sample is composed of a plurality of first fields and the scattering factors of the original configuration.
As shown in fig. 3, in one embodiment, when the service data includes a data sample, the data oblique scattering method provided by the present invention samples the service data according to a sampling proportion to obtain the service data sample, and specifically includes the following steps:
step S122, sampling the plurality of second fields according to the sampling proportion to obtain a plurality of third fields.
It should be noted that the service data to be perceived currently is composed of a plurality of second fields.
Specifically, the server samples the plurality of second fields according to the configured sampling proportion to obtain a plurality of third fields.
In step S124, a service data sample is acquired based on the plurality of third fields, and the plurality of third fields together form the service data sample.
Specifically, the server obtains a service data sample of the service data to be perceived currently based on the plurality of third fields obtained in step S122, where the service data sample is jointly formed by the plurality of third fields.
As shown in fig. 4, in one embodiment, when the service data includes a data sample, the data oblique scattering method provided by the present invention samples the service data according to a sampling proportion to obtain the service data sample, and then includes the following steps:
in step S410, the sampling time of the service data sample, the first field value corresponding to the third field, and the sampling times of the service data sample are obtained.
Specifically, the server acquires the sampling time of the service data sample, the field value corresponding to the third field, that is, the first field value and the sampling times of the service data sample.
Each third field corresponds to a field value, and the field values corresponding to different third fields are different.
Step S420, storing the service data samples according to the first storage format.
The first storage format is a storage format formed by the sampling time, the first field value and the sampling times obtained in step S410.
Specifically, the server stores the service data sample according to a storage format formed by the sampling time, the first field value and the sampling times.
As shown in fig. 5, in one embodiment, the data oblique scattering method provided by the present invention calculates the overall duty ratio of each service data sample based on the number of kinds of service data samples to obtain a scattering factor, and specifically includes the following steps:
step S132, the kind number of the service data samples is calculated, and the service data samples are ordered to obtain the service data sample with the highest kind number.
Specifically, the server obtains the category number of the service data samples through calculation, and sorts the service data samples to obtain the service data sample with the highest category number.
In step S134, based on the number of categories and the service data samples with the highest number of categories, the ratio of the service data samples with the highest number of categories to the number of categories is obtained.
Specifically, the server obtains the proportion of the service data sample with the highest category number to the total category number based on the total category number of the service data samples and the service data sample with the highest category number obtained in step S132.
As shown in fig. 6, in one embodiment, the data skew scattering method provided by the present invention further includes the following steps:
in step S610, it is determined whether the ratio of the service data samples with the highest category number to the category number exceeds the first threshold.
Specifically, the server determines whether the proportion of the service data sample with the highest category to the service data sample with the total category data amount exceeds a set threshold.
In step S620, the service data sample with the highest category number is identified as the data skew field.
Specifically, when the judging result in step S610 is that the proportion of the service data samples with the highest corresponding category number to the total category number of the service data samples exceeds the set threshold, the service data sample with the highest category number is identified as the data skew field.
As shown in fig. 7, in one embodiment, the data oblique scattering method provided by the present invention performs primary aggregation on a service data sample containing service data of a scattering configuration, and removes a scattering factor after the primary aggregation to perform secondary aggregation, so as to obtain an aggregation result, and specifically includes the following steps:
in step S152, the service data and the break-up configuration are associated, and a break-up factor is added to the service data when the service data includes the break-up configuration.
Specifically, in the process of performing aggregation calculation, the server associates service data with a scattering configuration generated in advance, and adds a scattering factor given in advance into the service data when the service data contains the scattering configuration.
And step S154, performing aggregation calculation on the business data samples containing the scattering factors to obtain a preliminary aggregation result.
Specifically, the server performs aggregation calculation on the business data samples containing the scattering factors to obtain a preliminary aggregation result.
Step S156, based on the preliminary aggregation result, removing the scattering factors in the service data sample, and performing aggregation calculation again to obtain an aggregation result.
Specifically, the server removes the scattering factor in the service data sample based on the preliminary aggregation result obtained in step S154, and performs aggregation calculation on the service data sample again to obtain a final aggregation result.
Referring to fig. 8, in a specific embodiment, the data tilting and scattering method provided by the present invention firstly provides a tilting real-time sensing service, that is, a sampling KEY is configured with an aggregation KEY to be sensed and a sampling proportion, and when service data is sampled, a corresponding KEY is sampled according to proportion by a computing node, and then is output to an external storage for further analysis. And secondly, providing an intelligent break-up analysis service, namely summarizing data sampling results of all computing nodes through break-up factor analysis to form global data distribution, calculating data break-up factors, and generating a break-up configuration interface for a computing task to access according to the calculated break-up factors. Scattering KEY matching: the computing task accesses the break-up interface and queries break-up factors corresponding to different data applications. And finally, carrying out primary aggregation on the data of the aggregation KEY plus the scattering factors through the scattering aggregation, reducing the data quantity and the gradient after aggregation, removing the scattering factors through the result of the scattering aggregation on the tape to carry out secondary aggregation calculation to form a final result, and carrying out multistage scattering on specific scenes which are insufficient for eliminating the data gradient during one-time scattering through multistage scattering.
Service data representing original service data to be processed, such as field 1, field 2, field 3, field 4, field 5, … …, and sampling KEY configured as fields to be aggregated according to service logic, such as: configuration samples [ KEY (field 1, field 2, field 3), RATIO (0.01) ], RATIO (0.01) is a configuration break-up factor. And the data input is used for reading the sampling device KEY and the original service data for the calculation task.
During data sampling, a corresponding proportion of the service data KEY (field 1, field 2, field 3) is extracted in the calculation task according to the sampling configuration. The sampled data is then stored, i.e., in a format such as [ TIME, KEY (field 1 value, field 2 value, field 3 value), COUNT ], where TIME represents the sampling TIME, the sampled data is saved at 1 minute granularity, and a timeline change that represents the data skew. KEY represents the field values that need to be aggregated, KEY (field 1-value a, field 2-value b, field 3-value c). COUNT represents the number of samplings, and the same KEY COUNTs are accumulated in the TIME to represent the inclination degree of different KEY values.
In the process of the break-up factor analysis, the number of sample KEY types is calculated, denoted as distict COUNT (KEY) =s, and the total COUNT is calculated, denoted as SUM (COUNT) =c. The sampled and stored data KEYs are ordered, and TOPN KEYs with highest COUNT ranking are taken, and n=s×5%, and are marked as SUM (TOPN. COUNT) =t. If T/C >60% (thresholding), these KEYs are identified as data-skewed KEYs. The overall duty cycle of each KEY was calculated and noted as top.count/t=f, which was taken as a break-up factor, e.g. F (KEY 1, 60%) =60, F (KEY 2, 40%) =40.
In breaking up KEY configuration, a configuration format is selected, such as [ KEY (field 1 value, field 2 value, field 3 value), RATIO ], where KEY represents the field value that needs to be aggregated, KEY (field 1-value a, field 2-value b, field 3-value c), RATIO represents the breaking factor.
In the process of aggregating KEY calculation, associating and matching service data with break-up configuration, if the service data contains break-up KEYs, carrying out break-up logic processing on the service data:
adding a breaking factor: according to the scattering factors matched by the service data aggregation KEY association, a RANDOM algorithm is adopted to generate a scattering KEY, for example, the KEY (field 1-value a, field 2-value b, field 3-value c) →key (field 1-value a, field 2-value b, field 3-value c, RANDOM (0, RANDOM)), so that the original KEY can be randomly distributed in the RANDOM (0, RANDOM) interval.
Carrying out scattering polymerization: and performing aggregation calculation on the data with the scattering factors to obtain a preliminary aggregation result. Because of the random distribution of the increased scattering in the preliminary aggregation calculation, the data inclination problem does not exist in the further aggregation calculation, and the data quantity is effectively reduced after aggregation.
Removing and polymerizing: as the result KEY of the preliminary aggregation calculation is provided with the scattering factors, the scattering factors in the KEY are removed during the secondary aggregation calculation, and the original service KEY is restored, namely the KEY (field 1-value a, field 2-value b, field 3-value c, RANDOM (0, RATIO)). The KEY (field 1-value a, field 2-value b, field 3-value c), and the aggregation calculation is performed again, so as to obtain the aggregation result of the service logic requirement. Based on the preliminary aggregation result, the data quantity is further reduced in aggregation calculation processing, and data inclination is effectively avoided.
According to the data inclination scattering method, the data inclination sensing realizes the efficient real-time data inclination sensing capability through the configured sampling interface, and the sampling configuration can be changed in a state that the computing task is not restarted, so that the usability of the computing task is greatly improved. The scattering factor analysis identifies data inclination through a specific analysis method and gives out the scattering factor, and the scattering factor analysis can be operated as an independent service, so that the analysis algorithm is convenient to update. The data are obliquely scattered, the scattering factors are obtained in real time through the configured scattering interfaces, different scattering factors can be applied to different periods, the stability of calculation tasks in the peak period is effectively guaranteed, and the flow flood peak is dealt with.
The data incline breaking device provided by the invention is described below, and the data incline breaking device and the data incline breaking method described below can be correspondingly referred to each other.
As shown in fig. 9, in one embodiment, a data incline breaking apparatus includes a data acquisition module 910, a data sampling module 920, a breaking factor analysis module 930, a breaking configuration generation module 940, and an aggregation calculation module 950.
The data acquisition module 910 is configured to acquire a data sample and service data to be perceived, where the data sample is obtained by sampling a configured field to be aggregated according to a configured sampling proportion.
The data sampling module 920 is configured to sample the service data according to a sampling ratio when the service data includes a data sample, so as to obtain the service data sample.
The break-up factor analysis module 930 is configured to calculate an overall duty ratio of each service data sample based on the number of kinds of service data samples, so as to obtain a break-up factor.
The break-up configuration generating module 940 is configured to generate a break-up configuration that is formed by the break-up factor and the service data sample together according to the break-up factor and the service data sample.
The aggregation calculation module 950 is configured to perform primary aggregation on the service data samples including the service data configured by scattering, and remove the scattering factor after the primary aggregation to perform secondary aggregation, so as to obtain an aggregation result.
In this embodiment, the data inclination scattering device provided by the present invention further includes a data configuration module, configured to:
and acquiring the fields to be aggregated and the sampling proportion through configuration.
Sampling the fields to be aggregated according to the sampling proportion to obtain a data sample, wherein the data sample is composed of a plurality of first fields and configuration scattering factors.
In this embodiment, the data inclination scattering device provided by the invention, the data sampling module is specifically configured to:
and sampling the plurality of second fields according to the sampling proportion to obtain a plurality of third fields.
And acquiring a service data sample based on a plurality of third fields, wherein the plurality of third fields jointly form the service data sample.
In this embodiment, the data incline breaking device provided by the present invention further includes a data storage module, configured to:
and acquiring the sampling time of the service data sample, the first field value corresponding to the third field and the sampling times of the service data sample.
And storing the business data samples according to a first storage format.
The first storage format is a storage format of sampling time, a first field value and sampling times.
In this embodiment, the data inclination scattering device provided by the invention, the scattering factor analysis module is specifically configured to:
and calculating the type number of the service data samples, and sequencing the service data samples to obtain the service data samples with the highest type number.
And acquiring the proportion of the service data samples with the highest category number to the category number based on the category number and the service data samples with the highest category number.
In this embodiment, the data inclination scattering device provided by the present invention further includes a data inclination identifying module, configured to:
and judging whether the proportion of the service data samples with the highest category number to the category number exceeds a first threshold value. If yes, then
The service data sample with the highest category number is identified as a data tilting field.
In this embodiment, the data inclination scattering device provided by the present invention, the aggregation calculation module is specifically configured to:
associating the business data with the break-up configuration, and adding a break-up factor into the business data when the business data contains the break-up configuration.
And carrying out aggregation calculation on the business data samples containing the scattering factors to obtain a preliminary aggregation result.
And removing scattering factors in the service data samples based on the preliminary aggregation result, and performing aggregation calculation again to obtain an aggregation result.
Fig. 10 illustrates a physical structure diagram of an electronic device, which may be an intelligent terminal, and an internal structure diagram thereof may be as shown in fig. 10. The electronic device includes a processor, an internal memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a data skew breaking method, the method comprising:
acquiring a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion;
when the service data contains a data sample, sampling the service data according to a sampling proportion to obtain the service data sample;
calculating the overall duty ratio of each service data sample based on the variety number of the service data samples to obtain a scattering factor;
according to the scattering factors and the business data samples, generating scattering configuration formed by the scattering factors and the business data samples together;
and performing primary aggregation on the service data samples containing the scattered service data, removing scattering factors after the primary aggregation, and performing secondary aggregation to obtain an aggregation result.
It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the electronic device to which the present inventive arrangements are applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In another aspect, the present invention also provides a computer storage medium storing a computer program, which when executed by a processor, implements a data tilting and scattering method, the method including:
acquiring a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion;
when the service data contains a data sample, sampling the service data according to a sampling proportion to obtain the service data sample;
calculating the overall duty ratio of each service data sample based on the variety number of the service data samples to obtain a scattering factor;
according to the scattering factors and the business data samples, generating scattering configuration formed by the scattering factors and the business data samples together;
and performing primary aggregation on the service data samples containing the scattered service data, removing scattering factors after the primary aggregation, and performing secondary aggregation to obtain an aggregation result.
In yet another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. A processor of an electronic device reads the computer instructions from a computer readable storage medium, the processor executing the computer instructions to implement a data tilting and scattering method, the method comprising:
acquiring a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion;
when the service data contains a data sample, sampling the service data according to a sampling proportion to obtain the service data sample;
calculating the overall duty ratio of each service data sample based on the variety number of the service data samples to obtain a scattering factor;
according to the scattering factors and the business data samples, generating scattering configuration formed by the scattering factors and the business data samples together;
and performing primary aggregation on the service data samples containing the scattered service data, removing scattering factors after the primary aggregation, and performing secondary aggregation to obtain an aggregation result.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory.
By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (8)

1. A method of oblique scattering of data, the method comprising:
acquiring a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion;
when the service data contains the data sample, sampling the service data according to the sampling proportion to obtain a service data sample;
calculating the overall duty ratio of each service data sample based on the variety number of the service data samples so as to obtain a scattering factor;
generating a scattering configuration formed by the scattering factors and the business data samples according to the scattering factors and the business data samples;
performing primary aggregation on service data samples containing the scattered service data, removing the scattered factors after the primary aggregation, and performing secondary aggregation to obtain an aggregation result;
the method comprises the steps of obtaining a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion, and comprises the following steps:
acquiring the field to be aggregated and the sampling proportion through configuration;
sampling the fields to be aggregated according to the sampling proportion to obtain the data sample, wherein the data sample is composed of a plurality of first fields and configuration scattering factors;
the step of performing primary aggregation on the service data sample containing the service data of the scattering configuration, removing the scattering factor after the primary aggregation, and performing secondary aggregation to obtain an aggregation result, including:
associating the service data with a break-up configuration, and adding the break-up factor to the service data when the service data contains the break-up configuration;
performing aggregation calculation on the business data samples containing the scattering factors to obtain a preliminary aggregation result;
and removing the scattering factors in the business data sample based on the preliminary aggregation result, and performing aggregation calculation again to obtain the aggregation result.
2. The data oblique scattering method as claimed in claim 1, wherein the service data to be perceived is composed of a plurality of second fields together;
when the service data contains the data sample, sampling the service data according to the sampling proportion to obtain a service data sample, including:
sampling the plurality of second fields according to the sampling proportion to obtain a plurality of third fields;
and acquiring the service data sample based on the third fields, wherein the third fields jointly form the service data sample.
3. The method for oblique scattering of data according to claim 2, wherein when the service data includes the data samples, sampling the service data according to the sampling ratio to obtain service data samples, and then comprising:
acquiring the sampling time of the service data sample, a first field value corresponding to the third field and the sampling times of the service data sample;
storing the business data samples according to a first storage format;
the first storage format is a storage format of the sampling time, the first field value and the sampling times.
4. The data oblique scattering method of claim 1, wherein calculating the overall duty ratio of each of the service data samples based on the number of kinds of the service data samples to obtain the scattering factor comprises:
calculating the kind number of the service data samples, and sequencing the service data samples to obtain the service data sample with the highest kind number;
and acquiring the proportion of the service data samples with the highest category number to the category number based on the category number and the service data samples with the highest category number.
5. The method of oblique scattering of data of claim 4, further comprising:
judging whether the proportion of the service data samples with the highest category number to the category number exceeds a first threshold value or not; if yes, then
And identifying the service data sample with the highest category number as a data inclination field.
6. A data skew scattering apparatus, the apparatus comprising:
the data acquisition module is used for acquiring data samples and service data to be perceived, wherein the data samples are obtained by sampling configured fields to be aggregated according to configured sampling proportions;
the data sampling module is used for sampling the service data according to the sampling proportion when the service data contains the data sample to obtain a service data sample;
the scattering factor analysis module is used for calculating the overall duty ratio of each business data sample based on the variety number of the business data samples so as to obtain scattering factors;
the scattering configuration generation module is used for generating scattering configuration formed by the scattering factors and the business data samples according to the scattering factors and the business data samples;
the aggregation calculation module is used for carrying out primary aggregation on the business data samples containing the business data of the scattering configuration, removing the scattering factors after the primary aggregation and carrying out secondary aggregation to obtain an aggregation result;
the method comprises the steps of obtaining a data sample and service data to be perceived, wherein the data sample is obtained by sampling configured fields to be aggregated according to configured sampling proportion, and comprises the following steps:
acquiring the field to be aggregated and the sampling proportion through configuration;
sampling the fields to be aggregated according to the sampling proportion to obtain the data sample, wherein the data sample is composed of a plurality of first fields and configuration scattering factors;
the step of performing primary aggregation on the service data sample containing the service data of the scattering configuration, removing the scattering factor after the primary aggregation, and performing secondary aggregation to obtain an aggregation result, including:
associating the service data with a break-up configuration, and adding the break-up factor to the service data when the service data contains the break-up configuration;
performing aggregation calculation on the business data samples containing the scattering factors to obtain a preliminary aggregation result;
and removing the scattering factors in the business data sample based on the preliminary aggregation result, and performing aggregation calculation again to obtain the aggregation result.
7. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.
8. A computer storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 5.
CN202311282068.9A 2023-10-07 2023-10-07 Data oblique scattering method and device, electronic equipment and storage medium Active CN117009094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311282068.9A CN117009094B (en) 2023-10-07 2023-10-07 Data oblique scattering method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311282068.9A CN117009094B (en) 2023-10-07 2023-10-07 Data oblique scattering method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117009094A CN117009094A (en) 2023-11-07
CN117009094B true CN117009094B (en) 2024-02-23

Family

ID=88573016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311282068.9A Active CN117009094B (en) 2023-10-07 2023-10-07 Data oblique scattering method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117009094B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066612A (en) * 2017-05-05 2017-08-18 郑州云海信息技术有限公司 A kind of self-adapting data oblique regulating method operated based on SparkJoin
CN109471862A (en) * 2018-11-12 2019-03-15 北京懿医云科技有限公司 Data processing method and device, electronic equipment, storage medium
CN109684401A (en) * 2018-12-30 2019-04-26 北京金山云网络技术有限公司 Data processing method, device and system
CN111352924A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Method and device for solving data tilt problem
CN114064260A (en) * 2020-08-05 2022-02-18 北京金山云网络技术有限公司 Data de-tilting method and device, electronic equipment and storage medium
CN114490160A (en) * 2022-01-29 2022-05-13 中国农业银行股份有限公司 Method, device, equipment and medium for automatically adjusting data tilt optimization factor
CN115562861A (en) * 2022-09-29 2023-01-03 北京京东振世信息技术有限公司 Method and apparatus for data processing for data skew
WO2023045295A1 (en) * 2021-09-27 2023-03-30 北京沃东天骏信息技术有限公司 Data skew processing method, device, storage medium, and program product

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066612A (en) * 2017-05-05 2017-08-18 郑州云海信息技术有限公司 A kind of self-adapting data oblique regulating method operated based on SparkJoin
CN109471862A (en) * 2018-11-12 2019-03-15 北京懿医云科技有限公司 Data processing method and device, electronic equipment, storage medium
CN109684401A (en) * 2018-12-30 2019-04-26 北京金山云网络技术有限公司 Data processing method, device and system
CN111352924A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Method and device for solving data tilt problem
CN114064260A (en) * 2020-08-05 2022-02-18 北京金山云网络技术有限公司 Data de-tilting method and device, electronic equipment and storage medium
WO2023045295A1 (en) * 2021-09-27 2023-03-30 北京沃东天骏信息技术有限公司 Data skew processing method, device, storage medium, and program product
CN114490160A (en) * 2022-01-29 2022-05-13 中国农业银行股份有限公司 Method, device, equipment and medium for automatically adjusting data tilt optimization factor
CN115562861A (en) * 2022-09-29 2023-01-03 北京京东振世信息技术有限公司 Method and apparatus for data processing for data skew

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Spark和梯度提升树模型的短期负荷预测;许贤泽;刘静;施元;谭盛煌;;华中科技大学学报(自然科学版)(第05期);全文 *

Also Published As

Publication number Publication date
CN117009094A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN107346270B (en) Method and system for real-time computation based radix estimation
CN111522786A (en) Log processing system and method
CN112182043A (en) Log data query method, device, equipment and storage medium
CN115964374A (en) Query processing method and device based on pre-calculation scene
CN114691356A (en) Data parallel processing method and device, computer equipment and readable storage medium
CN117009094B (en) Data oblique scattering method and device, electronic equipment and storage medium
CN115576973B (en) Service deployment method, device, computer equipment and readable storage medium
WO2022253131A1 (en) Data parsing method and apparatus, computer device, and storage medium
CN112764935B (en) Big data processing method and device, electronic equipment and storage medium
US11709798B2 (en) Hash suppression
CN109901931B (en) Reduction function quantity determination method, device and system
CN111782479A (en) Log processing method and device, electronic equipment and computer readable storage medium
CN111143456A (en) Spark-based Cassandra data import method, device, equipment and medium
CN111061712A (en) Data connection operation processing method and device
CN115396319B (en) Data stream slicing method, device, equipment and storage medium
CN112668287A (en) Data table determination method, system and device
CN106815235B (en) Super webpage template generation method and device and page data transmission method
CN111563033B (en) Simulation data generation method and device
CN113032400B (en) High-performance TopN query method, system and medium for mass data
CN113961603B (en) Large-screen data display method and device, electronic equipment and storage medium
CN114253951B (en) Data processing method, system and second server
CN113220530B (en) Data quality monitoring method and platform
CN108804585B (en) Data processing method and device in CDN system
CN113204602A (en) Data processing method, device, equipment and storage medium
CN115374137A (en) Stream data processing method, device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant