CN114546652A

CN114546652A - Parameter estimation method and device and electronic equipment

Info

Publication number: CN114546652A
Application number: CN202210171505.9A
Authority: CN
Inventors: 王际超; 周明伟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-27

Abstract

A parameter estimation method, a device and an electronic device are provided, the method comprises: the method comprises the steps of obtaining N Flink subtasks, processing data to be processed based on the parallelism in each Flink subtask, obtaining the actual parallelism of the Flink subtask processing the data to be processed in each Flink subtask and the resource allocation rate of the Flink subtask, obtaining N resource allocation rates, and screening out a target resource allocation rate and the actual parallelism of the Flink subtask corresponding to the target resource allocation rate from the N resource allocation rates. By the method, the number value of the parallelism of each flight subtask is pre-estimated in advance, the parallelism with enough number value can be ensured to process the data to be processed, the target resource allocation rate is screened out from the N resource allocation rates, the screened target resource allocation rate is ensured to be the optimal parameter, the performance of flow calculation is further improved, and the purpose of optimizing the flow calculation is realized.

Description

Parameter estimation method and device and electronic equipment

Technical Field

The present application relates to the field of distributed computing technologies, and in particular, to a parameter estimation method and apparatus, and an electronic device.

Background

With the development of big data processing technology, in the traditional data processing flow, data is always collected and then put into a database. When people need to inquire data through the database to obtain answers or inquire related data, however, when some specific problems in some real-time search application environments exist, the traditional data processing flow searches and inquires the related data of the problems in the database, and when the data amount in the database is too much or the number of users inquiring the problems is too much, the query results cannot be returned in real time, so that the data inquiry effectiveness by adopting the traditional data processing flow is poor.

In order to be able to process data generated in the network in real time, a new data computation structure, stream computation, is introduced. The flow calculation can analyze large-scale flow data in real time in the process of constantly changing motion, useful information is captured, the response time of the flow calculation is in the second level, and real-time data processing is achieved.

In order to optimize flow calculation, a system and a method for optimizing flow calculation performance based on Flink are adopted at present.

Firstly, a Flink (named Apache Flink) is a distributed processing engine, a plurality of task managers are arranged in a Flink cluster, operations related to data reading, calculation and the like are operated in the task managers in a distributed task mode, and when data calculation is performed in the task managers, the task managers return obtained resource allocation rate, parallelism and data calculation results to a calling end after the data calculation is completed.

The principle of distributed computation by the task manager is shown in fig. 1: each task manager comprises a plurality of slots, the slots are the minimum units for executing calculation, operators to be calculated (including data reading, mapping, grouping, storing and the like) are divided into the slots according to the parallelism of the operators, and the slots are used for executing calculation in sequence. After several operators are executed on one Slot, the operators are distributed to the next Slot to continue execution according to the requirements of data packets until the operators are finally output or saved. Several threads can be executed in each Slot at the same time to ensure the efficiency of the Flink distributed operation.

Currently, the stream computing performance optimization method provides a method for acquiring a performance index of a Flink task in real time in a production environment, and dynamically adjusts the parallelism of task managers, the number of slots on each task manager (task manager), and the size of a memory allocated by each task manager according to the acquired performance index.

According to the method, after index collection is carried out in a production environment, data collected by the indexes are analyzed to obtain an analysis result, the analysis result is used for adjusting operation parameters of a task, the configuration of the optimal operation parameters of the task lags behind the requirement of actual data processing, so that the difference between the operation parameters corresponding to the analysis result and the optimal operation parameters of the task at the current moment is large, in addition, in the process of collecting the indexes, when the data quantity needing to be collected is too much, the analysis result cannot be obtained in time, further, the parameters of the task operation cannot be adjusted, the data overstock is caused, the data cannot be fed back in real time, and the optimization effect is poor.

Disclosure of Invention

The application provides a parameter pre-estimation method, a parameter pre-estimation device and electronic equipment, wherein N flag subtasks are set, the parallelism in each flag subtask is used for processing data to be processed, the actual number value of the parallelism in each flag subtask for processing the data to be processed and the resource allocation rate corresponding to each flag subtask are obtained, the target resource allocation rate and the actual parallelism of the flag subtask for processing the data to be processed corresponding to the target resource allocation rate are determined from the N resource allocation rates, the obtained target resource allocation rate and the actual parallelism of the flag subtask for processing the data to be processed are ensured to be optimal parameters, and therefore optimization of flow calculation can be achieved.

In a first aspect, the present application provides a parameter estimation method, including:

obtaining N Flink subtasks, wherein N is a positive integer, the Flink subtasks have a plurality of parallelism, and the number values of the parallelism in each Flink subtask are inconsistent;

processing data to be processed based on the parallelism in each flight subtask, obtaining the actual parallelism of the flight subtask processing the data to be processed in each flight subtask and the resource allocation rate of the flight subtask, and obtaining N resource allocation rates, wherein the resource allocation rate is the percentage of all parallels for processing the data to be processed in each flight subtask occupying equipment resources;

and screening out a target resource allocation rate and the actual parallelism of the Flink subtasks corresponding to the target resource allocation rate from the N resource allocation rates.

In one possible design, screening out a target resource allocation rate and an actual parallelism of a Flink subtask corresponding to the target resource allocation rate from N resource allocation rates includes:

acquiring resource allocation rates of N flight subtasks, and detecting whether each resource allocation rate is within a preset range;

if so, taking the minimum resource allocation rate within a preset range as a target resource allocation rate, and acquiring the actual parallelism of the Flink subtasks corresponding to the target resource allocation rate;

if not, extracting a target endpoint value in the preset range, calculating a difference value between the resource allocation rate and the target endpoint value, taking the resource allocation rate corresponding to the minimum difference value as the target resource allocation rate, and acquiring the actual parallelism of the Flink subtasks corresponding to the target resource allocation rate.

In one possible design, obtaining the actual parallelism of the Flink subtasks processing the data to be processed in each Flink subtask includes:

reading a current numerical value of the parallelism of the data to be processed in each flight subtask, and obtaining a back pressure index corresponding to the parallelism of the current numerical value;

and adjusting the current numerical values of all the parallelism degrees of the data to be processed according to the relation between the backpressure index and a preset threshold value until the backpressure index is lower than the preset threshold value, and taking the current numerical values of the parallelism degrees which accord with a first preset rule as actual numerical values.

In one possible design, taking a current quantitative value of the parallelism according to the first preset rule as an actual quantitative value includes:

reading a first quantity value interval corresponding to each flight subtask, and updating the first quantity value interval based on the current quantity values of all parallelism degrees of the data to be processed;

and when the minimum quantity value of the first quantity value interval is not 1 and the difference value between the maximum quantity value and the minimum quantity value in the first quantity value interval does not exceed 1, calculating the actual parallelism of the Flink subtasks.

In one possible design, taking a current magnitude value of the parallelism according to the first preset rule as an actual magnitude value includes:

obtaining back pressure indexes of all parallelism degrees of processing data to be processed in different first quantity value intervals;

and arranging the back pressure indexes according to a preset arrangement sequence, selecting a first quantity value interval corresponding to the minimum back pressure index, and calculating the actual parallelism of all the Flink subtasks for processing the data to be processed based on the first quantity value interval.

In one possible design, adjusting the current magnitude of parallelism according to the relationship between the backpressure indicator and a preset threshold includes:

detecting whether the back pressure indexes of all parallelism degrees of the data to be processed exceed a preset threshold value or not;

if so, acquiring current quantity values corresponding to all the parallelism degrees of the data to be processed, and adjusting the current quantity values of all the parallelism degrees of the data to be processed based on a second preset rule;

if not, when the number value of all the parallelism degrees of the data to be processed is determined to be greater than 1 and the difference value between the maximum number value and the minimum number value in the first number value interval is greater than 1, the current number value of all the parallelism degrees of the data to be processed is adjusted based on a third preset rule.

In one possible design, adjusting the current magnitude value of all parallelism of the processing of the data to be processed based on a second preset rule includes:

analyzing a maximum quantity value and a minimum quantity value in a first quantity value interval corresponding to the Flink subtask;

covering the minimum quantity value with the current quantity value of the parallelism of the data to be processed;

and adding the maximum number value and the current number value to obtain a first added number value, and taking the number value corresponding to the first added number value after the first added number value is averaged as the current number value.

In one possible design, adding the maximum magnitude value to the current magnitude value to obtain a first added magnitude value includes:

if the maximum number value or the current number value is an odd number, adding 1 to the maximum number value or the current number value to obtain a first added number value;

if the maximum number value and the current number value are even numbers, adding the maximum number value and the current number value to obtain a first added number value.

In one possible design, adjusting the current magnitude of all parallelism of the processing of the data to be processed based on a third preset rule includes:

covering the maximum quantity value with the current quantity value of the parallelism of the data to be processed;

and adding the current number value and the minimum number value to obtain a second added number value, and taking the number value corresponding to the average value of the second added number value as the current number value.

In one possible design, adding the current magnitude value to the minimum magnitude value to obtain a second added magnitude value includes:

if the minimum number value or the current number value is an odd number, adding 1 to the minimum number value or the current number value to obtain a second added number value;

and if the minimum number value and the current number value are even numbers, adding the minimum number value and the current number value to obtain a second added number value.

In a second aspect, the present application provides an apparatus for thread binding, the apparatus comprising:

an obtaining module, configured to obtain N Flink subtasks;

the processing module is used for processing the data to be processed based on the parallelism degree in each Flink subtask, obtaining the actual parallelism degree of the Flink subtask processing the data to be processed in each Flink subtask and the resource allocation rate of the Flink subtask, and obtaining N resource allocation rates;

and the screening module is used for screening out the target resource allocation rate and the actual parallelism of the flight subtasks corresponding to the target resource allocation rate from the N resource allocation rates.

In one possible design, the screening module is specifically configured to obtain resource allocation rates of N Flink subtasks, detect whether each resource allocation rate is within a preset range, if so, take a minimum resource allocation rate within the preset range as a target resource allocation rate, and obtain an actual parallelism of the Flink subtasks corresponding to the target resource allocation rate, otherwise, extract a target endpoint value in the preset range, calculate a difference value between the resource allocation rate and the target endpoint value, take a resource allocation rate corresponding to the minimum difference value as the target resource allocation rate, and obtain the actual parallelism of the Flink subtasks corresponding to the target resource allocation rate.

In a possible design, the processing module is specifically configured to read a current numerical value of parallelism of data to be processed in each Flink subtask, obtain a backpressure index corresponding to the parallelism of the current numerical value, adjust the current numerical value of all parallelism of the data to be processed according to a relationship between the backpressure index and a preset threshold, and take the current numerical value of parallelism conforming to a first preset rule as an actual numerical value until the backpressure index is lower than the preset threshold;

in a possible design, the processing module is further configured to read a first quantity value interval corresponding to each Flink subtask, update the first quantity value interval based on the current quantity values of all parallelism of the data to be processed, and calculate the actual parallelism of the Flink subtasks when the minimum quantity value of the first quantity value interval is not 1 and the difference between the maximum quantity value and the minimum quantity value in the first quantity value interval does not exceed 1.

In a possible design, the processing module is further configured to obtain backpressure indexes of all parallelism degrees of processing the data to be processed in different first quantity value intervals, arrange the backpressure indexes according to a preset arrangement sequence, select the first quantity value interval corresponding to the minimum backpressure index, and calculate the actual parallelism degrees of all Flink subtasks of processing the data to be processed based on the first quantity value interval.

In a possible design, the processing module is further configured to detect whether backpressure indicators of all parallelism degrees of processing data to be processed exceed a preset threshold, if so, obtain current quantity values corresponding to all parallelism degrees of processing the data to be processed, adjust the current quantity values of all parallelism degrees of processing the data to be processed based on a second preset rule, and if not, determine that the quantity values of all parallelism degrees of processing the data to be processed are greater than 1 and a difference between a maximum quantity value and a minimum quantity value in a first quantity value interval is greater than 1, and adjust the current quantity values of all parallelism degrees of processing the data to be processed based on a third preset rule.

In a possible design, the processing module is further configured to parse out a maximum quantity value and a minimum quantity value in a first quantity value interval corresponding to the Flink subtask, cover a current quantity value of the parallelism of the data to be processed with the minimum quantity value, add the maximum quantity value and the current quantity value to obtain a first added quantity value, and use a quantity value corresponding to an average value of the first added quantity value as the current quantity value.

In a possible design, the processing module is further configured to add 1 to the maximum quantity value or the current quantity value to obtain a first added quantity value if the maximum quantity value or the current quantity value is an odd number, and add the maximum quantity value and the current quantity value to obtain the first added quantity value if the maximum quantity value and the current quantity value are even numbers.

In a possible design, the processing module is further configured to analyze a maximum quantity value and a minimum quantity value in a first quantity value interval corresponding to the Flink subtask, cover a current quantity value of the parallelism of the data to be processed with the maximum quantity value, add the current quantity value and the minimum quantity value to obtain a second added quantity value, and use a quantity value corresponding to an average value of the second added quantity value as the current quantity value.

In a possible design, the processing module is further configured to add 1 to the minimum quantity value or the current quantity value to obtain a second addend quantity value if the minimum quantity value or the current quantity value is an odd number, and add the minimum quantity value and the current quantity value to obtain the second addend quantity value if the minimum quantity value and the current quantity value are even numbers.

In a third aspect, the present application provides an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for realizing the steps of the parameter estimation method when executing the computer program stored in the memory.

In a fourth aspect, a computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements one of the above-described parameter estimation method steps.

For each of the first to fourth aspects and possible technical effects of each aspect, please refer to the above description of the possible technical effects for the first aspect or each possible solution in the first aspect, and no repeated description is given here.

Drawings

FIG. 1 is a schematic diagram illustrating distributed computing performed by a task manager according to the present application;

FIG. 2 is a flow chart of the steps of a parameter estimation method provided in the present application;

fig. 3 is a schematic structural diagram of a parameter estimation apparatus provided in the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.

In the prior art, in order to realize optimization of flow calculation, a mode is adopted to be a flow calculation performance optimization system and method based on Flink. After the process of index acquisition, analyzing the data acquired by the index to obtain an analysis result, and then using the analysis result for adjusting the operation parameters of the task, so that the configuration of the optimal operation parameters of the task lags behind the requirement of actual data processing, which results in a large difference between the operation parameters corresponding to the analysis result and the optimal operation parameters of the task at the current moment.

In order to solve the above-described problem, an embodiment of the present application provides a parameter estimation method, which is used to screen out optimal parameters from N sets of data, so as to optimize flow calculation. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, and repeated parts are not repeated.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2, the present application provides a parameter estimation method, which can screen out an optimal resource allocation rate from resource allocation rates corresponding to N groups of Flink subtasks and an actual parallelism of the Flink subtasks in the Flink subtasks corresponding to the resource allocation rate, thereby implementing optimization of convection calculation, and an implementation flow of the method is as follows:

step S21: n Flink subtasks are obtained.

The embodiment of the application aims to screen out the optimal parameters from N groups of data so as to realize optimization of flow calculation, therefore, N Flink subtasks need to be set in advance, each Flink subtask has a plurality of parallelism, and the number and the value of the parallelism in each Flink subtask are inconsistent.

The number of the parallelism in the Flink subtasks is different, the resource allocation amount of the parallelism in the Flink subtasks is also different, the parallelism in each set Flink subtask can ensure that the data to be processed can be processed, and therefore the number of the parallelism in the Flink subtasks can be estimated according to the total amount of the data to be processed.

It should be further noted that the number value of the parallelism in the Flink subtask is equivalent to the number value of the Slot, and the CPU of the device, the resource allocation amount of a single Slot, and the number value of the Slot are shown in table 1:

serial number	CPU core	Resource allocation of a single Slot	Magnitude of Slot
				1	1	100％	1
2	0.5	50％	2
				3	0.25	25％	4
4	0.125	12.5％	8

TABLE 1

In table 1, the total amount of resources is unchanged, when the number value of the slots is 1, the CPU core is 1, and 1 Slot can allocate 100% of the resources of the CPU at most; when the number value of the slots is 2, a single Slot can be allocated to 50% of resources of the CPU at most, and the single Slot can only occupy 0.5 of CPU cores; when the number value of the slots is 4, a single Slot can be allocated to 25% of resources of the CPU at most, and the single Slot can only occupy 0.25 of a CPU core; when the number value of the Slot is 8, a single Slot can be allocated to 12.5% of resources of the CPU at most, and the single Slot can only occupy 0.125CPU core, so that when the number value of the single Slot occupying 0.125CPU core is changed, the number value of the Slot is changed accordingly.

By obtaining N flight subtasks, more sample data can be obtained, and the accuracy of screening out the final optimal parameters from the N sample data is ensured.

Step S22: and processing the data to be processed based on the parallelism in each flight subtask, obtaining the actual parallelism of the flight subtask for processing the data to be processed in each flight subtask and the resource allocation rate of the flight subtask, and obtaining N resource allocation rates.

After obtaining N flag subtasks, in order to obtain the optimal number value of the parallelism for processing the data to be processed in each flag subtask, in the process of processing the data to be processed by the parallelism in each flag subtask, the number value of the parallelism for processing the data to be processed in the flag subtask needs to be continuously adjusted, so as to obtain the optimal number value of the parallelism for processing the data to be processed in each flag subtask, and the relationship between the flag subtasks and the parallelism refers to table 2:

flink subtask	Magnitude of parallelism	Interval of first quantity value
			Flink subtask 1	16	[1，16]
Flink subtask 2	32	[1，32]
			......	......	......

TABLE 2

In table 2, the number value of the parallelism in the Flink subtasks and the first number value interval corresponding to the Flink subtasks are recorded, table 2 only shows the number value of the parallelism and the first number value interval corresponding to the

Flink subtasks

1 and 2, respectively, and the number value of the parallelism and the first number value interval corresponding to other Flink subtasks refer to any one of the Flink subtasks in table 1, which is not described herein.

It should be further noted that, in each flight subtask, the magnitude of the parallelism is the parallelism of the maximum magnitude available for processing the data to be processed, and the first magnitude interval is the parallelism of the maximum magnitude available for processing the data to be processed and the minimum magnitude available in each flight subtask, for example: in the Flink subtask 1, at least 1 parallelism and at most 16 parallelism can be used for processing the data to be processed.

After the above description is given for the Flink subtasks, in order to determine the optimal number value of the parallelism to be processed in each Flink subtask, the number value of the parallelism for processing the data to be processed needs to be adjusted multiple times, and a specific process for adjusting the number value of the parallelism for processing the data to be processed in the Flink subtasks is as follows:

after a first quantity value interval corresponding to the current Flink subtask is determined, the parallelism in the Flink subtask is enabled to process the data to be processed, the data quantity of the data to be processed is kept unchanged in the embodiment of the application, and after the parallelism in the Flink subtask processes the data to be processed, the current quantity value of the parallelism for processing the data to be processed is counted.

After obtaining the current quantity value of the parallelism of the data to be processed, in order to ensure the highest efficiency of processing the data to be processed, the actual parallelism of the Flink subtask that processes the data to be processed needs to be determined from the Flink subtasks, after the parallelism of the Flink subtasks is ensured to complete the processing of the data to be processed, it needs to be determined whether the parallelism of the data to be processed is in the load state, and a specific process of specifically determining whether the parallelism of the data to be processed is in the load state is as follows:

when the current value of the parallelism of the data to be processed is counted, the back pressure index corresponding to the parallelism of the current value of the parallelism of the data to be processed needs to be read, the back pressure index represents the percentage of the value of the parallelism of the data to be processed which is queued when the data to be processed is processed and the value of all the parallelisms of the data to be processed, the data to be processed is in a load state when the value of the parallelism of the current value of the data to be processed exceeds the preset threshold, and the data to be processed is in a normal state when the value of the parallelism of the current value of the data to be processed is processed when the value of the back pressure index corresponding to the data to be processed is lower than the preset threshold.

Whether the parallelism degree to be processed has the back pressure or not can be obtained through the back pressure index, so that the actual parallelism degree of the data to be processed can be adjusted, and the optimization of the parallelism degree of the Flank subtasks is realized.

In the above description, when the parallelism of processing the data to be processed is in the load state, it may cause the parallelism in the Flink subtask to process the data to be processed for a longer time, and in order to make the parallelism of processing the data to be processed in the normal state, it is necessary to readjust the magnitude of the parallelism of processing the data to be processed in the Flink subtask, and a specific case of readjusting the magnitude of the parallelism of processing the data to be processed in the Flink subtask may be as follows:

the first condition is as follows: and when the parallelism of the data to be processed is in a normal state, adjusting the current numerical value of the parallelism of the data to be processed based on a third preset rule.

When the parallelism of the data to be processed is in a normal state, the current quantity value of the parallelism of the data to be processed, which meets a first preset rule, is used as an actual quantity value, the first preset rule is used for ensuring that the distributed resources for processing the parallelism of the data to be processed are not wasted, the minimum quantity value in a first quantity value interval corresponding to the Flink subtask is not 1, and the difference value between the maximum quantity value and the minimum quantity value is not more than 1.

It should be further noted that, when the minimum value in the first value interval corresponding to the Flink subtask is 1, the minimum value represents a value of the parallelism of the unadjusted processed data, and when the difference between the maximum value and the minimum value in the first value interval corresponding to the Flink subtask is greater than 1, the parallelism of the current value allocated by the device is not in the most efficient state when the device processes the processed data, and therefore the value of the parallelism of the processed data needs to be reallocated.

In the above description, the first quantity value interval corresponding to the Flink subtask has been already known, when the parallelism of the data to be processed is in the normal state, in order to redistribute the quantity value of the parallelism of the data to be processed, it is necessary to obtain the current quantity value corresponding to the parallelism of the data to be processed in the normal state, in order to update the first quantity interval corresponding to the Flink subtask, it is further necessary to obtain the first quantity value interval corresponding to the Flink subtask, extract the maximum quantity value and the minimum quantity value from the first quantity value interval, and since the parallelism of the data to be processed is already in the normal state, it is necessary to reduce the quantity value of the parallelism of the data to be processed.

In order to reduce the magnitude of the parallelism of the data to be processed, it is necessary to cover the maximum magnitude described above with the current magnitude corresponding to the parallelism of the data to be processed in the normal state, add the maximum magnitude to the current magnitude, obtain a first added magnitude if the maximum magnitude and the current magnitude are added to be even, take the magnitude corresponding to the first added magnitude as the current magnitude, add 1 to the maximum magnitude or the current magnitude if the maximum magnitude and the current magnitude are added to be odd, obtain the first added magnitude, take the magnitude corresponding to the first added magnitude as the current magnitude, and at this time, both the first magnitude interval corresponding to the Flink subtask and the current magnitude have been updated.

Such as: the first quantity value interval of the Flink subtask 1 is [2, 16], the preset threshold is 50%, and when the parallelism of the data to be processed is in a normal condition, the quantity value for adjusting the parallelism of the data to be processed is shown in table 3:

TABLE 3

In table 3, since only the process of adjusting the number value of the parallelism when the parallelism of the data to be processed is in the normal state is described in table 3, the minimum number value in the first number value interval of the Flink subtask 1 does not change, and the minimum number value in the actual process changes, the first number value interval of the Flink subtask 1 is set to [2, 16], and it is determined that the actual number value of the data to be processed of the Flink subtask 1 is 3.

Case two: and when the parallelism of the data to be processed is in a load state, adjusting the current numerical value of the parallelism of the data to be processed based on a second preset rule.

In the above description, in the case where the parallelism of the to-be-processed data is in the normal state, in order to determine the actual parallelism of the Flink subtask that processes the to-be-processed data, the maximum value and the minimum value need to be extracted from the first value interval.

When the number of the parallelism for processing the data to be processed is increased, the current numerical value of the parallelism in a load state in the process of processing the data to be processed needs to be obtained, the current numerical value is covered with the minimum numerical value in the first numerical value interval, and the maximum numerical value and the current numerical value are added.

After adding the maximum number value and the current number value, in order to make the recalculated number value an integer, it is required to detect whether the number value after the addition of the maximum number value and the current number value is an even number, if the number value after the addition of the maximum number value and the current number value is an even number, a first added number value is obtained, the number value corresponding to the first added number value after the averaging is performed is taken as the current number value, if the number value after the addition of the maximum number value and the current number value is an odd number, the maximum number value or the described current number value is added by 1 to obtain the first added number value, then the number value corresponding to the first added number value after the averaging is taken as the current number value, at this time, the first number value interval corresponding to the Flink subtask and the current number value are both updated.

Such as: the first quantity value interval of the Flink subtask 1 is [1, 16], the preset threshold is 50%, and the process of adjusting the quantity value of the processing parallelism is shown in table 4:

TABLE 4

In table 4, when the number values of the parallelism of the data to be processed are 9, 13, and 15, the parallelism of the data to be processed is in the load state, and when the number value of the parallelism is 16, the parallelism of the data to be processed is in the normal state, and the first number value interval [15, 16] of the Flink subtask conforms to the first preset rule, it is determined that the actual parallelism of the Flink subtask processing the data to be processed in the Flink subtask 1 is 16.

In the above-described case one and case two, in the practical application of the embodiment of the present application, there is a case where the case one and the case two are performed irregularly and alternately, if the current magnitude of the parallelism of the data to be processed is adjusted in the manner of the case one, the magnitude of the parallelism of the data to be processed, which is redistributed by the device according to the second preset rule, is obtained again, the backpressure index corresponding to the data to be processed, which is processed according to the parallelism after the current magnitude of the data to be processed is adjusted, is obtained again, if the backpressure index is higher than the preset threshold, the magnitude of the parallelism of the data to be processed is continuously adjusted according to the second preset rule, if the backpressure index is lower than the preset threshold, it is determined that the parallelism of the data to be processed does not conform to the first preset rule, the magnitude of the parallelism of the data to be processed is readjusted according to the third preset rule, and the specific process of adjusting the magnitude of the parallelism of the data to be processed is referred to the case one and the case two Case two, too much explanation is not made here.

It should be further noted that, on the basis of the first and second cases, in the embodiment of the present application, the parallelism of the data to be processed corresponding to the minimum backpressure index may be further screened from the parallelism of the data to be processed lower than the preset threshold, and the value of the parallelism of the data to be processed is used as the actual value.

In the embodiment of the application, the number value of the parallelism of the data to be processed in each Flink subtask is adjusted for multiple times until the first number value interval corresponding to the Flink subtask meets the condition that the minimum number value is not 1 and the difference between the maximum number value and the minimum number value is not greater than 1, the calculated current number value is used as the actual parallelism of the Flink subtask processing the data to be processed in the Flink subtask, and after the actual parallelism of the Flink subtask processing the data to be processed is obtained, the resource allocation rates corresponding to all the parallelities of the actual number value are read, and the resource allocation rates represent the percentage of the parallelism in the Flink subtask, which is allocated by the device, occupying the total resource of the device.

It should be further noted that all the parallelism degrees of the actual quantity values also correspond to the memory occupancy rate, and the memory occupancy rate is a percentage of the total memory of the device occupied by the memory occupied by the data to be processed by all the parallelism degrees of the actual quantity values.

According to the method, the range of the first number value interval is continuously reduced through the second preset rule and the third preset rule, the actual parallelism of the Flink subtask for processing the data to be processed is determined through the first preset rule, when the parallelism for processing the data to be processed is in a load state, the number value of the parallelism in the load state covers the minimum number value of the first number interval, and when the parallelism for processing the data to be processed is in a normal state, the number value of the parallelism in the normal state covers the maximum number value of the first number interval, so that the first number value interval is updated, the current number value calculated based on the updated thought number value interval is ensured to be the optimal number value, and the performance of flow calculation is further optimized.

Step S23: and screening out a target resource allocation rate and the actual parallelism of the Flink subtasks corresponding to the target resource allocation rate from the N resource allocation rates.

After the actual parallelism of the Flink subtasks for processing the data to be processed in one Flink subtask and the resource allocation rate corresponding to the word task thread group are determined, since the process of processing the data to be processed by the parallelism in each Flink subtask is consistent, and only the quantity values of all the parallelisms for processing the data to be processed are different, the embodiment of the present application only describes the specific process of processing the data to be processed by the parallelism of one Flink subtask, and the specific process of processing the data to be processed by the parallelism in other Flink subtasks refers to the specific process of processing the data to be processed by the parallelism in one Flink subtask of the embodiment of the present application, and is not set forth herein.

After the process of processing the data to be processed by the parallelism in other Flink subtasks is described, the method described in the above steps is sequentially adopted for the remaining N-1 Flink subtasks, and the resource allocation rate corresponding to each Flink subtask, the actual parallelism and the memory occupancy rate of the Flink subtask processing the data to be processed in each Flink subtask are obtained.

In order to avoid the phenomenon of device jamming caused by too high memory occupancy rate, the Flink subtasks with the memory occupancy rate exceeding the preset memory occupancy rate need to be deleted, and the remaining Flink subtasks are used as the first Flink subtasks.

In order to determine the target resource allocation rate and the actual parallelism of the first Flink subtasks for processing the data to be processed corresponding to the target resource allocation rate, whether the resource allocation rate of each first Flink subtask is within a preset range needs to be detected, and when the resource allocation rate of the first Flink subtask is within the preset range, a difference value between the resource allocation rate within the preset range and the resource allocation rate corresponding to the target endpoint value within the preset range is calculated.

It should be noted that, in the embodiment of the present application, the resource allocation rate corresponding to the target endpoint value may be a minimum resource allocation rate within a preset range, and a case that the target resource allocation rate is other resource allocation rates is not described here.

And after calculating the difference between the resource allocation rate within the preset range and the minimum resource allocation rate corresponding to the preset range, taking the resource allocation rate corresponding to the minimum difference as a target resource allocation rate, and obtaining the actual parallelism of the first Flink subtask for processing the data to be processed corresponding to the target resource allocation rate.

In the above description, when the resource allocation rate is within the preset range, when the resource allocation rate is not within the preset range, the difference between each resource allocation rate and the minimum resource allocation rate within the preset range is still calculated, the resource allocation rate corresponding to the minimum difference is taken as the target resource allocation rate, and the actual parallelism of the first Flink subtask processing the data to be processed corresponding to the target resource allocation rate is obtained.

Through the method described above, N Flink subtasks are set, and it is ensured that the parallelism in each Flink subtask can completely process the data to be processed, thereby realizing the pre-estimation of the number value of the parallelism required for completing the processing of the data to be processed, during the process of processing the data to be processed by the parallelism in each Flink subtask, the number value of the parallelism of the data to be processed is continuously adjusted through a second preset rule and a third preset rule, thereby determining the resource allocation rate corresponding to each Flink subtask and the actual parallelism of the Flink subtask processing the data to be processed, ensuring that the resource allocation rate and the actual number value are the optimal parameters corresponding to the Flink subtasks, and then screening out the target resource allocation rate from the N obtained resource allocation rates, obtaining the actual parallelism of the Flink subtask processing the data to be processed corresponding to the target resource allocation rate, the method realizes the multi-layer screening of the resource allocation rate, ensures that the finally obtained target resource allocation rate and the actual quantity value are the optimal parameters corresponding to the equipment, improves the performance of flow calculation, and achieves the aim of optimizing the flow calculation.

Based on the same inventive concept, the embodiment of the present application further provides a parameter estimation apparatus, where the thread binding apparatus is used to implement a function of a parameter estimation method, and with reference to fig. 3, the apparatus includes:

an obtaining module 301, configured to obtain N Flink subtasks;

the processing module 302 is configured to process the data to be processed based on the parallelism in each Flink subtask, obtain an actual parallelism of the Flink subtask that processes the data to be processed in each Flink subtask and a resource allocation rate of the Flink subtask, and obtain N resource allocation rates;

and the screening module 303 is configured to screen out a target resource allocation rate and an actual parallelism of the Flink subtasks corresponding to the target resource allocation rate from the N resource allocation rates.

In a possible design, the screening module 303 is specifically configured to obtain resource allocation rates of N Flink subtasks, detect whether each resource allocation rate is within a preset range, if so, take a minimum resource allocation rate within the preset range as a target resource allocation rate, and obtain an actual parallelism of the Flink subtasks corresponding to the target resource allocation rate, otherwise, extract a target endpoint value in the preset range, calculate a difference value between the resource allocation rate and the target endpoint value, take a resource allocation rate corresponding to the minimum difference value as the target resource allocation rate, and obtain an actual parallelism of the Flink subtasks corresponding to the target resource allocation rate.

In a possible design, the processing module 302 is specifically configured to read a current numerical value of a parallelism of data to be processed in each Flink subtask, obtain a backpressure indicator corresponding to the parallelism of the current numerical value, adjust the current numerical values of all the parallelisms of the data to be processed according to a relationship between the backpressure indicator and a preset threshold, and take the current numerical value of the parallelism conforming to a first preset rule as an actual numerical value when the backpressure indicator is lower than the preset threshold;

in a possible design, the processing module 302 is further configured to read a first quantity value interval corresponding to each Flink subtask, update the first quantity value interval based on the current quantity values of all parallelism of the data to be processed, and calculate the actual parallelism of the Flink subtasks when the minimum quantity value of the first quantity value interval is not 1 and the difference between the maximum quantity value and the minimum quantity value in the first quantity value interval does not exceed 1.

In a possible design, the processing module 302 is further configured to obtain the backpressure indexes of all parallelism degrees of processing the data to be processed in different first quantity value intervals, arrange the backpressure indexes according to a preset arrangement sequence, select the first quantity value interval corresponding to the minimum backpressure index, and calculate the actual parallelism degrees of all Flink subtasks of processing the data to be processed based on the first quantity value interval.

In a possible design, the processing module 302 is further configured to detect whether backpressure indicators of all parallelism degrees of processing the to-be-processed data exceed a preset threshold, if so, obtain current quantity values corresponding to all parallelism degrees of processing the to-be-processed data, adjust the current quantity values of all parallelism degrees of processing the to-be-processed data based on a second preset rule, and if not, determine that the quantity values of all parallelism degrees of processing the to-be-processed data are greater than 1 and a difference between a maximum quantity value and a minimum quantity value in the first quantity value interval is greater than 1, adjust the current quantity values of all parallelism degrees of processing the to-be-processed data based on a third preset rule.

In a possible design, the processing module 302 is further configured to parse out a maximum quantity value and a minimum quantity value in a first quantity value interval corresponding to the Flink subtask, cover a current quantity value of the parallelism of the data to be processed with the minimum quantity value, add the maximum quantity value and the current quantity value to obtain a first added quantity value, and use a quantity value corresponding to an average value of the first added quantity value as the current quantity value.

In one possible design, the processing module 302 is further configured to add 1 to the maximum quantity value or the current quantity value to obtain a first added quantity value if the maximum quantity value or the current quantity value is an odd number, and add the maximum quantity value and the current quantity value to obtain the first added quantity value if the maximum quantity value and the current quantity value are even numbers.

In a possible design, the processing module 302 is further configured to analyze a maximum quantity value and a minimum quantity value in a first quantity value interval corresponding to the Flink subtask, cover a current quantity value of the parallelism of the data to be processed with the maximum quantity value, add the current quantity value and the minimum quantity value to obtain a second added quantity value, and use a quantity value corresponding to an average value of the second added quantity value as the current quantity value.

In a possible design, the processing module 302 is further configured to add 1 to the minimum number value or the current number value to obtain a second addend value if the minimum number value or the current number value is an odd number, and add the minimum number value and the current number value to obtain a second addend value if the minimum number value and the current number value are even numbers.

Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement a function of the foregoing parameter estimation apparatus, and with reference to fig. 4, the electronic device includes:

at least one processor 401 and a memory 402 connected to the at least one processor 401, in this embodiment, a specific connection medium between the processor 401 and the memory 402 is not limited, and fig. 4 illustrates an example where the processor 401 and the memory 402 are connected through a bus 400. The bus 400 is shown in fig. 4 by a thick line, and the connection manner between other components is merely illustrative and not limited thereto. The bus 400 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 4 for ease of illustration, but does not represent only one bus or type of bus. Alternatively, processor 401 may also be referred to as a controller, without limitation to name a few.

In the embodiment of the present application, the memory 402 stores instructions executable by the at least one processor 401, and the at least one processor 401 can execute the method of parameter estimation discussed above by executing the instructions stored in the memory 402. The processor 401 may implement the functions of the various modules in the apparatus shown in fig. 3.

The processor 401 is a control center of the apparatus, and may connect various parts of the entire control device by using various interfaces and lines, and perform various functions and process data of the apparatus by operating or executing instructions stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the apparatus.

In one possible design, processor 401 may include one or more processing units and processor 401 may integrate an application processor that handles primarily operating systems, user interfaces, application programs, and the like, and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401. In some embodiments, processor 401 and memory 402 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 401 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the parameter estimation method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

The memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 402 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 402 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 402 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

By programming the processor 401, the code corresponding to a parameter estimation method described in the foregoing embodiment may be fixed in the chip, so that the chip can perform a parameter estimation step of the embodiment shown in fig. 1 when running. How to program the processor 401 is well known to those skilled in the art and will not be described in detail herein.

Based on the same inventive concept, embodiments of the present application further provide a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform a parameter estimation method as discussed above.

In some possible embodiments, the present application provides that the various aspects of a method of parameter estimation may also be implemented in the form of a program product comprising program code for causing the control apparatus to perform the steps of a method of parameter estimation according to various exemplary embodiments of the present application described above in this specification when the program product is run on an apparatus.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for estimating parameters, the method comprising:

processing the data to be processed based on the parallelism in each flight subtask, obtaining the actual parallelism of the flight subtask processing the data to be processed in each flight subtask and the resource allocation rate of the flight subtask, and obtaining N resource allocation rates, wherein the resource allocation rate is the percentage of resources allocated to all the parallelisms for processing the data to be processed in each flight subtask by the device;

2. The method of claim 1, wherein the step of screening out a target resource allocation rate and an actual parallelism of the Flink subtasks corresponding to the target resource allocation rate from the N resource allocation rates comprises:

acquiring the resource allocation rates of N flag subtasks, and detecting whether each resource allocation rate is within a preset range;

3. The method of claim 1, wherein obtaining the actual parallelism of the Flink subtasks in each Flink subtask that process data to be processed comprises:

reading a current numerical value of the parallelism of the data to be processed in each flight subtask, and obtaining a backpressure index corresponding to the parallelism of the current numerical value, wherein the backpressure index is a percentage of the numerical value of the parallelism queued when the data to be processed is processed in each flight subtask to all the numerical values of the parallelism of the data to be processed;

4. The method of claim 3, wherein the step of using the current magnitude value of the parallelism according to the first predetermined rule as the actual magnitude value comprises:

5. The method of claim 3, wherein the step of using the current magnitude value of the parallelism according to the first predetermined rule as the actual magnitude value comprises:

6. The method of claim 3, wherein adjusting the current magnitude value of parallelism based on the backpressure indicator in relation to a preset threshold comprises:

7. The method of claim 6, wherein adjusting the current magnitude value of all parallelism of the processing of the data to be processed based on a second preset rule comprises:

8. The method of claim 7, wherein adding the maximum magnitude value to the current magnitude value to obtain a first added magnitude value comprises:

9. The method of claim 6, wherein adjusting the current magnitude value of all parallelism of the processing of the data to be processed based on a third preset rule comprises:

and adding the current number value and the minimum number value to obtain a second added number value, and taking the number value corresponding to the second added number value after the second added number value is averaged as the current number value.

10. The method of claim 9, wherein adding the current magnitude value to the minimum magnitude value to obtain a second added magnitude value comprises:

11. An apparatus for parameter estimation, the apparatus comprising:

an obtaining module, configured to obtain N Flink subtasks;

the processing module is used for processing the data to be processed based on the parallelism in each Flink subtask, obtaining the actual parallelism of the Flink subtask processing the data to be processed in each Flink subtask and the resource allocation rate of the Flink subtask, and obtaining N resource allocation rates;

12. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-10 when executing the computer program stored on the memory.

13. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-10.