CN111078389A

CN111078389A - Junk data cleaning method and device, electronic equipment and readable storage medium

Info

Publication number: CN111078389A
Application number: CN201811213300.2A
Authority: CN
Inventors: 徐福生; 邓长春
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2020-04-28
Anticipated expiration: 2038-10-18
Also published as: CN111078389B

Abstract

The embodiment of the invention provides a junk data cleaning method. The method is applied to a Driver end in a distributed computing framework and comprises the following steps: calculating the pressure value of the message queue according to the number obtained by monitoring; judging whether the pressure value is greater than a preset pressure threshold value or not; when the pressure value is judged to be larger than the preset pressure threshold value, sending a high-pressure state notification to a Worker end in the distributed computing frame, so that the Worker end triggers a GC program in a virtual machine of the Worker end when monitoring that the index value of the first performance index reaches the first preset threshold value, and sends a trigger instruction to a Driver end when monitoring that the index value of the second performance index meets the first cleaning condition; when a trigger instruction sent by the Worker end is received, a garbage cleaner in the Driver end is triggered to clean garbage data in the Worker end. Compared with the prior art, by applying the scheme provided by the embodiment of the invention, when the task receiving rate of the Driver end exceeds the task processing rate of the Worker end, the junk data can be cleaned in time, and the memory leakage of the Worker end is avoided.

Description

Junk data cleaning method and device, electronic equipment and readable storage medium

Technical Field

The invention relates to the technical field of big data, in particular to a junk data cleaning method and device, electronic equipment and a readable storage medium.

Background

At present, a Spark distributed computing framework plays an important role in data processing of massive information in practical application because of the characteristics of higher efficiency, higher running speed and the like.

The Spark distributed computing framework can comprise a Master terminal, a Worker terminal and a Driver terminal. The Master terminal is used for monitoring the current task processing condition and the memory use condition of the Worker terminal. The Driver end is used for receiving the tasks to be processed and obtaining a message queue formed by the received tasks to be processed; and distributing tasks to be processed to the Worker terminal according to the monitoring result of the Master terminal. And the Worker end processes the tasks to be processed and registers the garbage data generated in the task processing process according to the types of the tasks to be processed, and the registration information represents the garbage data which are cleaned by which objects. Wherein, the object that can carry out rubbish clearance includes: a GC (garbage collection) program running in a virtual machine at the Worker end, a GC program running in a virtual machine at the Driver end, and a garbage cleaner (ContextCleaner) in the Driver end.

Currently, the garbage data processing flow in the Spark distributed computing framework is as follows: when monitoring the occupancy rate of the memory of the virtual machine, the Worker can trigger the GC program of the virtual machine to clean the garbage data when reaching a preset threshold value, and further trigger the garbage cleaner to clean the garbage data by the GC program.

The inventor discovers that in the process of implementing the invention: when the task receiving rate of the Driver end exceeds the task processing rate of the Worker end, the Driver end distributes a large number of tasks to be processed to the Worker end, and at the moment, when the Worker end processes the tasks to generate garbage data, the garbage data generated by interaction of the Driver end and the Worker end is continuously increased, obviously, a large number of tasks to be processed and garbage data can be accumulated in the Worker end.

In this case, when the GC program triggers the garbage cleaner to perform garbage cleaning, the garbage cleaner may not be able to clean enough memory space in time to store new to-be-processed tasks and garbage data, so that memory leakage occurs at the Worker end, and the stability and task processing efficiency of the Spark distributed computing framework are affected.

Disclosure of Invention

The embodiment of the invention aims to provide a junk data cleaning method, a junk data cleaning device, electronic equipment and a readable storage medium, so that when the task receiving rate of a Driver end exceeds the task processing rate of a Worker end, junk data can be cleaned in time, memory leakage of the Worker end is avoided, and the stability and the task processing efficiency of Spark are improved.

The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for cleaning junk data, which is applied to a Driver end in a distributed computing framework; the method comprises the following steps:

monitoring the number of tasks to be processed included in the message queue, and calculating the pressure value of the message queue according to the monitored number;

judging whether the pressure value is greater than a preset pressure threshold value or not;

when the pressure value is judged to be larger than the preset pressure threshold value, sending a high-pressure state notification to a Worker end in the distributed computing frame so as to enable the Worker end to monitor a first performance index and a second performance index, triggering a GC program in a virtual machine of the Worker end when the index value of the first performance index is monitored to reach the first preset threshold value, and sending a trigger instruction to the Driver end when the index value of the second performance index is monitored to meet a first cleaning condition; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine is as follows: the index can represent the running state of the Worker end;

and when a triggering instruction sent by the Worker end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the Worker end.

As an implementation of an embodiment of the present invention,

the second performance indicator includes: the occupancy rate of all memories of the Worker end, the first cleaning condition includes: the occupancy rate reaches a first preset occupancy rate; alternatively, the first and second electrodes may be,

the second performance indicator includes: the load of the central processing unit of the Worker end, the first cleaning condition comprises: the load value reaches a first preset load value; alternatively, the first and second electrodes may be,

the second performance indicator includes: the occupancy rate of all the memories of the Worker end and the load of the central processing unit of the Worker end, wherein the first cleaning condition comprises the following steps: the occupancy rate reaches a first preset occupancy rate and the load value reaches a first preset load value.

As an implementation manner of the embodiment of the present invention, the method further includes:

when the pressure value is judged to be larger than the preset pressure threshold value, monitoring a third performance index and a fourth performance index; wherein the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine is as follows: the index can represent the running state of the Driver end;

when the index value of the third performance index is monitored to reach a second preset threshold value, triggering a GC program in the Driver-end virtual machine;

and when the index value of the fourth performance index is monitored to meet a second cleaning condition, triggering the garbage cleaner to clean garbage data in the Driver end.

As an implementation of an embodiment of the present invention,

the fourth performance metric includes: the occupancy rate of all memories of the Driver end, the second cleaning condition includes: the occupancy rate reaches a second preset occupancy rate; alternatively, the first and second electrodes may be,

the fourth performance metric includes: the load of the central processing unit of the Driver end, the second cleaning condition comprises: the load value reaches a second preset load value; alternatively, the first and second electrodes may be,

the fourth performance metric includes: the occupancy rate of all memories of the Driver end and the load of a central processing unit of the Driver end, the second cleaning condition comprises: the occupancy rate reaches a second preset occupancy rate and the load value reaches a second load value.

and when the pressure value is judged to be larger than the preset pressure threshold value, reducing the speed of receiving the task to be processed.

As an implementation manner of the embodiment of the present invention, the step of calculating the pressure value of the message queue according to the monitored number includes:

according to the number obtained by monitoring, calculating the pressure value of the message queue through a preset formula; wherein the preset formula is as follows:

P_t(s)＝P_t(n)*P_t(v)

wherein, P_t(s) is: at time t, the pressure value of the message queue; p_t(n) is: the event fullness rate of the Driver end at the time t; p_t(v) Comprises the following steps: the number change rate of the Driver end at the time t;

P_t(n)＝Num(t)/Num(max)

wherein num (t) is: at the moment t, the number of the tasks to be processed included in the message queue is equal to the number of the tasks to be processed included in the message queue; num (max) is: the number of the tasks to be processed which can be accommodated by the message queue;

P_t(v)＝Num(t-i)/Num(t)

wherein Num (t-i) is: at the time t-i, the number of the tasks to be processed included in the message queue, i, is a preset unit time length.

In a second aspect, an embodiment of the present invention provides a method for cleaning junk data, which is applied to a Worker end in a distributed computing framework; the method comprises the following steps:

receiving a high-voltage state notification sent by a Driver end in the distributed computing framework; wherein the high pressure state notification is: when the Driver end judges that the pressure value of the message queue is greater than a preset pressure threshold value, the Driver end sends a notification to the Worker end, wherein the pressure value is as follows: the Driver end calculates a pressure value according to the number of the to-be-processed tasks included in the message queue obtained through monitoring;

monitoring the first performance index and the second performance index; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine is as follows: the index can represent the running state of the Worker end;

triggering a GC program in a virtual machine of the Worker end when the index value of the first performance index reaches a first preset threshold value;

when the index value of the second performance index is monitored to meet a first cleaning condition, a trigger instruction is sent to the Driver end, so that after the Driver end receives the trigger instruction, a garbage cleaner in the Driver end is triggered to clean garbage data in a Worker end.

As an implementation of an embodiment of the present invention,

In a third aspect, an embodiment of the present invention provides a junk data cleaning apparatus, which is applied to a Driver end in a distributed computing framework; the device comprises:

the pressure value calculation module is used for monitoring the number of tasks to be processed included in the message queue and calculating the pressure value of the message queue according to the number obtained by monitoring;

the pressure value judging module is used for judging whether the pressure value is greater than a preset pressure threshold value or not;

the high-pressure state notification module is used for sending a high-pressure state notification to a Worker end in the distributed computing frame when the pressure value is judged to be larger than the preset pressure threshold value, so that the Worker end monitors a first performance index and a second performance index, when the index value of the first performance index is monitored to reach the first preset threshold value, a GC program in a virtual machine of the Worker end is triggered, and when the index value of the second performance index is monitored to meet a first cleaning condition, a trigger instruction is sent to the Driver end; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine is as follows: the index can represent the running state of the Worker end;

and the first junk data cleaning module is used for triggering a junk cleaner in the Driver end to clean the junk data in the Worker end when a triggering instruction sent by the Worker end is received.

As an implementation of an embodiment of the present invention,

As an implementation manner of the embodiment of the present invention, the apparatus further includes:

the first index monitoring module is used for monitoring a third performance index and a fourth performance index when the pressure value is judged to be larger than the preset pressure threshold value; wherein the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine is as follows: the index can represent the running state of the Driver end;

the first program triggering module is used for triggering a GC program in the Driver-end virtual machine when the index value of the third performance index is monitored to reach a second preset threshold value;

and the second garbage data cleaning module is used for triggering the garbage cleaner to clean the garbage data in the Driver end when the index value of the fourth performance index is monitored to meet a second cleaning condition.

As an implementation of an embodiment of the present invention,

and the speed reduction module is used for reducing the speed of receiving the tasks to be processed when the pressure value is judged to be larger than the preset pressure threshold value.

As an implementation manner of the embodiment of the present invention, the pressure value calculation module is specifically configured to:

P_t(s)＝P_t(n)*P_t(v)

wherein, P_t(s) is: at time t, the pressure value of the message queue; p_t(n) is: the Driver end is at tThe fill-up rate of the carved events; p_t(v) Comprises the following steps: the number change rate of the Driver end at the time t;

P_t(n)＝Num(t)/Num(max)

P_t(v)＝Num(t-i)/Num(t)

In a fourth aspect, an embodiment of the present invention provides a garbage data cleaning device, which is applied to a Worker end in a distributed computing framework; the device comprises:

the high-voltage notification receiving module is used for receiving a high-voltage state notification sent by a Driver end in the distributed computing framework; wherein the high pressure state notification is: when the Driver end judges that the pressure value of the message queue is greater than a preset pressure threshold value, the Driver end sends a notification to the Worker end, wherein the pressure value is as follows: the Driver end calculates a pressure value according to the number of the to-be-processed tasks included in the message queue obtained through monitoring;

the second index monitoring module is used for monitoring the first performance index and the second performance index; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine is as follows: the index can represent the running state of the Worker end;

the second program triggering module is used for triggering a GC program in the virtual machine of the Worker end when the index value of the first performance index reaches a first preset threshold value;

and the trigger instruction sending module is used for sending a trigger instruction to the Driver end when the index value of the second performance index is monitored to meet a first cleaning condition, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in a Worker end after receiving the trigger instruction.

As an implementation of an embodiment of the present invention,

In a fifth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

the processor is configured to implement, when executing the program stored in the memory, the method steps of any one of the garbage data cleaning methods provided from the perspective of the Driver end in the first aspect.

In a sixth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any one of the method steps of the junk data cleaning method provided from the aspect of the Worker end in the second aspect when executing the program stored in the memory.

In a seventh aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method steps in any one of the above-mentioned garbage data cleaning methods provided in the first aspect from the perspective of a Driver end are implemented.

In an eighth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any one of the method steps in the garbage data cleaning method provided in the second aspect from the standpoint of the Worker end.

As can be seen from the above, in the scheme provided in the embodiment of the present invention, when the Driver end monitors that the pressure value of the message queue is too large, the Driver end can notify the Worker end to monitor the first and second performance indexes. Furthermore, the Worker can trigger the GC program of the virtual machine or trigger a garbage processor in the Driver end to process garbage data in the Worker end according to the specific condition of the monitored first or second performance index. When the pressure of the message queue is too high, the Worker end can accumulate a large amount of tasks to be processed and garbage data, and by applying the method provided by the embodiment of the invention, the Worker end can trigger the object for garbage cleaning in time to clean the garbage data of the Worker end when the pressure of the message queue is too high, so that the phenomenon that the stability and the task processing efficiency of a Spark distributed computing framework are influenced due to memory leakage of the Worker end is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a garbage data cleaning method applied to a Driver end according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of a garbage data cleaning method applied to a Driver end according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of a garbage data cleaning method applied to a Worker end according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a garbage data cleaning apparatus applied to a Driver end according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a garbage data cleaning device applied to a Worker end according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a garbage data processing flow in a Spark distributed computing framework applied currently, a garbage cleaner located in a Driver end is triggered by a GC program of a virtual machine of a Worker end, and when a task receiving rate of the Driver end exceeds a task processing rate of the Worker end and the GC program triggers the garbage cleaner to perform garbage cleaning, the garbage cleaner may not be capable of timely cleaning enough memory space to store new tasks to be processed and garbage data, so that memory leakage occurs at the Worker end, and the stability and the task processing efficiency of the Spark distributed computing framework are affected. In order to solve the above problem, embodiments of the present invention provide a garbage data processing method from the perspective of a Driver end and a Worker end in a distributed computing framework, respectively.

It should be noted that the Spark distributed computing framework may include a Driver end, a Master end, and a Worker end, and the three are communicatively connected. The Driver end is used for receiving the tasks to be processed according to a certain speed, and arranging the tasks to be processed according to the receiving time to obtain the message queue. And the Master terminal is used for monitoring the current task processing condition and the memory use condition of the Worker terminal. Therefore, the Driver end can determine which Master end to send the task to be processed to according to the monitoring result of the Master end, so that the Master end can process the received task to be processed, and can register the garbage data generated in the processing process according to the type of the task to be processed, and the registration information represents which objects are used for cleaning the garbage data.

The Driver end, the Master end and the Worker end may be respectively disposed on different electronic devices, such as a tablet computer, a notebook computer, a desktop computer, etc., or two or three ends of the Driver end, the Master end and the Worker end may be disposed on the same electronic device, and on the electronic device, the different ends implement their respective functions through their respective virtual machines. This is all reasonable. Therefore, the garbage data processing method provided by the embodiment of the invention from the perspective of the Driver end in the distributed computing framework is applied to the electronic equipment running with the Driver end; the garbage data processing method provided by the embodiment of the invention from the aspect of the Worker end in the distributed computing framework is applied to the electronic equipment running with the Worker end.

In addition, in a Spark distributed computing framework, a Master end providing services and a plurality of Driver ends and a plurality of Worker ends may exist, the Driver ends and the Worker ends may both establish communication connection with the Master end providing services, one Driver end may be in communication connection with at least one Worker end, and one Worker end may be in communication connection with one Driver end. It should be noted that, in order to ensure the normal operation of the Spark distributed computing framework, a spare Master end may also be present in one Spark distributed computing framework, so as to ensure that when the Master end providing services cannot normally operate, the spare Master end may serve as the Master end providing services, and thus operate the spare Master end to continue to provide services for the Driver end and the Worker end establishing communication connection.

In the Spark distributed computing framework, the garbage data existing in the Worker end can be divided into two types: the first type is the garbage data generated by the interaction of a Driver end and a Worker end, and the second type is the garbage data generated by a Worker end processing task. The first type of garbage data can be cleaned through a GC program of a virtual machine of a Worker end, and the second type of garbage data can be processed through a garbage cleaner located at a Driver end.

First, a garbage data cleaning method provided from the perspective of the Driver end in the embodiment of the present invention is described below.

Fig. 1 is a schematic flow chart of a garbage data cleaning method provided from the perspective of a Driver end in the embodiment of the present invention. As shown in fig. 1, a garbage data cleaning method provided from the perspective of a Driver end may include the following steps:

s101: monitoring the number of tasks to be processed included in the message queue, and calculating the pressure value of the message queue according to the monitored number;

in the operation process of the distributed computing framework, the Driver terminal continuously receives tasks to be processed sent by other terminals, and obtains a message queue formed by arranging the tasks to be processed. According to the sequence of the receiving time of each task to be processed from early to late, the Driver end can continuously distribute the tasks in the message queue to the Worker end of each communication connection. Therefore, when the task receiving rate of the Driver end exceeds the task processing rate of the Worker end, the number of the tasks to be processed included in the message queue can be increased continuously until the maximum value of the tasks to be processed which can be accommodated by the message queue is reached because the tasks in the message queue cannot be processed in time. In this case, the message queue cannot receive a new to-be-processed task any more, so that the to-be-processed task sent to the Driver end by other terminals is lost, and a series of adverse results are caused.

For example, if a new to-be-processed task is related to garbage data cleaning of the Worker end, when the to-be-processed task is lost, memory leakage of the Worker end due to untimely garbage data cleaning can be easily caused.

Based on this, the Driver end can introduce a monitoring mechanism to monitor the number of the tasks to be processed included in the message queue, and calculate the pressure value of the message queue according to the monitored number.

The monitoring mechanism can be realized by writing a code for monitoring the number of the tasks to be processed included in the message queue at the Driver end, or setting a program interface at the Driver end, and introducing a monitoring program in the Master end in communication connection with the Driver end into the Driver end through the program interface, so that the Driver can monitor the number of the tasks to be processed included in the message queue by using the monitoring program. This is all reasonable.

In addition, in step S101, the Driver end may calculate the pressure value of the message queue in various ways, which is not limited in this embodiment of the present invention.

Optionally, in a specific implementation manner, the manner of calculating the pressure value of the message queue according to the monitored number in step S101 may be: and according to the number obtained by monitoring, calculating the pressure value of the message queue through a preset formula.

Specifically, the preset formula may be:

P_t(s)＝P_t(n)*P_t(v)

wherein, P_t(s) is: at time t, the pressure value of the message queue; p_t(n) is: event fullness rate of a Driver end at t moment; p_t(v) Comprises the following steps: the number change rate of the Driver end at the time t;

P_t(n)＝Num(t)/Num(max)

wherein num (t) is: at the moment t, the number of the tasks to be processed included in the message queue; num (max) is: the number of the tasks to be processed which can be accommodated by the message queue;

P_t(v)＝Num(t-i)/Num(t)

wherein Num (t-i) is: at the time t-i, the number of the tasks to be processed included in the message queue, i is the preset unit time length.

In this implementation manner, the Driver end may obtain the number of to-be-processed tasks included in the message queue at intervals of a preset unit duration, and store the number. And calculating the pressure value of the message queue according to the formula and the obtained number of the tasks to be processed and the number of the tasks to be processed which can be contained in the message queue corresponding to the Driver end. The specific data of the number of to-be-processed tasks that can be accommodated by the message queue may be set according to the requirement of the distributed computing framework in practical application, which is not limited in the embodiment of the present invention.

S102: judging whether the pressure value is greater than a preset pressure threshold value or not;

after the pressure value of the message queue is obtained through calculation, the Driver end can judge whether the calculated pressure value is larger than a preset pressure threshold value. Therefore, the Driver end can determine whether the number of the tasks to be processed included in the current message queue is too large according to the judgment result, so as to determine whether the risk of memory leakage is brought.

The pressure threshold may be set according to a requirement of the distributed computing framework in practical application, and the embodiment of the present invention does not limit a specific value of the pressure threshold.

S103: when the pressure value is judged to be larger than a preset pressure threshold value, sending a high-pressure state notification to a Worker end in the distributed computing frame so as to enable the Worker end to monitor a first performance index and a second performance index, triggering a GC program in a virtual machine of the Worker end when the index value of the first performance index is monitored to reach the first preset threshold value, and sending a triggering instruction to a Driver end when the index value of the second performance index is monitored to meet a first cleaning condition;

wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine, and the second performance index are as follows: the index of the running state of the Worker end can be represented.

When the pressure value is greater than the preset pressure threshold value, the Driver end can determine that the number of the tasks to be processed included in the current message queue is too large, and if the tasks are not processed, the risk of memory leakage may be brought. Therefore, in order to deal with the current situation with memory leakage risk, the Driver end can send a high-voltage state notification to the Worker end in communication connection, so that the Worker end can know the pressure condition of the current message queue.

Furthermore, after the Worker end receives the high-voltage state notification, the occupancy rate of the memory of the virtual machine of the Worker end and a second performance index capable of representing the running state of the Worker end can be monitored, whether the current Worker end needs to trigger the garbage cleaning object to clean the garbage data or not is determined according to the monitoring result, and the garbage data cleaning object needs to be triggered if the garbage cleaning object needs to be triggered. Therefore, the Worker end can clean the garbage data in time so as to avoid causing memory leakage and influencing the stability and the task processing efficiency of the distributed computing framework.

Specifically, when the occupancy rate of the memory of the virtual machine at the Worker end is monitored to reach a first preset threshold value, the Worker end can trigger a GC program of the virtual machine, and the GC program can clear garbage data generated by interaction between the Driver end and the Worker end. The first preset threshold may be set according to a requirement of a distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.

When the index value of the second performance index is monitored to meet the first cleaning condition, the Worker end can send a trigger instruction to the Driver end, so that after the Driver end receives the trigger instruction, a garbage cleaner at the Driver end can be triggered to clean garbage data generated by processing tasks in the Worker end. That is, after the Driver executes the step S103 and sends the high-voltage state notification to the Worker, the Driver may determine whether to execute the subsequent step S104 according to the information feedback condition of the Worker.

If the Worker end sends a trigger instruction to the Driver end, the Driver end can execute the subsequent step S104, otherwise, the subsequent step S104 is not executed.

S104: when a trigger instruction sent by the Worker end is received, a garbage cleaner in the Driver end is triggered to clean garbage data in the Worker end.

After the Driver triggers the garbage cleaner, the garbage cleaner can start to operate, so that the Worker end can actively send a message to the garbage cleaner by relying on the Remote Procedure Call (RPC) communication between the Worker end and the Driver end, and the message can be used for indicating the garbage cleaner to clean garbage data in the Worker end. Therefore, the Worker end can actively trigger the garbage cleaner to clean the garbage data of the Worker end, and the Worker end does not need to wait for the GC program in the virtual machine to trigger the garbage cleaner, so that the garbage data of the Worker end can be cleaned in time. The garbage data in the Worker end cleaned by the garbage cleaner is as follows: and the Worker registers garbage data which is represented by the information and is cleaned by the garbage cleaner.

It should be noted that, when the Worker receives the high-voltage state notification and monitors a second performance index capable of representing the operating state of the Worker, the second performance index may include multiple situations, and when the situations included in the second performance index are different, the corresponding first cleaning conditions are also different.

Specifically, when the second performance index includes: when the occupancy rate of all memories of the Worker end is high, the first cleaning condition may include: the occupancy rate reaches a first preset occupancy rate; alternatively, the first and second electrodes may be,

when the second performance metric includes: when the Worker is loaded by the cpu, the first cleaning condition may include: the load value reaches a first preset load value; alternatively, the first and second electrodes may be,

when the second performance metric includes: when the occupancy rate of all the memories of the Worker end and the load of the central processing unit of the Worker end are met, the first cleaning condition may include: the occupancy rate reaches a first preset occupancy rate and the load value reaches a first preset load value.

The first preset occupancy rate and the first preset load value may be set according to a requirement of a distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.

In addition, the above-mentioned garbage data that needs to be cleaned by the garbage cleaner located in the Driver end may include various types. For example, when the storage level of the elastic Distributed data sets (RDD) is storage level, executing garbage data generated by a "call persistence" task; executing garbage data generated by an aggregation calculation task by using an Accumulator; temporary data caused when a "shuffle" operation is performed; spam data generated at "broadcast"; and garbage data generated when the "CheckPoint" operation is performed on the RDD, and the like.

Optionally, when the Driver determines that the pressure value of the message queue is greater than the preset pressure threshold, it may be determined that the number of to-be-processed tasks included in the current message queue is too large, and the Worker cannot process and complete the to-be-processed tasks in time, so that the number of to-be-processed tasks in the message queue cannot be reduced in time. Therefore, the method for cleaning the garbage data provided by the Driver end in this embodiment may further include:

Therefore, when the Driver end judges that the pressure value of the message queue is greater than the preset pressure threshold value, the Driver end can slow down the increase speed of the number of the tasks to be processed included in the message queue by the speed of receiving the tasks to be processed, so that the speed of processing the message queue at the Worker end can be equal to the speed of receiving the tasks to be processed by the Driver end, and even exceeds the speed of receiving the tasks to be processed by the Driver end. Therefore, the pressure value of the message queue of the Driver end can be reduced, the number of tasks to be processed distributed to the Worker end by the Driver end is reduced, and the memory burden of the Worker end is reduced.

In addition, when the Driver end sends the task to be processed to the Worker end, the Driver end can also have garbage data generated by analyzing the task to be processed. Therefore, a memory leak may occur at the Driver end due to excessive garbage data. In this way, in order to avoid memory leak at the Driver side due to excessive garbage data, garbage data processing may be performed at the Driver side.

Based on the above requirement, as shown in fig. 2, on the basis of the embodiment shown in fig. 1, the method for processing garbage data provided by the Driver end in the embodiment of the present invention may further include the following steps:

s105: when the pressure value is judged to be larger than the preset pressure threshold value, monitoring a third performance index and a fourth performance index;

wherein the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine is as follows: the index can represent the running state of the Driver end;

s106: when the index value of the third performance index reaches a second preset threshold value, triggering a GC program in the Driver-end virtual machine;

the second preset threshold may be set according to a requirement of a distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.

S107: and when the index value of the fourth performance index is monitored to meet the second cleaning condition, triggering a garbage cleaner to clean garbage data in the Driver end.

The fourth performance index may include a plurality of conditions, and when the conditions included in the fourth performance index are different, the corresponding second cleaning conditions are also different.

Specifically, when the fourth performance index includes: when the occupancy rate of all memories of the Driver end is low, the second cleaning condition may include: the occupancy rate reaches a second preset occupancy rate; alternatively, the first and second electrodes may be,

when the fourth performance metric includes: when the load of the central processing unit at the Driver end is high, the second cleaning condition may include: the load value reaches a second preset load value; alternatively, the first and second electrodes may be,

when the fourth performance metric includes: when the occupancy rate of all the memories of the Driver end and the load of the central processing unit of the Driver end are met, the second cleaning condition may include: the occupancy rate reaches a second preset occupancy rate and the load value reaches a second load value.

It should be noted that, the second preset occupancy rate and the second preset load value may be set according to a requirement of the distributed computing framework in practical application, and the embodiment of the present invention is not specifically limited thereto.

When the pressure value is greater than the preset pressure threshold value, the Driver end can determine that the number of the tasks to be processed included in the current message queue is too large, and if the tasks are not processed, the risk of memory leakage may be brought. Therefore, in order to cope with the current situation where there is a risk of memory leak. Therefore, the Driver end may execute the step S105, monitor the occupancy rate of the memory of the virtual machine of the Driver end and a fourth performance index capable of representing the running state of the Driver end, determine whether the current Driver end needs to trigger the garbage cleaning object to clean the garbage data according to the monitoring result, and trigger what garbage data cleaning object is needed if the garbage cleaning object needs to be triggered. Therefore, the Driver end can clear the junk data in time, and the stability and the task processing efficiency of the distributed computing framework are influenced.

Specifically, when it is monitored that the occupancy rate of the memory of the Driver-side virtual machine reaches the second preset threshold, the Driver side may execute the step S106 to trigger the GC program of the virtual machine itself, and the GC program may clean the garbage data generated by interaction between the Driver side and the Worker side. The second preset threshold may be set according to a requirement of a distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.

When the index value of the fourth performance index is monitored to meet the second cleaning condition, the Driver end can trigger the garbage cleaner to clean garbage data generated by the processing task in the Driver end and garbage data generated by analyzing the to-be-processed task.

That is, after the Driver performs the step S103 and determines that the pressure value of the message queue is greater than the preset pressure value, the Driver determines whether to perform the subsequent steps S106 and S107 according to the occupancy rate of the memory of the Driver virtual machine and the monitoring result of the fourth performance index capable of representing the operating state of the Driver.

Next, a garbage data processing method provided by the embodiment of the present invention from the aspect of the Worker end will be described.

Fig. 3 is a schematic flow chart of a garbage data processing method provided from the viewpoint of a Worker end in the embodiment of the present invention. As shown in fig. 3, a garbage data processing method provided from the viewpoint of the Worker end may include the following steps:

s301: receiving a high-voltage state notification sent by a Driver end in a distributed computing framework;

wherein the high pressure state notification is: when the Driver end judges that the pressure value of the message queue is greater than the preset pressure threshold, the Driver end sends a notification to the Worker end, wherein the pressure value is as follows: the Driver end calculates the pressure value according to the number of the tasks to be processed included in the monitored message queue;

Therefore, the Driver end can introduce a monitoring mechanism to monitor the number of the tasks to be processed included in the message queue, and calculate the pressure value of the message queue according to the monitored number. And Driver can judge whether the pressure value is larger than a preset pressure threshold value. Therefore, the Driver end can determine whether the number of the tasks to be processed included in the current message queue is too large according to the judgment result, so as to determine whether the risk of memory leakage is brought.

Thus, the Worker end can receive the high-voltage state notification sent by the Driver end of the communication connection.

S302: monitoring the first performance index and the second performance index;

wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine, and the second performance index are as follows: the index of the running state of the Worker end can be represented;

s303: triggering a GC program in a virtual machine of a Worker end when monitoring that the index value of the first performance index reaches a first preset threshold value;

s304: when the index value of the second performance index is monitored to meet the first cleaning condition, a trigger instruction is sent to the Driver end, so that after the Driver end receives the trigger instruction, a garbage cleaner in the Driver end is triggered to clean garbage data in the Worker end.

After receiving the high-voltage state notification, the Worker can execute the step S302, monitor the occupancy rate of the memory of the virtual machine of the Worker and the second performance index capable of representing the running state of the Worker, determine whether the current Worker needs to trigger the garbage cleaning object to clean the garbage data according to the monitoring result, and trigger which garbage data cleaning object if the garbage cleaning object needs to be triggered. Therefore, the Worker end can clean the garbage data in time so as to avoid causing memory leakage and influencing the stability and the task processing efficiency of the distributed computing framework.

In the step S302, a monitoring mechanism may be introduced into the Worker end to monitor the occupancy rate of the memory of the virtual machine of the Worker end and a second performance index capable of representing the running state of the Worker end. The monitoring mechanism can be realized by writing a code for monitoring the occupancy rate of the internal memory of the virtual machine at the Worker end and the second performance index capable of representing the running state of the Worker end at the Worker end, or arranging a program interface at the Worker end, and introducing a monitoring program in the Master end in communication connection with the Worker end into the Worker end through the program interface for monitoring the occupancy rate of the internal memory of the virtual machine at the Worker end and the second performance index capable of representing the running state of the Worker end. This is all reasonable.

Specifically, in the step S303, when it is monitored that the occupancy rate of the memory of the Worker-side virtual machine reaches the first preset threshold, the Worker side may trigger the GC program of the virtual machine itself, and the GC program may clean the garbage data generated by the interaction between the Driver side and the Worker side. The first preset threshold may be set according to a requirement of a distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.

In the step S304, when it is monitored that the index value of the second performance index satisfies the first cleaning condition, the Worker end may send a trigger instruction to the Driver end, so that after the Driver end receives the trigger instruction, the garbage cleaner at the Driver end may be triggered to clean the garbage data generated by the processing task in the Worker end.

That is to say, when the Worker executes the step S302, it is determined whether to execute the subsequent steps S303 and S304 according to the occupancy rate of the memory of the Worker virtual machine and the monitoring result of the second performance index capable of representing the running state of the Worker.

Furthermore, the second type of data that needs to be cleaned by the garbage cleaner located in the Driver end may include multiple types. For example, when the storage level of the elastic Distributed data sets (RDD) is storage level, executing garbage data generated by a "call persistence" task; executing garbage data generated by an aggregation calculation task by using an Accumulator; temporary data caused when a "shuffle" operation is performed; spam data generated at "broadcast"; and garbage data generated when the "CheckPoint" operation is performed on the RDD, and the like.

As can be seen from the above, in the scheme provided in the embodiment of the present invention, when the Driver end monitors that the pressure value of the message queue is too large, the Driver end can notify the Worker end to monitor the first and second performance indexes. Furthermore, the Worker can trigger the GC program of the virtual machine or trigger a garbage processor in the Driver end to process garbage data in the Worker end according to the specific condition of the monitored first or second performance index. Due to the fact that when the pressure of the message queue is too large, the Worker end can accumulate a large number of tasks to be processed and garbage data, when the pressure of the message queue is too large, the Worker end can trigger an object for garbage cleaning in time to clean the garbage data of the Worker end by applying the method provided by the embodiment of the invention, and therefore the phenomenon that the stability and the task processing efficiency of a Spark distributed computing framework are affected due to memory leakage of the Worker end is avoided.

Compared with the junk data cleaning method provided from the perspective of the Driver end in the embodiment of the invention, the embodiment of the invention also provides a junk data cleaning device from the perspective of the Driver end.

Fig. 4 is a schematic structural diagram of a garbage data cleaning apparatus provided from the perspective of a Driver end in the embodiment of the present invention. As shown in fig. 4, the apparatus may include the following modules:

the pressure value calculation module 410 is configured to monitor the number of to-be-processed tasks included in the message queue, and calculate a pressure value of the message queue according to the monitored number;

a pressure value judging module 420, configured to judge whether a pressure value is greater than a preset pressure threshold;

the high-pressure state notification module 430 is configured to send a high-pressure state notification to a Worker end in the distributed computing frame when it is determined that the pressure value is greater than the preset pressure threshold, so that the Worker end monitors a first performance index and a second performance index, trigger a GC program in a virtual machine of the Worker end when it is monitored that an index value of the first performance index reaches the first preset threshold, and send a trigger instruction to a Driver end when it is monitored that an index value of the second performance index meets a first cleaning condition; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine, and the second performance index are as follows: the index of the running state of the Worker end can be represented;

the first junk data cleaning module 440 is configured to trigger a junk cleaner in the Driver end to clean junk data in the Worker end when a trigger instruction sent by the Worker end is received.

As an implementation manner provided by the embodiment of the present invention, the second performance index may include: the occupancy rate of all memories of the Worker end, the first cleaning condition may include: the occupancy rate reaches a first preset occupancy rate; alternatively, the first and second electrodes may be,

the second performance index may include: the load of the central processing unit at the Worker end, the first cleaning condition may include: the load value reaches a first preset load value; alternatively, the first and second electrodes may be,

the second performance index may include: the occupancy rate of all the memories of the Worker end and the load of the central processing unit of the Worker end, and the first cleaning condition may include: the occupancy rate reaches a first preset occupancy rate and the load value reaches a first preset load value.

As an implementation manner of the embodiment of the present invention, the first garbage data cleaning apparatus applied to a Driver end in a distributed computing framework, provided by the above embodiment of the present invention, may further include:

the first program triggering module is used for triggering a GC program in the virtual machine of the Driver end when the index value of the third performance index reaches a second preset threshold value;

and the second garbage data cleaning module is used for triggering the garbage cleaner to clean the garbage data in the Driver end when the index value of the fourth performance index is monitored to meet the second cleaning condition.

As an implementation manner of the embodiment of the present invention, the fourth performance index may include: the occupancy rate of all memories of the Driver end, the second cleaning condition may include: the occupancy rate reaches a second preset occupancy rate; alternatively, the first and second electrodes may be,

the fourth performance index may include: the load of the central processing unit at the Driver end, the second cleaning condition may include: the load value reaches a second preset load value; alternatively, the first and second electrodes may be,

the fourth performance index may include: the occupancy rate of all memories of the Driver end and the load of the central processing unit of the Driver end, the second cleaning condition may include: the occupancy rate reaches a second preset occupancy rate and the load value reaches a second load value.

and the speed reducing module is used for reducing the speed of receiving the tasks to be processed when the pressure value is judged to be larger than the preset pressure threshold value.

As an implementation manner of the embodiment of the present invention, the pressure value calculating module 410 may specifically be configured to: according to the number obtained by monitoring, calculating the pressure value of the message queue through a preset formula; wherein, the preset formula is as follows:

P_t(s)＝P_t(n)*P_t(v)

P_t(n)＝Num(t)/Num(max)

P_t(v)＝Num(t-i)/Num(t)

Compared with the junk data cleaning method provided from the aspect of the Worker end in the embodiment of the invention, the embodiment of the invention also provides a junk data cleaning device from the aspect of the Worker end.

Fig. 5 is a schematic structural diagram of a garbage data cleaning device provided from the viewpoint of a Worker end in the embodiment of the present invention. As shown in fig. 5, the apparatus may include the following modules:

a high-voltage notification receiving module 510, configured to receive a high-voltage status notification sent by a Driver in a distributed computing framework; wherein the high pressure state notification is: when the Driver end judges that the pressure value of the message queue is greater than the preset pressure threshold, the Driver end sends a notification to the Worker end, wherein the pressure value is as follows: the Driver end calculates the pressure value according to the number of the tasks to be processed included in the monitored message queue;

a second index monitoring module 520, configured to monitor the first performance index and the second performance index; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine, and the second performance index are as follows: the index of the running state of the Worker end can be represented;

the second program triggering module 530 is configured to trigger a GC program in a virtual machine of the Worker end when it is monitored that the index value of the first performance index reaches a first preset threshold;

and the trigger instruction sending module 540 is configured to send a trigger instruction to the Driver end when it is monitored that the index value of the second performance index meets the first cleaning condition, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the Worker end after receiving the trigger instruction.

An embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,

a memory 603 for storing a computer program;

the processor 601 is configured to implement the garbage data cleaning method provided from the perspective of the Driver end in the embodiment of the present invention when executing the program stored in the memory 603.

Specifically, the garbage data cleaning method includes:

when the pressure value is judged to be larger than a preset pressure threshold value, sending a high-pressure state notification to a Worker end in the distributed computing frame so as to enable the Worker end to monitor a first performance index and a second performance index, triggering a GC program in a virtual machine of the Worker end when the index value of the first performance index is monitored to reach the first preset threshold value, and sending a triggering instruction to a Driver end when the index value of the second performance index is monitored to meet a first cleaning condition; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine, and the second performance index are as follows: the index of the running state of the Worker end can be represented;

when a trigger instruction sent by the Worker end is received, a garbage cleaner in the Driver end is triggered to clean garbage data in the Worker end.

It should be noted that other implementation manners of the first garbage data cleaning method applied to the Driver end in the distributed computing framework, which are implemented by the processor 601 executing the program stored in the memory 603, are the same as the first garbage data cleaning method applied to the Driver end in the distributed computing framework, which is provided in the foregoing method embodiment section, and are not described herein again.

Another electronic device is provided in the embodiments of the present invention, as shown in fig. 7, and includes a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,

a memory 703 for storing a computer program;

the processor 701 is configured to implement the garbage data cleaning method provided from the marker end in the embodiment of the present invention when executing the program stored in the memory 703.

Specifically, the garbage data cleaning method includes:

receiving a high-voltage state notification sent by a Driver end in a distributed computing framework; wherein the high pressure state notification is: when the Driver end judges that the pressure value of the message queue is greater than the preset pressure threshold, the Driver end sends a notification to the Worker end, wherein the pressure value is as follows: the Driver end calculates the pressure value according to the number of the tasks to be processed included in the monitored message queue;

monitoring the first performance index and the second performance index; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine, and the second performance index are as follows: the index of the running state of the Worker end can be represented;

triggering a GC program in a virtual machine of a Worker end when monitoring that the index value of the first performance index reaches a first preset threshold value;

when the index value of the second performance index is monitored to meet the first cleaning condition, a trigger instruction is sent to the Driver end, so that after the Driver end receives the trigger instruction, a garbage cleaner in the Driver end is triggered to clean garbage data in the Worker end.

It should be noted that other implementation manners of the second garbage data cleaning method applied to the Worker end in the distributed computing framework, which is implemented by the processor 701 executing the program stored in the memory 703, are the same as the second garbage data cleaning method applied to the Worker end in the distributed computing framework, which is provided in the foregoing method embodiment section, and are not described herein again.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The embodiment of the invention also provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the first junk data cleaning method applied to a Driver end in a distributed computing framework, provided by the embodiment of the invention, is implemented.

Specifically, the first garbage data cleaning method applied to the Driver end in the distributed computing framework provided by the embodiment of the present invention includes:

It should be noted that other implementation manners of the first garbage data cleaning method applied to the Driver end in the distributed computing framework, which are implemented when the computer program is executed by the processor, are the same as the first garbage data cleaning method applied to the Driver end in the distributed computing framework, which is provided in the foregoing method embodiment section, and are not described herein again.

The embodiment of the invention also provides another computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the second junk data cleaning method applied to a Worker end in a distributed computing framework provided by the embodiment of the invention is realized.

Specifically, the second garbage data cleaning method applied to the Worker end in the distributed computing framework provided by the embodiment of the present invention includes:

It should be noted that other implementation manners of the second garbage data cleaning method applied to the Worker end in the distributed computing framework, which are implemented when the computer program is executed by the processor, are the same as the embodiments of the second garbage data cleaning method applied to the Worker end in the distributed computing framework, which are provided in the foregoing method embodiment section, and are not described here again.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, the electronic device embodiment and the computer-readable storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A junk data cleaning method is characterized by being applied to a Driver end in a distributed computing framework; the method comprises the following steps:

when the pressure value is judged to be larger than the preset pressure threshold value, sending a high-pressure state notification to a Worker end in the distributed computing frame so as to enable the Worker end to monitor a first performance index and a second performance index, triggering a garbage recovery GC (gas chromatography) program in a virtual machine of the Worker end when the index value of the first performance index is monitored to reach the first preset threshold value, and sending a triggering instruction to the Driver end when the index value of the second performance index is monitored to meet a first cleaning condition; wherein the first performance index is: the occupancy rate of the memory of the Worker-side virtual machine is as follows: the index can represent the running state of the Worker end;

2. The method of claim 1,

3. The method of claim 1, further comprising:

4. The method of claim 3,

5. The method according to any one of claims 1-4, further comprising:

6. The method according to any one of claims 1 to 4, wherein the step of calculating the pressure value of the message queue based on the monitored number comprises:

P_t(s)＝P_t(n)*P_t(v)

P_t(n)＝Num(t)/Num(max)

P_t(v)＝Num(t-i)/Num(t)

7. A junk data cleaning method is characterized in that the junk data cleaning method is applied to a Worker end in a distributed computing framework; the method comprises the following steps:

8. The method of claim 7,

9. The junk data cleaning device is characterized by being applied to a Driver end in a distributed computing framework; the device comprises:

10. The apparatus of claim 9,

11. The apparatus of claim 9, further comprising:

12. The apparatus of claim 11,

13. The apparatus according to any one of claims 9-12, further comprising:

14. The device according to any one of claims 9 to 12, wherein the pressure value calculation module is specifically configured to:

P_t(s)＝P_t(n)*P_t(v)

P_t(n)＝Num(t)/Num(max)

P_t(v)＝Num(t-i)/Num(t)

15. A junk data cleaning device is characterized by being applied to a Worker end in a distributed computing framework; the device comprises:

16. The apparatus of claim 15,

17. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

18. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 7 to 8 when executing a program stored in the memory.

19. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.

20. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any of the claims 7-8.