CN111078389B - Junk data cleaning method and device, electronic equipment and readable storage medium - Google Patents

Junk data cleaning method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111078389B
CN111078389B CN201811213300.2A CN201811213300A CN111078389B CN 111078389 B CN111078389 B CN 111078389B CN 201811213300 A CN201811213300 A CN 201811213300A CN 111078389 B CN111078389 B CN 111078389B
Authority
CN
China
Prior art keywords
workbench
preset
performance index
value
driver
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811213300.2A
Other languages
Chinese (zh)
Other versions
CN111078389A (en
Inventor
徐福生
邓长春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201811213300.2A priority Critical patent/CN111078389B/en
Publication of CN111078389A publication Critical patent/CN111078389A/en
Application granted granted Critical
Publication of CN111078389B publication Critical patent/CN111078389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Refuse-Collection Vehicles (AREA)
  • Memory System (AREA)

Abstract

The embodiment of the invention provides a garbage data cleaning method. The method is applied to the Driver end in the distributed computing framework, and comprises the following steps: calculating the pressure value of the message queue according to the monitored quantity; judging whether the pressure value is larger than a preset pressure threshold value or not; when the pressure value is judged to be larger than a preset pressure threshold, a high-pressure state notification is sent to a Worker end in the distributed computing framework, so that the Worker end triggers a GC program in a virtual machine of the Worker end when the Worker end monitors that the index value of a first performance index reaches the first preset threshold, and sends a trigger instruction to a Driver end when the Worker end monitors that the index value of a second performance index meets a first cleaning condition; when a triggering instruction sent by the workbench end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the workbench end. Compared with the prior art, by applying the scheme provided by the embodiment of the invention, when the task receiving rate of the Driver end exceeds the task processing rate of the workbench end, the garbage data can be cleaned in time, so that the memory leakage of the workbench end is avoided.

Description

Junk data cleaning method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a method and apparatus for cleaning junk data, an electronic device, and a readable storage medium.
Background
Currently, the Spark distributed computing framework plays an important role in data processing of massive information in practical application because of the characteristics of higher efficiency, higher running speed and the like.
The Spark distributed computing framework can comprise a Master end, a workbench end and a Driver end. The Master end is used for monitoring the current task processing condition and the memory use condition of the workbench end. The Driver end is used for receiving the task to be processed and obtaining a message queue formed by the received task to be processed; and distributing tasks to be processed to the workbench end according to the monitoring result of the Master end. The workbench end processes the task to be processed, registers junk data generated in the task processing process according to the type of the task to be processed, and the registration information characterizes objects for cleaning the junk data. Wherein, the object that can carry out rubbish clearance includes: GC (Garbage Collection) program running in virtual machine of the Worker side, GC program running in virtual machine of the Driver side, and garbage cleaner (ContextCleaner) in the Driver side.
At present, the garbage data processing flow in the Spark distributed computing framework is as follows: when the Worker end monitors the occupancy rate of the memory of the virtual machine, and reaches a preset threshold, the Worker end triggers a GC program of the virtual machine to clean the garbage data, and further, the GC program triggers a garbage cleaner to clean the garbage data.
The inventors found in the course of implementing the present invention that: when the task receiving rate of the Driver terminal exceeds the task processing rate of the workbench terminal, the Driver terminal distributes a large number of tasks to be processed to the workbench terminal, and when the workbench terminal processes the tasks to generate garbage data, the garbage data generated by interaction between the Driver terminal and the workbench terminal is continuously increased, and obviously, a large number of tasks to be processed and garbage data can be accumulated in the workbench terminal.
Under such circumstances, when the GC program triggers the garbage cleaner to clean garbage, the garbage cleaner may not be able to clean enough memory space in time to store new tasks to be processed and garbage data, thereby causing memory leakage at the workbench end, and affecting the stability and task processing efficiency of the Spark distributed computing frame.
Disclosure of Invention
The embodiment of the invention aims to provide a garbage data cleaning method, a device, electronic equipment and a readable storage medium, so that garbage data can be cleaned in time when the task receiving rate of a Driver end exceeds the task processing rate of a workbench end, memory leakage of the workbench end is avoided, and the stability and task processing efficiency of Spark are improved.
The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for cleaning junk data, which is applied to a Driver end in a distributed computing frame; the method comprises the following steps:
monitoring the number of tasks to be processed in a message queue, and calculating the pressure value of the message queue according to the monitored number;
judging whether the pressure value is larger than a preset pressure threshold value or not;
when the pressure value is judged to be larger than the preset pressure threshold value, a high-pressure state notification is sent to a workbench end in the distributed computing framework, so that the workbench end monitors a first performance index and a second performance index, when the fact that the index value of the first performance index reaches the first preset threshold value is monitored, a GC program in a virtual machine of the workbench end is triggered, and when the fact that the index value of the second performance index meets a first cleaning condition is monitored, a trigger instruction is sent to the Driver end; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
when a triggering instruction sent by the workbench end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the workbench end.
As one implementation of the embodiments of the present invention,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
the second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
As an implementation manner of the embodiment of the present invention, the method further includes:
when the pressure value is judged to be larger than the preset pressure threshold value, monitoring a third performance index and a fourth performance index; wherein, the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine, and the fourth performance index is: an index capable of representing the running state of the Driver end;
when the index value of the third performance index reaches a second preset threshold value, triggering a GC program in the Driver-side virtual machine;
When the index value of the fourth performance index meets a second cleaning condition, triggering the garbage cleaner to clean the garbage data in the Driver end.
As one implementation of the embodiments of the present invention,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end, and the second cleaning condition includes: the occupancy rate reaches a second preset occupancy rate; or,
the fourth performance indicator includes: the load of the central processing unit at the Driver end, and the second cleaning condition includes: the load value reaches a second preset load value; or,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end and the load of the central processing unit of the Driver end, and the second cleaning condition includes: the occupancy reaches a second preset occupancy and the load value reaches a second load value.
As an implementation manner of the embodiment of the present invention, the method further includes:
and when the pressure value is judged to be larger than the preset pressure threshold value, reducing the rate of receiving the task to be processed.
As an implementation manner of the embodiment of the present invention, the step of calculating the pressure value of the message queue according to the monitored number includes:
Calculating the pressure value of the message queue through a preset formula according to the number obtained by monitoring; wherein, the preset formula is:
P t (s)=P t (n)*P t (v)
wherein P is t (s) is: at time t, the pressure value of the message queue; p (P) t (n) is: event occupancy rate of the Driver terminal at the time t; p (P) t (v) The method comprises the following steps: the quantity change rate of the Driver terminal at the time t;
P t (n)=Num(t)/Num(max)
wherein Num (t) is: at time t, the number of tasks to be processed included in the message queue; num (max) is: the number of tasks to be processed that can be accommodated by the message queue;
P t (v)=Num(t-i)/Num(t)
wherein Num (t-i) is: at time t-i, the number of tasks to be processed included in the message queue, i is a preset unit duration.
In a second aspect, an embodiment of the present invention provides a garbage data cleaning method, which is applied to a workbench end in a distributed computing framework; the method comprises the following steps:
receiving a high-voltage state notification sent by a Driver end in the distributed computing framework; wherein, the high voltage state notification is: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification is sent to the workbench end, and the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue and are obtained through monitoring;
Monitoring the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
and when the index value of the second performance index meets a first cleaning condition, sending a trigger instruction to the Driver end, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
As one implementation of the embodiments of the present invention,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
the second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
In a third aspect, an embodiment of the present invention provides a garbage data cleaning device, which is applied to a Driver end in a distributed computing framework; the device comprises:
the pressure value calculation module is used for monitoring the number of tasks to be processed included in the message queue and calculating the pressure value of the message queue according to the monitored number;
the pressure value judging module is used for judging whether the pressure value is larger than a preset pressure threshold value or not;
the high-pressure state notification module is used for sending a high-pressure state notification to a workbench end in the distributed computing framework when the pressure value is judged to be larger than the preset pressure threshold value, so that the workbench end monitors a first performance index and a second performance index, triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index is monitored to reach the first preset threshold value, and sending a trigger instruction to the Driver end when the index value of the second performance index is monitored to meet a first cleaning condition; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
The first garbage data cleaning module is used for triggering the garbage cleaner in the Driver terminal to clean the garbage data in the workbench terminal when receiving the triggering instruction sent by the workbench terminal.
As one implementation of the embodiments of the present invention,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
the second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
As an implementation manner of the embodiment of the present invention, the apparatus further includes:
the first index monitoring module is used for monitoring a third performance index and a fourth performance index when the pressure value is judged to be larger than the preset pressure threshold value; wherein, the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine, and the fourth performance index is: an index capable of representing the running state of the Driver end;
The first program triggering module is used for triggering a GC program in the Driver end virtual machine when the index value of the third performance index reaches a second preset threshold value;
and the second garbage data cleaning module is used for triggering the garbage cleaner to clean the garbage data in the Driver terminal when the index value of the fourth performance index meets a second cleaning condition.
As one implementation of the embodiments of the present invention,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end, and the second cleaning condition includes: the occupancy rate reaches a second preset occupancy rate; or,
the fourth performance indicator includes: the load of the central processing unit at the Driver end, and the second cleaning condition includes: the load value reaches a second preset load value; or,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end and the load of the central processing unit of the Driver end, and the second cleaning condition includes: the occupancy reaches a second preset occupancy and the load value reaches a second load value.
As an implementation manner of the embodiment of the present invention, the apparatus further includes:
And the rate reducing module is used for reducing the rate of receiving the task to be processed when the pressure value is judged to be larger than the preset pressure threshold value.
As an implementation manner of the embodiment of the present invention, the pressure value calculating module is specifically configured to:
calculating the pressure value of the message queue through a preset formula according to the number obtained by monitoring; wherein, the preset formula is:
P t (s)=P t (n)*P t (v)
wherein P is t (s) is: at time t, the pressure value of the message queue; p (P) t (n) is: event occupancy rate of the Driver terminal at the time t; p (P) t (v) The method comprises the following steps: the quantity change rate of the Driver terminal at the time t;
P t (n)=Num(t)/Num(max)
wherein Num (t) is: at time t, the number of tasks to be processed included in the message queue; num (max) is: the number of tasks to be processed that can be accommodated by the message queue;
P t (v)=Num(t-i)/Num(t)
wherein Num (t-i) is: at time t-i, the number of tasks to be processed included in the message queue, i is a preset unit duration.
In a fourth aspect, an embodiment of the present invention provides a garbage data cleaning device, which is applied to a workbench end in a distributed computing framework; the device comprises:
the high-voltage notification receiving module is used for receiving a high-voltage state notification sent by a Driver end in the distributed computing framework; wherein, the high voltage state notification is: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification is sent to the workbench end, and the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue and are obtained through monitoring;
The second index monitoring module is used for monitoring the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
the second program triggering module is used for triggering a GC program in the virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
and the trigger instruction sending module is used for sending a trigger instruction to the Driver end when the index value of the second performance index meets the first cleaning condition, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
As one implementation of the embodiments of the present invention,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
The second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the method steps of the garbage data cleaning method provided from the perspective of the Driver end in the first aspect when executing the program stored in the memory.
In a sixth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the method steps of the garbage data cleaning method provided from the viewpoint of the workbench end in the second aspect when executing the program stored in the memory.
In a seventh aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements any one of the method steps provided in the first aspect from the perspective of a Driver end.
In an eighth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements any one of the method steps in a garbage data cleaning method provided from a workbench end point of view in the second aspect.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, and by applying the method provided by the embodiment of the invention, the workbench end can timely trigger the object for garbage cleaning to clean the garbage data of the workbench end when the pressure of the message queue is too high, so that the occurrence of memory leakage of the workbench end is avoided, and the stability and task processing efficiency of the Spark distributed computing frame are affected.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for cleaning junk data applied to a Driver end according to an embodiment of the present invention;
fig. 2 is another flow chart of a method for cleaning garbage data applied to a Driver end according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a method for cleaning garbage data applied to a workbench end according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a garbage data cleaning device applied to a Driver end according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a garbage data cleaning device applied to a workbench end according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the garbage data processing flow in the Spark distributed computing framework of the current application, a garbage cleaner in a Driver terminal is triggered by a GC program of a virtual machine of the Worker terminal, and when the task receiving rate of the Driver terminal exceeds the task processing rate of the Worker terminal, the GC program triggers the garbage cleaner to clean garbage, the garbage cleaner can not timely clean enough memory space to store new tasks to be processed and garbage data, so that memory leakage occurs at the Worker terminal, and the stability and the task processing efficiency of the Spark distributed computing framework are affected. In order to solve the problems, the embodiment of the invention provides a garbage data processing method from the angles of a Driver end and a Worker end in a distributed computing framework.
It should be noted that the Spark distributed computing framework may include a Driver end, a Master end, and a Worker end, and the three may be connected in communication. The Driver end is used for receiving tasks to be processed according to a certain rate, and arranging the tasks to be processed according to the sequence of the receiving time to obtain a message queue. The Master end is used for monitoring the current task processing condition and the memory use condition of the workbench end. Therefore, the Driver terminal can determine which Master terminal is sent with the task to be processed according to the monitoring result of the Master terminal, so that the Master terminal can process the received task to be processed, register the garbage data generated in the processing process according to the type of the task to be processed, and the registration information characterizes the objects from which the garbage data is cleaned.
The Driver end, the Master end and the workbench end may be respectively disposed on different electronic devices, for example, a tablet computer, a notebook computer, a desktop computer, or the like, or two or three of the Driver end, the Master end and the workbench end may be disposed on the same electronic device, and on the electronic device, the different ends implement respective functions through respective virtual machines. This is reasonable. Therefore, the garbage data processing method provided by the embodiment of the invention from the angle of the Driver end in the distributed computing framework is applied to the electronic equipment running with the Driver end; the garbage data processing method provided by the embodiment of the invention from the viewpoint of the workbench end in the distributed computing framework is applied to the electronic equipment running with the workbench end.
In addition, in a Spark distributed computing framework, there may be a Master end that provides a service, and multiple drivers and multiple workers, where each of the drivers and the workers may establish a communication connection with the Master end that provides the service, and one of the drivers may be communicatively connected with at least one of the workers, and one of the workers may establish a communication connection with one of the drivers. It should be noted that, in order to ensure the normal operation of the Spark distributed computing frame, a standby Master end may also exist in one Spark distributed computing frame, so as to ensure that when the service-providing Master end cannot normally operate, the standby Master end may be used as the service-providing Master end, so that the service is continuously provided for the Driver end and the Worker end that establish the communication connection by operating the standby Master end.
In Spark distributed computing frameworks, the garbage data present in the workbench end can be divided into two categories: the first class is garbage data generated by interaction between a Driver end and a workbench end, and the second class is garbage data generated by a workbench end processing task. The first type of garbage data can be cleaned through a GC program of a virtual machine of the workbench end, and the second type of garbage data can be processed through a garbage cleaner positioned at the Driver end.
Next, a method for cleaning garbage data provided by the embodiment of the invention from the perspective of the Driver end is first described.
Fig. 1 is a schematic flow chart of a method for cleaning garbage data provided from the perspective of a Driver end in an embodiment of the present invention. As shown in fig. 1, the method for cleaning garbage data provided from the perspective of the Driver end may include the following steps:
s101: monitoring the number of tasks to be processed in the message queue, and calculating the pressure value of the message queue according to the monitored number;
in the running process of the distributed computing framework, the Driver end continuously receives tasks to be processed sent by other terminals, and a message queue formed by arranging the tasks to be processed is obtained. According to the order of the receiving time of each task to be processed from early to late, the Driver end can continuously distribute the tasks in the message queue to the workbench end of each communication connection. Therefore, when the task receiving rate of the Driver exceeds the task processing rate of the Worker, the number of the tasks to be processed in the message queue can be increased continuously until the maximum value of the tasks to be processed which can be accommodated in the message queue is reached because the tasks in the message queue cannot be processed in time. In this case, the message queue cannot receive a new task to be processed any more, so that the task to be processed sent to the Driver end by other terminals is lost, which causes a series of adverse effects.
For example, if a new task to be processed is related to garbage data cleaning of the workbench, when the task to be processed is lost, memory leakage of the workbench caused by untimely garbage data cleaning is easily caused.
Based on the above, the Driver end can introduce a monitoring mechanism to monitor the number of tasks to be processed included in the message queue, and calculate the pressure value of the message queue according to the monitored number.
The monitoring mechanism can be realized by writing a section of code for monitoring the number of the tasks to be processed in the message queue at the Driver end, or setting a program interface at the Driver end, and introducing a monitoring program in a Master end in communication connection with the Driver end into the Driver end through the program interface so that the Driver can monitor the number of the tasks to be processed in the message queue by using the monitoring program. This is reasonable.
In addition, in the step S101, the Driver end may calculate the pressure value of the message queue in various manners, which is not specifically limited in the embodiment of the present invention.
Optionally, in a specific implementation manner, the manner of calculating the pressure value of the message queue according to the monitored number in step S101 may be: and calculating the pressure value of the message queue through a preset formula according to the number obtained through monitoring.
Specifically, the preset formula may be:
P t (s)=P t (n)*P t (v)
wherein P is t (s) is: at time t, the pressure value of the message queue; p (P) t (n) is: event occupancy rate of Driver terminal at time t; p (P) t (v) The method comprises the following steps: the quantity change rate of the Driver terminal at the time t;
P t (n)=Num(t)/Num(max)
wherein Num (t) is: at time t, the number of tasks to be processed included in the message queue; num (max) is: the number of tasks to be processed that the message queue can accommodate;
P t (v)=Num(t-i)/Num(t)
wherein Num (t-i) is: at time t-i, the number of tasks to be processed included in the message queue, i is a preset unit duration.
In this implementation manner, the Driver may acquire the number of tasks to be processed included in the message queue at intervals of a preset unit duration, and store the number of tasks to be processed. And calculating the pressure value of the message queue according to the formula according to the acquired number of the tasks to be processed and the number of the tasks to be processed which can be accommodated by the message queue corresponding to the Driver end. Specific data of the number of tasks to be processed that can be accommodated by the message queue can be set according to the requirement of the distributed computing framework in practical application, and the embodiment of the invention is not limited to this.
S102: judging whether the pressure value is larger than a preset pressure threshold value or not;
After the pressure value of the message queue is calculated, the Driver end can judge whether the calculated pressure value is larger than a preset pressure threshold value. In this way, the Driver end can determine whether the number of tasks to be processed included in the current message queue is too large according to the judging result, so as to determine whether the risk of memory leakage is brought.
The pressure threshold may be set according to the requirement of the distributed computing framework in practical application, and in this regard, the embodiment of the present invention does not limit the specific value of the pressure threshold.
S103: when the pressure value is judged to be larger than a preset pressure threshold value, a high-pressure state notification is sent to a workbench end in a distributed computing framework, so that the workbench end monitors a first performance index and a second performance index, when the index value of the first performance index is monitored to reach the first preset threshold value, a GC program in a virtual machine of the workbench end is triggered, and when the index value of the second performance index is monitored to meet a first cleaning condition, a trigger instruction is sent to a Driver end;
wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: and the index of the operation state of the workbench end can be represented.
When the pressure value is greater than the preset pressure threshold, the Driver end can determine that the number of tasks to be processed included in the current message queue is too large, and if the tasks are not processed, the risk of memory leakage can be brought. Therefore, in order to cope with the current situation that the memory leakage risk exists, the Driver end can send a high-voltage state notification to the Worker end in communication connection, so that the Worker end can know the pressure condition of the current message queue.
And after the Worker receives the high-voltage state notification, the occupancy rate of the memory of the virtual machine at the Worker end and a second performance index capable of representing the running state of the Worker end can be monitored, whether the current Worker end needs to trigger a garbage cleaning object to clean garbage data or not is determined according to the monitoring result, and if the garbage cleaning object needs to be triggered, what garbage data cleaning object needs to be triggered. Therefore, the workbench end can timely clean the garbage data, so that memory leakage is avoided, and the stability of the distributed computing frame and the task processing efficiency are prevented from being affected.
Specifically, when the occupancy rate of the memory of the virtual machine at the workbench end reaches a first preset threshold, the workbench end can trigger a GC program of the virtual machine, and the GC program can clean garbage data generated by interaction between the Driver end and the workbench end. The first preset threshold may be set according to a requirement of the distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.
When the index value of the second performance index is monitored to meet the first cleaning condition, the workbench end can send a trigger instruction to the Driver end, so that the Driver end can trigger a garbage cleaner positioned at the Driver end to clean garbage data generated by processing tasks in the workbench end after receiving the trigger instruction. That is, after executing the step S103, the Driver end sends a notification of the high voltage state to the workbench end, and then determines whether to execute the subsequent step S104 according to the feedback condition of the information of the workbench end.
If the workbench end sends a trigger instruction to the Driver end, the Driver end may execute the subsequent step S104, otherwise, the Driver end may not execute the subsequent step S104.
S104: when a triggering instruction sent by the workbench end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the workbench end.
After triggering the garbage cleaner, the Driver can start to operate, so that the Worker end can actively send a message to the garbage cleaner depending on communication between the Worker end and an RPC (Remote Procedure Call Protocol ) of the Driver end, and the message can be used for instructing the garbage cleaner to clean garbage data in the Worker end. Therefore, the workbench end can actively trigger the garbage cleaner to clean the garbage data of the workbench without waiting for a GC program in the virtual machine to trigger the garbage cleaner, so that the garbage data of the workbench can be cleaned in time. The garbage data in the workbench end cleaned by the garbage cleaner is as follows: and registering the junk data which is characterized by the information and is cleaned by the junk cleaner in the workbench.
It should be noted that, when the workbench end receives the high-voltage state notification and monitors the second performance index capable of representing the operation state of the workbench end, the second performance index may include multiple conditions, and when the conditions included in the second performance index are different, the corresponding first cleaning conditions are also different.
Specifically, when the second performance index includes: when the occupancy rate of all the memories of the workbench end is the same, the first cleaning condition may include: the occupancy rate reaches a first preset occupancy rate; or,
when the second performance index includes: when the load of the cpu at the workbench is met, the first cleaning condition may include: the load value reaches a first preset load value; or,
when the second performance index includes: when the occupancy rate of all the memories of the workbench and the load of the central processing unit of the workbench are calculated, the first cleaning condition may include: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
The first preset occupancy rate and the first preset load value may be set according to a requirement of the distributed computing framework in practical application, which is not particularly limited in the embodiment of the present invention.
In addition, the above-mentioned garbage data that needs to be cleaned by the garbage cleaner located in the Driver end may include various types. For example, when the storage level of the elastic data set (Resilient Distributed Datasets, RDD) is storage level. Performing garbage data generated by the 'aggregation calculation' task by using an Accumulator; temporary data caused when a "shuffle" operation is performed; garbage data generated at the time of "broadcasting"; and garbage data generated when a "CheckPoint" operation is performed on RDD, etc.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, and by applying the method provided by the embodiment of the invention, the workbench end can timely trigger the object for garbage cleaning to clean the garbage data of the workbench end when the pressure of the message queue is too high, so that the occurrence of memory leakage of the workbench end is avoided, and the stability and task processing efficiency of the Spark distributed computing frame are affected.
Optionally, when the Driver end determines that the pressure value of the message queue is greater than the preset pressure threshold, it may be determined that the number of tasks to be processed included in the current message queue is too large, and the Worker end cannot process and complete the tasks to be processed in time, so that the number of tasks to be processed in the message queue cannot be reduced in time. Therefore, the method for cleaning the garbage data provided by the Driver end in this embodiment may further include:
and when the pressure value is judged to be larger than the preset pressure threshold value, reducing the rate of receiving the task to be processed.
Thus, when the Driver end judges that the pressure value of the message queue is greater than the preset pressure threshold, the Driver end can slow down the increasing speed of the number of the tasks to be processed, which are included in the message queue, through receiving the rate of the tasks to be processed, so that the speed of the work end for processing the message queue can be equal to or even exceed the rate of the Driver end for receiving the tasks to be processed. Therefore, the pressure value of the message queue of the Driver end can be reduced, the number of tasks to be processed distributed from the Driver end to the workbench end is further reduced, and the memory burden of the workbench end is reduced.
Corresponding to that memory leakage may occur due to excessive garbage data at the workbench end in the embodiment of the invention, garbage data generated by interaction between the workbench end and the Driver end and garbage data generated by processing tasks at the workbench end may also exist in the Driver end, and in addition, garbage data generated by analyzing the tasks to be processed when the Driver end sends the tasks to be processed to the workbench end may also exist in the Driver end. Therefore, memory leakage may occur at the Driver end due to excessive garbage data. Therefore, in order to avoid memory leakage of the Driver end caused by excessive garbage data, garbage data processing can be performed on the Driver end.
Based on the above requirements, as shown in fig. 2, on the basis of the embodiment shown in fig. 1, the garbage data processing method provided by the embodiment of the invention from the Driver end may further include the following steps:
s105: when the pressure value is judged to be larger than the preset pressure threshold value, monitoring a third performance index and a fourth performance index;
wherein, the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine is as follows: an index capable of representing the running state of the Driver end;
s106: when the index value of the third performance index reaches a second preset threshold value, triggering a GC program in the Driver-side virtual machine;
the second preset threshold may be set according to a requirement of the distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.
S107: when the index value of the fourth performance index meets the second cleaning condition, triggering the garbage cleaner to clean the garbage data in the Driver end.
The fourth performance index may include a plurality of situations, and when the situations included in the fourth performance index are different, the corresponding second cleaning conditions are different.
Specifically, when the fourth performance index includes: when the occupancy rate of all the memories of the Driver end is equal to the occupancy rate, the second cleaning condition may include: the occupancy rate reaches a second preset occupancy rate; or,
when the fourth performance indicator includes: when the load of the central processing unit at the Driver end is met, the second cleaning condition may include: the load value reaches a second preset load value; or,
when the fourth performance indicator includes: when the occupancy rate of all the memories of the Driver end and the load of the central processing unit of the Driver end are met, the second cleaning condition may include: the occupancy reaches a second preset occupancy and the load value reaches a second load value.
It should be noted that, the second preset occupancy rate and the second preset load value may be set according to a requirement of the distributed computing framework in practical application, which is not limited in particular.
When the pressure value is greater than the preset pressure threshold, the Driver end can determine that the number of tasks to be processed included in the current message queue is too large, and if the tasks are not processed, the risk of memory leakage can be brought. Therefore, in order to cope with the current situation where there is a risk of memory leak. Therefore, the Driver end may execute the step S105, monitor the occupancy rate of the memory of the virtual machine at the Driver end and the fourth performance index that can represent the running state of the Driver end, and determine, according to the monitoring result, whether the current Driver end needs to trigger the garbage cleaning object to clean the garbage data, and if the garbage cleaning object needs to be triggered, what kind of garbage data cleaning object needs to be triggered. Therefore, the Driver end can timely clean the garbage data, and stability and task processing efficiency of the distributed computing framework are affected.
Specifically, when it is monitored that the occupancy rate of the memory of the virtual machine at the Driver end reaches the second preset threshold, the Driver end may execute the step S106 to trigger the GC program of the virtual machine, where the GC program may clean the garbage data generated by the interaction between the Driver end and the Worker end. The second preset threshold may be set according to a requirement of the distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.
When the index value of the fourth performance index meets the second cleaning condition, the Driver terminal can trigger the garbage cleaner to clean garbage data generated by processing tasks and garbage data generated by analyzing the tasks to be processed in the Driver terminal.
That is, after executing the step S103, the Driver determines that the pressure value of the message queue is greater than the preset pressure value, and determines whether to execute the subsequent step S106 and the step S107 according to the occupancy rate of the memory of the virtual machine at the Driver and the monitoring result of the fourth performance index capable of representing the running state of the Driver.
The following describes a garbage data processing method provided by the embodiment of the invention from the viewpoint of a workbench end.
Fig. 3 is a schematic flow chart of a garbage data processing method provided from the viewpoint of a workbench end according to an embodiment of the present invention. As shown in fig. 3, a garbage data processing method provided from the viewpoint of a workbench end may include the following steps:
s301: receiving a high-voltage state notification sent by a Driver end in a distributed computing framework;
wherein, the high voltage state is notified as follows: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification sent to the workbench end is sent, wherein the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue obtained through monitoring;
in the running process of the distributed computing framework, the Driver end continuously receives tasks to be processed sent by other terminals, and a message queue formed by arranging the tasks to be processed is obtained. According to the order of the receiving time of each task to be processed from early to late, the Driver end can continuously distribute the tasks in the message queue to the workbench end of each communication connection. Therefore, when the task receiving rate of the Driver exceeds the task processing rate of the Worker, the number of the tasks to be processed in the message queue can be increased continuously until the maximum value of the tasks to be processed which can be accommodated in the message queue is reached because the tasks in the message queue cannot be processed in time. In this case, the message queue cannot receive a new task to be processed any more, so that the task to be processed sent to the Driver end by other terminals is lost, which causes a series of adverse effects.
Therefore, the Driver end can introduce a monitoring mechanism to monitor the number of tasks to be processed in the message queue, and calculate the pressure value of the message queue according to the monitored number. And further Driver can determine whether the pressure value is greater than a preset pressure threshold. In this way, the Driver end can determine whether the number of tasks to be processed included in the current message queue is too large according to the judging result, so as to determine whether the risk of memory leakage is brought.
When the pressure value is greater than the preset pressure threshold, the Driver end can determine that the number of tasks to be processed included in the current message queue is too large, and if the tasks are not processed, the risk of memory leakage can be brought. Therefore, in order to cope with the current situation that the memory leakage risk exists, the Driver end can send a high-voltage state notification to the Worker end in communication connection, so that the Worker end can know the pressure condition of the current message queue.
Thus, the workbench end can receive the high-voltage state notification sent by the Driver end of the communication connection.
S302: monitoring the first performance index and the second performance index;
wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
S303: triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
s304: when the index value of the second performance index meets the first cleaning condition, a trigger instruction is sent to the Driver end, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
After receiving the high-voltage state notification, the Worker end may execute the step S302, monitor the occupancy rate of the memory of the virtual machine at the Worker end and the second performance index capable of representing the running state of the Worker end, and determine, according to the monitoring result, whether the current Worker end needs to trigger the garbage cleaning object to clean the garbage data, and if the garbage cleaning object needs to be triggered, what kind of garbage data cleaning object needs to be triggered. Therefore, the workbench end can timely clean the garbage data, so that memory leakage is avoided, and the stability of the distributed computing frame and the task processing efficiency are prevented from being affected.
In the step S302, the Worker end may introduce a monitoring mechanism to monitor the occupancy rate of the memory of the virtual machine of the Worker end and a second performance index capable of representing the running state of the Worker end. The monitoring mechanism can be realized by writing a section of code for monitoring the occupancy rate of the memory of the virtual machine at the workbench end and the second performance index capable of representing the running state of the workbench end at the workbench end, or setting a program interface at the workbench end, and introducing a monitoring program in a Master end in communication connection with the workbench end into the workbench end through the program interface for monitoring the occupancy rate of the memory of the virtual machine at the workbench end and the second performance index capable of representing the running state of the workbench end. This is reasonable.
Specifically, in the step S303, when it is monitored that the occupancy rate of the memory of the virtual machine at the workbench end reaches the first preset threshold, the workbench end may trigger a GC program of the virtual machine of itself, where the GC program may clean the garbage data generated by the interaction between the Driver end and the workbench end. The first preset threshold may be set according to a requirement of the distributed computing framework in practical application, which is not specifically limited in the embodiment of the present invention.
In the step S304, when it is monitored that the index value of the second performance index meets the first cleaning condition, the Worker end may send a trigger instruction to the Driver end, so that the Driver end may trigger the garbage cleaner located at the Driver end to clean the garbage data generated by the processing task in the Worker end after receiving the trigger instruction.
That is, the Worker end executes the step S302, and determines whether to execute the subsequent step S303 and step S304 according to the monitored result of the occupancy rate of the memory of the virtual machine of the Worker end and the second performance index capable of characterizing the operation state of the Worker end.
It should be noted that, when the workbench end receives the high-voltage state notification and monitors the second performance index capable of representing the operation state of the workbench end, the second performance index may include multiple conditions, and when the conditions included in the second performance index are different, the corresponding first cleaning conditions are also different.
Specifically, when the second performance index includes: when the occupancy rate of all the memories of the workbench end is the same, the first cleaning condition may include: the occupancy rate reaches a first preset occupancy rate; or,
when the second performance index includes: when the load of the cpu at the workbench is met, the first cleaning condition may include: the load value reaches a first preset load value; or,
when the second performance index includes: when the occupancy rate of all the memories of the workbench and the load of the central processing unit of the workbench are calculated, the first cleaning condition may include: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
The first preset occupancy rate and the first preset load value may be set according to a requirement of the distributed computing framework in practical application, which is not particularly limited in the embodiment of the present invention.
In addition, the second type of data that needs to be cleaned by the garbage cleaner located in the Driver may include multiple types. For example, when the storage level of the elastic data set (Resilient Distributed Datasets, RDD) is storage level. Performing garbage data generated by an aggregation calculation task by using an Accumulator; temporary data caused when a "shuffle" operation is performed; garbage data generated at the time of "broadcasting"; and garbage data generated when a "CheckPoint" operation is performed on RDD, etc.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, so that the workbench end can timely trigger an object for garbage cleaning to clean the garbage data when the pressure of the message queue is too high, and the situation that the workbench end is subjected to memory leakage and the stability and the task processing efficiency of the Spark distributed computing frame are affected is avoided.
Compared with the method for cleaning the garbage data provided by the embodiment of the invention from the angle of the Driver end, the embodiment of the invention also provides a device for cleaning the garbage data from the angle of the Driver end.
Fig. 4 is a schematic structural diagram of a garbage data cleaning device provided from the perspective of a Driver end according to an embodiment of the present invention. As shown in fig. 4, the apparatus may include the following modules:
The pressure value calculating module 410 is configured to monitor the number of tasks to be processed included in the message queue, and calculate a pressure value of the message queue according to the monitored number;
the pressure value judging module 420 is configured to judge whether the pressure value is greater than a preset pressure threshold;
the high-pressure state notification module 430 is configured to send a high-pressure state notification to a Worker end in the distributed computing framework when the pressure value is determined to be greater than a preset pressure threshold, so that the Worker end monitors a first performance index and a second performance index, trigger a GC program in a virtual machine of the Worker end when an index value of the first performance index is monitored to reach the first preset threshold, and send a trigger instruction to a Driver end when an index value of the second performance index is monitored to meet a first cleaning condition; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
the first garbage data cleaning module 440 is configured to trigger a garbage cleaner in the Driver terminal to clean garbage data in the Worker terminal when receiving a trigger instruction sent by the Worker terminal.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, and by applying the method provided by the embodiment of the invention, the workbench end can timely trigger the object for garbage cleaning to clean the garbage data of the workbench end when the pressure of the message queue is too high, so that the occurrence of memory leakage of the workbench end is avoided, and the stability and task processing efficiency of the Spark distributed computing frame are affected.
As an implementation manner provided by the embodiment of the present invention, the second performance index may include: the occupancy rate of all memories of the workbench end may include: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index may include: the load of the cpu at the workbench end may include: the load value reaches a first preset load value; or,
the second performance index may include: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench may include: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
As an implementation manner of the embodiment of the present invention, the first garbage data cleaning device applied to a Driver end in a distributed computing frame provided in the foregoing embodiment of the present invention may further include:
the first index monitoring module is used for monitoring the third performance index and the fourth performance index when the pressure value is judged to be larger than the preset pressure threshold value; wherein, the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine is as follows: an index capable of representing the running state of the Driver end;
The first program triggering module is used for triggering a GC program in the Driver end virtual machine when the index value of the third performance index reaches a second preset threshold value;
and the second garbage data cleaning module is used for triggering the garbage cleaner to clean the garbage data in the Driver end when the index value of the fourth performance index is monitored to meet the second cleaning condition.
As an implementation manner of the embodiment of the present invention, the fourth performance index may include: the occupancy rate of all memories of the Driver end, the second cleaning condition may include: the occupancy rate reaches a second preset occupancy rate; or,
the fourth performance index may include: the load of the cpu at the Driver end, the second cleaning condition may include: the load value reaches a second preset load value; or,
the fourth performance index may include: the occupancy rate of all the memories of the Driver end and the load of the central processing unit of the Driver end, and the second cleaning condition may include: the occupancy reaches a second preset occupancy and the load value reaches a second load value.
As an implementation manner of the embodiment of the present invention, the first garbage data cleaning device applied to a Driver end in a distributed computing frame provided in the foregoing embodiment of the present invention may further include:
And the rate reducing module is used for reducing the rate of receiving the task to be processed when the pressure value is judged to be larger than the preset pressure threshold value.
As an implementation of the embodiment of the present invention, the pressure value calculating module 410 may specifically be configured to: calculating the pressure value of the message queue through a preset formula according to the number obtained by monitoring; wherein, the preset formula is:
P t (s)=P t (n)*P t (v)
wherein P is t (s) is: at time t, the pressure value of the message queue; p (P) t (n) is: event occupancy rate of Driver terminal at time t; p (P) t (v) The method comprises the following steps: the quantity change rate of the Driver terminal at the time t;
P t (n)=Num(t)/Num(max)
wherein Num (t) is: at time t, the number of tasks to be processed included in the message queue; num (max) is: the number of tasks to be processed that the message queue can accommodate;
P t (v)=Num(t-i)/Num(t)
wherein Num (t-i) is: at time t-i, the number of tasks to be processed included in the message queue, i is a preset unit duration.
Compared with the method for cleaning the garbage data provided by the embodiment of the invention from the viewpoint of the workbench end, the embodiment of the invention also provides a device for cleaning the garbage data from the viewpoint of the workbench end.
Fig. 5 is a schematic structural diagram of a garbage data cleaning device according to an embodiment of the present invention from the perspective of a workbench end. As shown in fig. 5, the apparatus may include the following modules:
The high-voltage notification receiving module 510 is configured to receive a high-voltage status notification sent by a Driver end in the distributed computing framework; wherein, the high voltage state is notified as follows: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification sent to the workbench end is sent, wherein the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue obtained through monitoring;
the second index monitoring module 520 is configured to monitor the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
the second program triggering module 530 is configured to trigger a GC program in the virtual machine of the Worker end when it is monitored that the index value of the first performance index reaches a first preset threshold;
and the trigger instruction sending module 540 is configured to send a trigger instruction to the Driver end when it is monitored that the index value of the second performance index meets the first cleaning condition, so that the Driver end triggers the garbage cleaner in the Driver end to clean the garbage data in the workbench end after receiving the trigger instruction.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, and by applying the method provided by the embodiment of the invention, the workbench end can timely trigger the object for garbage cleaning to clean the garbage data of the workbench end when the pressure of the message queue is too high, so that the occurrence of memory leakage of the workbench end is avoided, and the stability and task processing efficiency of the Spark distributed computing frame are affected.
As an implementation manner provided by the embodiment of the present invention, the second performance index may include: the occupancy rate of all memories of the workbench end may include: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index may include: the load of the cpu at the workbench end may include: the load value reaches a first preset load value; or,
the second performance index may include: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench may include: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the garbage data cleaning method provided by the above embodiment of the present invention from the perspective of the Driver end when executing the program stored in the memory 603.
Specifically, the garbage data cleaning method includes:
monitoring the number of tasks to be processed in the message queue, and calculating the pressure value of the message queue according to the monitored number;
judging whether the pressure value is larger than a preset pressure threshold value or not;
when the pressure value is judged to be larger than a preset pressure threshold value, a high-pressure state notification is sent to a workbench end in a distributed computing framework, so that the workbench end monitors a first performance index and a second performance index, when the index value of the first performance index is monitored to reach the first preset threshold value, a GC program in a virtual machine of the workbench end is triggered, and when the index value of the second performance index is monitored to meet a first cleaning condition, a trigger instruction is sent to a Driver end; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
when a triggering instruction sent by the workbench end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the workbench end.
It should be noted that, other implementation manners of the first garbage data cleaning method applied to the Driver end in the distributed computing frame, which are implemented by the processor 601 executing the program stored in the memory 603, are the same as the first garbage data cleaning method applied to the Driver end in the distributed computing frame provided in the foregoing method embodiment, and are not repeated herein.
The embodiment of the present invention also provides another electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, the memory 703 complete communication with each other through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement a garbage data cleaning method provided from the viewpoint of a workbench according to the embodiment of the present invention when executing a program stored in the memory 703.
Specifically, the garbage data cleaning method includes:
receiving a high-voltage state notification sent by a Driver end in a distributed computing framework; wherein, the high voltage state is notified as follows: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification sent to the workbench end is sent, wherein the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue obtained through monitoring;
monitoring the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
Triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
when the index value of the second performance index meets the first cleaning condition, a trigger instruction is sent to the Driver end, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
It should be noted that, other implementation manners of the second garbage data cleaning method applied to the workbench end in the distributed computing frame, which are implemented by the processor 701 executing the program stored in the memory 703, are the same as the second garbage data cleaning method applied to the workbench end in the distributed computing frame provided in the foregoing method embodiment, and are not repeated herein.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, and by applying the method provided by the embodiment of the invention, the workbench end can timely trigger the object for garbage cleaning to clean the garbage data of the workbench end when the pressure of the message queue is too high, so that the occurrence of memory leakage of the workbench end is avoided, and the stability and task processing efficiency of the Spark distributed computing frame are affected.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the first garbage data cleaning method applied to the Driver end in the distributed computing framework is realized.
Specifically, the first garbage data cleaning method applied to the Driver end in the distributed computing framework provided by the embodiment of the invention includes:
monitoring the number of tasks to be processed in the message queue, and calculating the pressure value of the message queue according to the monitored number;
judging whether the pressure value is larger than a preset pressure threshold value or not;
when the pressure value is judged to be larger than a preset pressure threshold value, a high-pressure state notification is sent to a workbench end in a distributed computing framework, so that the workbench end monitors a first performance index and a second performance index, when the index value of the first performance index is monitored to reach the first preset threshold value, a GC program in a virtual machine of the workbench end is triggered, and when the index value of the second performance index is monitored to meet a first cleaning condition, a trigger instruction is sent to a Driver end; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
When a triggering instruction sent by the workbench end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the workbench end.
It should be noted that, other implementation manners of the first garbage data cleaning method applied to the Driver end in the distributed computing framework, which are implemented when the computer program is executed by the processor, are the same as the embodiments of the first garbage data cleaning method applied to the Driver end in the distributed computing framework and provided in the foregoing method embodiment section, and are not repeated herein.
The embodiment of the invention also provides another computer readable storage medium, and a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the second garbage data cleaning method applied to the workbench end in the distributed computing framework is realized.
Specifically, the second garbage data cleaning method applied to a workbench end in a distributed computing framework provided by the embodiment of the invention includes:
receiving a high-voltage state notification sent by a Driver end in a distributed computing framework; wherein, the high voltage state is notified as follows: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification sent to the workbench end is sent, wherein the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue obtained through monitoring;
Monitoring the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of representing the running state of the workbench end;
triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
when the index value of the second performance index meets the first cleaning condition, a trigger instruction is sent to the Driver end, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
It should be noted that, other implementation manners of the second garbage data cleaning method applied to the workbench end in the distributed computing framework, which are implemented when the computer program is executed by the processor, are the same as the second garbage data cleaning method embodiment applied to the workbench end in the distributed computing framework and are provided in the foregoing method embodiment section, and are not repeated herein.
In the above, in the solution provided by the embodiment of the present invention, the Driver end may notify the Worker end to monitor the first and second performance indexes when the pressure value of the message queue is monitored to be too large. Furthermore, the Worker end can trigger the GC program of the virtual machine or trigger the garbage processor in the Driver end to perform garbage data on the Worker end according to the specific conditions of the monitored first or second performance indexes. When the pressure of the message queue is too high, a large amount of tasks to be processed and garbage data can be accumulated by the workbench end, and by applying the method provided by the embodiment of the invention, the workbench end can timely trigger the object for garbage cleaning to clean the garbage data of the workbench end when the pressure of the message queue is too high, so that the occurrence of memory leakage of the workbench end is avoided, and the stability and task processing efficiency of the Spark distributed computing frame are affected.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, the electronic device embodiments, the computer-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the section of the method embodiments for relevance.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (20)

1. A garbage data cleaning method is characterized by being applied to a Driver end in a distributed computing framework; the method comprises the following steps:
monitoring the number of tasks to be processed in a message queue, and calculating the pressure value of the message queue according to the monitored number;
judging whether the pressure value is larger than a preset pressure threshold value or not;
when the pressure value is judged to be larger than the preset pressure threshold value, a high-pressure state notification is sent to a workbench end in the distributed computing framework, so that the workbench end monitors a first performance index and a second performance index, when the index value of the first performance index is monitored to reach the first preset threshold value, a garbage collection GC program in a virtual machine of the workbench end is triggered, and when the index value of the second performance index is monitored to meet a first cleaning condition, a trigger instruction is sent to the Driver end; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
When a triggering instruction sent by the workbench end is received, triggering a garbage cleaner in the Driver end to clean garbage data in the workbench end.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
the second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
3. The method according to claim 1, wherein the method further comprises:
when the pressure value is judged to be larger than the preset pressure threshold value, monitoring a third performance index and a fourth performance index; wherein, the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine, and the fourth performance index is: an index capable of representing the running state of the Driver end;
When the index value of the third performance index reaches a second preset threshold value, triggering a GC program in the Driver-side virtual machine;
when the index value of the fourth performance index meets a second cleaning condition, triggering the garbage cleaner to clean the garbage data in the Driver end.
4. The method of claim 3, wherein the step of,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end, and the second cleaning condition includes: the occupancy rate reaches a second preset occupancy rate; or,
the fourth performance indicator includes: the load of the central processing unit at the Driver end, and the second cleaning condition includes: the load value reaches a second preset load value; or,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end and the load of the central processing unit of the Driver end, and the second cleaning condition includes: the occupancy reaches a second preset occupancy and the load value reaches a second load value.
5. The method according to any one of claims 1-4, further comprising:
and when the pressure value is judged to be larger than the preset pressure threshold value, reducing the rate of receiving the task to be processed.
6. The method according to any of claims 1-4, wherein the step of calculating a pressure value of the message queue based on the monitored number comprises:
calculating the pressure value of the message queue through a preset formula according to the number obtained by monitoring; wherein, the preset formula is:
P t (s)=P t (n)*P t (v)
wherein P is t (s) is: at time t, the pressure value of the message queue; p (P) t (n) is: event occupancy rate of the Driver terminal at the time t; p (P) t (v) The method comprises the following steps: the quantity change rate of the Driver terminal at the time t;
P t (n)=Num(t)/Num(max)
wherein Num (t) is: at time t, the number of tasks to be processed included in the message queue; num (max) is: the number of tasks to be processed that can be accommodated by the message queue;
P t (v)=Num(t-i)/Num(t)
wherein Num (t-i) is: at time t-i, the number of tasks to be processed included in the message queue, i is a preset unit duration.
7. A garbage data cleaning method is characterized by being applied to a workbench end in a distributed computing framework; the method comprises the following steps:
receiving a high-voltage state notification sent by a Driver end in the distributed computing framework; wherein, the high voltage state notification is: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification is sent to the workbench end, and the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue and are obtained through monitoring;
Monitoring the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
and when the index value of the second performance index meets a first cleaning condition, sending a trigger instruction to the Driver end, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
8. The method of claim 7, wherein the step of determining the position of the probe is performed,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
the second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
9. The garbage data cleaning device is characterized by being applied to a Driver end in a distributed computing framework; the device comprises:
the pressure value calculation module is used for monitoring the number of tasks to be processed included in the message queue and calculating the pressure value of the message queue according to the monitored number;
the pressure value judging module is used for judging whether the pressure value is larger than a preset pressure threshold value or not;
the high-pressure state notification module is used for sending a high-pressure state notification to a workbench end in the distributed computing framework when the pressure value is judged to be larger than the preset pressure threshold value, so that the workbench end monitors a first performance index and a second performance index, triggering a GC program in a virtual machine of the workbench end when the index value of the first performance index is monitored to reach the first preset threshold value, and sending a trigger instruction to the Driver end when the index value of the second performance index is monitored to meet a first cleaning condition; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
The first garbage data cleaning module is used for triggering the garbage cleaner in the Driver terminal to clean the garbage data in the workbench terminal when receiving the triggering instruction sent by the workbench terminal.
10. The apparatus of claim 9, wherein the device comprises a plurality of sensors,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
the second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
11. The apparatus of claim 9, wherein the apparatus further comprises:
the first index monitoring module is used for monitoring a third performance index and a fourth performance index when the pressure value is judged to be larger than the preset pressure threshold value; wherein, the third performance index is: the occupancy rate of the memory of the Driver-side virtual machine, and the fourth performance index is: an index capable of representing the running state of the Driver end;
The first program triggering module is used for triggering a GC program in the Driver end virtual machine when the index value of the third performance index reaches a second preset threshold value;
and the second garbage data cleaning module is used for triggering the garbage cleaner to clean the garbage data in the Driver terminal when the index value of the fourth performance index meets a second cleaning condition.
12. The apparatus of claim 11, wherein the device comprises a plurality of sensors,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end, and the second cleaning condition includes: the occupancy rate reaches a second preset occupancy rate; or,
the fourth performance indicator includes: the load of the central processing unit at the Driver end, and the second cleaning condition includes: the load value reaches a second preset load value; or,
the fourth performance indicator includes: the occupancy rate of all memories of the Driver end and the load of the central processing unit of the Driver end, and the second cleaning condition includes: the occupancy reaches a second preset occupancy and the load value reaches a second load value.
13. The apparatus according to any one of claims 9-12, wherein the apparatus further comprises:
And the rate reducing module is used for reducing the rate of receiving the task to be processed when the pressure value is judged to be larger than the preset pressure threshold value.
14. The apparatus according to any one of claims 9-12, wherein the pressure value calculation module is specifically configured to:
calculating the pressure value of the message queue through a preset formula according to the number obtained by monitoring; wherein, the preset formula is:
P t (s)=P t (n)*P t (v)
wherein P is t (s) is: at time t, the pressure value of the message queue; p (P) t (n) is: event occupancy rate of the Driver terminal at the time t; p (P) t (v) The method comprises the following steps: the quantity change rate of the Driver terminal at the time t;
P t (n)=Num(t)/Num(max)
wherein Num (t) is: at time t, the number of tasks to be processed included in the message queue; num (max) is: the number of tasks to be processed that can be accommodated by the message queue;
P t (v)=Num(t-i)/Num(t)
wherein Num (t-i) is: at time t-i, the number of tasks to be processed included in the message queue, i is a preset unit duration.
15. The garbage data cleaning device is characterized by being applied to a workbench end in a distributed computing framework; the device comprises:
the high-voltage notification receiving module is used for receiving a high-voltage state notification sent by a Driver end in the distributed computing framework; wherein, the high voltage state notification is: when the Driver end judges that the pressure value of the message queue is larger than a preset pressure threshold, a notification is sent to the workbench end, and the pressure value is as follows: the Driver end calculates a pressure value according to the number of tasks to be processed, which are included in the message queue and are obtained through monitoring;
The second index monitoring module is used for monitoring the first performance index and the second performance index; wherein, the first performance index is: the occupancy rate of the memory of the virtual machine at the workbench end is as follows: an index capable of characterizing an operational state of the workbench end;
the second program triggering module is used for triggering a GC program in the virtual machine of the workbench end when the index value of the first performance index reaches a first preset threshold;
and the trigger instruction sending module is used for sending a trigger instruction to the Driver end when the index value of the second performance index meets the first cleaning condition, so that the Driver end triggers a garbage cleaner in the Driver end to clean garbage data in the workbench end after receiving the trigger instruction.
16. The apparatus of claim 15, wherein the device comprises a plurality of sensors,
the second performance index includes: the occupancy rate of all memories of the workbench end, and the first cleaning condition comprises: the occupancy rate reaches a first preset occupancy rate; or,
the second performance index includes: the load of the central processing unit at the workbench end, and the first cleaning condition comprises: the load value reaches a first preset load value; or,
The second performance index includes: the occupancy rate of all memories of the workbench and the load of the central processing unit of the workbench, and the first cleaning condition comprises: the occupancy reaches a first preset occupancy and the load value reaches a first preset load value.
17. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
18. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 7-8 when executing a program stored on a memory.
19. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.
20. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 7-8.
CN201811213300.2A 2018-10-18 2018-10-18 Junk data cleaning method and device, electronic equipment and readable storage medium Active CN111078389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811213300.2A CN111078389B (en) 2018-10-18 2018-10-18 Junk data cleaning method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811213300.2A CN111078389B (en) 2018-10-18 2018-10-18 Junk data cleaning method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111078389A CN111078389A (en) 2020-04-28
CN111078389B true CN111078389B (en) 2023-09-05

Family

ID=70308413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811213300.2A Active CN111078389B (en) 2018-10-18 2018-10-18 Junk data cleaning method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111078389B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281458A (en) * 2008-05-14 2008-10-08 华为技术有限公司 Apparatus, system and for recycling rubbish
CN102457906A (en) * 2010-10-26 2012-05-16 ***通信集团河南有限公司 Load balancing control method and system of message queues
CN103577240A (en) * 2012-07-25 2014-02-12 腾讯科技(深圳)有限公司 Automatic system cleaning method and device and memory medium
CN107153573A (en) * 2016-03-02 2017-09-12 阿里巴巴集团控股有限公司 Distributed task scheduling treating method and apparatus
CN107528922A (en) * 2017-09-29 2017-12-29 深圳市金立通信设备有限公司 A kind of information push method, terminal and computer-readable recording medium
CN107861797A (en) * 2017-12-04 2018-03-30 北京奇艺世纪科技有限公司 A kind of method for early warning and device based on JVM
CN108255582A (en) * 2018-01-16 2018-07-06 携程旅游信息技术(上海)有限公司 Method, system, equipment and the storage medium of java virtual machine garbage reclamations

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112011103979T5 (en) * 2010-11-30 2013-08-29 International Business Machines Corporation Computer program and system for a method of optimizing memory management of an application running on a virtual machine
US10642663B2 (en) * 2014-09-10 2020-05-05 Oracle International Corporation Coordinated garbage collection in distributed systems
US20160350214A1 (en) * 2015-05-29 2016-12-01 Google Inc. Idle time software garbage collection
US10467152B2 (en) * 2016-05-18 2019-11-05 International Business Machines Corporation Dynamic cache management for in-memory data analytic platforms
US10204175B2 (en) * 2016-05-18 2019-02-12 International Business Machines Corporation Dynamic memory tuning for in-memory data analytic platforms

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281458A (en) * 2008-05-14 2008-10-08 华为技术有限公司 Apparatus, system and for recycling rubbish
CN102457906A (en) * 2010-10-26 2012-05-16 ***通信集团河南有限公司 Load balancing control method and system of message queues
CN103577240A (en) * 2012-07-25 2014-02-12 腾讯科技(深圳)有限公司 Automatic system cleaning method and device and memory medium
CN107153573A (en) * 2016-03-02 2017-09-12 阿里巴巴集团控股有限公司 Distributed task scheduling treating method and apparatus
CN107528922A (en) * 2017-09-29 2017-12-29 深圳市金立通信设备有限公司 A kind of information push method, terminal and computer-readable recording medium
CN107861797A (en) * 2017-12-04 2018-03-30 北京奇艺世纪科技有限公司 A kind of method for early warning and device based on JVM
CN108255582A (en) * 2018-01-16 2018-07-06 携程旅游信息技术(上海)有限公司 Method, system, equipment and the storage medium of java virtual machine garbage reclamations

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Java垃圾收集的机制及调优;池炜成;计算机应用研究(03);第144-148页 *

Also Published As

Publication number Publication date
CN111078389A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN106776099B (en) Service fusing isolation system and method
CN109048996B (en) Robot abnormal state processing method and device
CN107204875B (en) Data reporting link monitoring method and device, electronic equipment and storage medium
CN109800204B (en) Data distribution method and related product
CN109450701B (en) Virtual switch switching method and device, host machine and computer readable storage medium
JP2015524122A (en) Method, computer system and apparatus for accessing PCI Express endpoint device
CN111061570B (en) Image calculation request processing method and device and terminal equipment
CN104426696A (en) Fault processing method and device
CN111682977A (en) Method and device for processing exception of network equipment, storage medium and network equipment
CN109766198B (en) Stream processing method, device, equipment and computer readable storage medium
CN114168071B (en) Distributed cluster capacity expansion method, distributed cluster capacity expansion device and medium
CN114461335A (en) Elastic expansion method, device and equipment for virtual machine and container in cloud computing environment
CN110209548B (en) Service control method, system, electronic device and computer readable storage medium
EP3358467A1 (en) Fault processing method, computer system, baseboard management controller and system
CN106933673B (en) Method and device for adjusting number of logical threads of component
CN111078389B (en) Junk data cleaning method and device, electronic equipment and readable storage medium
CN113672471A (en) Software monitoring method, device, equipment and storage medium
CN111181777B (en) Service degradation method, device, computer equipment and storage medium
JP2007249663A (en) Transaction device, delay failure detection device and method, and program
CN106169999A (en) The method and device of session backup
CN113541979B (en) Fault dynamic prediction method and device based on time sequence data and computing equipment
CN104348641A (en) Fault detection method and fault detection device
CN111857689A (en) Framework, function configuration method of framework, terminal and storage medium
CN113778763A (en) Intelligent switching method and system for three-party interface service fault
CN113542001A (en) OSD (on-screen display) fault heartbeat detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant