CN115709200B - High-performance computing cluster system fault prediction device and application method thereof - Google Patents

High-performance computing cluster system fault prediction device and application method thereof Download PDF

Info

Publication number
CN115709200B
CN115709200B CN202211493434.0A CN202211493434A CN115709200B CN 115709200 B CN115709200 B CN 115709200B CN 202211493434 A CN202211493434 A CN 202211493434A CN 115709200 B CN115709200 B CN 115709200B
Authority
CN
China
Prior art keywords
movable
device body
dust
areas
performance computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211493434.0A
Other languages
Chinese (zh)
Other versions
CN115709200A (en
Inventor
龙玉江
甘润东
卫薇
李洵
王杰峰
王策
孙骏
钟掖
卢仁猛
袁捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202211493434.0A priority Critical patent/CN115709200B/en
Publication of CN115709200A publication Critical patent/CN115709200A/en
Application granted granted Critical
Publication of CN115709200B publication Critical patent/CN115709200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a high-performance computing cluster system fault prediction device and a using method thereof, and relates to the field of fault prediction devices. The device comprises a device body, wherein a controller and a fault prediction mechanism are arranged in the device body; the device comprises a device body, a first movable part, a second movable part and a movable block, wherein the device body is internally divided into a monitoring chamber and a system chamber through a partition plate, the system chamber is divided into four areas, and the dust fault prediction assembly can be used for cleaning dust by arranging a double-acting cylinder, a first movable part, a second movable part and a movable block in cooperation with a rotary ball, a pressure pump, an air inlet pipe, a sliding block, an elastic part, a movable ball, a connecting pipe and an output pipe. The dust detection device can accurately detect dust and automatically clean the dust, and the probability of system faults caused by the dust is reduced.

Description

High-performance computing cluster system fault prediction device and application method thereof
Technical Field
The invention relates to a high-performance computing cluster system fault prediction device and a using method thereof, belonging to the technical field of system fault prediction devices.
Background
High performance computing refers to computing systems and environments that typically use many processors or several computers organized in a cluster, with many types of HPC systems ranging from large clusters of standard computers to highly specialized hardware.
A method and apparatus for predicting failure of a high performance computing cluster system are disclosed in chinese patent application (publication No. CN 105159815B), where the method for predicting failure in the patent includes: and acquiring the chip working condition and the power output power of each service node in the cluster system, analyzing the working state of each service node according to the chip working condition and the power output power, and executing a preset maintenance strategy when the working state of the service node is abnormal. The invention analyzes the working state of the service node by acquiring the chip working condition and the power output power of each service node, and executes a preset maintenance strategy when the service node is in an abnormal state. In the process of working, the module in the system is easily affected by dust, the dust can also cause the fault of the system, the detection efficiency is lower, and moreover, the position of the faulty component can not be accurately predicted, so that the detection effect is poor.
Disclosure of Invention
The invention aims to solve the technical problems that: the high-performance computing cluster system fault prediction device and the application method thereof solve the problems that in the working process, modules in the system are easily affected by dust, the dust can cause the system fault, the detection efficiency is low, the position of a faulty component cannot be accurately predicted, and the detection effect is poor.
The technical scheme adopted by the invention is as follows: a high-performance computing cluster system fault prediction device comprises a device body, wherein a controller and a fault prediction mechanism are arranged in the device body;
The fault prediction mechanism comprises a dust fault prediction component and a system fault prediction component, the inside of the device body is divided into a monitoring chamber and a system chamber by a partition board, and the system chamber is divided into four areas;
The system fault prediction assembly comprises a moving part and a display part, wherein the display part is arranged on the moving part and is arranged in a monitoring chamber, the moving part comprises a driving motor, a screw rod and a moving block, the driving motor is arranged on one side in the monitoring chamber, the screw rod is connected to the output end of the driving motor, the moving block is movably arranged on the screw rod, the display part comprises a mounting plate, a dismounting piece, a multi-color lamp and a lamp shade, the mounting plate is connected to the moving block through a connecting rod, the multi-color lamp and the lamp shade are arranged on the mounting plate through the dismounting piece, the multi-color lamp is positioned in the lamp shade, three long strips are arranged on the outer side of the device body, the outer side of the device body is divided into four areas, and the four areas correspond to the positions of the four areas of the system chamber;
Wherein, through setting up driving motor, lead screw and the movable block of movable part and the mounting panel of display part, the piece of dismantling, multi-color lamp and lamp shade, through the different colours of multi-color lamp to predict the system trouble, move through driving motor drive multi-color lamp, four regions of the outside of rethread device body cooperate four regions of system's room, connect the different parts of high performance computing cluster system in the different regions of system's body in the device body, the region that can be comparatively accurate predicts the emergence trouble, the accuracy of prediction has been improved, the detection effect has been improved.
The dust fault prediction assembly comprises a driving part and a cleaning part, wherein the driving part comprises a double-acting air cylinder, a first movable part, a second movable part and a movable block, the double-acting air cylinder is arranged in a monitoring chamber, the first movable part is connected with one end of the double-acting air cylinder, the second movable part is connected with the other end of the double-acting air cylinder, one end of the second movable part, which is far away from the double-acting air cylinder, is connected with the movable block, a plurality of annular grooves are formed in the movable block, one side of the device body is provided with a mounting groove, and the movable block extends out of the device body through the mounting groove;
The cleaning part comprises a transfer ball, a pressure pump, an air inlet pipe, a sliding block, elastic pieces, a movable ball, a connecting pipe and an output pipe, wherein the transfer ball is a hollow ball, the transfer ball is connected to the moving block through a connecting piece, a guide rail is arranged on the partition plate, the sliding block is movably arranged in the guide rail, a diversion trench is arranged in the sliding block, the diversion trench is of a round table-shaped structure, one ends of the elastic pieces are connected to the inner wall of the diversion trench, the other ends of the elastic pieces are connected to the movable ball, one end of the connecting pipe is communicated with the transfer ball, the other end of the connecting pipe is communicated with the diversion trench, the air inlet pipe is arranged on the transfer ball and is communicated with the transfer ball, the pressure pump is arranged on the air inlet pipe, and the output pipe is arranged on the sliding block and is communicated with the diversion trench;
After detecting that dust is about to affect the normal operation of the system, the double-acting air cylinder moves towards the first movable part, one end of the double-acting air cylinder stretches out, the other end of the double-acting air cylinder contracts, the second movable part contracts to drive the movable block to contract, the difference of dust content can enable the double-acting air cylinder to move different distances, the distance of the movable block moving towards the inside of the device body is different, the number of ring grooves on the movable block leaking out of the device body is also different, and the degree of influence of the dust on the system body can be accurately represented; in the initial state, the movable ball blocks the guide groove in the guide groove of the sliding block, and when the first movable piece stretches out, the movable ball is driven to not block the guide groove, so that air flow can circulate, and air flow is output through the output pipe for cleaning, the self-cleaning capacity of the device is improved, hidden trouble of faults is eliminated, and the service life of the device is prolonged;
the two-acting air cylinders can move different distances through different dust contents, the longer the content is, the longer the distance from the two-acting air cylinders to the first movable piece is, the farther the movable ball is away from the outlet of the diversion trench, so that the flow quantity of air flow can be controlled along with the dust content, the cleaning accuracy is improved, the cleaning time is saved, and the cleaning efficiency is improved;
The system room is divided into four areas, different dust contents in the four areas of the system room can be classified when the system fault prediction is carried out, and the dust fault prediction assembly is driven to move simultaneously in the moving process of the driving motor so as to drive the dust fault prediction assembly to carry out different working states in different areas.
Preferably, the bottom of the device body is provided with an ash discharge groove, and the ash discharge groove is positioned in the system room.
Preferably, a system body and a dust sensing module are arranged in the system chamber, the system body corresponds to the four areas of the system chamber, and the dust sensing module is positioned at the top of the system chamber and corresponds to the position of the system body;
the dust sensing module is provided with an analysis sub-module, so that the dust content can be analyzed, and various operations can be performed.
Preferably, the system room is provided with a connecting assembly, the connecting assembly comprises connecting wires and interfaces, the four interfaces are arranged on the outer side surface of the device body, each interface is connected with one connecting wire, and one end of each connecting wire, far away from the interfaces, is connected with the system body.
Preferably, an air outlet pipe is arranged at one end of the output pipe far away from the sliding block, and an air outlet cover is arranged at one end of the air outlet pipe far away from the output pipe;
Wherein, the air-out cover can increase the air-out area, improves the efficiency of dust removal.
Preferably, an air inlet cover is arranged on the device body and communicated with the air inlet pipe, and a plurality of dustproof holes are formed in the top of the air inlet cover;
wherein, through the air inlet cover with the air of external world in the intake pipe to dustproof hole can effectively prevent that dust from getting into.
Preferably, the system body comprises an analysis module and an acquisition module, wherein the acquisition module is used for acquiring the chip working condition and the power output power of each service node in the high-performance computing cluster system, the analysis module is used for analyzing the working state of each service node according to the chip working condition and the power output power and transmitting different information to the controller according to the working state, and the controller is used for respectively controlling the multi-color lamp, the driving motor, the double-acting cylinder and the dust sensing module.
A method for using a fault prediction device of a high-performance computing cluster system comprises the following steps:
S1: preparation:
connecting an interface on the device body with a high-performance computing cluster system, so that the system body in a system room is connected with the high-performance computing cluster system, and carrying out information transfer;
S2: predicting system faults:
Setting three thresholds in an analysis module according to the working states of all service nodes, wherein the three thresholds are respectively a first threshold, a second threshold and a third threshold, the first threshold represents a low probability of failure, the second threshold represents a medium probability of failure, the third threshold represents a high probability of failure, the system body corresponds to four areas of a system room, the system body is respectively connected with each part of a high-performance computing cluster system and represents different areas with failure, a driving motor receives a controller signal to drive a multi-color lamp to move, the multi-color lamp passes through four areas outside the device body respectively, when the multi-color lamp passes through one area, the controller controls the multi-color lamp to display different colors according to different information transmitted by the system body, the multi-color lamp is respectively red, orange, yellow and green, the working states of all service nodes in the area are lower than the first threshold, the orange working states of all service nodes in the area are located between the first threshold and the second threshold, the orange states of all service nodes in the area are located between the second threshold and the third threshold, and the red states of all service nodes in the area are larger than the third threshold;
S3: dust fault prediction:
The dust sensing module detects the dust content in the system room in real time, a plurality of thresholds of the dust content are arranged in the dust sensing module, when the thresholds are reached, different signals are sent to the controller, the controller controls the double-acting air cylinder to move towards the first movable part, the different thresholds represent different moving distances, and at the moment, the number of the ring grooves on the movable block exposed on the device body represents the dust content level;
S4: self-cleaning:
in the initial state, the movable ball blocks the diversion trench in the sliding block, when the controller controls the double-acting air cylinder to move towards the first movable piece, the movable ball does not block the diversion trench any more, and gas is led out from the output pipe and the air outlet pipe, so that the system body is cleaned and discharged from the ash discharge groove;
S5: ending:
And extracting the high-performance computing cluster system connected with the interface on the device body, and ending the work of the prediction device.
The invention has the beneficial effects that: compared with the prior art, the invention has the following effects:
1) According to the invention, by arranging the double-acting air cylinder, the first movable part, the second movable part and the movable block to be matched with the rotary ball, the pressure pump, the air inlet pipe, the sliding block, the elastic part, the movable ball, the connecting pipe and the output pipe, after detecting that dust is about to affect the normal operation of the system, the double-acting air cylinder moves towards the first movable part, one end of the double-acting air cylinder stretches out, the other end of the double-acting air cylinder contracts, the second movable part contracts to drive the movable block to contract, the difference of dust content can enable the double-acting air cylinder to move different distances, the difference of the moving distance of the movable block towards the inside of the device body can enable the number of ring grooves on the movable block to leak out of the device body to be different, and the influence degree of dust on the system body can be accurately represented; in the initial state, the movable ball blocks the guide groove in the guide groove of the sliding block, and when the first movable piece stretches out, the movable ball is driven to not block the guide groove, so that air flow can circulate, and air flow is output through the output pipe for cleaning, the self-cleaning capacity of the device is improved, hidden trouble of faults is eliminated, and the service life of the device is prolonged;
2) According to the invention, the driving motor of the moving part, the lead screw, the moving block, the mounting plate of the display part, the dismounting piece, the multi-color lamp and the lamp shade are arranged, the system fault is predicted by different colors of the multi-color lamp, the multi-color lamp is driven by the driving motor to move, and then the four areas outside the device body are matched with the four areas of the system room, so that different parts of the high-performance computing cluster system are connected to different areas of the system body in the device body, the fault area can be accurately predicted, the prediction accuracy is improved, and the detection effect is improved;
3) According to the invention, the double-acting air cylinder moves different distances according to different dust contents, the longer the content is, the longer the distance from the double-acting air cylinder to the first movable part is, the farther the movable ball is away from the outlet of the diversion trench, so that the flow quantity of air flow can be controlled along with the dust content, the cleaning accuracy is improved, the cleaning time is saved, and the cleaning efficiency is improved;
4) The system room is divided into four areas, different dust contents of the four areas of the system room can be classified while the system fault prediction is carried out, and the dust fault prediction assembly is driven to move and drive the dust fault prediction assembly to carry out different working states in different areas in the moving process of the driving motor.
Drawings
FIG. 1 is a schematic perspective view of the present invention;
FIG. 2 is a schematic elevational view of the present invention;
FIG. 3 is a cross-sectional view taken at A-A of FIG. 2;
FIG. 4 is a cross-sectional view taken at B-B in FIG. 2;
FIG. 5 is an enlarged view of part C of FIG. 1;
FIG. 6 is a partial enlarged view of portion D of FIG. 2;
FIG. 7 is an enlarged view of portion E of FIG. 3;
Fig. 8 is a system block diagram of a system ontology.
In addition, 110, a device body; 120. a controller; 130. a monitoring chamber; 140. a system room; 150. a partition plate; 210. a driving motor; 220. a screw rod; 230. a moving block; 240. a mounting plate; 250. a dismounting piece; 260. a multi-colored lamp; 270. a lamp shade; 280. a connecting rod; 310. a double-acting cylinder; 320. a first movable member; 330. a second movable member; 340. a movable block; 350. a ring groove; 360. a guide rail; 370. a diversion trench; 410. a middle rotating ball; 420. a pressure pump; 430. a slide block; 440. an elastic member; 450. a movable ball; 460. a connecting pipe; 470. an output pipe; 480. a connecting piece; 490. an air inlet pipe; 510. an ash discharge groove; 520. a system body; 530. a connecting wire; 540. an interface; 610. an air outlet pipe; 620. an air outlet cover; 630. an air inlet cover; 640. a dust hole; 650. an analysis module; 660. an acquisition module; 710. a dust sensing module; 720. and (3) long strips.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples.
Example 1: 1-8, a high performance computing cluster system fault prediction device includes a device body 110, a controller 120 and a fault prediction mechanism disposed inside the device body 110;
The fault prediction mechanism comprises a dust fault prediction component and a system fault prediction component, wherein the inside of the device body 110 is divided into a monitoring chamber 130 and a system chamber 140 by a partition 150, and the system chamber 140 is divided into four areas;
The system fault prediction assembly comprises a moving part and a display part, wherein the display part is arranged on the moving part, the moving part comprises a driving motor 210, a screw rod 220 and a moving block 230, the driving motor 210 is arranged on one side in the monitoring chamber 130, the screw rod 220 is connected to the output end of the driving motor 210, the moving block 230 is movably arranged on the screw rod 220, the display part comprises a mounting plate 240, a dismounting piece 250, a multi-color lamp 260 and a lampshade 270, the mounting plate 240 is connected to the moving block 230 through a connecting rod 280, the multi-color lamp 260 and the lampshade 270 are arranged on the mounting plate 240 through the dismounting piece 250, the multi-color lamp 260 is positioned in the lampshade 270, three long strips 720 are arranged on the outer side of the device body 110, the outer side of the device body 110 is divided into four areas, and the positions of the four areas correspond to the system chamber 140;
Wherein, through setting up the mounting panel 240, the dismantlement piece 250, the multi-color lamp 260 and the lamp shade 270 of the mounting panel 240, the dismounting piece 250, the multi-color lamp 260 of moving part, through the different colours of multi-color lamp 260 to predict the system trouble, drive multi-color lamp 260 through driving motor 210 and remove, four regions on the outside of rethread device body 110 cooperate four regions of system room 140, connect the different parts of high performance computing cluster system in the different regions of system body 520 in device body 110, the region that the trouble takes place can be comparatively accurate prediction, the accuracy of prediction has been improved, the detection effect has been improved.
The dust fault prediction assembly comprises a driving part and a cleaning part, wherein the driving part comprises a double-acting air cylinder 310, a first movable part 320, a second movable part 330 and a movable block 340, the double-acting air cylinder 310 is installed in the monitoring chamber 130, the first movable part 320 is connected to one end of the double-acting air cylinder 310, the second movable part 330 is connected to the other end of the double-acting air cylinder 310, one end, far away from the double-acting air cylinder 310, of the second movable part 330 is connected to the movable block 340, a plurality of annular grooves 350 are formed in the movable block 340, one side of the device body 110 is provided with a mounting groove, and the movable block 340 extends out of the device body 110 through the mounting groove;
The cleaning part comprises a middle rotating ball 410, a pressure pump 420, an air inlet pipe 490, a sliding block 430, an elastic piece 440, a movable ball 450, a connecting pipe 460 and an output pipe 470, wherein the middle rotating ball 410 is a hollow sphere, the middle rotating ball 410 is connected to the movable block 230 through the connecting piece 480, a guide rail 360 is arranged on the partition plate 150, the sliding block 430 is movably arranged in the guide rail 360, a guide groove 370 is arranged in the sliding block 430, the guide groove 370 is of a circular truncated cone-shaped structure, one end of the elastic piece 440 is connected to the inner wall of the guide groove 370, the other end of the elastic piece 440 is connected to the movable ball 450, one end of the connecting pipe 460 is communicated with the middle rotating ball 410, the other end of the connecting pipe 460 is communicated with the guide groove 370, the air inlet pipe 490 is arranged on the middle rotating ball 410 and is communicated with the middle rotating ball, the pressure pump 420 is arranged on the air inlet pipe 490, and the output pipe 470 is arranged on the sliding block 430 and is communicated with the guide groove 370;
Wherein, by arranging the double-acting cylinder 310, the first movable member 320, the second movable member 330 and the movable block 340 in cooperation with the rotary ball 410, the pressure pump 420, the air inlet pipe 490, the sliding block 430, the elastic member 440, the movable ball 450, the connecting pipe 460 and the output pipe 470, after detecting that dust is about to affect the normal operation of the system, the double-acting cylinder 310 moves towards the first movable member 320, one end of the double-acting cylinder 310 stretches out, the other end contracts, the second movable member 330 contracts to drive the movable block 340 to contract, the difference of dust content can lead the double-acting cylinder 310 to move different distances, the distance of the movable block 340 moving towards the inside of the device body 110 can be different, the number of the ring grooves on the movable block 340 leaking out of the device body 110 can be different, and the degree of the influence of dust on the system body 520 can be accurately represented; in the initial state, the movable ball 450 in the guide groove 370 of the sliding block 430 blocks the guide groove 370, and when the first movable piece 320 stretches out, the movable ball 450 is driven to not block the guide groove 370 any more, so that air flow can circulate, and air flow is output through the output pipe 470 for cleaning, thereby improving self-cleaning capability of the device, eliminating hidden trouble of faults and prolonging service life of the device;
The two-way cylinder 310 moves different distances according to different dust contents, the higher the dust content is, the longer the two-way cylinder 310 moves to the first movable piece 320 is, the farther the movable ball 450 is away from the outlet of the diversion trench 370, so that the flow of air flow can be controlled along with the dust content, the cleaning accuracy is improved, the cleaning time is saved, and the cleaning efficiency is improved;
The system chamber 140 is divided into four areas, and different dust contents in the four areas of the system chamber 140 can be classified while the system fault prediction is performed, and the dust fault prediction component is driven to move while the driving motor 210 moves, so that the dust fault prediction component is driven to perform different working states in different areas.
In this embodiment, the movement of the double-acting cylinder 310 in one direction is 5cm, that is, the double-acting cylinder 310 can move 5cm toward the first movable member 320 and 5cm toward the second movable member 330, four ring grooves 350 are provided on the movable block 340, one ring groove 350 is provided every 1cm of the side surface of the movable block 340 outside the device body 110, the initial position of the double-acting cylinder 310, the movable block 340 leaks out of the four ring grooves 350 outside, when the double-acting cylinder 310 moves toward the first movable member 320, the number of the leaked ring grooves 350 is continuously reduced, when the four ring grooves 350 leak out, the four ring grooves 350 represent the failure probability without dust effect, when the three ring grooves 350 leak out, the three ring grooves 350 represent the failure probability with light dust effect, when the two ring grooves 350 leak out, the one ring groove 350 represents the failure probability with light dust effect, and when the ring groove 350 leaks out, the ring groove 350 leaks out represents the failure probability without dust effect.
The bottom of the device body 110 is provided with an ash discharge groove 510, and the ash discharge groove 510 is located in the system chamber 140.
The system chamber 140 is internally provided with a system body 520 and a dust sensing module 710, the system body 520 corresponds to four areas of the system chamber 140, and the dust sensing module 710 is positioned at the top of the system chamber 140 and corresponds to the position of the system body 520;
The dust sensing module 710 is provided therein with an analysis sub-module capable of analyzing dust content and performing various operations.
The system room 140 is provided with a connecting assembly, the connecting assembly comprises connecting wires 530 and interfaces 540, four interfaces 540 are arranged on the outer side face of the device body 110, each interface 540 is connected with one connecting wire 530, and one end, far away from the interfaces 540, of each connecting wire 530 is connected with the system body 520.
An air outlet pipe 610 is arranged at one end of the output pipe 470 away from the sliding block 430, and an air outlet cover 620 is arranged at one end of the air outlet pipe 610 away from the output pipe 470;
wherein, the air outlet cover 620 can increase the air outlet area, and improve the dust removal efficiency.
The device body 110 is provided with an air inlet cover 630, the air inlet cover 630 is communicated with an air inlet pipe 490, and the top of the air inlet cover 630 is provided with a plurality of dustproof holes 640;
wherein the outside air is drawn into the air inlet duct 490 through the air inlet cowl 630, and the dust hole 640 can effectively prevent dust from entering.
The system body 520 includes an analysis module 650 and an acquisition module 660, the acquisition module 660 is configured to acquire a chip working condition and a power output power of each service node in the high-performance computing cluster system, the analysis module 650 is configured to analyze a working state of each service node according to the chip working condition and the power output power, and transmit different information to the controller 120 according to the working state, where the controller 120 controls the multi-color lamp 260, the driving motor 210, the double-acting cylinder 310, and the dust sensing module 710 respectively.
Example 2: a method for using a fault prediction device of a high-performance computing cluster system comprises the following steps:
S1: preparation:
Connecting an interface 540 on the device body 110 with the high-performance computing cluster system, so that the system body in the system room 140 is connected with the high-performance computing cluster system for information transmission;
S2: predicting system faults:
Three thresholds are set in the analysis module 650 according to the working states of the service nodes, namely a first threshold, a second threshold and a third threshold, wherein the first threshold represents a low probability of failure, the second threshold represents a medium probability of failure, the third threshold represents a high probability of failure, the system body 520 corresponds to four areas of the system room 140, the system body 520 is respectively connected with each component of the high-performance computing cluster system and represents different areas of failure, the driving motor 210 receives signals from the controller 120 to drive the multi-color lamps 260 to move, and the multi-color lamps 260 are controlled to display different colors according to different information transmitted by the system body 520 when passing through one area, wherein the operating states of the service nodes in the area are respectively red, orange, yellow and green, the operating states of the service nodes in the area are lower than the first threshold, the operating states of the service nodes in the area are located between the first threshold and the second threshold, the yellow represents the operating states of the service nodes in the area are located between the second threshold and the third threshold, and the yellow represents the operating states of the service nodes in the area are located between the third threshold;
In this embodiment, the working state of each service node is converted into a failure rate, at this time, a first threshold is set to be 5% of the failure rate, a second threshold is set to be 15%, and a third threshold is set to be 50%, which is specifically as follows:
Level of Failure rate Color of multi-color lamp Processing priority
Fourth level ≤5% Green colour Without treatment
Third grade 5 To 15 percent of the mixture contains 15 percent Orange color Without prior treatment
Second level 15 To 50 percent of the mixture contains 50 percent Yellow colour Priority handling
First level of >50% Red color Processing as soon as possible
As shown in the table above, when the failure rate is less than 5%, the color of the colorful lamp is green in the fourth level, and the failure occurrence probability is small, so that the processing is not needed; when the failure rate is 5% -15%, and the color of the colorful lamp is orange at the third level, the failure occurrence probability is small, and the processing is needed, but the priority processing is not needed; when the failure rate is 15% -50%, and the failure rate is at the second level, the failure occurrence probability is high, and priority treatment is needed; when the failure rate is greater than 50%, the failure rate is at the first level, and the failure occurrence probability is high at the moment, so that the failure rate needs to be processed as soon as possible.
S3: dust fault prediction:
The dust sensing module 710 detects the dust content in the system chamber 140 in real time, sets a plurality of thresholds of the dust content in the dust sensing module 710, and sends different signals to the controller 120 when reaching different thresholds, the controller 120 controls the double-acting cylinder 310 to move towards the first movable member 320, the different thresholds represent different moving distances, and at this time, the number of exposed ring grooves 350 on the movable block 340 on the device body 110 represents the dust content level;
S4: self-cleaning:
In the initial state, the movable ball 450 blocks the flow guide groove 370 in the slider 430, when the controller 120 controls the double-acting cylinder 310 to move towards the first movable member 320, the movable ball 450 no longer blocks the flow guide groove 370, and the gas is led out from the output pipe 470 and the air outlet pipe 610, so that the system body 520 is cleaned and discharged from the ash discharge groove 510;
S5: ending:
The high-performance computing cluster system connected with the interface 540 on the device body 110 is pulled out, and the operation of the prediction device is ended.
The foregoing is merely illustrative of the present invention, and the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention, and therefore, the scope of the present invention shall be defined by the scope of the appended claims.

Claims (7)

1. A high-performance computing cluster system fault prediction device is characterized in that: comprises a device body (110), a controller (120) arranged inside the device body (110) and a fault prediction mechanism;
the fault prediction mechanism comprises a dust fault prediction component and a system fault prediction component, wherein the interior of the device body (110) is divided into a monitoring chamber (130) and a system chamber (140) through a partition plate (150), and the system chamber (140) is divided into four areas;
The system fault prediction assembly comprises a moving part and a display part, wherein the display part is arranged on the moving part and is installed in a monitoring chamber (130);
The dust fault prediction assembly comprises a driving part and a cleaning part, wherein the driving part comprises a double-acting air cylinder (310), a first movable part (320), a second movable part (330) and a movable block (340), the double-acting air cylinder (310) is installed in a monitoring chamber (130), the first movable part (320) is connected to one end of the double-acting air cylinder (310), the second movable part (330) is connected to the other end of the double-acting air cylinder (310), one end, far away from the double-acting air cylinder (310), of the second movable part (330) is connected to the movable block (340), a plurality of annular grooves (350) are formed in the movable block (340), a mounting groove is formed in one side of the device body (110), and the movable block (340) extends out of the inside of the device body (110) through the mounting groove;
The cleaning part comprises a transfer ball (410), a pressure pump (420), an air inlet pipe (490), a sliding block (430), elastic pieces (440), a movable ball (450), a connecting pipe (460) and an output pipe (470), wherein the transfer ball (410) is a hollow ball, the transfer ball (410) is connected to a moving block (230) of the moving part through a connecting piece (480), a guide rail (360) is arranged on the partition board (150), the sliding block (430) is movably arranged in the guide rail (360), a guide groove (370) is arranged in the sliding block (430), the guide groove (370) is in a conical table structure, one end of each elastic piece (440) is connected to the inner wall of the guide groove (370), the other end of each elastic piece (440) is connected to the movable ball (450), one end of each connecting pipe (460) is communicated with the transfer ball (410), the other end of each connecting pipe (460) is communicated with the guide groove (370), the air inlet pipe (490) is arranged on the transfer ball (410) and is communicated with the guide groove (470), and the pressure pump (420) is arranged on the guide groove (470) and is communicated with the sliding block (370).
The system comprises a system chamber (140), wherein a system body (520) and a dust sensing module (710) are arranged in the system chamber (140), the system body (520) corresponds to four areas of the system chamber (140), and the dust sensing module (710) is positioned at the top of the system chamber (140) and corresponds to the position of the system body (520);
The system body (520) comprises an analysis module (650) and an acquisition module (660), wherein the acquisition module (660) is used for acquiring chip working conditions and power output power of each service node in the high-performance computing cluster system, the analysis module (650) is used for analyzing working states of each service node according to the chip working conditions and the power output power and transmitting different information to the controller (120) according to the working states, and the controller (120) respectively controls the multi-color lamp (260), the driving motor (210), the double-acting air cylinder (310) and the dust sensing module (710);
In the initial state, the movable ball (450) blocks the guide groove (370) in the guide groove (370) of the sliding block (430), and when the first movable piece (320) stretches out, the movable ball (450) is driven to not block the guide groove (370) any more, so that air flow can circulate.
2. The high performance computing cluster system failure prediction apparatus of claim 1, wherein: the moving part comprises a driving motor (210), a screw rod (220) and a moving block (230), wherein the driving motor (210) is arranged on one side in the monitoring chamber (130), the screw rod (220) is connected to a motor shaft of the driving motor (210), the moving block (230) is connected to the screw rod (220) through a screw nut in a screwed manner, the display part comprises a mounting plate (240), a dismounting piece (250), a multi-color lamp (260) and a lampshade (270), the mounting plate (240) is connected to the moving block (230) through the connecting rod (280), the colorful lamp (260) and the lampshade (270) are mounted on the mounting plate (240) through the dismounting piece (250), the colorful lamp (260) is located inside the lampshade (270), three long strips (720) are arranged on the outer side of the device body (110), the outer side of the device body (110) is divided into four areas, and the four areas correspond to the four areas of the system chamber (140).
3. The high performance computing cluster system failure prediction apparatus of claim 1, wherein: an ash discharge groove (510) is formed in the bottom of the device body (110), and the ash discharge groove (510) is located in the system chamber (140).
4. A high performance computing cluster system failure prediction apparatus according to claim 3, wherein: the system room (140) is provided with a connecting assembly, the connecting assembly comprises connecting wires (530) and interfaces (540), four interfaces (540) are arranged on the outer side face of the device body (110), each interface (540) is connected with one connecting wire (530), and one end, far away from the interfaces (540), of each connecting wire (530) is connected with the system body (520).
5. The high performance computing cluster system failure prediction apparatus of claim 1, wherein: one end of the output pipe (470) far away from the sliding block (430) is provided with an air outlet pipe (610), and one end of the air outlet pipe (610) far away from the output pipe (470) is provided with an air outlet cover (620).
6. The high performance computing cluster system failure prediction apparatus of claim 5, wherein: the device is characterized in that an air inlet cover (630) is arranged on the device body (110), the air inlet cover (630) is communicated with an air inlet pipe (490), and a plurality of dustproof holes (640) are formed in the top of the air inlet cover (630).
7. The method for using a high performance computing cluster system fault prediction device according to any one of claims 1-6, wherein: the moving part of the high-performance computing cluster system fault prediction device adopted in the using method comprises a driving motor (210), a lead screw (220) and a moving block (230), wherein the driving motor (210) is installed at one side inside a monitoring chamber (130), the lead screw (220) is connected to a motor shaft of the driving motor (210), the moving block (230) is connected to the lead screw (220) through a nut screw, the display part comprises a mounting plate (240), a dismounting piece (250), a multi-color lamp (260) and a lamp shade (270), the mounting plate (240) is connected to the moving block (230) through a connecting rod (280), the multi-color lamp (260) and the lamp shade (270) are installed on the mounting plate (240) through the dismounting piece (250), the multi-color lamp (260) is positioned inside the lamp shade (270), three long strips (720) are arranged at the outer side of the device body (110), the outer side of the device body (110) is divided into four areas, and the positions of the four areas correspond to the system chamber (140); the system room (140) is provided with a connecting assembly, the connecting assembly comprises connecting wires (530) and interfaces (540), four interfaces (540) are arranged on the outer side surface of the device body (110), each interface (540) is connected with one connecting wire (530), and one end, far away from the interface (540), of each connecting wire (530) is connected with the system body (520);
The using method comprises the following steps:
S1: preparation:
Connecting an interface (540) on the device body (110) with a high-performance computing cluster system, so that the system body in the system room (140) is connected with the high-performance computing cluster system for information transmission;
S2: predicting system faults:
Three thresholds are set in an analysis module (650) according to the working states of all service nodes, wherein the three thresholds are respectively a first threshold, a second threshold and a third threshold, the first threshold represents a low-probability fault, the second threshold represents a medium-probability fault, the third threshold represents a high-probability fault, the system body (520) and four areas of a system room (140) are mutually corresponding, the system bodies (520) are respectively connected with all parts of a high-performance computing cluster system and represent different areas with faults, a driving motor (210) receives a signal of a controller (120) to drive a multi-color lamp (260) to move, the controller (120) respectively passes through the four areas outside the device body (110) when passing through one area, the multi-color lamp (260) is controlled to display different colors according to different information transmitted by the system body (520), the four areas are respectively red, orange, yellow and green, the working states of all service nodes representing the areas are lower than the first threshold, the orange states of all service nodes representing the areas are located between the first service nodes and the second threshold, and the third threshold represents the working states of all service nodes located between the second service nodes and the third threshold;
S3: dust fault prediction:
The dust sensing module (710) detects the dust content in the system chamber (140) in real time, a plurality of thresholds of the dust content are arranged in the dust sensing module (710), when different thresholds are reached, different signals are sent to the controller (120), the controller (120) controls the double-acting air cylinder (310) to move towards the first movable part (320), the different thresholds represent different moving distances, and at the moment, the number of the annular grooves (350) on the movable block (340) exposed on the device body (110) represents the dust content level;
S4: self-cleaning:
In the initial state, the movable ball (450) blocks the diversion trench (370) in the sliding block (430), when the controller (120) controls the double-acting air cylinder (310) to move towards the first movable piece (320), the movable ball (450) does not block the diversion trench (370), and gas is led out from the output pipe (470) and the air outlet pipe (610) along with the movable ball, so that the system body (520) is cleaned, and is discharged from the ash discharge groove (510);
S5: ending:
And extracting the high-performance computing cluster system connected with the interface (540) on the device body (110), and ending the operation of the prediction device.
CN202211493434.0A 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof Active CN115709200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211493434.0A CN115709200B (en) 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211493434.0A CN115709200B (en) 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof

Publications (2)

Publication Number Publication Date
CN115709200A CN115709200A (en) 2023-02-24
CN115709200B true CN115709200B (en) 2024-06-14

Family

ID=85234798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211493434.0A Active CN115709200B (en) 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof

Country Status (1)

Country Link
CN (1) CN115709200B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112960169A (en) * 2021-03-31 2021-06-15 成渝钒钛科技有限公司 High-speed wire bundling machine fault alarm device and using method thereof
CN113941534A (en) * 2021-09-16 2022-01-18 泰州市光明电子材料有限公司 Electrochemical detection device with dust removal mechanism for plastic chip manufacturing

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101603127B1 (en) * 2014-05-26 2016-03-21 전남대학교산학협력단 Air pulsing controllers for dust collector of petrochemical plants
JP6652699B2 (en) * 2015-10-05 2020-02-26 富士通株式会社 Anomaly evaluation program, anomaly evaluation method, and information processing device
JP2018202380A (en) * 2017-12-14 2018-12-27 株式会社セキタ Dust removal device
CN109482573A (en) * 2017-12-29 2019-03-19 国网浙江武义县供电有限公司 A kind of intelligence closed computer host dust pelletizing system and method
CN110694386B (en) * 2019-10-14 2021-04-27 安徽建筑大学 Electric automation control's upset dust type electric appliance cabinet
CN111530783B (en) * 2020-01-10 2024-04-30 爱景节能科技(上海)有限公司 Automatic purging device of air-cooled screw air compressor and control device thereof
CN212324114U (en) * 2020-04-01 2021-01-08 丽水蓝鸟网络科技有限公司 A nothing hinders detection device for net twine fault detection
CN111538396B (en) * 2020-05-07 2021-11-30 庆邦电子(深圳)有限公司 Computer mainboard capable of regularly detecting dust condition
CN111570402A (en) * 2020-06-22 2020-08-25 江苏吉丰自动化设备有限公司 Bidirectional negative pressure type dust remover for full-automatic horn production line
DE102020121016A1 (en) * 2020-07-23 2022-01-27 WABCO Global GmbH Cleaning device, sensor cleaning module, vehicle and method for operating a cleaning device
CN111966177A (en) * 2020-08-14 2020-11-20 广州驰创科技有限公司 Big data intelligent processing is with storage hard disk structure
CN113641551B (en) * 2021-07-08 2022-08-09 娄底职业技术学院 Computer fault monitoring system based on internet
KR102464389B1 (en) * 2021-10-19 2022-11-09 주식회사 원어스 Air shower apparatus for positioning beside house component
CN216728582U (en) * 2021-11-03 2022-06-14 青岛双合电力工程有限公司 Online dust collector of high-voltage electrical equipment
CN115254702A (en) * 2022-07-22 2022-11-01 苏州浪潮智能科技有限公司 Automatic server dust removal system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112960169A (en) * 2021-03-31 2021-06-15 成渝钒钛科技有限公司 High-speed wire bundling machine fault alarm device and using method thereof
CN113941534A (en) * 2021-09-16 2022-01-18 泰州市光明电子材料有限公司 Electrochemical detection device with dust removal mechanism for plastic chip manufacturing

Also Published As

Publication number Publication date
CN115709200A (en) 2023-02-24

Similar Documents

Publication Publication Date Title
CN105116132B (en) A kind of diesel lubrication oil online acquisition monitoring device
CN110397476B (en) Online monitoring and analyzing system and method for oil products of steam turbine for nuclear power
CN115709200B (en) High-performance computing cluster system fault prediction device and application method thereof
CN110948268A (en) Multistation lathe tool bit milling fixture system of intellectual detection system clamp force
CN105549509A (en) Wind source system intelligence controller used for track locomotive vehicle and control method thereof
CN111589795B (en) Wafer detection equipment
CN113651208A (en) Lubricating device and lubricating method for automatic movement of mine hoisting steel wire rope along with rope
CN112355475B (en) Underwater laser additive drainage device integrating multiple sensors
CN205620789U (en) Air source system intelligent control ware for rail locomotive vehicle
CN112034782A (en) Air preheater online state monitoring and fault prediction method
CN213161325U (en) Environmental protection island intelligence dust pelletizing system with optimization control device
CN105319032A (en) Test device of machine oil collector
CN110657273A (en) Heat dissipation type solenoid valve that interference killing feature is strong
CN112728382B (en) Lubricating grease oiling device for inner wall of pipe
CN212597343U (en) Fire detector and cleaning device thereof
CN203908150U (en) Control system for industrial refrigerating unit
CN101269283A (en) Intelligent testing and warning system for obstruction of air filtering case of large-scale air cooling generator
CN114101720A (en) Piston type gas circuit switching mechanism at front end of electric spindle
CN1256475C (en) Automatic dusting method and device for cylindrical knitting machine
CN217155757U (en) Pneumatic control valve island for high-low pressure air tightness test
CN213419561U (en) Fault detection device for hydraulic system
CN113803319B (en) Cylinder with internal driving lubrication function
CN210154997U (en) Oil smoke concentration monitoring devices based on thing networking
CN213744218U (en) Hydraulic valve body with high wear resistance
CN216524833U (en) Finished product processing detection device for automobile water chamber

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant