CN114281173A - Reliable heat dissipation control method and device for server - Google Patents

Reliable heat dissipation control method and device for server Download PDF

Info

Publication number
CN114281173A
CN114281173A CN202111454833.1A CN202111454833A CN114281173A CN 114281173 A CN114281173 A CN 114281173A CN 202111454833 A CN202111454833 A CN 202111454833A CN 114281173 A CN114281173 A CN 114281173A
Authority
CN
China
Prior art keywords
temperature
temperature point
point
fan
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111454833.1A
Other languages
Chinese (zh)
Inventor
岳永恒
吕书朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111454833.1A priority Critical patent/CN114281173A/en
Publication of CN114281173A publication Critical patent/CN114281173A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Cooling Or The Like Of Electrical Apparatus (AREA)

Abstract

The invention provides a reliable heat dissipation control method and a device of a server, belonging to the technical field of heat dissipation control of a server system, wherein the method comprises the following steps: the BMC acquires temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time, analyzes the temperature values of all temperature points, judges whether the temperature value of a certain temperature point does not change within set times, starts abnormal detection on the temperature point if the temperature value of the certain temperature point does not change, adjusts the rotating speed of the fan according to a set proportion, judges whether the temperature value of the temperature point changes correspondingly, judges that the temperature point with the temperature value matched with the change of the rotating speed of the fan is normal, judges that the temperature point with the temperature value not matched with the change of the rotating speed of the fan is abnormal, adjusts the full-speed rotation of the fan and prompts the abnormal temperature examination when the abnormal temperature point exists. The invention identifies abnormal temperature points by adjusting the rotating speed of the fan, alarms in time and adjusts the speed regulation mode to be full speed, thereby reducing the downtime caused by speed regulation failure caused by abnormal sensors or reading links.

Description

Reliable heat dissipation control method and device for server
Technical Field
The invention belongs to the technical field of server system heat dissipation control, and particularly relates to a reliable heat dissipation control method and device for a server.
Background
In server hardware, detection of each temperature point often faces some unstable factors, and especially, the temperature value is not refreshed due to some abnormalities, and no warning information exists, so that the condition of untimely speed regulation is caused. For example, the CPU temperature of the X86 server is generally that a temperature sensor inside the CPU is read by the intel management engine ME and then transmitted to the BMC, so that the BMC calculates the corresponding fan speed according to the CPU temperature, and when some special operations are performed on the CPU, such as testing or debugging, the CPU may be executed with a hash instruction operation or other abnormal operations, so that normal services cannot be executed, and meanwhile, the temperature of the CPU cannot be transmitted to the intel management engine ME, and the CPU temperature obtained by the BMC cannot be refreshed at this time, and the fan speed cannot be normally adjusted.
The temperature of the memory is also obtained by the intel management engine ME from the memory through the I2C link and then transmitted to the BMC, when the temperature sensor of the memory cannot be accessed due to the abnormal link of the memory I2C, the strategy executed by the intel management engine ME also keeps the temperature value unchanged without refreshing, and the BMC can normally access the temperature value of the ME but always obtains the value before the abnormal link; in addition, although other temperature points on the mainboard are directly obtained by the BMC through the I2C, the abnormal condition of the link can be sensed, but if the sensor damages the internal temperature value and the internal temperature value is not refreshed, the abnormal temperature value obtaining can be caused.
The CPU executes the hash instruction or the memory I2C link is abnormal, so that the temperature value of the intel management engine ME is not refreshed, the temperature value is not refreshed due to the fault of the sensor, the temperature value taken by the BMC from the intel management engine ME or the sensor is a value which is not refreshed all the time, the heat dissipation regulation and control risk is caused, and the fault is diffused.
Therefore, it is very necessary to provide a method and an apparatus for controlling reliable heat dissipation of a server to overcome the above-mentioned drawbacks in the prior art.
Disclosure of Invention
Aiming at the defects that the CPU executes a hash instruction or the memory I2C link is abnormal, so that the intel management engine ME does not refresh the temperature value, the sensor fails, the temperature value taken by the BMC from the intel management engine ME or the sensor is a value which is not refreshed all the time, the heat dissipation regulation risk is caused, and the fault is diffused, the invention provides a reliable heat dissipation control method and a reliable heat dissipation control device of a server, so as to solve the technical problems.
In a first aspect, the present invention provides a method and an apparatus for controlling reliable heat dissipation of a server, including the following steps:
s1, the BMC acquires temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time;
s2, analyzing the temperature value of each temperature point by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point;
s3, adjusting the rotating speed of the fan for the temperature point needing to be subjected to abnormal detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly or not, judging whether the temperature point with the temperature value matched with the change of the rotating speed of the fan is normal or not, and judging whether the temperature point with the temperature value not matched with the change of the rotating speed of the fan is abnormal or not;
and S4, when an abnormal temperature point exists, adjusting the full-speed rotation of the fan, and prompting to perform temperature abnormality examination.
Further, the step S1 specifically includes the following steps:
s11, reading the temperature value of each temperature point acquired by the temperature sensor or the ME register at intervals of a set time period by the BMC;
and S12, acquiring the temperature value of the temperature point by the BMC, and acquiring the fan rotating speed n corresponding to the speed regulation strategy of the temperature point. Each temperature point comprises a temperature value of a CPU (central processing unit) or a temperature value of a memory stored in an ME (management entity) register and a temperature value of a heating element collected by a temperature sensor on a server mainboard.
Further, the step S2 specifically includes the following steps:
s21, acquiring a set number m;
s22, sequentially taking m temperature values of each temperature point from the current time point by the BMC, and judging whether the m temperature values are the same;
if yes, go to step S24;
if not, go to step S23;
s23, judging that the temperature value obtained m times of each temperature point changes, and returning to the step S1 when each temperature point is normal;
and S24, starting abnormal detection on the temperature points with the same m temperature values. The m temperature values obtained by taking values of each temperature point m times are slightly changed under normal conditions, and if the m values are consistent and do not change, a sensor of the temperature point or a link reading the temperature values may be in failure, and abnormal detection is required.
Further, the step S3 specifically includes the following steps:
s31, acquiring the number of times of abnormality detection and a set proportion of fan rotating speed adjustment required by each time of abnormality detection;
s32, increasing the rotating speed of the fan for the temperature point needing to be subjected to the abnormal detection according to the corresponding set proportion, and judging whether the temperature value of the temperature point is reduced or not;
if yes, go to step S33;
if not, go to step S34;
s33, judging that the temperature value of the temperature point is matched with the change of the rotating speed of the fan, and returning to the step S1 if the temperature point is normal or the fault is relieved;
s34, judging whether the abnormal detection times are finished or not;
if yes, go to step S35;
if not, go to step S36;
s35, judging that the temperature value of the temperature point is not matched with the change of the rotating speed of the fan, if the temperature point is abnormal, entering the step S4;
and S36, positioning to the next abnormal detection, acquiring the set proportion of the abnormal detection, and returning to the step S32. The abnormal detection times are set in advance, after the fan rotating speed is adjusted once, if the temperature cannot be changed, the fan rotating speed adjusting proportion is possibly too low, the fan rotating speed needs to be further adjusted, and the more the abnormal detection times are, the higher the precision is, and the abnormal detection times are specifically set according to the test requirements; under normal conditions, the rotating speed of the fan is increased, the temperature value of the temperature point is reduced, if the temperature value is reduced, the sensor of the temperature point is normal, and the link reading the temperature is normal, and if the temperature value is not reduced, the ratio of the rotating speed of the fan is insufficient, or the sensor of the temperature point is in failure or the link reading the temperature value is abnormal.
Further, the step S4 specifically includes the following steps:
s41, judging that the current server system has an abnormal temperature point;
s42, adjusting full-speed rotation of the fan;
and S43, prompting to carry out temperature anomaly checking on the server system. For determining abnormal temperature points, the full-speed rotation of the fan needs to be adjusted to ensure the requirements of the server, and meanwhile, a warning prompt is sent to carry out temperature abnormality examination.
In a second aspect, the present invention provides a reliable heat dissipation control device for a server, including:
the temperature point temperature value sampling module is used for the BMC to obtain the temperature value of each temperature point and the rotating speed of the fan corresponding to the speed regulation strategy at regular time;
the temperature point and temperature value analysis module is used for analyzing the temperature values of the temperature points by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point;
the temperature point abnormity detection module is used for adjusting the rotating speed of the fan for the temperature points needing abnormity detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly or not, judging whether the temperature point with the temperature value matched with the rotating speed change of the fan is normal or not, and judging whether the temperature point with the temperature value not matched with the rotating speed change of the fan is abnormal or not;
and the abnormal temperature point checking module is used for adjusting the full-speed rotation of the fan and prompting the abnormal temperature checking when the abnormal temperature point exists.
Further, the temperature point temperature value sampling module comprises:
the temperature value acquisition unit is used for reading the temperature value of each temperature point acquired by the temperature sensor or the ME register at intervals of a set time period by the BMC;
and the fan rotating speed acquisition unit is used for acquiring the temperature value of the temperature point by the BMC and acquiring the fan rotating speed n corresponding to the speed regulation strategy of the temperature point. Each temperature point comprises a temperature value of a CPU (central processing unit) or a temperature value of a memory stored in an ME (management entity) register and a temperature value of a heating element collected by a temperature sensor on a server mainboard.
Further, the temperature point and temperature value analysis module comprises:
a polling frequency acquisition unit for acquiring a set frequency m;
the temperature value analysis unit is used for the BMC to sequentially obtain m temperature values of each temperature point from the current time point and judge whether the m temperature values are the same;
the temperature point normality judging unit is used for judging that the m temperature values are different and the temperature value obtained for m times of each temperature point is changed and each temperature point is normal;
and the temperature point abnormity detection starting unit is used for starting abnormity detection on the temperature points with the same m temperature values. The m temperature values obtained by taking values of each temperature point m times are slightly changed under normal conditions, and if the m values are consistent and do not change, a sensor of the temperature point or a link reading the temperature values may be in failure, and abnormal detection is required.
Further, the temperature point abnormality detection module includes:
an abnormality detection parameter acquisition unit for acquiring the number of abnormality detections and a set proportion of each abnormality detection requiring fan rotation speed adjustment;
an abnormal detection temperature judging unit, which is used for increasing the rotating speed of the fan for the temperature point needing abnormal detection according to the corresponding set proportion and judging whether the temperature value of the temperature point is reduced or not;
the temperature normality or fault removal judging unit is used for judging that the temperature value of the temperature point is matched with the change of the rotating speed of the fan when the temperature value of the temperature point is reduced, and the temperature point is normal or the fault is removed;
an abnormality detection frequency completion judging unit for judging whether the abnormality detection frequency is completed or not when the temperature value of the temperature point is not decreased;
a temperature point abnormality determination unit for determining that the temperature value of the temperature point is not matched with the change of the fan rotation speed when the abnormality detection times are completed, and the temperature point is abnormal;
and the next abnormity detection positioning unit is used for positioning to the next abnormity detection when the abnormity detection times are not finished and acquiring the set proportion of the abnormity detection. The abnormal detection times are set in advance, after the fan rotating speed is adjusted once, if the temperature cannot be changed, the fan rotating speed adjusting proportion is possibly too low, the fan rotating speed needs to be further adjusted, and the more the abnormal detection times are, the higher the precision is, and the abnormal detection times are specifically set according to the test requirements; under normal conditions, the rotating speed of the fan is increased, the temperature value of the temperature point is reduced, if the temperature value is reduced, the sensor of the temperature point is normal, and the link reading the temperature is normal, and if the temperature value is not reduced, the ratio of the rotating speed of the fan is insufficient, or the sensor of the temperature point is in failure or the link reading the temperature value is abnormal.
Further, the abnormal temperature point checking module comprises:
the abnormal temperature point existence judging unit is used for judging that the abnormal temperature point exists in the current server system;
the fan full-speed rotation adjusting unit is used for adjusting the full-speed rotation of the fan;
and the temperature abnormity troubleshooting prompting unit is used for prompting the server system to perform temperature abnormity troubleshooting. For determining abnormal temperature points, the full-speed rotation of the fan needs to be adjusted to ensure the requirements of the server, and meanwhile, a warning prompt is sent to carry out temperature abnormality examination.
The invention has the beneficial effects that:
according to the reliable heat dissipation control method and device for the server, the temperature point with the temperature value unchanged all the time is detected by adjusting the rotating speed of the fan, whether the obtained temperature value is normally refreshed when the rotating speed of the fan changes is judged, whether the sensor and the temperature value reading are abnormal is judged, the abnormal sensor or reading link is timely alarmed, the speed regulation mode is adjusted to be full speed, downtime caused by speed regulation failure due to the abnormality of the sensor or reading link is reduced, and the stability of a server system is enhanced.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a reliable heat dissipation control method of a server according to an embodiment 1 of the present invention.
Fig. 2 is a schematic flow chart of embodiment 2 of the reliable heat dissipation control method for a server according to the present invention.
Fig. 3 is a schematic diagram of a reliable heat dissipation control device of a server according to the present invention.
In the figure, 1-temperature point temperature value sampling module; 1.1-a temperature value acquisition unit; 1.2-a fan speed acquisition unit; 2-temperature point and temperature value analysis module; 2.1-polling times obtaining unit; 2.2-temperature value analysis unit; 2.3-temperature point normal judging unit; 2.4-temperature point abnormity detection starting unit; 3-temperature point abnormity detection module; 3.1-anomaly detection parameter acquisition unit; 3.2-abnormal detection temperature judging unit; 3.3-normal temperature or failure relief determination unit; 3.4-abnormal detection times finishing judging unit; 3.5-temperature point abnormity judging unit; 3.6-next abnormity detection positioning unit; 4-abnormal temperature point checking module; 4.1-abnormal temperature point existence judging unit; 4.2-full speed rotation adjusting unit of fan; 4.3-temperature abnormity troubleshooting prompt unit.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An ME is an Intel Management Engine, which is a short name for an Intel Management Engine, and an Intel ME refers to a microprocessor independent of a CPU and an operating system in an Intel chip.
Example 1:
as shown in fig. 1, the present invention provides a reliable heat dissipation control method for a server, including the following steps:
s1, the BMC acquires temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time;
s2, analyzing the temperature value of each temperature point by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point;
s3, adjusting the rotating speed of the fan for the temperature point needing to be subjected to abnormal detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly or not, judging whether the temperature point with the temperature value matched with the change of the rotating speed of the fan is normal or not, and judging whether the temperature point with the temperature value not matched with the change of the rotating speed of the fan is abnormal or not;
and S4, when an abnormal temperature point exists, adjusting the full-speed rotation of the fan, and prompting to perform temperature abnormality examination.
According to the reliable heat dissipation control method for the server, the temperature point with the temperature value unchanged all the time is detected by adjusting the rotating speed of the fan, whether the obtained temperature value is normally refreshed when the rotating speed of the fan changes is judged, whether the sensor and the temperature value reading are abnormal is judged, the abnormal sensor or reading link is timely alarmed, the speed regulation mode is adjusted to be full speed, downtime caused by speed regulation failure due to the abnormality of the sensor or reading link is reduced, and the stability of a server system is enhanced.
Example 2:
as shown in fig. 2, the present invention provides a reliable heat dissipation control method for a server, including the following steps:
s1, the BMC acquires temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time; the method comprises the following specific steps:
s11, reading the temperature value of each temperature point acquired by the temperature sensor or the ME register at intervals of a set time period by the BMC;
s12, acquiring a temperature value of a temperature point by the BMC, and acquiring a fan rotating speed n corresponding to a speed regulation strategy of the temperature point; each temperature point comprises a temperature value of a CPU (central processing unit) or a temperature value of a memory stored in an ME (management entity) register and a temperature value of a heating element acquired by a temperature sensor on a server mainboard;
s2, analyzing the temperature value of each temperature point by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point; the method comprises the following specific steps:
s21, acquiring a set number m;
s22, sequentially taking m temperature values of each temperature point from the current time point by the BMC, and judging whether the m temperature values are the same;
if yes, go to step S24;
if not, go to step S23;
s23, judging that the temperature value obtained m times of each temperature point changes, and returning to the step S1 when each temperature point is normal;
s24, starting abnormal detection on the m temperature points with the same temperature value; the m temperature values of each temperature point which are taken for m times are slightly changed under the normal condition, and if the m values are consistent and do not change, a sensor of the temperature point or a link for reading the temperature values possibly fails, and abnormal detection is needed;
s3, adjusting the rotating speed of the fan for the temperature point needing to be subjected to abnormal detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly or not, judging whether the temperature point with the temperature value matched with the change of the rotating speed of the fan is normal or not, and judging whether the temperature point with the temperature value not matched with the change of the rotating speed of the fan is abnormal or not; the method comprises the following specific steps:
s31, acquiring the number of times of abnormality detection and a set proportion of fan rotating speed adjustment required by each time of abnormality detection;
s32, increasing the rotating speed of the fan for the temperature point needing to be subjected to the abnormal detection according to the corresponding set proportion, and judging whether the temperature value of the temperature point is reduced or not;
if yes, go to step S33;
if not, go to step S34;
s33, judging that the temperature value of the temperature point is matched with the change of the rotating speed of the fan, and returning to the step S1 if the temperature point is normal or the fault is relieved;
s34, judging whether the abnormal detection times are finished or not;
if yes, go to step S35;
if not, go to step S36;
s35, judging that the temperature value of the temperature point is not matched with the change of the rotating speed of the fan, if the temperature point is abnormal, entering the step S4;
s36, positioning to next abnormal detection, obtaining the set proportion of the abnormal detection, and returning to the step S32; the abnormal detection times are set in advance, after the fan rotating speed is adjusted once, if the temperature cannot be changed, the fan rotating speed adjusting proportion is possibly too low, the fan rotating speed needs to be further adjusted, and the more the abnormal detection times are, the higher the precision is, and the abnormal detection times are specifically set according to the test requirements; under the normal condition, increasing the rotating speed of the fan, reducing the temperature value of the temperature point, if the temperature value is reduced, indicating that the sensor of the temperature point is normal and the link reading the temperature is normal, and if the temperature value is not reduced, indicating that the increasing proportion of the rotating speed of the fan is insufficient, or indicating that the sensor of the temperature point is in fault or the link reading the temperature value is abnormal;
s4, when an abnormal temperature point exists, adjusting the full-speed rotation of the fan, and prompting to perform temperature abnormality examination; the method comprises the following specific steps:
s41, judging that the current server system has an abnormal temperature point;
s42, adjusting full-speed rotation of the fan;
s43, prompting to carry out temperature anomaly checking on the server system; for determining abnormal temperature points, the full-speed rotation of the fan needs to be adjusted to ensure the requirements of the server, and meanwhile, a warning prompt is sent to carry out temperature abnormality examination.
In the above embodiment 2, taking 10 as the set number of times m, 2 as the number of times of abnormal tests, 10% as the fan speed increase for the first abnormal test, and 20% as the fan speed increase for the second abnormal test, under normal conditions, collecting the temperature value of a temperature point, setting a speed regulation strategy according to the temperature value of each temperature point, setting the fan speed at this time to n, where the temperature value of the temperature point obtained by the BMC has a slight change, and when the temperature value of a certain temperature point is not changed for 10 polling times, the sensor or link of the temperature point may be abnormal, starting abnormal detection for the temperature point, at this time, the fan speed is adjusted up to n (1+ 10%) on the original basis, and at this time, since the fan speed is increased, the temperature value of each sensor should be decreased, the temperature point is polled for 10 times, and if the temperature value of the temperature point still does not change, in order to prevent false alarm, starting the second abnormal detection, wherein the rotating speed of the fan is further increased to n (1+ 20%), and the temperature in the server case is further reduced; if the temperature value of the temperature point is not changed after ten times of polling, the temperature point is judged to be an abnormal temperature point, the temperature point is possible to have sensor failure or abnormal reading link, at the moment, in order to avoid over-temperature, the fan is adjusted to be in a full-speed mode, and simultaneously alarm information is reported to prompt that the temperature is abnormal and needs to be checked. And if the temperature value polled to the temperature point is changed during abnormal detection or in a full-speed mode, considering that the temperature point is normal or the fault is relieved, regulating the rotating speed of the fan again according to the original speed regulation strategy, and relieving the corresponding alarm.
Example 3:
as shown in fig. 3, the present invention provides a reliable heat dissipation control device for a server, including:
the temperature point temperature value sampling module 1 is used for BMC to obtain temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time;
the temperature point and temperature value analysis module 2 is used for analyzing the temperature values of the temperature points by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point;
the temperature point abnormity detection module 3 is used for adjusting the rotating speed of the fan for the temperature points needing abnormity detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly, judging whether the temperature point with the temperature value matched with the rotating speed change of the fan is normal, and judging whether the temperature point with the temperature value not matched with the rotating speed change of the fan is abnormal;
and the abnormal temperature point checking module 4 is used for adjusting the full-speed rotation of the fan and prompting to check the abnormal temperature when the abnormal temperature point exists.
According to the reliable heat dissipation control device for the server, the temperature point with the temperature value unchanged all the time is detected by adjusting the rotating speed of the fan, whether the obtained temperature value is normally refreshed when the rotating speed of the fan changes is judged, whether the sensor and the temperature value reading are abnormal is judged, the abnormal sensor or reading link is timely alarmed, the speed regulation mode is adjusted to be full speed, downtime caused by speed regulation failure due to the abnormality of the sensor or reading link is reduced, and the stability of a server system is enhanced.
Example 4:
as shown in fig. 3, the present invention provides a reliable heat dissipation control device for a server, including:
the temperature point temperature value sampling module 1 is used for BMC to obtain temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time; the temperature point and temperature value sampling module 1 includes:
a temperature value acquisition unit 1.1, which is used for reading the temperature value of each temperature point acquired by the temperature sensor or the ME register at intervals of a set time period by the BMC;
the fan rotating speed obtaining unit 1.2 is used for obtaining the temperature value of the temperature point by the BMC and obtaining the fan rotating speed n corresponding to the speed regulating strategy of the temperature point; each temperature point comprises a temperature value of a CPU (central processing unit) or a temperature value of a memory stored in an ME (management entity) register and a temperature value of a heating element acquired by a temperature sensor on a server mainboard;
the temperature point and temperature value analysis module 2 is used for analyzing the temperature values of the temperature points by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point; the temperature point and temperature value analysis module 2 includes:
a polling frequency obtaining unit 2.1 for obtaining a set frequency m;
the temperature value analysis unit 2.2 is used for the BMC to sequentially obtain m temperature values of each temperature point from the current time point and judge whether the m temperature values are the same;
a temperature point normality judging unit 2.3, which is used for judging that the m temperature values are different, the temperature value obtained for m times of each temperature point is changed, and each temperature point is normal;
a temperature point abnormality detection starting unit 2.4 for starting abnormality detection on m temperature points having the same temperature value; the m temperature values of each temperature point which are taken for m times are slightly changed under the normal condition, and if the m values are consistent and do not change, a sensor of the temperature point or a link for reading the temperature values possibly fails, and abnormal detection is needed;
the temperature point abnormity detection module 3 is used for adjusting the rotating speed of the fan for the temperature points needing abnormity detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly, judging whether the temperature point with the temperature value matched with the rotating speed change of the fan is normal, and judging whether the temperature point with the temperature value not matched with the rotating speed change of the fan is abnormal; the temperature point abnormality detection module 3 includes:
an abnormality detection parameter obtaining unit 3.1 for obtaining the number of abnormality detections and the set proportion of the fan rotation speed adjustment required for each abnormality detection;
an abnormal detection temperature judging unit 3.2, which is used for increasing the rotating speed of the fan for the temperature point needing abnormal detection according to the corresponding set proportion and judging whether the temperature value of the temperature point is reduced or not;
a temperature normality or failure removal determination unit 3.3, which is used for determining that the temperature value of the temperature point is matched with the change of the rotating speed of the fan when the temperature value of the temperature point is reduced, and the temperature point is normal or the failure is removed;
an abnormality detection frequency completion judging unit 3.4 for judging whether the abnormality detection frequency is completed or not when the temperature value of the temperature point is not decreased;
a temperature point abnormality determination unit 3.5 for determining that the temperature value of the temperature point is not matched with the change of the fan rotation speed when the abnormality detection times are completed, and the temperature point is abnormal;
the next abnormity detection positioning unit 3.6 is used for positioning the next abnormity detection when the abnormity detection times are not finished and acquiring the set proportion of the abnormity detection; the abnormal detection times are set in advance, after the fan rotating speed is adjusted once, if the temperature cannot be changed, the fan rotating speed adjusting proportion is possibly too low, the fan rotating speed needs to be further adjusted, and the more the abnormal detection times are, the higher the precision is, and the abnormal detection times are specifically set according to the test requirements; under the normal condition, increasing the rotating speed of the fan, reducing the temperature value of the temperature point, if the temperature value is reduced, indicating that the sensor of the temperature point is normal and the link reading the temperature is normal, and if the temperature value is not reduced, indicating that the increasing proportion of the rotating speed of the fan is insufficient, or indicating that the sensor of the temperature point is in fault or the link reading the temperature value is abnormal;
the abnormal temperature point checking module 4 is used for adjusting the full-speed rotation of the fan and prompting the abnormal temperature checking when the abnormal temperature point exists; the abnormal temperature point checking module 4 includes:
an abnormal temperature point existence judging unit 4.1, which is used for judging that the abnormal temperature point exists in the current server system;
the fan full-speed rotation adjusting unit 4.2 is used for adjusting the full-speed rotation of the fan;
the temperature abnormity troubleshooting prompting unit 4.3 is used for prompting the server system to perform temperature abnormity troubleshooting; for determining abnormal temperature points, the full-speed rotation of the fan needs to be adjusted to ensure the requirements of the server, and meanwhile, a warning prompt is sent to carry out temperature abnormality examination.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A reliable heat dissipation control method of a server is characterized by comprising the following steps:
s1, the BMC acquires temperature values of all temperature points and fan rotating speeds corresponding to speed regulation strategies at regular time;
s2, analyzing the temperature value of each temperature point by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and if so, starting abnormal detection on the temperature point;
s3, adjusting the rotating speed of the fan for the temperature point needing to be subjected to abnormal detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly or not, judging whether the temperature point with the temperature value matched with the change of the rotating speed of the fan is normal or not, and judging whether the temperature point with the temperature value not matched with the change of the rotating speed of the fan is abnormal or not;
and S4, when an abnormal temperature point exists, adjusting the full-speed rotation of the fan, and prompting to perform temperature abnormality examination.
2. The method for controlling reliable heat dissipation of a server according to claim 1, wherein step S1 comprises the following steps:
s11, reading the temperature value of each temperature point acquired by the temperature sensor or the ME register at intervals of a set time period by the BMC;
and S12, acquiring the temperature value of the temperature point by the BMC, and acquiring the fan rotating speed n corresponding to the speed regulation strategy of the temperature point.
3. The method for controlling reliable heat dissipation of a server according to claim 1, wherein step S2 comprises the following steps:
s21, acquiring a set number m;
s22, sequentially taking m temperature values of each temperature point from the current time point by the BMC, and judging whether the m temperature values are the same;
if yes, go to step S24;
if not, go to step S23;
s23, judging that the temperature value obtained m times of each temperature point changes, and returning to the step S1 when each temperature point is normal;
and S24, starting abnormal detection on the temperature points with the same m temperature values.
4. The method for controlling reliable heat dissipation of a server according to claim 1, wherein step S3 comprises the following steps:
s31, acquiring the number of times of abnormality detection and a set proportion of fan rotating speed adjustment required by each time of abnormality detection;
s32, increasing the rotating speed of the fan for the temperature point needing to be subjected to the abnormal detection according to the corresponding set proportion, and judging whether the temperature value of the temperature point is reduced or not;
if yes, go to step S33;
if not, go to step S34;
s33, judging that the temperature value of the temperature point is matched with the change of the rotating speed of the fan, and returning to the step S1 if the temperature point is normal or the fault is relieved;
s34, judging whether the abnormal detection times are finished or not;
if yes, go to step S35;
if not, go to step S36;
s35, judging that the temperature value of the temperature point is not matched with the change of the rotating speed of the fan, if the temperature point is abnormal, entering the step S4;
and S36, positioning to the next abnormal detection, acquiring the set proportion of the abnormal detection, and returning to the step S32.
5. The method for controlling reliable heat dissipation of a server according to claim 1, wherein step S4 comprises the following steps:
s41, judging that the current server system has an abnormal temperature point;
s42, adjusting full-speed rotation of the fan;
and S43, prompting to carry out temperature anomaly checking on the server system.
6. A reliable heat dissipation control device for a server, comprising:
the temperature point temperature value sampling module (1) is used for the BMC to obtain the temperature value of each temperature point and the rotating speed of the fan corresponding to the speed regulation strategy at regular time;
the temperature point and temperature value analysis module (2) is used for analyzing the temperature values of the temperature points by the BMC, judging whether the temperature value of a certain temperature point does not change within the set times, and starting abnormal detection on the temperature point if the temperature value of the certain temperature point does not change;
the temperature point abnormity detection module (3) is used for adjusting the rotating speed of the fan for the temperature point needing abnormity detection according to a set proportion, judging whether the temperature value of the temperature point changes correspondingly or not, judging that the temperature point with the temperature value matched with the change of the rotating speed of the fan is normal, and judging that the temperature point with the temperature value not matched with the change of the rotating speed of the fan is abnormal;
and the abnormal temperature point checking module (4) is used for adjusting the full-speed rotation of the fan and prompting the abnormal temperature checking when the abnormal temperature point exists.
7. The reliable heat dissipation control device of a server according to claim 6, wherein the temperature point temperature value sampling module (1) comprises:
the temperature value acquisition unit (1.1) is used for reading the temperature value of each temperature point acquired by the temperature sensor or the ME register at intervals of a set time period by the BMC;
and the fan rotating speed acquisition unit (1.2) is used for acquiring the temperature value of the temperature point by the BMC and acquiring the fan rotating speed n corresponding to the speed regulation strategy of the temperature point.
8. The reliable heat dissipation control apparatus of a server according to claim 6, wherein the temperature point temperature value analysis module (2) comprises:
a polling frequency acquisition unit (2.1) for acquiring a set frequency m;
the temperature value analysis unit (2.2) is used for the BMC to sequentially obtain m temperature values of each temperature point from the current time point and judge whether the m temperature values are the same;
a temperature point normal determination unit (2.3) which is used for determining that the m temperature values are different and that the temperature value obtained for m times of each temperature point is changed and each temperature point is normal;
and a temperature point abnormality detection starting unit (2.4) for starting abnormality detection for the m temperature points with the same temperature value.
9. The reliable heat dissipation control apparatus of a server according to claim 6, wherein the temperature point abnormality detection module (3) includes:
an abnormality detection parameter acquisition unit (3.1) for acquiring the number of abnormality detections and the set proportion of fan rotation speed adjustment required for each abnormality detection;
an abnormal detection temperature judgment unit (3.2) which is used for increasing the rotating speed of the fan for the temperature point needing abnormal detection according to the corresponding set proportion and judging whether the temperature value of the temperature point is reduced or not;
a normal temperature or failure release determination unit (3.3) for determining that the temperature value of the temperature point matches the change of the fan speed when the temperature value of the temperature point is decreased, and the temperature point is normal or the failure is released;
an abnormality detection frequency completion judging unit (3.4) for judging whether the abnormality detection frequency has been completed or not when the temperature value of the temperature point has not decreased;
a temperature point abnormity determining unit (3.5) for determining that the temperature value of the temperature point is not matched with the change of the rotating speed of the fan when the abnormity detection times are completed, and the temperature point is abnormal;
and the next abnormity detection positioning unit (3.6) is used for positioning to the next abnormity detection when the abnormity detection times are not finished, and acquiring the set proportion of the abnormity detection.
10. The reliable heat dissipation control apparatus of a server according to claim 6, wherein the abnormal temperature point checking module (4) comprises:
an abnormal temperature point existence judging unit (4.1) for judging that the abnormal temperature point exists in the current server system;
a full-speed fan rotation adjusting unit (4.2) for adjusting full-speed fan rotation;
and the temperature abnormity troubleshooting prompting unit (4.3) is used for prompting the server system to perform temperature abnormity troubleshooting.
CN202111454833.1A 2021-11-29 2021-11-29 Reliable heat dissipation control method and device for server Pending CN114281173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111454833.1A CN114281173A (en) 2021-11-29 2021-11-29 Reliable heat dissipation control method and device for server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111454833.1A CN114281173A (en) 2021-11-29 2021-11-29 Reliable heat dissipation control method and device for server

Publications (1)

Publication Number Publication Date
CN114281173A true CN114281173A (en) 2022-04-05

Family

ID=80870470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111454833.1A Pending CN114281173A (en) 2021-11-29 2021-11-29 Reliable heat dissipation control method and device for server

Country Status (1)

Country Link
CN (1) CN114281173A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756108A (en) * 2022-04-26 2022-07-15 深圳市研控科技有限公司 Temperature control method and system of computer case
CN117310241A (en) * 2023-11-30 2023-12-29 天津瑞芯源智能科技有限责任公司 Ammeter with fire safety function

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108457888A (en) * 2018-03-01 2018-08-28 郑州云海信息技术有限公司 A kind of server fan fault detection method, apparatus and system
CN110594180A (en) * 2019-07-19 2019-12-20 苏州浪潮智能科技有限公司 Control method and system of server heat dissipation controller
CN111734667A (en) * 2020-05-29 2020-10-02 苏州浪潮智能科技有限公司 Method and device for regulating and controlling rotating speed of server fan
CN113049142A (en) * 2019-12-27 2021-06-29 华能如东八仙角海上风力发电有限责任公司 Temperature sensor alarm method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108457888A (en) * 2018-03-01 2018-08-28 郑州云海信息技术有限公司 A kind of server fan fault detection method, apparatus and system
CN110594180A (en) * 2019-07-19 2019-12-20 苏州浪潮智能科技有限公司 Control method and system of server heat dissipation controller
CN113049142A (en) * 2019-12-27 2021-06-29 华能如东八仙角海上风力发电有限责任公司 Temperature sensor alarm method, device, equipment and storage medium
CN111734667A (en) * 2020-05-29 2020-10-02 苏州浪潮智能科技有限公司 Method and device for regulating and controlling rotating speed of server fan

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756108A (en) * 2022-04-26 2022-07-15 深圳市研控科技有限公司 Temperature control method and system of computer case
CN117310241A (en) * 2023-11-30 2023-12-29 天津瑞芯源智能科技有限责任公司 Ammeter with fire safety function
CN117310241B (en) * 2023-11-30 2024-02-02 天津瑞芯源智能科技有限责任公司 Ammeter with fire safety function

Similar Documents

Publication Publication Date Title
CN109637680B (en) Nuclear power station leakage monitoring alarm method and alarm system
CN114281173A (en) Reliable heat dissipation control method and device for server
EP3905263A1 (en) Nuclear power plant leakage monitoring alarm method and alarm system
JP2001201433A (en) Machine protection system for rotating equipment and method
EP3696405B1 (en) Operating state evaluation method and operating state evaluation device
CN111124827A (en) Monitoring device and monitoring method for equipment fan
CN107193708A (en) A kind of condition detection method and system
US11555757B2 (en) Monitoring device, monitoring method, method of creating shaft vibration determination model, and program
CN112797807B (en) Temperature anomaly monitoring system and method
CN114019422A (en) Transformer fault monitoring system based on ATT-BilSTM
CN117215876A (en) Temperature checking method, system, device and medium
JP7288794B2 (en) Operating state evaluation method and operating state evaluation device
CN111338891A (en) Fan stability testing method and device
KR102198190B1 (en) Data standardization method considering operating contion for diagnosis of rotating machinery failure and diagnosis method rotating machinery failure using the same
CN108958220B (en) Intelligent instrument configuration software and method of fluid machinery measurement and control system
JP2000298511A (en) Equipment diagnosing device and recording medium
CN114837902B (en) Health degree evaluation method, system, equipment and medium for wind turbine generator
CN115729756A (en) Test-accompanied hard disk, and method and system for test-accompanied server based on test-accompanied hard disk
CN113568397A (en) Self-detection system and self-detection method for turbine monitoring instrument
CN111290920B (en) System, method and storage medium for testing CPU temperature based on PECI bus
CN112542029A (en) Fan noise detection monitoring method and system, computer equipment and storage medium
CN216052738U (en) Self-detection system for steam turbine monitoring instrument
CN114233470B (en) Engine crankcase pressure correction method and related equipment
CN111352789B (en) Alternating current circulation test method and device for server and storage medium
CN115686156A (en) Heat dissipation control method and device, communication equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination