CN111026252B - Method and device for server temperature redundancy control - Google Patents

Method and device for server temperature redundancy control Download PDF

Info

Publication number
CN111026252B
CN111026252B CN201911243108.2A CN201911243108A CN111026252B CN 111026252 B CN111026252 B CN 111026252B CN 201911243108 A CN201911243108 A CN 201911243108A CN 111026252 B CN111026252 B CN 111026252B
Authority
CN
China
Prior art keywords
controller
temperature
control
channel
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911243108.2A
Other languages
Chinese (zh)
Other versions
CN111026252A (en
Inventor
程鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201911243108.2A priority Critical patent/CN111026252B/en
Publication of CN111026252A publication Critical patent/CN111026252A/en
Application granted granted Critical
Publication of CN111026252B publication Critical patent/CN111026252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • G06F1/206Cooling means comprising thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to a method and a device for server temperature redundancy control, wherein the method comprises the following steps: the control channel selector transmits the collected first temperature to the first controller by a first channel, and performs first temperature control based on the first temperature by the first controller, the second controller and the third controller which are connected in serial communication; monitoring a first bus and a feed dog signal between a second controller and a third controller; in response to the first bus being idle for more than a threshold time and the dog feed signal being normal, controlling the channel selector to communicate the first temperature to the second controller in the second channel and to perform a second temperature control by the second controller and the third controller based on the first temperature; in response to the first bus being idle for more than a threshold time and the dog feed signal being abnormal, the channel selector is controlled to communicate the first temperature to the third controller in a third channel, and third temperature control is performed by the third controller based on the first temperature. This embodiment solves the problem of uncontrolled heat dissipation within the server.

Description

Method and device for server temperature redundancy control
Technical Field
The invention relates to the technical field of servers. The invention further relates to a server temperature redundancy control method and device.
Background
Temperature control has always been a great proposition in the field of servers. Currently, the x86 servers used in the industry are basically Intel to provide technical support, most CPUs are also from Intel manufacturers, and some designs such as architectural design and board bus are also designed and used by Intel. Therefore, currently, each server design company sets different heat dissipation modes, such as air cooling, water cooling, mixed heat dissipation and the like, according to the Intel temperature specification. Among the components of the X86 architecture platform, the CPU (Central Processing Unit) typically consumes the most power and the temperature is the highest. There are different processing modes for the control part of this part. Specifically, in the existing design scheme of the motherboard in the server, a PCH (Platform Control unit) is generally adopted to obtain temperature information of the CPU through a PECI (Platform Environment Control Interface) bus, and then the temperature information of the CPU is transmitted to a BMC (Baseboard Management Controller) through an SMLINK bus (or SMBus), and after receiving the transmitted CPU temperature information, the BMC directly controls a fan PWM (Pulse width modulation) signal to Control the rotation speed of the fan.
Generally, 1 piece of PCH and BMC are placed on a motherboard, and two paths of servers are used for illustration, and a general motherboard is equipped with 2 CPUs, 1 piece of PCH, 1 piece of BMC, and 1 piece of CPLD (Complex Programmable Logic Device), which are related devices for controlling a main fan. When there is a problem with the PCH and/or BMC on the motherboard, for example, when the BMC is down and cannot execute the current setting, how to ensure that the fan control is normal needs to be considered. In the current temperature control, the temperature control modes under the condition that the SMLINK of the PCH is hung up and under the condition that the BMC is hung up are not considered, so that the fan control is in an uncontrolled state under the two abnormal conditions.
Therefore, a solution to the problem of uncontrolled heat dissipation in the server when a chip failure occurs in the PCH and/or BMC or a firmware problem occurs is needed, so as to better regulate the temperature in the server and further protect the safe operation of the server.
Disclosure of Invention
In one aspect, the present invention provides a method for server temperature redundancy control based on the above object, wherein the method comprises the following steps:
the control channel selector transmits the collected first temperature to the first controller by a first channel, and performs first temperature control based on the first temperature by the first controller, the second controller and the third controller which are connected in serial communication;
monitoring a first bus and a feed dog signal between a second controller and a third controller;
in response to the first bus being idle for more than a threshold time and the dog feed signal being normal, controlling the channel selector to communicate the first temperature to the second controller in the second channel and to perform a second temperature control by the second controller and the third controller based on the first temperature;
in response to the first bus being idle for more than a threshold time and the dog feed signal being abnormal, the channel selector is controlled to communicate the first temperature to the third controller in a third channel, and third temperature control is performed by the third controller based on the first temperature.
According to an embodiment of the server temperature redundancy control method of the present invention, wherein the controlling the channel selector transfers the collected first temperature to the first controller in the first channel, and the performing, by the first controller, the second controller, and the third controller connected by serial communication, the first temperature control based on the first temperature further includes:
the first controller analyzes the first temperature, regulates and controls the working state of the CPU according to the analyzed first temperature, and forwards the analyzed first temperature to the second controller through the second bus;
the second controller generates a control instruction according to the analyzed first temperature and the collected second temperature, and sends the control instruction to the third controller through the first bus;
and the third controller outputs a fan control signal according to the control instruction.
An embodiment of a method of server temperature redundancy control in accordance with the invention wherein in response to the first bus being idle for more than a threshold time and the dog feed signal being normal, controlling the channel selector to pass the first temperature to the second controller in the second channel, and performing, by the second controller and the third controller, a second temperature control based on the first temperature further comprises:
the second controller analyzes the first temperature, generates a control instruction according to the analyzed first temperature and the collected second temperature, and sends the control instruction to the third controller through the first bus;
and the third controller outputs a fan control signal according to the control instruction.
In an embodiment of the method for server temperature redundancy control according to the invention, wherein in response to the first bus being idle for more than a threshold time and the dog feed signal being abnormal, controlling the channel selector to communicate the first temperature to the third controller in a third channel, and performing, by the third controller, a third temperature control based on the first temperature further comprises:
the third controller analyzes the first temperature and outputs a preset fan control signal according to the analyzed first temperature.
According to the embodiment of the server temperature redundancy control method, the first controller is a PCH chip, the second controller is a BMC, and the third controller is a CPLD.
According to an embodiment of the method for server temperature redundancy control of the present invention, the first bus is an I2C bus.
An embodiment of the method for redundant control of server temperature according to the invention is defined by the second bus being a SMLINK bus.
According to the embodiment of the server temperature redundancy control method, the channel selector is a GPIO channel switching chip and is controlled by the third controller to gate the first channel, the second channel or the third channel.
An embodiment of a method of server temperature redundancy control in accordance with the present invention, wherein the dog feeding signal is normally a constant frequency square wave.
In another aspect, the present invention further provides a device for controlling temperature redundancy of a server, wherein the device includes:
at least one processor; and
a memory storing processor executable program instructions that, when executed by the processor, perform the steps of any of the foregoing embodiments of a method of server temperature redundancy control.
By adopting the technical scheme, the invention at least has the following beneficial effects: a plurality of temperature regulation strategies are added in the conventional server temperature control strategy, a redundancy control mechanism is added for the conventional server temperature control, different temperature control schemes are executed by monitoring the working state of the bus and controlling the channel of the channel selector by the dog feeding signal, and the temperature in the server is ensured to be in a controlled state under the condition that the first controller is abnormal or the first controller and the second controller are both abnormal.
The present invention provides aspects of embodiments, which should not be used to limit the scope of the present invention. Other embodiments are contemplated in accordance with the techniques described herein, as will be apparent to one of ordinary skill in the art upon study of the following figures and detailed description, and are intended to be included within the scope of the present application.
Embodiments of the invention are explained and described in more detail below with reference to the drawings, but they should not be construed as limiting the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the description of the prior art and the embodiments will be briefly described below, components in the drawings are not necessarily drawn to scale, and related elements may be omitted, or in some cases the scale may have been enlarged so as to emphasize and clearly show the features described herein. In addition, the structural order may be arranged differently, as is known in the art.
FIG. 1 shows a schematic block diagram of an embodiment of a method of server temperature redundancy control according to the present invention;
fig. 2 shows a hardware link diagram of an embodiment of the method of server temperature redundancy control according to the present invention.
Detailed Description
While the present invention may be embodied in various forms, there is shown in the drawings and will hereinafter be described some exemplary and non-limiting embodiments, with the understanding that the present disclosure is to be considered an exemplification of the invention and is not intended to limit the invention to the specific embodiments illustrated.
It should be noted that the steps mentioned in the following description of the embodiments of the present invention are only numbered for convenience and clarity of indicating the steps without specific description, and do not limit the sequence of the steps.
In order to solve the problem of uncontrolled heat dissipation in a server when a chip failure or a firmware occurs in a PCH (Platform control unit) and/or BMC (Baseboard Management Controller), the present invention provides a method for redundant control of server temperature. Fig. 1 shows a schematic block diagram of an embodiment of a method of server temperature redundancy control according to the present invention. In the embodiment shown in fig. 1, the method comprises at least the following steps:
s1: the control channel selector transmits the collected first temperature to the first controller by a first channel, and performs first temperature control based on the first temperature by the first controller, the second controller and the third controller which are connected in serial communication;
s2: monitoring a first bus and a feed dog signal between a second controller and a third controller;
s3: in response to the first bus being idle for more than a threshold time and the dog feed signal being normal, controlling the channel selector to communicate the first temperature to the second controller in the second channel and to perform a second temperature control by the second controller and the third controller based on the first temperature;
s4: in response to the first bus being idle for more than a threshold time and the dog feed signal being abnormal, the channel selector is controlled to communicate the first temperature to the third controller in a third channel, and third temperature control is performed by the third controller based on the first temperature.
An embodiment of the method is described in detail below with reference to fig. 2. First, by default, that is, when the server system is operating normally, step S1 controls the channel selector 10 to transmit the first temperature to the first controller 20 in the first channel 11, and the first temperature control is performed by the first controller 20, the second controller 30, and the third controller 40 connected in serial communication based on the first temperature. The first temperature is preferably the collected temperature of the server CPU. In some embodiments, this step implements a conventional server temperature control strategy. In a preferred embodiment, this first temperature control further adds a multi-faceted temperature regulation strategy to the conventional server temperature control strategy, as will be further explained below. During server operation, step S2 monitors the first bus 50 and the dog feed signal 60 between the second controller 30 and the third controller 40. Preferably, the first bus 50 and the watchdog signal WDT (abbreviation for watchdog) 60 are monitored by the third controller 40. When the first bus 50 is idle for more than the threshold time and the dog feeding signal 60 is normal, it is considered that the first controller 20 is abnormal and cannot operate, so step S3 controls the channel selector 10 to transmit the first temperature to the second controller 30 through the second channel 12, and the second controller 30 and the third controller 40 perform the second temperature control based on the first temperature. In a preferred embodiment, this second temperature control further adds a multi-faceted temperature regulation strategy to the conventional server temperature control strategy, as will be further explained below. When the second bus 50 is idle for more than the threshold time and the dog feeding signal 60 is abnormal, it is considered that the first controller 20 and the second controller 30 or the second controller 30 is abnormal and cannot operate, so step S4 controls the channel selector 10 to transmit the first temperature to the third controller 40 through the third channel 13, and the third controller 30 performs the third temperature control based on the first temperature. Since the operating frequencies of the first controller 20, the second controller 30, and the third controller 40 used in the server are relatively high, the threshold time is usually set to be short, preferably in milliseconds, and at least cannot exceed seconds, in order to ensure the stability of the server temperature control.
In some embodiments of the server temperature redundancy control method of the present invention, the step S1 controlling the channel selector 10 to transfer the collected first temperature to the first controller 20 in the first channel 11, and performing, by the first controller 20, the second controller 30, and the third controller 40 connected in serial communication, the first temperature control based on the first temperature further comprises:
s11: the first controller analyzes the first temperature, regulates and controls the working state of the CPU according to the analyzed first temperature, and forwards the analyzed first temperature to the second controller through the second bus;
s12: the second controller generates a control instruction according to the analyzed first temperature and the collected second temperature, and sends the control instruction to the third controller through the first bus;
s13: and the third controller outputs a fan control signal according to the control instruction.
The specific control strategy for the first temperature control proposed in these embodiments is specifically described below, i.e., a temperature regulation strategy is added to the conventional server temperature control strategy in many aspects. After controlling the channel selector 10 to transmit the first temperature to the first controller 20 through the first channel 11, first, the first controller 20 parses the first temperature, and adjusts the CPU operating state according to the parsed first temperature, and forwards the parsed first temperature to the second controller 30 through the second bus 70 in step S11. Based on the parsed first temperature, the first controller 20 issues a corresponding instruction to adjust the operating state of the CPU, such as reducing the operating frequency of the CPU, adjusting the data processing intensity of the CPU, and so on, thereby fundamentally controlling the temperature of the CPU. At the same time, the first controller 20 forwards the parsed first temperature to the second controller 30 through the second bus 70, thereby allowing the second controller 30 to further control the server temperature. Then, in step S12, the second controller 30 generates a control command based on the analyzed first temperature and the collected second temperature, and transmits the control command to the third controller 40 via the first bus 50. From another perspective, in the first temperature control, the second controller 30 is a core control part of the server temperature control, and the second controller 30 generates a corresponding control command according to the resolved first temperature forwarded by the first controller 20 and comprehensively considering a second temperature collected by the second controller 30 through another interface (for example, the general purpose input output interface GPIO). The second temperature referred to herein includes the temperature of other elements, locations, in the server other than the CPU. After the second controller 30 generates the control command, the control command is transmitted to the third controller 40 through the first bus 50. Finally, in step S13, the third controller 40 outputs a FAN control signal, preferably a PWM FAN control signal, according to the control command, and directly outputs the FAN control signal to the FAN (FAN)80 to perform real-time temperature control on the server. In another aspect, the third controller 40 is essentially operating as an actuator for the second controller 30 in the first temperature control.
In some embodiments of the method of server temperature redundancy control of the present invention, the step S3 controlling the channel selector 10 to communicate the first temperature to the second controller 30 in the second channel 12 in response to the first bus 50 being idle for more than a threshold time and the dog feeding signal 60 being normal, and performing, by the second controller 30 and the third controller 40, the second temperature control based on the first temperature further comprises:
s31: the second controller analyzes the first temperature, generates a control instruction according to the analyzed first temperature and the collected second temperature, and sends the control instruction to the third controller through the first bus;
s32: and the third controller outputs a fan control signal according to the control instruction.
The specific control strategy of the second temperature control proposed in these embodiments is specifically described below. At this time, the first controller 20 is considered to be abnormal and thus cannot operate, that is, the first controller 20 cannot provide the aforementioned function of analyzing and forwarding the first temperature. Therefore, the control channel selector 10 transmits the first temperature to the second controller 30 through the second channel 12, and the second controller 30 provides a function of resolving the first temperature, that is, the second controller resolves the first temperature in step S31, and generates a control command according to the resolved first temperature and the collected second temperature, and transmits the control command to the third controller 40 through the first bus 50. In the second temperature control, the second controller 30 serves as the only control part of the server temperature control, and the second controller 30 generates a corresponding control command according to the first temperature analyzed by itself and comprehensively considering the second temperature collected by the second controller 30 through other interfaces (for example, the general purpose input output interface GPIO), and sends the control command to the third controller 40 through the first bus 50. Finally, in step S32, the third controller 40 outputs a fan control signal, preferably a PWM fan control signal, according to the control command, and directly outputs the fan control signal to the fan 80, so as to perform real-time temperature control on the server. In another aspect, the third controller 40 is essentially operating as an actuator for the second controller 30 in the second temperature control.
In some embodiments of the method of server temperature redundancy control of the present invention, the step S4 controlling the channel selector 10 to communicate the first temperature to the third controller 40 in the third channel 13 in response to the first bus 50 being idle for more than a threshold time and the dog feed signal 60 being abnormal, and performing, by the third controller 40, a third temperature control based on the first temperature further comprises:
the third controller analyzes the first temperature and outputs a preset fan control signal according to the analyzed first temperature.
The specific control strategy of the third temperature control proposed in these embodiments is specifically described below. At this time, it is considered that both the first controller 20 and the second controller 30 or the second controller 30 is abnormal and is not able to operate, that is, the functions of parsing, forwarding and substantial control are not able to be completed. Therefore, the third controller 40 receives and analyzes the first temperature, and outputs a predetermined fan control signal according to the analyzed first temperature. On the other hand, in the third temperature control, the third controller 40 essentially assumes a substantial control work and performs a work. However, since both the function and the stability cannot be achieved, a control device having sufficiently high stability and slightly poor functionality is generally selected in order to ensure the stability of the third controller. Therefore, some specific control signals, such as full rotation speed PWM, 90% rotation speed PWM, 80% rotation speed PWM, etc., are preset in the third controller 40, and when the first temperature obtained by analysis is within a certain range, the corresponding PWM control signals are output to the fan 80, so that the fan operates at a relatively high rotation speed, and it is ensured that the temperature in the server is not too high, but the server cannot be subjected to real-time temperature control at this time.
In some embodiments of the method for controlling the temperature redundancy of the server of the present invention, the first Controller 20 is a PCH chip (Platform control unit), the second Controller 30 is a BMC (Baseboard Management Controller), and the third Controller 40 is a CPLD (Complex Programmable Logic Device). The PCH can analyze the first temperature, and can adjust and control the operating state of the CPU while forwarding the analyzed first temperature. The BMC can better assume the actual control effort as a second controller. The CPLD has sufficient stability and certain control functions. The BMC chip is generally an advanced AST2500 chip or other similar management chip, and mainly analyzes information such as PECI and SMLINK and outputs a dog feeding signal to the CPLD in the present day, and the BMC I2C is mainly used for transmitting fan control information to the CPLD.
In some embodiments of the server temperature redundancy control method of the present invention, the first Bus 50 is an I2C Bus (Inter-Integrated Circuit Bus). That is, the second controller 30 and the third controller 40 are communicatively connected via an I2C bus to transmit control commands. Additionally, in some embodiments, the second bus is a SMLINK bus (or SMBus, system management bus). That is, the first controller 20 and the second controller 30 are communicatively connected via the SMLINK bus to communicate the resolved first temperature.
In some embodiments of the server temperature redundancy control method of the present invention, the channel selector 10 is a GPIO (General Purpose Input/Output) channel switching chip, and the channel selector 10 is controlled by the third controller 40 to gate the first channel 11, the second channel 12, or the third channel 13. A GPIO channel switching chip, that is, an I/O-SW (Input/Output Switch) chip refers to a GPIO switching chip, generally 2 pins are set for selecting a channel, that is, a channel can be selected by setting a level state of the 2 pins, so that switching to 3 channels is possible. And, since the stability of the third controller 40 is strongest, the channel selector 10 is controlled by the third controller 40 to gate the first channel 11 or the second channel 12 or the third channel 13 according to the Select signal. The Select signal refers to a channel switching control GPIO of the I/O-SW chip, and is generally 2 GPIOs, which are used to control the I/O-SW chip to switch to different channels.
In some embodiments of the method of server temperature redundancy control of the present invention, the dog feeding signal is normally a constant frequency square wave. The BMC generally outputs a square wave with fixed frequency to the CPLD for feeding back whether the working state of the BMC is normal or not. Preferably, a constant frequency of 1Hz is selected to facilitate monitoring by the third controller 40.
The hardware link and the elements in the hardware link in fig. 2 are taken as examples to further illustrate the method of the present invention.
By adopting two server mainboards for example, the PECI buses of two CPUs are connected to an I/O-SW chip (a chip of a GPIO switching channel), the PCH, the BMC and the CPLD are also connected to the I/O-SW chip, the CPU temperature information is transmitted between the PCH and the BMC through the SMLINK, a dog feeding signal WDT (abbreviation of watch dog) is interconnected between the BMC and the CPLD, and an I2C channel is arranged between the BMC and the CPLD and used for transmitting fan control information.
The normal fan control flow is as follows: the CPU1 and the CPU2 transmit the temperature information of the two CPUs respectively through the PECI bus, and the I/O SW chip is provided with a select control pin for switching the channel of the I/O SW chip. The channel of the I/O SW is generally set: CH 1: PCH terminal, CH 2: BMC end, CH 3: and a CPLD end. The default setting I/O SW channel is switched to CH1, namely, the I/O SW channel is connected to the PCH end, so that the PECI bus of the CPU transmits temperature information to the PCH end, the PCH end receives PECI bus data, analyzes the PECI bus data, and transmits corresponding data to the BMC through the SMLINK bus, and therefore the temperature information transmission from the CPU to the BMC is completed. After receiving the temperature information, the BMC sends a control instruction to the CPLD according to the set temperature and rotating speed to perform PWM control, namely fan rotating speed control. The above is the fan control flow under normal conditions.
Since the PECI is mainly transmitted to the ME module of the PCH, and the module is responsible for managing the power supply and other functions of the PCH, when the module fails, the PECI signal cannot be analyzed, that is, the temperature information of the CPU cannot be transmitted. According to the normal flow, the BMC is responsible for acquiring temperature information from the PCH through the SMLINK bus, cannot acquire the temperature information and cannot transmit the temperature information to the CPLD through the I2C, and finally the fan is controlled to be invalid and is in an out-of-control state.
When the PCH is normal, the BMC has a fault, and at this time, even if the SMLINK normally transmits temperature information, the BMC is in a hang-up state, the BMC cannot transmit information to the CPLD, and the fan is also in an out-of-control state.
When the CPLD detects that no data transmission of I2C reaches a set threshold time, and detects that a WDT (dog feeding signal) is in a normal state, it may be considered that the PCH fault is currently determined and the BMC is normal, which is a no-fan control state. At this time, the CPLD controls the I/O SW chip through a select pin, and switches to channel 2(CH 2): and at the BMC end, the BMC is responsible for analyzing the PECI, and after analysis, the PECI is transmitted to the CPLD through I2C according to a fan regulation strategy, and the CPLD carries out PWM output to control the rotating speed of the fan.
When the CPLD detects that no data transmission of I2C reaches a set threshold time, and detects that a WDT (dog feeding signal) is abnormal, it may be considered that both the PCH and the BMC are currently in a fault, and it is in a fanless control state. At this time, the CPLD controls the I/O SW chip through select pin, and switches to channel 3(CH 3): and the CPLD is responsible for the analysis of the PECI, and directly carries out PWM output and controls the rotating speed of the fan according to a fan regulation and control strategy after the analysis.
Therefore, the fan can be regulated and controlled no matter the PCH fault or the BMC fault or both the PCH and the BMC fault is ensured, and the temperature control in the server is ensured.
In another aspect, the present invention further provides a device for controlling temperature redundancy of a server, wherein the device includes: at least one processor; and a memory storing program instructions executable by the processor to perform the steps of any one of the foregoing embodiments of a method of server temperature redundancy control when executed by the processor.
The devices and apparatuses disclosed in the embodiments of the present invention may be various electronic terminal apparatuses, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal apparatus, such as a server, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of device and apparatus. The client disclosed in the embodiment of the present invention may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
The computer-readable storage media (e.g., memory) described herein may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
By adopting the technical scheme, the invention at least has the following beneficial effects: a plurality of temperature regulation strategies are added in the conventional server temperature control strategy, a redundancy control mechanism is added for the conventional server temperature control, different temperature control schemes are executed by monitoring the working state of the bus and controlling the channel of the channel selector by the dog feeding signal, and the temperature in the server is ensured to be in a controlled state under the condition that the first controller is abnormal or the first controller and the second controller are both abnormal.
It is to be understood that the features listed above for the different embodiments may be combined with each other to form further embodiments within the scope of the invention, where technically feasible. Furthermore, the specific examples and embodiments described herein are non-limiting, and various modifications of the structure, steps and sequence set forth above may be made without departing from the scope of the invention.
In this application, the use of the conjunction of the contrary intention is intended to include the conjunction. The use of definite or indefinite articles is not intended to indicate cardinality. In particular, references to "the" object or "an" and "an" object are intended to mean one of many such objects possible. However, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated. Furthermore, the conjunction "or" may be used to convey simultaneous features, rather than mutually exclusive schemes. In other words, the conjunction "or" should be understood to include "and/or". The term "comprising" is inclusive and has the same scope as "comprising".
The above-described embodiments, particularly any "preferred" embodiments, are possible examples of implementations, and are presented merely for a clear understanding of the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing substantially from the spirit and principles of the technology described herein. All such modifications are intended to be included within the scope of this disclosure.

Claims (10)

1. A method for server temperature redundancy control, the method comprising the steps of:
the control channel selector transmits the collected first temperature to a first controller through a first channel, and the first controller, a second controller and a third controller which are sequentially connected in serial communication jointly execute first temperature control based on the first temperature;
monitoring a first bus and a dog feed signal between the second controller and the third controller;
in response to the first bus being idle for more than a threshold time and the dog feed signal being normal, controlling the channel selector to pass the first temperature to the second controller in a second channel and to perform a second temperature control by the second controller and the third controller together based on the first temperature;
in response to the first bus being idle for more than a threshold time and the dog feed signal being abnormal, controlling the channel selector to communicate the first temperature to a third controller in a third channel, and performing, by the third controller, a third temperature control based on the first temperature.
2. The method of claim 1, wherein the control channel selector communicates the collected first temperature to a first controller in a first channel, and wherein performing a first temperature control based on the first temperature by the first controller, a second controller, and a third controller connected in serial communication in sequence further comprises:
the first controller analyzes the first temperature, regulates and controls the working state of the CPU according to the analyzed first temperature, and forwards the analyzed first temperature to the second controller through a second bus;
the second controller generates a control instruction according to the analyzed first temperature and the collected second temperature, and sends the control instruction to the third controller through the first bus;
and the third controller outputs a fan control signal according to the control instruction.
3. The method of claim 1, wherein the controlling the channel selector to pass the first temperature to the second controller in a second channel and to perform a second temperature control by the second controller and the third controller together based on the first temperature in response to the first bus being idle for more than a threshold time and the dog feed signal being normal further comprises:
the second controller analyzes the first temperature, generates a control instruction according to the analyzed first temperature and the collected second temperature, and sends the control instruction to the third controller through the first bus;
and the third controller outputs a fan control signal according to the control instruction.
4. The method of claim 1, wherein the controlling the channel selector to communicate the first temperature to a third controller in a third channel in response to the first bus being idle for more than a threshold time and the dog feed signal being abnormal, and performing, by the third controller, a third temperature control based on the first temperature further comprises:
the third controller analyzes the first temperature and outputs a preset fan control signal according to the analyzed first temperature.
5. The method of claim 1, wherein the first controller is a PCH chip, the second controller is a BMC, and the third controller is a CPLD.
6. The method of claim 1, wherein the first bus is an I2C bus.
7. The method of claim 2, wherein the second bus is a SMLINK bus.
8. The method of claim 1, wherein the channel selector is a GPIO channel switching chip, and wherein the channel selector is controlled by the third controller to gate the first channel, the second channel, or the third channel.
9. The method of claim 1 wherein the feeding dog signal is normally a square wave of constant frequency.
10. An apparatus for redundant control of server temperature, the apparatus comprising:
at least one processor; and
a memory storing processor-executable program instructions which, when executed by the processor, perform the steps of the method of server temperature redundancy control of any of the preceding claims 1 to 9.
CN201911243108.2A 2019-12-06 2019-12-06 Method and device for server temperature redundancy control Active CN111026252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911243108.2A CN111026252B (en) 2019-12-06 2019-12-06 Method and device for server temperature redundancy control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911243108.2A CN111026252B (en) 2019-12-06 2019-12-06 Method and device for server temperature redundancy control

Publications (2)

Publication Number Publication Date
CN111026252A CN111026252A (en) 2020-04-17
CN111026252B true CN111026252B (en) 2021-08-24

Family

ID=70207434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911243108.2A Active CN111026252B (en) 2019-12-06 2019-12-06 Method and device for server temperature redundancy control

Country Status (1)

Country Link
CN (1) CN111026252B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527714B (en) 2020-11-13 2023-03-28 苏州浪潮智能科技有限公司 PECI signal interconnection method, system, equipment and medium of server
CN113064479B (en) * 2021-03-03 2023-05-23 山东英信计算机技术有限公司 Power redundancy control system, method and medium of GPU server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103306800A (en) * 2012-03-09 2013-09-18 铃木株式会社 Cooling apparatus of internal combustion engine for vehicle
CN104272889A (en) * 2012-05-11 2015-01-07 E3计算有限公司 Method of operating a data center with an efficient cooling device
CN106598814A (en) * 2016-12-26 2017-04-26 郑州云海信息技术有限公司 Design method for realizing overheating protection on server system
US10390464B1 (en) * 2018-12-03 2019-08-20 Aic Inc. Server heat dissipation channel structure
CN110362175A (en) * 2019-06-29 2019-10-22 苏州浪潮智能科技有限公司 A kind of control method for fan and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4199101A (en) * 1979-01-26 1980-04-22 Johnson Controls, Inc. Multiple load integrated fluid control units
US7988063B1 (en) * 2008-06-30 2011-08-02 Emc Corporation Method for controlling cooling in a data storage system
US10139873B2 (en) * 2013-08-29 2018-11-27 International Business Machines Corporation Electronics enclosure with redundant thermal sensing architecture
CN106445780A (en) * 2016-09-26 2017-02-22 英业达科技有限公司 Server, hardware monitor system and the method of the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103306800A (en) * 2012-03-09 2013-09-18 铃木株式会社 Cooling apparatus of internal combustion engine for vehicle
CN104272889A (en) * 2012-05-11 2015-01-07 E3计算有限公司 Method of operating a data center with an efficient cooling device
CN106598814A (en) * 2016-12-26 2017-04-26 郑州云海信息技术有限公司 Design method for realizing overheating protection on server system
US10390464B1 (en) * 2018-12-03 2019-08-20 Aic Inc. Server heat dissipation channel structure
CN110362175A (en) * 2019-06-29 2019-10-22 苏州浪潮智能科技有限公司 A kind of control method for fan and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《兼顾热优化的TSV容错设计》;张阿敏;《电子测量与仪器学报》;20180731;全文 *
《冷却塔供冷***在数据中心的应用研究》;折建利;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20181031;全文 *

Also Published As

Publication number Publication date
CN111026252A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
US9742198B2 (en) Controlling a fault-tolerant array of converters
CN111026252B (en) Method and device for server temperature redundancy control
US20120137159A1 (en) Monitoring system and method of power sequence signal
TWI726550B (en) Method of providing power in standby phase
WO2023029375A1 (en) Power source consumption management apparatus for four-way server
US11733762B2 (en) Method to allow for higher usable power capacity in a redundant power configuration
CN114281172A (en) Server fan management method, system, equipment and storage medium
CN111324503A (en) Machine frame management device, method and computer readable storage medium
CN116755542B (en) Whole machine power consumption reduction method, system, substrate management controller and server
CN116991221A (en) Power consumption adjusting method and device
WO2019227839A1 (en) Bmc-based file transmission method, device and equipment, and medium
US11093014B2 (en) Method for monitoring, control and graceful shutdown of control and/or computer units
JP5332257B2 (en) Server system, server management method, and program thereof
TW201322697A (en) Baseboard management controller electronic device and controlling method thereof
TW201224727A (en) Network device and method thereof for controlling power consumption
CN112462927B (en) Voltage regulation method and device, server and computer readable storage medium
JP2008217051A (en) Fault tolerant computer and its control method
CN110633176B (en) Working system switching method, cube star and switching device
CN111382014A (en) Redundancy control system and method based on server system disk faults
JP2013156871A (en) Multiplexing control device and multiplexing expansion board
CN112769603B (en) Out-of-band management switching device, method and server
CN117270660A (en) Server heat dissipation control system, method and device and computer equipment
CN109710193B (en) System and method for controlling PWDIS signal in M.3 SSD
CN114721852B (en) Power management system and management method
CN117435019A (en) Server power supply control method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant