CN116680101A - Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system - Google Patents

Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system Download PDF

Info

Publication number
CN116680101A
CN116680101A CN202310529448.1A CN202310529448A CN116680101A CN 116680101 A CN116680101 A CN 116680101A CN 202310529448 A CN202310529448 A CN 202310529448A CN 116680101 A CN116680101 A CN 116680101A
Authority
CN
China
Prior art keywords
management controller
downtime
operating system
baseboard management
upgrading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310529448.1A
Other languages
Chinese (zh)
Inventor
刘美欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310529448.1A priority Critical patent/CN116680101A/en
Publication of CN116680101A publication Critical patent/CN116680101A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the field of detection of downtime of an operating system in the upgrading process of a substrate management controller, and particularly discloses a method and a device for detecting downtime of the operating system, a method and a device for eliminating downtime of the operating system, and a system information acquisition circulation command and system information acquisition circulation are executed in response to the upgrading preparation process and the upgrading execution process of the substrate management controller; detecting whether the acquired system information meets a preset condition or not in response to the completion of upgrading of the baseboard management controller; if both the operation systems are satisfied, no operation system is down; if the system information is not satisfied, the operating system downtime abnormality exists, and the operating system downtime abnormality and the corresponding system information are recorded. The invention realizes the detection of whether the operating system is down or not in the upgrading process of the baseboard management controller, and timely discovers the problem of down, eliminates the problem of down of the operating system by modifying the noninductive mode of the pin function configuration register of the baseboard management controller, and does not influence the normal operation and maintenance of the machine operating system.

Description

Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system
Technical Field
The invention relates to the field of detection of downtime of an operating system in the upgrading process of a substrate management controller, in particular to a method and a device for detecting downtime of the operating system, and a method and a device for eliminating downtime of the operating system.
Background
In the fields of research, development, test and application of a server, a BMC (Baseboard Management Controller ) is used as a monitoring and management system of the server, and BMC upgrading can be performed in various modes such as BMC WEB pages, restful and curl. When the BMC is updated or restarted, whether the machine is abnormal or not is judged by observing a mode that a machine status lamp checks BMC logs and OS logs.
However, the test detection mode is not comprehensive enough, and when the BMC is upgraded or restarted, the OS is not easily found by a tester in a short time, because the running state information of the server is checked to be normal through the WEB page of the BMC, the BMC does not generate an abnormal alarm log, and the tmp file under the var path is checked under the OS and the downtime log is not generated. Therefore, the inspection of the input and output functions under the OS is easy to be ignored in the BMC upgrading process, the phenomenon that the OS is down in a short time caused by BMC upgrading and BMC restarting operation is difficult to be found, the problem flow is further caused, analysis is performed when the problem flow is subsequently transferred to a production line or a client field machine, and further operation and maintenance are affected. In addition, when the problem of short-time downtime of the OS during the process of upgrading the BMC or restarting the BMC occurs in the production line or in the customer site, the problem is mostly solved by adopting an OS-aware method, for example, the BIOS import solution is usually upgraded, which further results in the influence on the operation and maintenance of the OS of the customer site machine.
Disclosure of Invention
In order to solve the problems, the invention provides the field of detection of downtime of an operating system, and in particular relates to a method and a device for detecting downtime of an operating system, and a method and a device for eliminating downtime of an operating system.
In a first aspect, the present invention provides a method for detecting downtime of an operating system, including the following steps:
executing a system information acquisition cycle command in response to the substrate management controller upgrade preparation process and the upgrade execution process, and circularly acquiring the system information;
detecting whether the acquired system information meets a preset condition or not in response to the completion of upgrading of the baseboard management controller;
if the acquired system information meets the preset conditions, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller;
if the system information does not meet the preset conditions, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
In an alternative embodiment, the system information is operating system time, and the method specifically includes the following steps:
executing a system information acquisition cycle command, and circularly acquiring operating system time;
detecting whether the difference value between the operation system time acquired at the present time and the operation system time acquired at the last time is smaller than a preset threshold value;
if all the differences are smaller than the preset threshold value, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller;
if the difference value is greater than or equal to the preset threshold value, the operation system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operation system downtime abnormality and the corresponding operation system time are recorded.
In an alternative embodiment, the method further comprises the steps of:
responding to an upgrade preparation process and an upgrade execution process of the baseboard management controller, and continuously performing data interaction with test equipment where the baseboard management controller is located;
it is detected whether the data interaction was successful.
In an alternative embodiment, the method specifically comprises the steps of:
if the acquired system information meets the preset conditions and the data interaction is successful, the baseboard management controller is not down in the upgrade preparation and upgrade processes;
If the system information does not meet the preset condition or the data interaction is unsuccessful, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
In a second aspect, the present invention provides an operating system downtime detection apparatus, including,
the system information acquisition module: executing a system information acquisition cycle command in response to the substrate management controller upgrade preparation process and the upgrade execution process, and circularly acquiring the system information;
the system information detection module: detecting whether the acquired system information meets a preset condition or not in response to the completion of upgrading of the baseboard management controller;
and the downtime detection module: if the acquired system information meets the preset conditions, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller; if the system information does not meet the preset conditions, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
In a third aspect, the present invention provides a method for eliminating downtime of an operating system, including the following steps:
Performing baseboard management controller upgrade preparation operations, including: logging in a baseboard management controller system to be upgraded, and modifying a pin function configuration register of the baseboard management controller to be upgraded to be a default value;
performing a baseboard management controller upgrade operation;
the inspection method of any one of the above is performed in response to a baseboard management controller upgrade preparation process and an upgrade execution process.
In an alternative embodiment, modifying the pin function configuration register of the baseboard management controller to be upgraded to a default value specifically includes:
executing an unlock pin function configuration register command to unlock the pin function configuration register;
executing a command for modifying the pin function configuration register, and modifying the pin function configuration register to a default value;
executing a pin function configuration register locking command to lock the pin function configuration register;
and executing the modification result checking command, and detecting whether the pin function configuration register is successfully modified according to the return value.
In an alternative embodiment, after the upgrade operation of the baseboard management controller is completed, the method further includes the steps of:
degrading the current baseboard management controller version to an old version; the old version refers to a version before version upgrading of the baseboard management controller;
Modifying a pin function configuration register under the baseboard management controller system to simulate the fault state of the operating system;
the elimination method is repeated;
and repeatedly executing the steps for a plurality of times, and if downtime does not occur in the upgrading preparation and upgrading processes of the substrate management controller, passing the reliability test of the eliminating method.
In an alternative embodiment, logging in the baseboard management controller system to be upgraded specifically includes:
connecting the baseboard management controller address through a remote connection tool;
logging in the baseboard management controller system using a user name and a password;
check if the baseboard management controller system is successfully logged in.
In a fourth aspect, the present invention provides an operating system downtime elimination apparatus, including,
upgrade preparation operation module: performing baseboard management controller upgrade preparation operations, including: logging in a baseboard management controller system to be upgraded, and modifying a pin function configuration register of the baseboard management controller to be upgraded to be a default value;
and (3) upgrading an operation module: performing a baseboard management controller upgrade operation;
and the detection triggering module is used for: the detection method of any one of the above is triggered to be executed in response to the baseboard management controller upgrade preparation process and the upgrade execution process.
Compared with the prior art, the method and the device for detecting the downtime of the operating system have the following beneficial effects: and acquiring system information in an upgrade preparation process and an upgrade execution process of the substrate management controller, judging whether an operating system is down according to the system information, and detecting whether the operating system is down in the upgrade process of the substrate management controller to timely find out the problem of down. Meanwhile, before the upgrade of the baseboard management controller is executed, the problem of downtime of the operating system is eliminated by modifying the noninductive mode of the pin function configuration register of the baseboard management controller, and the normal operation and maintenance of the operating system of the machine are not affected.
Drawings
For a clearer description of embodiments of the invention or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an upgrade architecture of a baseboard management controller.
Fig. 2 is a schematic flow chart of a first method for detecting downtime of an operating system according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a second flow chart of a method for detecting downtime of an operating system according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a third flow chart of a method for detecting downtime of an operating system according to an embodiment of the present invention.
Fig. 5 is a schematic block diagram of an operating system downtime detection apparatus according to an embodiment of the present invention.
Fig. 6 is a flowchart of a method for eliminating downtime of an operating system according to an embodiment of the present invention.
Fig. 7 is a schematic block diagram of an operating system downtime elimination apparatus according to an embodiment of the present invention.
Fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The following explains key terms appearing in the present invention.
BMC: baseboard Management Controller, a baseboard management controller.
OS: operating System.
SSH: and (5) remotely connecting the tool.
For ease of understanding, a baseboard management controller upgrade architecture to which the present invention is applicable will first be described. The method for detecting and eliminating the downtime of the operating system provided by the invention can be applied to the upgrading framework of the baseboard management controller shown in figure 1. The baseboard management controller upgrading architecture comprises a test terminal 800 and a plurality of test devices 900, wherein the test terminal 800 upgrades the baseboard management controller in each test device 900.
Fig. 2 is a schematic flow chart of a first method for detecting downtime of an operating system according to an embodiment of the present invention. The execution body of fig. 2 may be an operating system downtime detection apparatus, where the operating system downtime detection apparatus operates in a computer device. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
As shown in fig. 2, the method includes the following steps.
S101, in response to the upgrade preparation process and the upgrade execution process of the baseboard management controller, executing a system information acquisition loop command to acquire system information in a loop.
If the operating system is down, the system information is recorded, and in the embodiment, the system information acquisition cycle command is executed in the upgrade preparation process and the upgrade execution process of the baseboard management controller, the system information is periodically acquired, and whether the operating system is down is continuously monitored.
S102, in response to the completion of the upgrade of the baseboard management controller, detecting whether the acquired system information meets preset conditions.
After the upgrading of the baseboard management controller is finished, detecting all the acquired system information, and judging whether preset conditions are met or not so as to judge whether downtime of an operating system occurs or not.
And S103, if the acquired system information meets the preset conditions, no operation system downtime exists in the upgrading preparation and upgrading process of the baseboard management controller.
And S104, if the system information does not meet the preset conditions, recording the abnormal downtime of the operating system and the corresponding system information when the abnormal downtime of the operating system exists in the upgrading preparation and upgrading processes of the substrate management controller.
If all the acquired system information meets the preset conditions, no downtime fault exists in the operating system, if the system information which does not meet the preset conditions exists, the downtime abnormality of the operating system occurs, and at the moment, the downtime abnormality information and the corresponding system information are recorded and given to related staff, so that the staff can timely acquire the abnormality information of the downtime of the operating system.
According to the method for detecting the downtime of the operating system, system information is acquired in the upgrade preparation process and the upgrade execution process of the substrate management controller, whether the operating system is downtime is judged according to the system information, whether the operating system is downtime is detected in the upgrade process of the substrate management controller, and the downtime problem is found timely.
Fig. 3 is a schematic diagram of a second flow chart of a method for detecting downtime of an operating system according to an embodiment of the present invention. The execution body of fig. 3 may be an operating system downtime detection apparatus, where the operating system downtime detection apparatus operates in a computer device. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
As shown in fig. 3, the method includes the following steps.
S201, in response to the upgrade preparation process and the upgrade execution process of the baseboard management controller, a system information acquisition loop command is executed to loop the acquisition of the operating system time.
In this embodiment, the acquired system information is the operating system time, the operating system time is continuously printed, and whether the operating system is down is judged according to the operating system time.
In one embodiment, commands for i in {1..1000}, do date, sleep 1, done are executed for the duration of the printing operating system time.
S202, detecting whether the difference between the current acquired operating system time and the last acquired operating system time is smaller than a preset threshold value.
Judging whether the operation system is down or not through the difference value of the two operation system time, and setting a preset threshold value according to the requirement, for example, setting the preset threshold value to be 3s.
And S203, if all the differences are smaller than the preset threshold value, no operation system downtime exists in the upgrading preparation and upgrading processes of the baseboard management controller.
S204, if the difference value is greater than or equal to a preset threshold value, recording the operation system downtime abnormality and the corresponding operation system time when the operation system downtime abnormality exists in the upgrading preparation and upgrading processes of the substrate management controller.
For example, if the difference between any two adjacent operating systems is less than 3s, then there is no problem of downtime of the operating systems, and if the difference between the two adjacent operating systems exceeds 3s, then it is determined that there is an abnormality of downtime of the operating systems.
According to the method for detecting the downtime of the operating system, system information is acquired in the upgrade preparation process and the upgrade execution process of the substrate management controller, whether the operating system is downtime is judged according to the system information, whether the operating system is downtime is detected in the upgrade process of the substrate management controller, and the downtime problem is found timely.
Fig. 4 is a schematic diagram of a third flow chart of a method for detecting downtime of an operating system according to an embodiment of the present invention. The execution body of fig. 4 may be an operating system downtime detection apparatus, where the operating system downtime detection apparatus operates in a computer device. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
As shown in fig. 4, the method includes the following steps.
S301, in response to the upgrade preparation process and the upgrade execution process of the baseboard management controller, executing a system information acquisition cycle command, and circularly acquiring the operating system time and continuously performing data interaction with test equipment where the baseboard management controller is located.
In this embodiment, the acquired system information is the operating system time, the operating system time is continuously printed, and whether the operating system is down is judged according to the operating system time.
In one embodiment, commands for i in {1..1000}, do date, sleep 1, done are executed for the duration of the printing operating system time. The machine can be an upper computer through another machine, and can continuously ping the machine to be detected to detect whether normal ping can be achieved.
S302, detecting whether the difference value between the current acquired operating system time and the last acquired operating system time is smaller than a preset threshold value or not, and detecting whether data interaction is successful or not.
Judging whether the operation system is down or not through the difference value of the two operation system time, and setting a preset threshold value according to the requirement, for example, setting the preset threshold value to be 3s.
And S303, if all the differences are smaller than a preset threshold value and the data interaction is successful, the baseboard management controller is in upgrade preparation and no operation system downtime exists in the upgrade process.
And S304, if the difference value is greater than or equal to a preset threshold value or the data interaction is unsuccessful, recording the operation system downtime abnormality and the corresponding operation system time when the operation system downtime abnormality exists in the upgrading preparation and upgrading processes of the baseboard management controller.
In the embodiment, in the upgrade preparation and upgrade processes of the substrate management controller, whether the problem of downtime of the operating system exists or not is detected in two modes, and whether the operating system is downtime is detected in any mode is judged, so that the effectiveness of detection is improved.
According to the method for detecting the downtime of the operating system, the system information is acquired in the upgrade preparation process and the upgrade execution process of the substrate management controller, whether the downtime of the operating system exists is judged according to the system information, meanwhile, whether the downtime problem of the operating system exists is judged according to data interaction, whether the downtime of the operating system exists is detected in the upgrade process of the substrate management controller, the downtime problem is found timely, and the detection is carried out in two modes, so that the detection effectiveness is improved.
The embodiment of the method for detecting the downtime of the operating system is described in detail above, and the embodiment of the invention further provides a device for detecting the downtime of the operating system corresponding to the method based on the method for detecting the downtime of the operating system described in the embodiment.
Fig. 5 is a schematic block diagram of an operating system downtime detection apparatus according to an embodiment of the present invention, where in this embodiment, an operating system downtime detection system 500 may be divided into a plurality of functional modules according to functions performed by the operating system downtime detection system 500, as shown in fig. 5. The functional module may include: the system information acquisition module 510, the system information detection module 520, the downtime detection module 530, the data interaction module 540 and the data interaction detection module 550. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory.
System information acquisition module 510: executing a system information acquisition cycle command in response to the substrate management controller upgrade preparation process and the upgrade execution process, and circularly acquiring the system information;
system information detection module 520: and detecting whether the acquired system information meets a preset condition or not in response to the completion of upgrading of the baseboard management controller.
Downtime detection module 530: if the acquired system information meets the preset conditions, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller; if the system information does not meet the preset conditions, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
In an alternative embodiment, the system information is an operating system time, and the system information obtaining module 510 is specifically configured to perform a system information obtaining cycle command in response to the upgrade preparation process and the upgrade execution process of the baseboard management controller, and cycle to obtain the operating system time. The system information detection module 520 is specifically configured to detect whether the difference between the current acquired operating system time and the last acquired operating system time is smaller than a preset threshold. The downtime detection module 530 is specifically configured to, if all the differences are smaller than a preset threshold, prevent an operating system from being downtime in the upgrade preparation and upgrade processes of the baseboard management controller; if the difference value is greater than or equal to the preset threshold value, the operation system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operation system downtime abnormality and the corresponding operation system time are recorded.
In an alternative embodiment, the apparatus 500 further comprises a data interaction module 540 and a data interaction detection module 550. The data interaction module 540 is configured to continuously interact with the test equipment in which the baseboard management controller is located in response to the baseboard management controller upgrade preparation process and the upgrade execution process. The data interaction detection module 550 is configured to detect whether a data interaction is successful.
In an alternative embodiment, the downtime detection module 530 is specifically configured to, if the acquired system information all meets the preset condition and the data interaction is successful, make the baseboard management controller crash no operating system during the upgrade preparation and the upgrade process; if the system information does not meet the preset condition or the data interaction is unsuccessful, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
The device for detecting the downtime of the operating system in this embodiment is used to implement the foregoing method for detecting the downtime of the operating system, so that the specific implementation manner of the device can be seen from the foregoing example portion of the method for detecting the downtime of the operating system, and therefore, the specific implementation manner of the device can refer to the description of the examples of the corresponding portions, and will not be further described herein.
In addition, since the device for detecting downtime of an operating system in this embodiment is used to implement the method for detecting downtime of an operating system, the function of the device corresponds to that of the method, and thus, the description thereof is omitted.
Fig. 6 is a flow chart of a method for eliminating downtime of an operating system according to an embodiment of the present invention, where the execution body of fig. 6 may be an operating system downtime eliminating apparatus, and the operating system downtime eliminating apparatus is operated in a computer device. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
The BMC chip has a plurality of register information, and the SCU84 register mainly has the function of configuring BMC pins which have the function of inputting and outputting electric levels. The principle is that by writing data into a register, a BMC controller can perform different data switching operations according to corresponding data. The 21 st bit default value for checking the SCU84 register return value in the normal state of the machine is 0 (corresponding to GPIOL 5). When the downtime problem of the operating system occurs, the register value of the check SCU84 becomes 1, at this time, devmem 0x1e6e2084 is executed, the return value is obtained as 0x9ffff000, and the 21 st bit is converted into 2 nd bit to be checked as 1. The SCU84 register is changed into a default value, and BMC upgrading is performed at the moment, so that the problem of downtime of an operating system can be avoided.
As shown in fig. 6, the method includes the following steps.
S401, performing a baseboard management controller upgrade preparation operation, including: logging in the baseboard management controller system to be upgraded, and modifying the pin function configuration register of the baseboard management controller to be upgraded to be a default value.
The login of the substrate management controller system to be upgraded specifically comprises the following steps: connecting the baseboard management controller address through a remote connection tool (SSH); logging in the baseboard management controller system using a user name and a password; check if the baseboard management controller system is successfully logged in.
It should be noted that, the user can modify the pin function configuration register by modifying and collocating the sysadmin user authority through the IPMI command in advance.
The method comprises the following steps of modifying a pin function configuration register of a baseboard management controller to be upgraded to a default value.
Step one, executing an unlocking pin function configuration register command, and unlocking the pin function configuration register.
Unlock pin function configuration register command: devmem 0x1e6e2000 32 0x1688A8A8.
And step two, executing a command for modifying the pin function configuration register, and modifying the pin function configuration register into a default value.
Modifying pin function configuration register commands: devmem 0x1e6e2084 32 0x9fdff000, devmem 0x1e780074 32 0.
And step three, executing a pin function configuration register locking command to lock the pin function configuration register.
Lock pin function configuration register command: devmem 0x1e6e2000 32 0.
And step four, executing a modification result checking command, and detecting whether the pin function configuration register is successfully modified according to the return value.
Modifying the results view command: devmem 0x1e6e2084 and devmem 0x1e780074.
If the modification is successful, the return values should be 0x9fdff000 and 0, respectively.
Where 0x9fdff000 translates to a binary 0, indicating that no operating system downtime anomalies are present.
S402, performing the upgrade operation of the baseboard management controller.
And upgrading the baseboard management controllers in batches in a curl mode.
S403, executing the downtime detection method of the operating system in response to the upgrade preparation process and the upgrade execution process of the baseboard management controller.
The detection method of downtime of the operating system uses the detection method of the above embodiment, and is not described herein.
In the embodiment, before the upgrade of the baseboard management controller is executed, the problem of downtime of the operating system is eliminated by modifying the noninductive mode of the pin function configuration register of the baseboard management controller, and after the upgrade of the baseboard management controller is finished, whether the downtime of the operating system exists in the process is detected by the operating system downtime detection method, so that the effectiveness of the operating system downtime elimination method is ensured, and the problem of downtime of the operating system does not exist in the upgrading process of the baseboard management controller.
The test verification of the method for eliminating the downtime of the operating system is to ensure the feasibility of the method for eliminating the downtime of the operating system in order to improve the accuracy of the test verification, and the method further comprises the following steps after the upgrading operation of the baseboard management controller is finished.
S404, degrading the current baseboard management controller version to an old version; the old version refers to the version before the version of the baseboard management controller is updated.
S405, modifying a pin function configuration register under the baseboard management controller system to simulate an operating system fault state.
The modified pin function configuration register simulates the fault state of the operating system, and specifically comprises the following steps.
Step one, executing an unlocking pin function configuration register command, and unlocking the pin function configuration register.
Unlock pin function configuration register command: devmem 0x1e6e2000 32 0x1688A8A8.
And step two, executing a command for modifying the pin function configuration register, and modifying the pin function configuration register into an operating system fault simulation state.
Modifying pin function configuration register commands: devmem 0x1e6e2084 32 0x9ffff000, devmem 0x1e780074 32 0.
And step three, executing a pin function configuration register locking command to lock the pin function configuration register.
Lock pin function configuration register command: devmem 0x1e6e2000 32 0.
And step four, executing a modification result checking command, and detecting whether the pin function configuration register is successfully modified according to the return value.
Modifying the results view command: devmem 0x1e6e2084 and devmem 0x1e780074.
If the modification is successful, the return values should be 0x9ffff000 and 0, respectively.
Where 0x9ffff000 translates to a binary 1, indicating that there is an operating system downtime anomaly.
S406, re-executing steps S401-S403.
S407, repeatedly executing the steps S401-S406 for a plurality of times, and if downtime does not occur in the upgrading preparation and upgrading processes of the substrate management controller, passing the reliability test of the eliminating method.
The pin function configuration register is modified to simulate the fault state of the operating system, the pin function configuration register is modified to eliminate the fault of the operating system before the upgrade of the baseboard management controller is executed, then the upgrade is executed to check whether the problem of downtime of the operating system is no longer existed, and the operation is repeated for a plurality of times, so that the reliability of the method for eliminating the downtime of the operating system is ensured.
The embodiment of the method for eliminating the downtime of the operating system is described in detail above, and the embodiment of the invention also provides a device for eliminating the downtime of the operating system corresponding to the method based on the method for eliminating the downtime of the operating system described in the embodiment.
Fig. 7 is a schematic block diagram of an operating system downtime eliminating apparatus according to an embodiment of the present invention, in this embodiment, an operating system downtime eliminating system 700 may be divided into a plurality of functional modules according to functions performed by the operating system downtime eliminating system, as shown in fig. 7. The functional module may include: upgrade preparation operation module 710, upgrade operation module 720, detection trigger module 730, version downgrade module 740, and reliability test control module 750. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory.
Upgrade preparation operation module 710: performing baseboard management controller upgrade preparation operations, including: logging in the baseboard management controller system to be upgraded, and modifying the pin function configuration register of the baseboard management controller to be upgraded to be a default value.
Upgrade operation module 720: and performing an upgrade operation of the baseboard management controller.
The detection trigger module 730: and responding to the upgrade preparation process and the upgrade execution process of the baseboard management controller, and triggering the operation system downtime detection device to execute.
In an alternative embodiment, the upgrade preparation operation module 710 modifies the pin function configuration register of the baseboard management controller to be upgraded to a default value, and specifically includes: executing an unlock pin function configuration register command to unlock the pin function configuration register; executing a command for modifying the pin function configuration register, and modifying the pin function configuration register to a default value; executing a pin function configuration register locking command to lock the pin function configuration register; and executing the modification result checking command, and detecting whether the pin function configuration register is successfully modified according to the return value.
In an alternative embodiment, the apparatus 700 further comprises a version downgrade module 740 configured to downgrade the current baseboard management controller version to an old version; the old version refers to a version before version upgrading of the baseboard management controller; modifying pin function configuration registers under the baseboard management controller system simulates operating system fault conditions. The apparatus 700 further includes a reliability test control module 750 configured to trigger the upgrade preparation operation module 710 after the execution of the version downgrade module 740 is completed, and repeatedly execute the upgrade preparation operation module 710, the upgrade operation module 720, the detection trigger module 730, and the version downgrade module 740 for several times, and if no downtime occurs in the upgrade preparation and upgrade processes of the baseboard management controller, the method is eliminated, and the reliability test is passed.
In an alternative embodiment, the upgrade preparation operation module 710 logs in to the baseboard management controller system to be upgraded, and specifically includes: connecting the baseboard management controller address through a remote connection tool; logging in the baseboard management controller system using a user name and a password; check if the baseboard management controller system is successfully logged in.
The device for eliminating the downtime of the operating system in this embodiment is used to implement the foregoing method for eliminating the downtime of the operating system, so that the specific implementation manner of the device can be seen from the foregoing example portion of the method for eliminating the downtime of the operating system, and therefore, the specific implementation manner of the device can refer to the description of the examples of the corresponding various portions, and will not be further described herein.
In addition, since the device for eliminating downtime of an operating system in this embodiment is used to implement the method for eliminating downtime of an operating system, the function of the device corresponds to that of the method, and thus, the description thereof is omitted.
Fig. 8 is a schematic structural diagram of a test terminal 800 according to an embodiment of the present invention, including: processor 810, memory 820, and communication unit 830. The processor 810 is configured to implement the following steps when implementing the operating system downtime elimination program stored in the memory 820:
performing baseboard management controller upgrade preparation operations, including: logging in a baseboard management controller system to be upgraded, and modifying a pin function configuration register of the baseboard management controller to be upgraded to be a default value;
performing a baseboard management controller upgrade operation;
and responding to the upgrade preparation process and the upgrade execution process of the baseboard management controller, and executing the downtime detection method of the operating system.
The invention obtains the system information in the upgrade preparation process and the upgrade execution process of the substrate management controller, judges whether the operation system is down according to the system information, and detects whether the operation system is down in the upgrade process of the substrate management controller, so as to find out the downtime problem in time. Meanwhile, before the upgrade of the baseboard management controller is executed, the problem of downtime of the operating system is eliminated by modifying the noninductive mode of the pin function configuration register of the baseboard management controller, and the normal operation and maintenance of the operating system of the machine are not affected.
The test terminal 800 includes a processor 810, a memory 820, and a communication unit 830. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.
The memory 820 may be used to store instructions for execution by the processor 810, and the memory 820 may be implemented by any type of volatile or non-volatile memory test terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. The execution of the instructions in memory 820, when executed by processor 810, enables test terminal 800 to perform some or all of the steps in the method embodiments described below.
The processor 810 is a control center storing test terminals, connects various parts of the entire electronic test terminal using various interfaces and lines, and performs various functions of the electronic test terminal and/or processes data by running or executing software programs and/or modules stored in the memory 820, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (Integrated Circuit, simply referred to as an IC), for example, a single packaged IC, or may be comprised of a plurality of packaged ICs connected to the same function or different functions. For example, the processor 810 may include only a central processing unit (Central Processing Unit, simply CPU). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.
And a communication unit 830 for establishing a communication channel so that the test terminal can communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.
The invention also provides a computer storage medium, which can be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (random access memory, RAM) and the like.
The computer storage medium stores an operating system downtime elimination program, which when executed by the processor, realizes the following steps:
performing baseboard management controller upgrade preparation operations, including: logging in a baseboard management controller system to be upgraded, and modifying a pin function configuration register of the baseboard management controller to be upgraded to be a default value;
performing a baseboard management controller upgrade operation;
and responding to the upgrade preparation process and the upgrade execution process of the baseboard management controller, and executing the downtime detection method of the operating system.
The invention obtains the system information in the upgrade preparation process and the upgrade execution process of the substrate management controller, judges whether the operation system is down according to the system information, and detects whether the operation system is down in the upgrade process of the substrate management controller, so as to find out the downtime problem in time. Meanwhile, before the upgrade of the baseboard management controller is executed, the problem of downtime of the operating system is eliminated by modifying the noninductive mode of the pin function configuration register of the baseboard management controller, and the normal operation and maintenance of the operating system of the machine are not affected.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or what contributes to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer test terminal (which may be a personal computer, a server, or a second test terminal, a network test terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The foregoing disclosure is merely illustrative of the preferred embodiments of the invention and the invention is not limited thereto, since modifications and variations may be made by those skilled in the art without departing from the principles of the invention.

Claims (10)

1. The method for detecting the downtime of the operating system is characterized by comprising the following steps:
executing a system information acquisition cycle command in response to the substrate management controller upgrade preparation process and the upgrade execution process, and circularly acquiring the system information;
Detecting whether the acquired system information meets a preset condition or not in response to the completion of upgrading of the baseboard management controller;
if the acquired system information meets the preset conditions, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller;
if the system information does not meet the preset conditions, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
2. The method for detecting downtime of an operating system of claim 1, wherein the system information is operating system time, and the method specifically comprises the steps of:
executing a system information acquisition cycle command, and circularly acquiring operating system time;
detecting whether the difference value between the operation system time acquired at the present time and the operation system time acquired at the last time is smaller than a preset threshold value;
if all the differences are smaller than the preset threshold value, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller;
if the difference value is greater than or equal to the preset threshold value, the operation system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operation system downtime abnormality and the corresponding operation system time are recorded.
3. The operating system downtime detection method of claim 2, further comprising the steps of:
responding to an upgrade preparation process and an upgrade execution process of the baseboard management controller, and continuously performing data interaction with test equipment where the baseboard management controller is located;
it is detected whether the data interaction was successful.
4. The method for detecting downtime of an operating system of claim 3, wherein the method comprises the steps of:
if the acquired system information meets the preset conditions and the data interaction is successful, the baseboard management controller is not down in the upgrade preparation and upgrade processes;
if the system information does not meet the preset condition or the data interaction is unsuccessful, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
5. An operating system downtime detection device is characterized by comprising,
the system information acquisition module: executing a system information acquisition cycle command in response to the substrate management controller upgrade preparation process and the upgrade execution process, and circularly acquiring the system information;
The system information detection module: detecting whether the acquired system information meets a preset condition or not in response to the completion of upgrading of the baseboard management controller;
and the downtime detection module: if the acquired system information meets the preset conditions, no operation system downtime exists in the upgrading preparation and upgrading processes of the substrate management controller; if the system information does not meet the preset conditions, the operating system downtime abnormality exists in the upgrading preparation and upgrading process of the substrate management controller, and the operating system downtime abnormality and the corresponding system information are recorded.
6. The method for eliminating the downtime of the operating system is characterized by comprising the following steps:
performing baseboard management controller upgrade preparation operations, including: logging in a baseboard management controller system to be upgraded, and modifying a pin function configuration register of the baseboard management controller to be upgraded to be a default value;
performing a baseboard management controller upgrade operation;
the inspection method of any one of claims 1-4 is performed in response to a baseboard management controller upgrade preparation process and an upgrade execution process.
7. The method for eliminating downtime of an operating system of claim 6, wherein modifying a pin function configuration register of a baseboard management controller to be upgraded to a default value comprises:
Executing an unlock pin function configuration register command to unlock the pin function configuration register;
executing a command for modifying the pin function configuration register, and modifying the pin function configuration register to a default value;
executing a pin function configuration register locking command to lock the pin function configuration register;
and executing the modification result checking command, and detecting whether the pin function configuration register is successfully modified according to the return value.
8. The operating system downtime elimination method of claim 7, wherein after the upgrade operation of the baseboard management controller is completed, further comprising the steps of:
degrading the current baseboard management controller version to an old version; the old version refers to a version before version upgrading of the baseboard management controller;
modifying a pin function configuration register under the baseboard management controller system to simulate the fault state of the operating system;
re-executing the abatement method of claim 6;
and repeatedly executing the steps for a plurality of times, and if downtime does not occur in the upgrading preparation and upgrading processes of the substrate management controller, passing the reliability test of the eliminating method.
9. The method for eliminating downtime of an operating system of any one of claims 6-8, wherein logging in a baseboard management controller system to be upgraded specifically comprises:
Connecting the baseboard management controller address through a remote connection tool;
logging in the baseboard management controller system using a user name and a password;
check if the baseboard management controller system is successfully logged in.
10. An operating system downtime elimination device is characterized by comprising,
upgrade preparation operation module: performing baseboard management controller upgrade preparation operations, including: logging in a baseboard management controller system to be upgraded, and modifying a pin function configuration register of the baseboard management controller to be upgraded to be a default value;
and (3) upgrading an operation module: performing a baseboard management controller upgrade operation;
and the detection triggering module is used for: the detection method of any of claims 1-4 is triggered to be performed in response to a baseboard management controller upgrade preparation process and an upgrade execution process.
CN202310529448.1A 2023-05-11 2023-05-11 Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system Pending CN116680101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310529448.1A CN116680101A (en) 2023-05-11 2023-05-11 Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310529448.1A CN116680101A (en) 2023-05-11 2023-05-11 Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system

Publications (1)

Publication Number Publication Date
CN116680101A true CN116680101A (en) 2023-09-01

Family

ID=87781580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310529448.1A Pending CN116680101A (en) 2023-05-11 2023-05-11 Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system

Country Status (1)

Country Link
CN (1) CN116680101A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873771A (en) * 2024-03-11 2024-04-12 浪潮计算机科技有限公司 System downtime processing method, device, equipment, storage medium and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873771A (en) * 2024-03-11 2024-04-12 浪潮计算机科技有限公司 System downtime processing method, device, equipment, storage medium and server
CN117873771B (en) * 2024-03-11 2024-06-07 浪潮计算机科技有限公司 System downtime processing method, device, equipment, storage medium and server

Similar Documents

Publication Publication Date Title
US9569325B2 (en) Method and system for automated test and result comparison
CN111274077A (en) Disk array reliability testing method, system, terminal and storage medium
CN111327490A (en) Byzantine fault-tolerant detection method of block chain and related device
CN116680101A (en) Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system
CN112506758A (en) Firmware refreshing method and device, computer equipment and storage medium
CN114003445B (en) BMC I2C monitoring function test method, system, terminal and storage medium
CN111858201A (en) BMC (baseboard management controller) comprehensive test method, system, terminal and storage medium
US6457145B1 (en) Fault detection in digital system
CN113076210B (en) Server fault diagnosis result notification method, system, terminal and storage medium
CN106909382B (en) Method and device for outputting different types of system starting information
CN112559266A (en) Solid state disk testing method and device, readable storage medium and electronic equipment
CN111078476B (en) Network card drive firmware stability test method, system, terminal and storage medium
CN115934446A (en) Self-checking method, server, equipment and storage medium
CN112463504B (en) Double-control storage product testing method, system, terminal and storage medium
CN115757099A (en) Automatic test method and device for platform firmware protection recovery function
CN112231170B (en) Data interaction card supervision method, system, terminal and storage medium
CN114138574A (en) Controller testing method, device, server and storage medium
CN114510381A (en) Fault injection method, device, equipment and storage medium
CN112069009A (en) Method and device for pressure test in Recovery mode and terminal equipment
CN115629931A (en) HBA card stability testing method, device, terminal and storage medium
CN116382968B (en) Fault detection method and device for external equipment
CN116719712B (en) Processor serial port log output method and device, electronic equipment and storage medium
US20240159812A1 (en) Method for monitoring in a distributed system
CN116893928A (en) Supervision method, system, terminal and storage medium for fault memory
CN115408217A (en) Method, device, terminal and storage medium for testing PG signal abnormity of Riser card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination