CN116737430A - BMC control method and device, electronic equipment and storage medium - Google Patents

BMC control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116737430A
CN116737430A CN202211536120.4A CN202211536120A CN116737430A CN 116737430 A CN116737430 A CN 116737430A CN 202211536120 A CN202211536120 A CN 202211536120A CN 116737430 A CN116737430 A CN 116737430A
Authority
CN
China
Prior art keywords
target
bmc
reason
operation data
restarting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211536120.4A
Other languages
Chinese (zh)
Inventor
展晓洁
王宪臻
张昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nettrix Information Industry Beijing Co Ltd
Original Assignee
Nettrix Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nettrix Information Industry Beijing Co Ltd filed Critical Nettrix Information Industry Beijing Co Ltd
Priority to CN202211536120.4A priority Critical patent/CN116737430A/en
Publication of CN116737430A publication Critical patent/CN116737430A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a BMC control method, a device, electronic equipment and a storage medium, comprising the following steps: when detecting that the target BMC has a kernel error, storing at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason into a cache space; after the target BMC is detected to start restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in a cache space and class information corresponding to each restarting reason; the target operation data is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data; and controlling the kernel of the target BMC to start, and controlling the boot loader uboot in the target BMC to finish initialization. The technical scheme of the embodiment of the invention can reduce the occupation of BMC storage resources and reduce the development difficulty of a BMC control method.

Description

BMC control method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for BMC control, an electronic device, and a storage medium.
Background
The baseboard management controller (Baseboard Management Controller, BMC) is a server management control system based on a Linux system and is responsible for the core functions of hardware state management, operating system management, health state management, power consumption management and the like of a server.
After a kernel error kenrel panic occurs in an existing BMC, a kdump tool is usually required to use a kexec mechanism, a trapping kernel (i.e. a second kernel) is started according to reserved by a working kernel (i.e. a first kernel) in the BMC, then operation data before the kernel error occurs in the system is acquired through the trapping kernel, and the cause of the error occurs in the BMC is analyzed according to the operation data.
However, in the existing BMC control method, kdump needs to be installed in the embedded environment of the BMC in advance, so that the research and development difficulty of research and development personnel is high; secondly, in the existing method, a reserved memory with larger occupied memory is needed, so that the operability and the practicability of the system control method are poor.
Disclosure of Invention
The invention provides a BMC control method, a device, electronic equipment and a storage medium, which can reduce the occupation of target BMC storage resources and reduce the development difficulty of the BMC control method.
According to an aspect of the present invention, there is provided a BMC control method applied to a dump data tool kdump, the method including:
when detecting that the target BMC has a kernel error, storing at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason into a preset cache space;
after the target BMC is detected to start restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in the cache space and class information corresponding to each restarting reason;
the target operation data are operation data corresponding to the target BMC when kernel errors occur;
the target operation data are transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data;
and controlling the kernel of the target BMC to start, and controlling the boot loader uboot in the target BMC to finish initialization.
Optionally, according to each restart reason stored in the cache space and class information corresponding to each restart reason, obtaining target operation data from a target memory corresponding to a target BMC includes:
Judging whether a target restart reason exists in the cache space according to each restart reason stored in the cache space and class information corresponding to each restart reason;
if the target restarting reason exists in the cache space, judging whether class information corresponding to the target restarting reason is target class information or not;
and if the class information corresponding to the target restarting reason is the target class information, acquiring target operation data matched with the target restarting reason and the target class information from a target memory.
Optionally, when detecting that the target BMC has a kernel error, storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason into a preset cache space, where the method further includes:
setting a cache space according to a preset capacity in a standby memory corresponding to a target BMC;
storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason into a preset cache space, wherein the method comprises the following steps:
determining a memory address corresponding to the standby memory according to the configuration file corresponding to the target BMC;
and positioning the standby memory according to the memory address, and storing at least one restarting reason corresponding to the target BMC currently and class information corresponding to each restarting reason into a cache space in the standby memory.
Optionally, after controlling the kernel of the target BMC to start and controlling the boot loader uboot in the target BMC to finish initialization, the method further includes:
copying the target operation data in the nonvolatile memory into a preset virtual machine; the virtual machine is provided with a kernel debugging tool in advance;
and analyzing the target operation data according to an original kernel operation file corresponding to the target BMC by using a kernel debugging tool through the virtual machine to obtain an error reason corresponding to the target BMC.
Optionally, the transferring the target operation data to a nonvolatile memory to determine an error cause corresponding to the target BMC according to the target operation data includes:
generating a target abnormal log matched with the target operation data according to a preset log file format;
and the target exception log is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target exception log.
Optionally, before the target exception log is transferred to the nonvolatile memory, the method further includes:
counting the number of historical abnormal logs in the nonvolatile memory, and judging whether the number is larger than a preset threshold value or not;
If yes, determining a target historical abnormal log to be reserved according to the storage time corresponding to each historical abnormal log and the preset threshold;
and deleting the residual historical exception logs except the target historical exception log in the nonvolatile memory.
According to another aspect of the present invention, there is provided a BMC control apparatus for use in a dump data tool kdump, the apparatus comprising:
the information storage module is used for storing at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason into a preset cache space when the target BMC detects that a kernel error occurs;
the data acquisition module is used for acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in the cache space and class information corresponding to each restarting reason after the target BMC starts restarting;
the target operation data are operation data corresponding to the target BMC when kernel errors occur;
the data transfer module is used for transferring the target operation data to a nonvolatile memory so as to determine an error reason corresponding to the target BMC according to the target operation data;
The kernel starting module is used for controlling the kernel of the target BMC to start and controlling the boot loader uboot in the target BMC to finish initialization.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the BMC control method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for implementing a BMC control method according to any embodiment of the present invention when executed by a processor.
According to another aspect of the present invention, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a BMC control method according to any embodiment of the present invention.
According to the technical scheme provided by the embodiment of the invention, when the kernel error of the target BMC is detected, at least one restarting reason corresponding to the target BMC currently and class information corresponding to each restarting reason are stored into the cache space; after the target BMC is detected to start restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in a cache space and class information corresponding to each restarting reason; the target operation data is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data; the technical means of starting the kernel of the target BMC and controlling the uboot program in the target BMC to finish initialization can be used for reducing occupation of BMC storage resources, reducing development difficulty of a BMC control method, improving stability of the target BMC in an operation process, realizing automation of an error cause determining process, improving efficiency of the target BMC, improving operability and practicality of the BMC control method, guaranteeing accuracy of an error cause determining result, saving a crashed scene of the kernel of the target BMC, and bringing convenience to the error cause determining process.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a BMC control method according to an embodiment of the present invention;
FIG. 2 is a flowchart of another BMC control method provided according to an embodiment of the present invention;
FIG. 3 is a flowchart of another BMC control method provided according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a BMC control device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing a BMC control method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of a BMC control method according to an embodiment of the present invention, where the method may be implemented by a BMC control device, and the BMC control device may be implemented in hardware and/or software, and the BMC control device may be configured in a dump data tool kdump when a kernel error occurs in the BMC. As shown in fig. 1, the method includes:
Step 110, when it is detected that the target BMC has a kernel error, storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason into a preset cache space.
In this embodiment, kdump may be a tool for dumping memory running data when a target BMC experiences a system crash and a deadlock. When the kdump detects that the target BMC has kenrel panic (i.e. kernel error), the restart reason currently corresponding to the target BMC and class information corresponding to each restart reason can be stored into a preset cache space.
In a specific embodiment, optionally, the restart cause may be a system event that causes an error in the target BMC kernel, for example, a reset key corresponding to the target BMC is triggered, the target BMC is powered off, and a flash memory chip corresponding to the target BMC is switched. The class information may be a specific event type corresponding to the restart reason. The preset buffer space may be a dedicated storage space with smaller capacity, which is used for storing the restart reason and the class information.
In this step, optionally, when the core error is detected in the target BMC, the restart reason and the class information corresponding to the restart reason may be obtained according to the event currently executed by the target BMC.
And step 120, after detecting that the target BMC starts restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in the cache space and class information corresponding to each restarting reason.
In this embodiment, the target operation data is operation data corresponding to when the target BMC generates a kernel error. And after the kdump detects that the target BMC starts restarting, each restarting reason and class information corresponding to each restarting reason can be obtained from the cache space.
In a specific embodiment, after obtaining each restart reason and class information corresponding to each restart reason, each restart reason and corresponding class information may be used as a query condition, and then in a memory (i.e. a target memory) of the target BMC, operation data matched with each query condition is obtained as target operation data.
Alternatively, the target Memory may be a Synchronous Dynamic Random Access Memory (SDRAM).
And 130, the target operation data is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data.
In this embodiment, kdump may transfer the target operation data from the target memory to a preset nonvolatile memory, so that a developer determines, according to the target operation data in the nonvolatile memory, an error cause corresponding to the target BMC.
In a specific embodiment, optionally, the error cause may be a system code cause that causes the target BMC to generate a kernel error. After determining the cause of the error, the developer may improve the system service logic code of the target BMC.
In this step, optionally, the non-volatile memory may be an embedded multimedia card (Embedded Multi Media Card, EMMC).
And 140, controlling the kernel of the target BMC to start, and controlling a boot loader (Universal Boot Loader, uboot) in the target BMC to finish initialization.
In this embodiment, the target running data is transferred to the nonvolatile memory before the target BMC kernel is started and before the uboot program is initialized, so that the target running data in the target memory can be prevented from being covered by the new running data, the crash field of the target BMC kernel can be saved, and convenience is brought to debugging codes of research personnel.
In a specific embodiment, after detecting that the target running data is transferred, the restart reason and the class information in the cache space may be deleted, so that the cache space stores the restart reason and the class information corresponding to the next kernel error.
In a specific embodiment, when the core error of the target BMC is detected, the current time may also be stored in the cache space, so as to obtain the target running data from the target memory according to each restart reason stored in the cache space, class information corresponding to each restart reason, and the current time.
In this embodiment, since the storage space occupied by the restart cause and the class information is smaller, the occupation of the target BMC storage resource can be reduced by presetting the cache space for storing the restart cause and the class information, and the operability and practicality of the BMC control method are improved; secondly, in the embodiment, the target operation data can be obtained through restarting the reason and the class information, and the capture kernel in the kexec mechanism is not required to be used for obtaining the operation data, so that the process of installing kdump into the BMC embedded environment can be omitted, and the development difficulty of a BMC control method can be reduced.
According to the technical scheme provided by the embodiment of the invention, when the kernel error of the target BMC is detected, at least one restarting reason corresponding to the target BMC currently and class information corresponding to each restarting reason are stored into the cache space; after the target BMC is detected to start restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in a cache space and class information corresponding to each restarting reason; the target operation data is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data; the technical means of controlling the kernel of the target BMC to start and controlling the uboot program in the target BMC to finish initialization can reduce the occupation of storage resources of the target BMC and reduce the development difficulty of a BMC control method.
Fig. 2 is a flowchart of another BMC control method according to an embodiment of the present invention, as shown in fig. 2, where the method includes:
step 210, when it is detected that the target BMC has a kernel error, storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason into a preset cache space.
Step 220, after detecting that the target BMC starts restarting, determining whether a target restart reason exists in the cache space according to each restart reason stored in the cache space and class information corresponding to each restart reason.
In this embodiment, the target restart reason may be a preset important system event that triggers a target BMC kernel error. Specifically, before detecting that the target BMC has a kernel error, the restart reason with the highest importance level may be used as the target restart reason according to a plurality of restart reasons corresponding to the target BMC in the history operation process and the importance levels corresponding to the restart reasons. The importance level may be understood as the severity of the error corresponding to the restart cause. Specifically, the target restart cause may be that the target BMC generates a fatal error (sys_panic).
Step 230, if there is a target restart reason in the cache space, judging whether the class information corresponding to the target restart reason is the target class information.
In this embodiment, the target class information may be a preset important event type that triggers the target BMC kernel error. Specifically, the target class information may be a watchdog RESET (sys_wdt_reset).
In this step, if there is a target restart reason in the cache space, it is determined whether the class information corresponding to the target restart reason in the cache space is target class information.
Step 240, if the class information corresponding to the target restart reason is the target class information, acquiring target operation data matched with the target restart reason and the target class information from a target memory.
In this embodiment, if the target restart reason and the target class information exist in the cache space, the target restart reason and the target class information may be used as a query condition, and in the target memory, the operation data matched with the query condition may be obtained as target operation data.
In a specific embodiment, if the target restart reason or the target class information does not exist in the cache space, the method may execute the operation of controlling the kernel of the target BMC to start in step 260, and controlling the uboot program in the target BMC to complete the initialization operation.
The method has the advantages that the kernel errors of the target BMC can be processed in time by acquiring the operation data under the condition of higher error severity and transferring the operation data, so that the stability of the target BMC in the subsequent operation process is improved; second, for lower error severity cases, these errors have less impact on the running process of the target BMC and therefore may be disregarded.
Step 250, the target operation data is transferred to a nonvolatile memory.
Step 260, controlling the kernel of the target BMC to start, and controlling the uboot program in the target BMC to complete initialization.
Step 270, copying the target operation data in the nonvolatile memory into a preset virtual machine; and the kernel debugging tool is pre-installed in the virtual machine.
In this embodiment, the kernel debug tool may be a tool for analyzing target operational data. Specifically, the kernel debug tool may be Crash.
And 280, analyzing the target operation data according to an original kernel operation file corresponding to the target BMC by using a kernel debugging tool through the virtual machine, and obtaining an error reason corresponding to the target BMC.
In this embodiment, the original kernel running file may be a vmlinux file. Specifically, the virtual machine may analyze the target operation data according to the vmlinux file corresponding to the target BMC by using Crash, and obtain an error cause corresponding to the target BMC according to an analysis result.
The advantage of this arrangement is that automation of the error cause determination process can be achieved, improving efficiency of the target BMC; secondly, since the program debugging tool (GNU Symbolic Debugger, GDB) is embedded in Crash, the accuracy of the error cause determination result can be improved by analyzing the target operation data by using Crash.
According to the technical scheme provided by the embodiment of the invention, when the kernel error of the target BMC is detected, at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason are stored in the preset cache space, after the restarting of the target BMC is detected, whether the target restarting reason exists in the cache space is judged, if the target restarting reason exists in the cache space, whether the class information corresponding to the target restarting reason is the target class information is judged, if yes, target running data matched with the target restarting reason and the target class information is acquired from a target memory, the target running data is transferred into a nonvolatile memory, the kernel of the target BMC is controlled to be started, a uboot program in the target BMC is controlled to complete initialization, the target running data in the nonvolatile memory is copied into a virtual machine, and the target running data is analyzed by using a kernel debugging tool through the virtual machine according to an original kernel running file of the target BMC, so that the occupation of storage resources of the target BMC can be reduced, and the difficulty of developing a control method can be reduced.
Fig. 3 is a flowchart of another BMC control method according to an embodiment of the present invention, as shown in fig. 3, where the method includes:
Step 310, setting a cache space according to a preset capacity in a standby memory corresponding to the target BMC.
In this step, the cache space may be set according to a preset capacity in the reserved memory (i.e., the spare memory) corresponding to the target BMC. Alternatively, the capacity may be 1M, and specific values may be preset according to practical situations, which is not limited in this embodiment.
The setting has the advantages that the occupation of storage resources of the target BMC can be reduced by setting the cache space with smaller capacity, and the operability and practicability of the BMC control method are improved.
Step 320, when detecting that the target BMC has a kernel error, determining a memory address corresponding to the standby memory according to a configuration file corresponding to the target BMC.
In a specific embodiment, the memory address corresponding to the spare memory is pre-recorded in the configuration file.
Step 330, locating the standby memory according to the memory address, and storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason in a cache space in the standby memory.
The configuration file control method has the advantages that the memory address can be quickly obtained according to the configuration file corresponding to the target BMC, and the restarting reason and the class information are written into the cache space, so that the control efficiency of the BMC can be improved.
And step 340, after detecting that the target BMC starts restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in the cache space and class information corresponding to each restarting reason.
And 350, generating a target abnormal log matched with the target operation data according to a preset log file format.
In this embodiment, optionally, standard fields corresponding to different data types are preset in the log file format. After the target operation data is obtained, the target operation data can be converted according to standard fields in a log file format to obtain a target exception log.
The advantage of this arrangement is that by generating the target abnormality log that matches the target operation data, the subsequent analysis of the target abnormality log is facilitated, whereby the accuracy of the error cause determination result can be ensured.
And step 360, the target exception log is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target exception log.
In one implementation of this embodiment, before the target exception log is transferred to the nonvolatile memory, the method further includes: counting the number of historical abnormal logs in the nonvolatile memory, and judging whether the number is larger than a preset threshold value or not; if yes, determining a target historical abnormal log to be reserved according to the storage time corresponding to each historical abnormal log and the preset threshold; and deleting the residual historical exception logs except the target historical exception log in the nonvolatile memory.
In this embodiment, the history exception log is an exception log generated by the target BMC during the history running process. In order to avoid the problem of failure in the transfer of the target exception log caused by insufficient space of the nonvolatile memory, the embodiment provides an implementation manner of dynamically deleting the historical exception log before the target exception log is transferred.
In a specific embodiment, if the number of the history exception logs in the nonvolatile memory is greater than a preset threshold, the history exception log with the storage time closest to the current time may be reserved as the target history exception log according to the storage time corresponding to each history exception log, and the remaining history exception logs except the target history exception log may be deleted. The number of the target historical abnormal logs is equal to the preset threshold value.
The method has the advantages that the problem of failure in the transfer of the target abnormal log caused by insufficient space of the nonvolatile memory can be avoided, so that the crashed site of the target BMC kernel can be smoothly saved, and convenience is brought to the subsequent error cause determination process.
And 370, controlling the kernel of the target BMC to start, and controlling the boot loader uboot in the target BMC to finish initialization.
According to the technical scheme provided by the embodiment of the invention, the cache space is set in the standby memory corresponding to the target BMC according to the preset capacity, when the occurrence of the kernel error of the target BMC is detected, the memory address corresponding to the standby memory is determined according to the configuration file corresponding to the target BMC, the standby memory is positioned according to the memory address, at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason are stored in the cache space, after the restarting of the target BMC is detected, the target operation data is acquired from the target memory corresponding to the target BMC according to each restarting reason and class information corresponding to each restarting reason stored in the cache space, the target abnormal log is generated according to the preset log file format, and is stored in the nonvolatile memory, so that the error reason corresponding to the target BMC is determined according to the target abnormal log, the kernel of the target BMC is controlled to start, the technical means of initializing the boot loader uboot in the target BMC is controlled, the occupation of storage resources of the target BMC can be reduced, and the development difficulty of a control method of the target BMC is reduced.
Fig. 4 is a schematic structural diagram of a BMC control device according to an embodiment of the present invention, where the device is applied to a dump data tool kdump, as shown in fig. 4, and the device includes: an information storage module 410, a data acquisition module 420, a data transfer module 430, and a kernel initiation module 440.
The information storage module 410 is configured to store, when detecting that a kernel error occurs in the target BMC, at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason to a preset cache space;
the data obtaining module 420 is configured to obtain, after detecting that the target BMC starts restarting, target operation data from a target memory corresponding to the target BMC according to each restart reason stored in the cache space and class information corresponding to each restart reason;
the target operation data are operation data corresponding to the target BMC when kernel errors occur;
the data transfer module 430 is configured to transfer the target operation data to a nonvolatile memory, so as to determine an error cause corresponding to the target BMC according to the target operation data;
the kernel starting module 440 is configured to control the kernel of the target BMC to start, and control the boot loader uboot in the target BMC to complete initialization.
According to the technical scheme provided by the embodiment of the invention, when the kernel error of the target BMC is detected, at least one restarting reason corresponding to the target BMC currently and class information corresponding to each restarting reason are stored into the cache space; after the target BMC is detected to start restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in a cache space and class information corresponding to each restarting reason; the target operation data is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data; the technical means of controlling the kernel of the target BMC to start and controlling the uboot program in the target BMC to finish initialization can reduce the occupation of storage resources of the target BMC and reduce the development difficulty of a BMC control method.
On the basis of the above embodiment, the apparatus further includes:
and the cache space setting module is used for setting the cache space according to the preset capacity in the standby memory corresponding to the target BMC.
The information storage module 410 includes:
the address determining unit is used for determining a memory address corresponding to the standby memory according to the configuration file corresponding to the target BMC;
and the memory positioning unit is used for positioning the standby memory according to the memory address, and storing at least one restarting reason corresponding to the target BMC currently and class information corresponding to each restarting reason into a cache space in the standby memory.
The data acquisition module 420 includes:
the reason judging unit is used for judging whether the target restarting reason exists in the cache space according to each restarting reason stored in the cache space and class information corresponding to each restarting reason;
the class information judging unit is used for judging whether class information corresponding to the target restarting reason is target class information or not if the target restarting reason exists in the cache space;
and the operation data acquisition unit is used for acquiring target operation data matched with the target restarting reason and the target class information from a target memory if the class information corresponding to the target restarting reason is the target class information.
The data transfer module 430 includes:
the target log generating unit is used for generating a target abnormal log matched with the target operation data according to a preset log file format;
the log transfer unit is used for transferring the target abnormal log to a nonvolatile memory so as to determine an error reason corresponding to the target BMC according to the target abnormal log;
the quantity counting unit is used for counting the quantity of the history abnormal logs in the nonvolatile memory and judging whether the quantity is larger than a preset threshold value or not; if yes, determining a target historical abnormal log to be reserved according to the storage time corresponding to each historical abnormal log and the preset threshold;
and the log deleting unit is used for deleting the residual history exception logs except the target history exception log in the nonvolatile memory.
The kernel launch module 440 includes:
the data copying unit is used for copying the target operation data in the nonvolatile memory into a preset virtual machine; the virtual machine is provided with a kernel debugging tool in advance;
the data analysis unit is used for analyzing the target operation data according to the original kernel operation file corresponding to the target BMC by using a kernel debugging tool through the virtual machine, and obtaining an error reason corresponding to the target BMC.
The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the embodiments of the present invention can be found in the methods provided in all the foregoing embodiments of the present invention.
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the BMC control method.
In some embodiments, the BMC control method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the BMC control method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the BMC control method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A BMC control method, applied to a dump data tool kdump, the method comprising:
when detecting that the target BMC has a kernel error, storing at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason into a preset cache space;
after the target BMC is detected to start restarting, acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in the cache space and class information corresponding to each restarting reason;
The target operation data are operation data corresponding to the target BMC when kernel errors occur;
the target operation data are transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target operation data;
and controlling the kernel of the target BMC to start, and controlling the boot loader uboot in the target BMC to finish initialization.
2. The method of claim 1, wherein obtaining the target operation data from the target memory corresponding to the target BMC according to each restart reason stored in the cache space and the class information corresponding to each restart reason, comprises:
judging whether a target restart reason exists in the cache space according to each restart reason stored in the cache space and class information corresponding to each restart reason;
if the target restarting reason exists in the cache space, judging whether class information corresponding to the target restarting reason is target class information or not;
and if the class information corresponding to the target restarting reason is the target class information, acquiring target operation data matched with the target restarting reason and the target class information from a target memory.
3. The method of claim 1, wherein when detecting that the target BMC has a kernel error, storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason in a preset cache space, further comprises:
setting a cache space according to a preset capacity in a standby memory corresponding to a target BMC;
storing at least one restart reason currently corresponding to the target BMC and class information corresponding to each restart reason into a preset cache space, wherein the method comprises the following steps:
determining a memory address corresponding to the standby memory according to the configuration file corresponding to the target BMC;
and positioning the standby memory according to the memory address, and storing at least one restarting reason corresponding to the target BMC currently and class information corresponding to each restarting reason into a cache space in the standby memory.
4. The method of claim 1, further comprising, after controlling the kernel of the target BMC to start and controlling the boot loader uboot in the target BMC to complete initialization:
copying the target operation data in the nonvolatile memory into a preset virtual machine; the virtual machine is provided with a kernel debugging tool in advance;
And analyzing the target operation data according to an original kernel operation file corresponding to the target BMC by using a kernel debugging tool through the virtual machine to obtain an error reason corresponding to the target BMC.
5. The method of claim 1, wherein the step of transferring the target operation data to a nonvolatile memory to determine an error cause corresponding to the target BMC according to the target operation data comprises:
generating a target abnormal log matched with the target operation data according to a preset log file format;
and the target exception log is transferred to a nonvolatile memory, so that an error reason corresponding to the target BMC is determined according to the target exception log.
6. The method of claim 5, further comprising, prior to the transferring the target exception log into non-volatile memory:
counting the number of historical abnormal logs in the nonvolatile memory, and judging whether the number is larger than a preset threshold value or not;
if yes, determining a target historical abnormal log to be reserved according to the storage time corresponding to each historical abnormal log and the preset threshold;
and deleting the residual historical exception logs except the target historical exception log in the nonvolatile memory.
7. A BMC control device for use in a dump data tool kdump, the device comprising:
the information storage module is used for storing at least one restarting reason currently corresponding to the target BMC and class information corresponding to each restarting reason into a preset cache space when the target BMC detects that a kernel error occurs;
the data acquisition module is used for acquiring target operation data from a target memory corresponding to the target BMC according to each restarting reason stored in the cache space and class information corresponding to each restarting reason after the target BMC starts restarting;
the target operation data are operation data corresponding to the target BMC when kernel errors occur;
the data transfer module is used for transferring the target operation data to a nonvolatile memory so as to determine an error reason corresponding to the target BMC according to the target operation data;
the kernel starting module is used for controlling the kernel of the target BMC to start and controlling the boot loader uboot in the target BMC to finish initialization.
8. An electronic device, the electronic device comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the BMC control method of any of claims 1-6.
9. A computer readable storage medium storing computer instructions for causing a processor to implement the BMC control method of any of claims 1-6 when executed.
10. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the BMC control method according to any of claims 1-6.
CN202211536120.4A 2022-12-01 2022-12-01 BMC control method and device, electronic equipment and storage medium Pending CN116737430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211536120.4A CN116737430A (en) 2022-12-01 2022-12-01 BMC control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211536120.4A CN116737430A (en) 2022-12-01 2022-12-01 BMC control method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116737430A true CN116737430A (en) 2023-09-12

Family

ID=87901793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211536120.4A Pending CN116737430A (en) 2022-12-01 2022-12-01 BMC control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116737430A (en)

Similar Documents

Publication Publication Date Title
CN107122321B (en) Hardware repair method, hardware repair system, and computer-readable storage device
US9946553B2 (en) BMC firmware recovery
CN114168222B (en) Method and device for acquiring time consumption during starting, terminal equipment and storage medium
CN110413432B (en) Information processing method, electronic equipment and storage medium
CN110673936A (en) Breakpoint continuous operation method and device for arranging service, storage medium and electronic equipment
US10474517B2 (en) Techniques of storing operational states of processes at particular memory locations of an embedded-system device
CN111124728A (en) Automatic service recovery method, system, readable storage medium and server
CN116627702A (en) Method and device for restarting virtual machine in downtime
CN115454515A (en) System processing method and device and electronic equipment
CN116737430A (en) BMC control method and device, electronic equipment and storage medium
US10176142B2 (en) Techniques of accessing BMC terminals through serial port
CN112068980B (en) Method and device for sampling information before CPU suspension, equipment and storage medium
CN114860292A (en) Terminal equipment firmware upgrading control method and device, computer equipment and medium
TWI554876B (en) Method for processing node replacement and server system using the same
CN117389781B (en) Abnormality detection and recovery method and system for server equipment, server and medium
CN113867753B (en) Firmware updating method and system of server
CN115061842A (en) Data processing method, device, equipment and storage medium
CN117075977A (en) Method and device for starting processor, electronic equipment and storage medium
CN114816433A (en) Encoding method, system, device and medium in project based on asynchronous programming
CN117492799A (en) Software upgrading method and device, terminal equipment and storage medium
CN115562803A (en) Automatic recovery method, device, equipment and storage medium for mirror image file
CN115412514A (en) Restart information recording method, apparatus, system, device and medium
CN115390992A (en) Virtual machine creating method, device, equipment and storage medium
CN118034986A (en) Memory fault processing method and device, storage medium and electronic equipment
CN117453255A (en) Method, device, equipment and storage medium for upgrading embedded equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination