CN111488233A - Method and system for processing bandwidth loss problem of PCIe device - Google Patents

Method and system for processing bandwidth loss problem of PCIe device Download PDF

Info

Publication number
CN111488233A
CN111488233A CN202010254405.3A CN202010254405A CN111488233A CN 111488233 A CN111488233 A CN 111488233A CN 202010254405 A CN202010254405 A CN 202010254405A CN 111488233 A CN111488233 A CN 111488233A
Authority
CN
China
Prior art keywords
pcie
bandwidth
equipment
restarting
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010254405.3A
Other languages
Chinese (zh)
Inventor
孙一心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010254405.3A priority Critical patent/CN111488233A/en
Publication of CN111488233A publication Critical patent/CN111488233A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/076Error or fault detection not based on redundancy by exceeding limits by exceeding a count or rate limit, e.g. word- or bit count limit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The system comprises a negotiation result acquisition module, a judgment module, a post module, a restart time storage module, a stop module and a fault positioning module, wherein the system also comprises a BIOS, a PCH, an EEPROM, a BMC and a CP L D, the EEPROM is arranged in the server, the PCH is in communication connection with the BMC, the PCH is connected with the CP L D through a signal, and the fault processing efficiency and the stability of the server can be effectively improved through the system.

Description

Method and system for processing bandwidth loss problem of PCIe device
Technical Field
The present application relates to the technical field of server information transmission, and in particular, to a method and system for handling a bandwidth drop problem of a PCIe (peripheral component interconnect express, a high-speed serial computer expansion bus standard) device.
Background
The PCIe protocol is an important peripheral protocol of the server, and is generally applied to an X86 platform, an arm platform, a PowerPC platform, and the like to meet different functional requirements of the server. But the high rate of PCIe devices is prone to a common type of failure, namely: and (4) the bandwidth fault is dropped. A dropped bandwidth fault generally includes two cases: and (3) dropping lane, namely: lane from X16 to X8, or from X8 to X4, etc.; the drop rate, i.e.: the PCIe rate drops from Gen3 to Gen2, or from Gen3 to Gen1, and so on. Therefore, how to deal with the problem of bandwidth drop of the PCIe device, thereby ensuring the operation stability of the PCIe device and improving the operation stability of the server is an important problem.
At present, a method for handling the problem of bandwidth loss of PCIe devices generally predicts possible reasons according to experience by combining bandwidth loss information after bandwidth loss of PCIe devices occurs, and then verifies hardware one by one, and finally determines the reason of bandwidth loss and performs fault handling.
However, in the current method for handling the problem of bandwidth loss of the PCIe device, since the failure handler is started as long as the bandwidth loss of the PCIe device occurs, the hardware is subjected to troubleshooting, and the hardware troubleshooting frequency is high. Moreover, the hardware needs to be checked one by one for any bandwidth drop phenomenon, and the actual reason cannot be determined in a short time, so that the fault processing efficiency is low.
Disclosure of Invention
The application provides a method and a system for processing the bandwidth loss problem of PCIe equipment, which aim to solve the problem of lower processing efficiency of the bandwidth loss fault of the PCIe equipment in the prior art.
In order to solve the technical problem, the embodiment of the application discloses the following technical scheme:
a method of handling a PCIe device drop bandwidth problem, the method comprising:
s1: obtaining a PCIe port rate negotiation result;
s2: judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises: the PCIe interface, the PCIe equipment and the mapping relation between the PCIe interface and the PCIe equipment, wherein any PCIe equipment has specific PCIe bandwidth and PCIe speed;
s3: if the bandwidth of the current PCIe equipment is normal, executing a normal starting process;
s4: if the bandwidth of the current PCIe equipment is abnormal, restarting the power-on time sequence of the server, and recording the restart times;
s5: returning to the steps S2-S4, and counting the restart times;
s6: when the restarting times are larger than or equal to the set restarting times, judging that the bandwidth loss of the PCIe equipment is a hard fault and stopping restarting;
s7: and executing a normal starting process and recording the fault position of the PCIe equipment.
Optionally, the method further comprises:
when the restart times are smaller than the set restart times and the bandwidth of the current PCIe equipment is normal, resetting the restart times;
and when the restart times are less than the set restart times and the bandwidth of the current PCIe device is not normal, returning to step S4.
Optionally, the restarting the power-on sequence of the server, and recording the number of times of one restart, includes:
utilizing a Basic Input Output System (BIOS) to pull down a General-purpose Input/Output (GPIO) signal of a PCH (integrated south bridge of intel corporation), so as to generate a low-level GPIO signal;
sending the low-level GPIO signal to CP L D (Complex Programmable L organic Device), and recording the restart times into EEPROM (Electrically erasable Programmable Read-Only Memory) of the server;
and the CP L D terminates the current power-on process according to the low-level GPIO signal and powers on the server again.
Optionally, the executing the normal boot process and recording the fault location of the PCIe device includes:
the BIOS executes a normal boot process until entering an operating system; and the number of the first and second electrodes,
the position of a fault slot BDF (Bus/Device/Function, identifier of each Function in the Bus/Device/Function, PCIe Bus) of a failed PCIe Device is recorded in a BMC (Baseboard Management Controller).
Optionally, the set restart number is 3.
A system to handle PCIe device drop bandwidth issues, the system comprising:
a negotiation result obtaining module for obtaining a PCIe port rate negotiation result;
a judging module, configured to judge whether a bandwidth of a current PCIe device is normal according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, where the PCIe configuration information includes: the PCIe interface, the PCIe equipment and the mapping relation between the PCIe interface and the PCIe equipment, wherein any PCIe equipment has specific PCIe bandwidth and PCIe speed;
the post module is used for executing a normal starting process when the bandwidth of the current PCIe equipment is normal;
the restarting module is used for restarting the power-on time sequence of the server when the bandwidth of the current PCIe equipment is abnormal;
the restart times storage module is used for recording and counting the restart times;
the system comprises a stopping module, a judging module and a restarting module, wherein the stopping module is used for judging that the bandwidth of the PCIe equipment is lost as a hard fault and stopping restarting when the restarting times are more than or equal to the set restarting times;
the post module is also used for executing a normal starting process when judging that the bandwidth loss of the PCIe equipment is a hard fault and stopping restarting;
and the fault positioning module is used for recording the fault position of the PCIe equipment.
Optionally, the system further includes a reset module, configured to clear the restart times when the restart times are smaller than the set restart times and a bandwidth of the current PCIe device is normal.
Optionally, the set restart number is 3.
The system for processing the bandwidth drop problem of PCIe equipment comprises a BIOS, a PCH, an EEPROM, a BMC and a CP L D, wherein the EEPROM is arranged in a server, the PCH is in communication connection with the BMC, and the PCH is connected with the CP L D through GPIO signals;
the BIOS is used for obtaining a PCIe port rate negotiation result and judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises: the PCIe port, the PCIe device bandwidth, the PCIe device rate and the mapping relation between the PCIe port and the PCIe device bandwidth and between the PCIe port and the PCIe device rate are matched, and any PCIe port is matched with one PCIe device bandwidth and one PCIe device rate;
the BIOS is also used for executing a normal starting process when the bandwidth of the current PCIe equipment is normal, and starting the CP L D through the PCH when the bandwidth of the current PCIe equipment is abnormal;
the CP L D is used for restarting the power-on sequence of the server when the bandwidth of the current PCIe device is abnormal;
the EEPROM is used for recording and counting the restart times;
the BIOS is also used for judging that the bandwidth of the PCIe equipment is lost as a hard fault and stopping restarting when the restarting times are more than or equal to the set restarting times;
the BMC is used for recording fault positions of PCIe devices.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
the method comprises the steps of firstly obtaining a PCIe port speed negotiation result, then judging whether the bandwidth of the current PCIe device is normal according to the negotiation result and PCIe configuration information, if the current PCIe device is abnormally restarted, a power-on program is restarted and the number of restart times is recorded, repeatedly executing the judgment for multiple times and counting the number of restart times, judging that the bandwidth of the PCIe device is a hard fault and stopping restarting when the number of restart times is larger than or equal to the set number of restart times, continuously executing a normal startup process and recording the fault position of the PCIe device. In this embodiment, by restarting the power-on process, the problem of bandwidth loss of the PCIe device due to a non-hard failure can be avoided, and a stable link can be reestablished by restarting the power-on process, so that high-frequency hardware detection is avoided, and the efficiency of failure processing is improved. Also, the present embodiment is provided with a set number of restarts, i.e.: the target value is set for the restart times, the normal startup process is continuously executed when the restart times are not less than the set restart times, the bandwidth loss of the PCIe equipment caused by hard faults and non-hard faults can be effectively distinguished through the set restart times, the quick method for processing the bandwidth loss of the PCIe equipment can be fully utilized, a large number of repeated restarts can be avoided, and the fault processing efficiency can be improved.
The present application further provides a system for handling a bandwidth drop problem of a PCIe device, the system mainly includes: the device comprises a negotiation result acquisition module, a judgment module, a post module, a restart time storage module, a stop module and a fault positioning module. Through the setting of the restarting module, the problem that the bandwidth of the PCIe equipment falls caused by partial non-hardware faults can be solved by using the restarting mode, and the processing efficiency of the problem that the bandwidth of the PCIe equipment falls is improved. The judgment module and the restart time storage module are arranged, the set restart times are fully utilized, the restart times are limited, multiple invalid restarts can be avoided, bandwidth loss of PCIe equipment caused by hardware faults and non-hardware faults can be relatively accurately judged, and therefore different processing modes are adopted according to different reasons, and the fault processing efficiency is improved.
The application also provides another system for processing the bandwidth drop problem of the PCIe device, which comprises a BIOS, a PCH, an EEPROM, a BMC and a CP L D, wherein the EEPROM is arranged in the server, the PCH is in communication connection with the BMC, and the PCH is connected with the CP L D through a GPIO signal.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for handling a bandwidth drop problem of a PCIe device according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a system for handling a bandwidth drop problem of a PCIe device according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of another system for handling a bandwidth drop problem of a PCIe device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For a better understanding of the present application, embodiments of the present application are explained in detail below with reference to the accompanying drawings.
Example one
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for handling a bandwidth drop problem of a PCIe device according to an embodiment of the present application. As shown in fig. 1, the method for processing the bandwidth drop problem of the PCIe device in this embodiment mainly includes the following steps:
s1: and obtaining a PCIe port rate negotiation result.
In this embodiment, the PCIe port rate negotiation is PCIe port tracing. When the CPU is powered on every time, the BIOS controls the PCIe port of the CPU to perform rate negotiation with the PCIe device according to the specification of a PCIe protocol. And after the PCIe port rate negotiation process is finished, obtaining a PCIe port rate negotiation result.
The PCIe device in this embodiment mainly includes: raid card, SAS card, network card, GPU card, FPGA card, etc.
S2: and judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server.
And when the rate negotiation result of the PCIe port is consistent with the PCIe configuration information stored in the server, judging that the bandwidth of the current PCIe equipment is normal, otherwise, judging that the bandwidth of the current PCIe equipment is abnormal. The bandwidth in this embodiment normally includes: and the PCIe port rate negotiation result shows that the rate of the current PCIe equipment is consistent with the rate in the PCIe configuration information, and the PCIe port rate negotiation result shows that the bandwidth of the current PCIe equipment is consistent with the bandwidth in the PCIe configuration information. Bandwidth irregularities include: and/or the PCIe port rate negotiation result shows that the bandwidth of the current PCIe device is inconsistent with the bandwidth in the PCIe configuration information.
Be provided with EEPROM in the server platform for the storage FRU information, this FRU information includes: manufacturer, model, SN, manufacturer, PCIe configuration name, etc. The PCIe configuration of the server can be uniquely specified according to the configuration name, namely: according to the configuration name, the PCIe device with the bandwidth and the rate connected to a certain PCIe port can be determined. The PCIe configuration information in this embodiment includes: PCIe ports, PCIe devices and mapping relationships between PCIe ports and PCIe devices, any PCIe device having a specific PCIe bandwidth and PCIe rate. Here, the mapping relationship is that which bandwidth and rate PCIe devices are connected on a PCIe port. The PCIe bandwidth is also called PCIe lane, namely: PCIe X16, PCIe X8, PCIe X4, etc.
With continued reference to fig. 1, if the bandwidth of the current PCIe device is normal, step S3 is executed: a normal boot process is performed. The normal boot process in this embodiment is generally referred to as a post process.
If the bandwidth of the current PCIe device is not normal, execute step S4: and restarting the power-on time sequence of the server, and recording the restart times.
Specifically, step S4 includes:
s41: and pulling down the GPIO signal of the PCH by using the BIOS to generate a low-level GPIO signal.
And S42, sending the low-level GPIO signal to the CP L D, and simultaneously recording the restart times into the EEPROM of the server.
And S43, the CP L D terminates the current power-on process according to the low-level GPIO signal and powers on the server again.
From the above steps S41-S43, when the bandwidth of the PCIe device is not normal, the problem of dropping the bandwidth occurs, the GPIO signal of the PCH is pulled down under the control of the BIOS, and the GPIO signal is sent to the CP L D, and the restart times are recorded in the EEPROM for recording the restart times, and when the CP L D detects a low-level GPIO signal, the current power-on process is terminated, and power is re-supplied.
S5: returning to steps S2-S4, and counting the number of restarts.
The above steps S2-S4 are a loop, and if the bandwidth of the current PCIe device is normal, the subsequent program after power-on is continuously executed, and the loop is skipped. And if the bandwidth of the current PCIe device is abnormal, restarting, recording the restart times, entering a loop, executing the steps S2-S4 again, and counting the restart times according to the step S5.
When the restart count is greater than or equal to the set restart count, step S6 is executed: and judging the bandwidth loss of the PCIe equipment as a hard fault and stopping restarting.
The hard failure in the present embodiment refers to: a series of failures due to aging, failure or damage of hardware devices. The method mainly comprises the following steps: mechanical faults, hardware faults, software faults, and the like.
S7: and executing a normal starting process and recording the fault position of the PCIe equipment.
Specifically, step S7 includes:
s71: the BIOS executes a normal boot process until entering an operating system; at the same time, the user can select the desired position,
s72: and recording the position of the fault slot BDF of the PCIe equipment with the fault into the BMC.
And subsequent operation and maintenance personnel can perform troubleshooting according to the BDF position of the fault slot position.
As can be seen from the above steps S6 and S7, when the number of times of restart reaches the set number of times of restart, it may be determined that the current problem of bandwidth loss of the PCIe device is not a problem that can be solved by restart, and it is determined as a hard fault, at this time, the restart is stopped, a normal boot process is continuously performed, and at the same time, the fault location of the PCIe device is recorded. The step S6 can avoid executing the steps S2-S4 all the time, so that the next fault processing can be performed according to the determined fault reason, which is beneficial to improving the fault processing efficiency. By setting the restart times, the reason for the bandwidth drop can be defined, and whether the fault is a hard fault or a non-hard fault is determined, so that multiple invalid restarts are avoided, and the fault judgment efficiency is improved.
Further, the reset number value set in this embodiment is 3, and when the reset number is 3 or more than 3, it is determined that the bandwidth of the PCIe device is out of the hard failure and the reset is stopped. The PCIe device bandwidth is still not normal, typically after 3 reboots, i.e., a hard failure is determined and the reboot is stopped.
Accordingly, when the number of reboots is less than the set number of reboots, if the bandwidth of the current PCIe device is normal, step S8 is executed: the restart times are reset, so that the storage space is saved, the counting error in the process of subsequently processing the bandwidth falling problem of the PCIe equipment can be avoided, and the accuracy of fault processing is improved.
And when the restart times are less than the set restart times, if the bandwidth of the current PCIe device is abnormal, returning to the step S4, restarting the power-on sequence of the server, and recording the restart times.
Example two
Referring to fig. 2 based on the embodiment shown in fig. 1, fig. 2 is a schematic structural diagram of a system for handling a bandwidth drop problem of a PCIe device according to an embodiment of the present application. As can be seen from fig. 2, the system for handling the bandwidth drop problem of the PCIe device in this embodiment mainly includes: the device comprises a negotiation result acquisition module, a judgment module, a post module, a restart time storage module, a stop module and a fault positioning module.
The negotiation result obtaining module is used for obtaining a PCIe port rate negotiation result. The judging module is used for judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises: PCIe ports, PCIe devices and mapping relationships between PCIe ports and PCIe devices, any PCIe device having a specific PCIe bandwidth and PCIe rate. And the post module is used for executing a normal starting process when the bandwidth of the current PCIe equipment is normal. And the restarting module is used for restarting the power-on time sequence of the server when the bandwidth of the current PCIe equipment is abnormal. And the restart times storage module is used for recording and counting the restart times. And the stopping module is used for judging that the bandwidth loss of the PCIe equipment is a hard fault and stopping restarting when the restarting times are more than or equal to the set restarting times. And the post module is also used for executing a normal starting process when the PCIe equipment is judged to be in a hard fault due to bandwidth loss and is stopped restarting. And the fault positioning module is used for recording the fault position of the PCIe equipment. The number of restarts set in this embodiment is set to 3.
Further, the system further includes a reset module, configured to clear the restart times when the restart times are smaller than the set restart times and the bandwidth of the current PCIe device is normal. The setting of the reset module is beneficial to saving the space of the server. The system can also avoid the confusion of the restart times when the system is used for processing the problem of the bandwidth drop of the subsequent PCIe equipment, and is favorable for improving the accuracy of fault processing.
The working principle and working method of the system for handling the bandwidth-dropping problem of the PCIe device in this embodiment have been described in detail in the embodiment shown in fig. 1, and are not described herein again.
EXAMPLE III
Referring to fig. 3 based on the embodiments shown in fig. 1 and fig. 2, fig. 3 is a schematic structural diagram of another system for handling the problem of bandwidth loss of PCIe devices according to the embodiments of the present application, as can be seen from fig. 3, the system mainly includes a BIOS, a PCH, an EEPROM, a BMC, and a CP L D, wherein the EEPROM is disposed in a server, the PCH is communicatively connected to the BMC, and the PCH is connected to the CP L D through a GPIO signal.
The BIOS is used for obtaining a PCIe port rate negotiation result, and judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises a PCIe port, a PCIe equipment bandwidth, a PCIe equipment rate, a mapping relation between the PCIe port and the PCIe equipment bandwidth and between the PCIe port and the PCIe equipment rate, and any PCIe port is matched with one PCIe equipment bandwidth and one PCIe equipment rate.
Further, the BIOS in this embodiment mainly includes: the device comprises a negotiation result acquisition module, a judgment module, a post module and a stop module. The negotiation result obtaining module is used for obtaining a PCIe port rate negotiation result. The judging module is used for judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises: PCIe ports, PCIe devices and mapping relationships between PCIe ports and PCIe devices, any PCIe device having a specific PCIe bandwidth and PCIe rate. And the post module is used for executing a normal starting process when the bandwidth of the current PCIe equipment is normal. And the stopping module is used for judging that the bandwidth loss of the PCIe equipment is a hard fault and stopping restarting when the restarting times are more than or equal to the set restarting times.
The parts not described in detail in this embodiment can be referred to the embodiments shown in fig. 1-2, and the three embodiments can be referred to each other, and are not described again here.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A method of handling a PCIe device bandwidth drop problem, the method comprising:
s1: obtaining a PCIe port rate negotiation result;
s2: judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises: the PCIe interface, the PCIe equipment and the mapping relation between the PCIe interface and the PCIe equipment, wherein any PCIe equipment has specific PCIe bandwidth and PCIe speed;
s3: if the bandwidth of the current PCIe equipment is normal, executing a normal starting process;
s4: if the bandwidth of the current PCIe equipment is abnormal, restarting the power-on time sequence of the server, and recording the restart times;
s5: returning to the steps S2-S4, and counting the restart times;
s6: when the restarting times are larger than or equal to the set restarting times, judging that the bandwidth loss of the PCIe equipment is a hard fault and stopping restarting;
s7: and executing a normal starting process and recording the fault position of the PCIe equipment.
2. The method of handling a PCIe device dropped bandwidth problem according to claim 1, said method further comprising:
when the restart times are smaller than the set restart times and the bandwidth of the current PCIe equipment is normal, resetting the restart times;
and when the restart times are less than the set restart times and the bandwidth of the current PCIe device is not normal, returning to step S4.
3. The method of claim 1, wherein restarting the server power-on sequence and recording the number of restarts comprises:
pulling down the GPIO signal of the PCH by using the BIOS to generate a low-level GPIO signal;
sending the low-level GPIO signal to a CP L D, and simultaneously recording the restarting times into an EEPROM of a server;
and the CP L D terminates the current power-on process according to the low-level GPIO signal and powers on the server again.
4. The method of claim 1, wherein the performing a normal boot process and recording the location of the PCIe device failure comprises:
the BIOS executes a normal boot process until entering an operating system; and the number of the first and second electrodes,
and recording the position of the fault slot BDF of the PCIe equipment with the fault into the BMC.
5. The method for handling the problem of dropped bandwidth for PCIe devices according to any of claims 1-4, wherein the set number of reboots is 3.
6. A system for handling a PCIe device drop bandwidth problem, the system comprising:
a negotiation result obtaining module for obtaining a PCIe port rate negotiation result;
a judging module, configured to judge whether a bandwidth of a current PCIe device is normal according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, where the PCIe configuration information includes: the PCIe interface, the PCIe equipment and the mapping relation between the PCIe interface and the PCIe equipment, wherein any PCIe equipment has specific PCIe bandwidth and PCIe speed;
the post module is used for executing a normal starting process when the bandwidth of the current PCIe equipment is normal;
the restarting module is used for restarting the power-on time sequence of the server when the bandwidth of the current PCIe equipment is abnormal;
the restart times storage module is used for recording and counting the restart times;
the system comprises a stopping module, a judging module and a restarting module, wherein the stopping module is used for judging that the bandwidth of the PCIe equipment is lost as a hard fault and stopping restarting when the restarting times are more than or equal to the set restarting times;
the post module is also used for executing a normal starting process when judging that the bandwidth loss of the PCIe equipment is a hard fault and stopping restarting;
and the fault positioning module is used for recording the fault position of the PCIe equipment.
7. The system for handling the bandwidth drop problem of the PCIe device according to claim 6, further comprising a reset module, configured to clear the restart times when the restart times are less than the set restart times and the bandwidth of the current PCIe device is normal.
8. The system for handling the problem of dropped bandwidth for PCIe devices according to claim 6 or 7, wherein the set number of reboots is 3.
9. The system for processing the bandwidth drop problem of PCIe equipment is characterized by comprising a BIOS, a PCH, an EEPROM, a BMC and a CP L D, wherein the EEPROM is arranged in a server, the PCH is in communication connection with the BMC, and the PCH is connected with the CP L D through GPIO signals;
the BIOS is used for obtaining a PCIe port rate negotiation result and judging whether the bandwidth of the current PCIe equipment is normal or not according to the PCIe port rate negotiation result and PCIe configuration information stored in the server, wherein the PCIe configuration information comprises: the PCIe port, the PCIe device bandwidth, the PCIe device rate and the mapping relation between the PCIe port and the PCIe device bandwidth and between the PCIe port and the PCIe device rate are matched, and any PCIe port is matched with one PCIe device bandwidth and one PCIe device rate;
the BIOS is also used for executing a normal starting process when the bandwidth of the current PCIe equipment is normal, and starting the CP L D through the PCH when the bandwidth of the current PCIe equipment is abnormal;
the CP L D is used for restarting the power-on sequence of the server when the bandwidth of the current PCIe device is abnormal;
the EEPROM is used for recording and counting the restart times;
the BIOS is also used for judging that the bandwidth of the PCIe equipment is lost as a hard fault and stopping restarting when the restarting times are more than or equal to the set restarting times;
the BMC is used for recording fault positions of PCIe devices.
CN202010254405.3A 2020-04-02 2020-04-02 Method and system for processing bandwidth loss problem of PCIe device Withdrawn CN111488233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010254405.3A CN111488233A (en) 2020-04-02 2020-04-02 Method and system for processing bandwidth loss problem of PCIe device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010254405.3A CN111488233A (en) 2020-04-02 2020-04-02 Method and system for processing bandwidth loss problem of PCIe device

Publications (1)

Publication Number Publication Date
CN111488233A true CN111488233A (en) 2020-08-04

Family

ID=71794561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010254405.3A Withdrawn CN111488233A (en) 2020-04-02 2020-04-02 Method and system for processing bandwidth loss problem of PCIe device

Country Status (1)

Country Link
CN (1) CN111488233A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015597A (en) * 2020-10-26 2020-12-01 苏州浪潮智能科技有限公司 Fault isolation method, device, equipment and computer readable storage medium
CN113391631A (en) * 2021-05-11 2021-09-14 北京迈格威科技有限公司 Operation control method and device for mobile device, storage medium and mobile device
CN113448785A (en) * 2021-05-28 2021-09-28 山东英信计算机技术有限公司 Method, device and equipment for processing bandwidth state exception and readable medium
CN113590511A (en) * 2021-10-08 2021-11-02 苏州浪潮智能科技有限公司 Bandwidth deceleration repairing method and device and electronic equipment
CN113688087A (en) * 2021-10-25 2021-11-23 苏州浪潮智能科技有限公司 PCIE (peripheral component interface express) device enumeration method, system, storage medium and device
CN113703850A (en) * 2021-07-16 2021-11-26 苏州浪潮智能科技有限公司 BIOS program starting method, system and related components
CN114003535A (en) * 2021-10-14 2022-02-01 苏州浪潮智能科技有限公司 Equipment bandwidth configuration method and system, electronic equipment and storage medium
WO2022111048A1 (en) * 2020-11-30 2022-06-02 苏州浪潮智能科技有限公司 Power supply control method and apparatus, and server and non-volatile storage medium
CN115080490A (en) * 2022-06-17 2022-09-20 苏州浪潮智能科技有限公司 Self-adaptive optimization SPI communication method and system
CN115756941A (en) * 2023-01-09 2023-03-07 苏州浪潮智能科技有限公司 Automatic repair method and device for equipment, electronic equipment and storage medium
CN115909674A (en) * 2023-02-13 2023-04-04 成都秦川物联网科技股份有限公司 Alarm and gas meter linkage method based on intelligent gas and Internet of things system

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015597B (en) * 2020-10-26 2021-04-13 苏州浪潮智能科技有限公司 Fault isolation method, device, equipment and computer readable storage medium
CN112015597A (en) * 2020-10-26 2020-12-01 苏州浪潮智能科技有限公司 Fault isolation method, device, equipment and computer readable storage medium
WO2022111048A1 (en) * 2020-11-30 2022-06-02 苏州浪潮智能科技有限公司 Power supply control method and apparatus, and server and non-volatile storage medium
CN113391631A (en) * 2021-05-11 2021-09-14 北京迈格威科技有限公司 Operation control method and device for mobile device, storage medium and mobile device
CN113448785A (en) * 2021-05-28 2021-09-28 山东英信计算机技术有限公司 Method, device and equipment for processing bandwidth state exception and readable medium
CN113703850B (en) * 2021-07-16 2023-08-04 苏州浪潮智能科技有限公司 BIOS program starting method, system and related components
CN113703850A (en) * 2021-07-16 2021-11-26 苏州浪潮智能科技有限公司 BIOS program starting method, system and related components
CN113590511A (en) * 2021-10-08 2021-11-02 苏州浪潮智能科技有限公司 Bandwidth deceleration repairing method and device and electronic equipment
WO2023056744A1 (en) * 2021-10-08 2023-04-13 苏州浪潮智能科技有限公司 Reduced bandwidth repair method and apparatus, electronic device and storage medium
CN114003535A (en) * 2021-10-14 2022-02-01 苏州浪潮智能科技有限公司 Equipment bandwidth configuration method and system, electronic equipment and storage medium
CN114003535B (en) * 2021-10-14 2023-07-14 苏州浪潮智能科技有限公司 Device bandwidth configuration method and system, electronic device and storage medium
CN113688087A (en) * 2021-10-25 2021-11-23 苏州浪潮智能科技有限公司 PCIE (peripheral component interface express) device enumeration method, system, storage medium and device
CN115080490A (en) * 2022-06-17 2022-09-20 苏州浪潮智能科技有限公司 Self-adaptive optimization SPI communication method and system
CN115080490B (en) * 2022-06-17 2023-07-18 苏州浪潮智能科技有限公司 SPI communication method and system with self-adaptive tuning
CN115756941A (en) * 2023-01-09 2023-03-07 苏州浪潮智能科技有限公司 Automatic repair method and device for equipment, electronic equipment and storage medium
CN115909674A (en) * 2023-02-13 2023-04-04 成都秦川物联网科技股份有限公司 Alarm and gas meter linkage method based on intelligent gas and Internet of things system
US11989007B2 (en) 2023-02-13 2024-05-21 Chengdu Qinchuan Iot Technology Co., Ltd. Methods for linkage between alarm based on gas and gas meter and internet of things systems thereof

Similar Documents

Publication Publication Date Title
CN111488233A (en) Method and system for processing bandwidth loss problem of PCIe device
CN112948157B (en) Server fault positioning method, device and system and computer readable storage medium
CN108228374B (en) Equipment fault processing method, device and system
US20230333621A1 (en) Server firmware self-recovery system and server
US8954629B2 (en) Adapter and debugging method using the same
CN114116280B (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN110928719A (en) SSD low-power-consumption mode exception handling method and device, computer equipment and storage medium
CN113360347A (en) Server and control method thereof
CN111338698A (en) Method and system for accurately booting server by BIOS (basic input output System)
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
WO2024124862A1 (en) Server-based memory processing method and apparatus, processor and an electronic device
CN113220324B (en) CPLD remote updating method, system and medium
WO2021169476A1 (en) Server expansion system, and power control method for same
CN114510374A (en) Automatic recovery system and method for peripheral mounting failure
CN108388481B (en) Intelligent watchdog circuit system of OLT equipment
CN111949462A (en) Method and system for determining CPU over-frequency range
TWI715005B (en) Monitor method for demand of a bmc
CN113312214B (en) Method, apparatus, electronic device and storage medium for operating computer
TWI726434B (en) Control method for solving abnormal operation of me
CN112463446B (en) PCIe device recovery method and system, electronic device and storage medium
TWI734357B (en) Mainboard and assisting test method of thereof
CN112115000B (en) Remote resetting method and system of system component power supply and BMC remote device
CN117389819B (en) Hot plug error reporting method, processor architecture, equipment and storage medium
CN116893938A (en) Method, device, equipment and medium for testing pressure of PCIe slot of server
CN118245292A (en) Server restarting method and device, storage medium and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200804