CN113852502A

CN113852502A - Fault diagnosis method, device and equipment of intelligent network card and readable medium

Info

Publication number: CN113852502A
Application number: CN202111112126.4A
Authority: CN
Inventors: 孙崇雨; 高磊; 刘齐
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2021-12-28

Abstract

The invention discloses a fault diagnosis method of an intelligent network card, which comprises the following steps: in response to receiving a fault diagnosis instruction, a host end diagnoses a PCIE link and an NCSI channel and judges whether the PCIE link and the NCSI channel are abnormal or not; if the PCIE link and the NCSI channel are not abnormal, the intelligent network card end carries out self diagnosis; and if the abnormal information is identified through self-diagnosis, recording the abnormal information, comparing the abnormal information with a local fault library, and providing a fault removal suggestion. The invention also discloses a fault diagnosis device of the intelligent network card, computer equipment and a readable storage medium. The invention makes the intelligent network card carry out fault diagnosis in the working state, achieves the purpose of on-line diagnosis, does not need to be disassembled in power down, and improves the operation and maintenance efficiency; meanwhile, whether the fault type belongs to a hardware fault or a software fault can be distinguished, and the purpose of decoupling software and hardware is achieved.

Description

Fault diagnosis method, device and equipment of intelligent network card and readable medium

Technical Field

The invention relates to the technical field of fault diagnosis, in particular to a fault diagnosis method, a fault diagnosis device, fault diagnosis equipment and a readable medium for an intelligent network card.

Background

Under the environment of cloud computing, the intelligent network card can release network message processing work from a server host CPU, unload network processing to the intelligent network card, and fully release computing resources of the server host CPU. The intelligent network card has high integration level, all devices are tightly coupled, and the intelligent network card not only needs to run a client customized program, but also supports the upgrading function of software and firmware.

After the intelligent network card operates at the upper limit, if a fault occurs in the operation process, how to quickly locate the fault type under the condition of not disassembling the machine and not powering down is important to the operation and maintenance work. The intelligent network card belongs to a new object and needs to be matched with a server to operate, and a front-line operation and maintenance engineer does not completely establish a fault diagnosis rule aiming at the intelligent network card.

The traditional troubleshooting of the intelligent network card is usually to collect logs generated by the intelligent network card in an in-band or out-of-band mode, judge the fault generated by the intelligent network card by checking error information in the logs and give a solution suggestion. Some hardware faults are checked by powering off the server and disassembling the server, and disassembling the intelligent network card for fault analysis. The traditional method for analyzing the system logs cannot comprehensively, quickly and accurately position fault points; the method for analyzing the server by power-off and power-off disassembly can affect the operating efficiency of the data center; meanwhile, the intelligent network card is operated by being matched with a server, and the log information generated by the intelligent network card cannot cover the fault information of the whole server system.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method, an apparatus, a device, and a readable medium for diagnosing a fault of an intelligent network card, so that the intelligent network card performs fault diagnosis in a working state, and achieves an online diagnosis purpose, and the operation and maintenance efficiency is improved without power down and power down; meanwhile, whether the fault type belongs to a hardware fault or a software fault can be distinguished, and the purpose of decoupling software and hardware is achieved; finally, fault points can be accurately positioned, fault removal suggestions are given based on a local fault library, and requirements on operation and maintenance personnel are reduced.

Based on the above object, an aspect of the embodiments of the present invention provides a method for diagnosing a fault of an intelligent network card, including the following steps: in response to receiving a fault diagnosis instruction, a host end diagnoses a PCIE link and an NCSI channel and judges whether the PCIE link and the NCSI channel are abnormal or not; if the PCIE link and the NCSI channel are not abnormal, the intelligent network card end carries out self diagnosis; and if the abnormal information is identified through self-diagnosis, recording the abnormal information, comparing the abnormal information with a local fault library, and providing a fault removal suggestion.

In some embodiments, the self-diagnosing by the smart card side includes: receiving the diagnosis instruction by the intelligent network card end and carrying out rapid detection; if the abnormality is detected and identified quickly, detailed diagnosis is carried out, and specific abnormality information of error reporting points is recorded.

In some embodiments, the self-diagnosing by the smart card side includes: the method comprises the steps of health diagnosis and pressure test of the lower-hanging device, link connectivity diagnosis and firmware version check.

In some embodiments, diagnosing the PCIE link and the NCSI lane, and determining whether the PCIE link and the NCSI lane have an abnormality includes: diagnosing a PCIE link, and judging whether the PCIE link is abnormal or not; if the PCIE link is not abnormal, the NCSI channel is further diagnosed, and whether the NCSI channel is abnormal or not is judged.

In some embodiments, diagnosing the PCIE link includes: acquiring PCIE slot information, checking an intelligent network card ID, a link rate, a link bandwidth and a UE/CE error, and performing a DMA pressure test; further diagnosing the NCSI channel includes: and checking the network communication state, the NCSI universal serial bus and the NCSI universal asynchronous receiving and transmitting transmitter, and carrying out a pressurization test on an NCSI network channel.

In some embodiments, the method further comprises: if the PCIE link and/or the NCSI channel are abnormal, reporting abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

In some embodiments, the method further comprises: and if the abnormal information is not identified through self-diagnosis, generating a diagnosis report based on the self-diagnosis data, and uploading the diagnosis report to operation and maintenance personnel.

In another aspect of the embodiments of the present invention, a fault diagnosis apparatus for an intelligent network card is further provided, including: the first module is configured to respond to a received fault diagnosis instruction, diagnose a PCIE link and an NCSI channel by a host end, and judge whether the PCIE link and the NCSI channel are abnormal or not; the second module is configured to perform self-diagnosis by the intelligent network card end if the PCIE link and the NCSI channel are not abnormal; and a third module configured to record and compare the abnormal information with a local fault library and provide a fault removal suggestion if the abnormal information is identified through self-diagnosis.

In some embodiments, the second module is further configured to: receiving the diagnosis instruction by the intelligent network card end and carrying out rapid detection; if the abnormality is detected and identified quickly, detailed diagnosis is carried out, and specific abnormality information of error reporting points is recorded.

In some embodiments, the second module is further configured to: the method comprises the steps of health diagnosis and pressure test of the lower-hanging device, link connectivity diagnosis and firmware version check.

In some embodiments, the first module is further configured to: diagnosing a PCIE link, and judging whether the PCIE link is abnormal or not; if the PCIE link is not abnormal, the NCSI channel is further diagnosed, and whether the NCSI channel is abnormal or not is judged.

In some embodiments, the first module is further configured to: acquiring PCIE slot information, checking an intelligent network card ID, a link rate, a link bandwidth and a UE/CE error, and performing a DMA pressure test; and checking the network communication state, the NCSI universal serial bus and the NCSI universal asynchronous receiving and transmitting transmitter, and carrying out a pressurization test on an NCSI network channel.

In some embodiments, the second module is further configured to: if the PCIE link and/or the NCSI channel are abnormal, reporting abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

In some embodiments, the third module is further configured to: and if the abnormal information is not identified through self-diagnosis, generating a diagnosis report based on the self-diagnosis data, and uploading the diagnosis report to operation and maintenance personnel.

In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing steps of the method comprising: in response to receiving a fault diagnosis instruction, a host end diagnoses a PCIE link and an NCSI channel and judges whether the PCIE link and the NCSI channel are abnormal or not; if the PCIE link and the NCSI channel are not abnormal, the intelligent network card end carries out self diagnosis; and if the abnormal information is identified through self-diagnosis, recording the abnormal information, comparing the abnormal information with a local fault library, and providing a fault removal suggestion.

In some embodiments, the steps further comprise: if the PCIE link and/or the NCSI channel are abnormal, reporting abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

In some embodiments, the steps further comprise: and if the abnormal information is not identified through self-diagnosis, generating a diagnosis report based on the self-diagnosis data, and uploading the diagnosis report to operation and maintenance personnel.

In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.

The invention has at least the following beneficial technical effects: the intelligent network card is subjected to fault diagnosis in a working state, the purpose of online diagnosis is achieved, power failure disassembly is not needed, and operation and maintenance efficiency is improved; meanwhile, whether the fault type belongs to a hardware fault or a software fault can be distinguished, and the purpose of decoupling software and hardware is achieved; finally, fault points can be accurately positioned, fault removal suggestions are given based on a local fault library, and requirements on operation and maintenance personnel are reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic diagram of an embodiment of a fault diagnosis method for an intelligent network card provided by the present invention;

fig. 2 is a schematic diagram of an embodiment of a fault diagnosis device of an intelligent network card provided in the present invention;

FIG. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention;

FIG. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

If the intelligent network card has a fault in the operation process, how to quickly judge whether the equipment has a software fault or a hardware fault in the use field of the equipment is a problem to be solved urgently.

Based on the above purpose, the first aspect of the embodiments of the present invention provides an embodiment of a method for diagnosing a fault of an intelligent network card. Fig. 1 is a schematic diagram illustrating an embodiment of a fault diagnosis method for an intelligent network card provided by the present invention. As shown in fig. 1, the method for diagnosing a fault of an intelligent network card according to an embodiment of the present invention includes the following steps:

001. in response to receiving the fault diagnosis instruction, the host end diagnoses the PCIE link and the NCSI channel and judges whether the PCIE link and the NCSI channel are abnormal or not;

002. if the PCIE link and the NCSI channel are not abnormal, the intelligent network card end carries out self diagnosis; and

003. and if the abnormal information is identified through self-diagnosis, recording the abnormal information, comparing the abnormal information with a local fault library, and providing a fault removal suggestion.

In this embodiment, when the operation and maintenance personnel sense that the network card is abnormal, a diagnosis instruction is sent out at the host end; after the host computer end receives the diagnosis instruction, the functional components related to the intelligent network card are actively detected; firstly, a PCIe link diagnostic program and a NCSI channel diagnostic program of a network card are executed, and if the two items of detection are not wrong, a host end issues a diagnostic instruction to a down-hung intelligent network card; and when the intelligent network card receives a diagnosis instruction in the operation process, starting a self-diagnosis program, and comparing the recorded fault information with a local fault knowledge base to give a fault removal suggestion.

In this embodiment, taking the analysis of a failed intelligent network card running online as an example:

firstly, operation and maintenance personnel find that a certain server intelligent network card possibly has faults, such as low network message receiving and sending efficiency, abnormal voltage and current values of the intelligent network card, abnormal temperature value of the intelligent network card and the like;

secondly, operation and maintenance personnel issue a diagnosis instruction;

thirdly, after the host receives the diagnosis instruction, executing a network card diagnosis program, and checking whether the PCIe link and the NCSI channel have problems;

fourthly, if the two items of detection are not wrong, the host end issues a diagnosis instruction to the down-hung intelligent network card;

fifthly, if the two items of detection have errors, recording the errors, reporting the errors to operation and maintenance personnel, and determining whether to issue a diagnosis instruction to the hung intelligent network card through the host side by the operation and maintenance personnel;

sixthly, the intelligent network card starts a self-diagnosis program when receiving a diagnosis instruction in the operation process;

seventhly, the intelligent network card sequentially performs health diagnosis and pressure test on the SOC down-mounted device, performs connectivity diagnosis on a link, checks a firmware version, performs pressure test on the FPGA down-mounted device and performs FPGA program self-diagnosis;

eighthly, quickly detecting the diagnosis content, automatically switching into a detailed diagnosis program if the quick detection program identifies abnormal information, more specifically reporting error points, and carrying out error recording; after the fault information is recorded, comparing the fault information with a local fault knowledge base to give a fault elimination suggestion;

and step nine, if the content is detected to be correct, reporting a diagnosis report to operation and maintenance personnel, and analyzing the fault caused by the operation of the network card software by the operation and maintenance personnel.

In some embodiments of the present invention, the performing self-diagnosis by the intelligent network card terminal includes: receiving the diagnosis instruction by the intelligent network card end and carrying out rapid detection; if the abnormality is detected and identified quickly, detailed diagnosis is carried out, and specific abnormality information of error reporting points is recorded.

In this embodiment, the intelligent network card performs fast detection after receiving the diagnosis instruction; if the rapid detection program identifies abnormal information, the automatic switching-in detailed diagnosis program can report the error points and record the errors.

In some embodiments of the present invention, the performing self-diagnosis by the intelligent network card terminal includes: the method comprises the steps of health diagnosis and pressure test of the lower-hanging device, link connectivity diagnosis and firmware version check.

In this embodiment, when the intelligent network card receives a diagnosis instruction during the operation process, the intelligent network card starts a self-diagnosis program, and sequentially operates the following diagnosis contents:

the health diagnosis of the devices hung under the intelligent network card SOC comprises CPU information (core number, thread number, running frequency, operation test, register read-write test and PCIe information), DIMM inspection and disk bad track inspection;

the intelligent network card SOC hangs down the device pressure test, including CPU pressurization, memory pressurization, disk pressurization, network channel pressurization;

the link connectivity of the intelligent network card is diagnosed, and detection channels comprise but are not limited to USB, UART, I2C, SPI, LPC and GPIO;

checking the firmware version, wherein the version detection items comprise but are not limited to an intelligent network card operating system, a software development kit, test software, a BIOS, a BMC, an FPGA, a CPLD and other hardware versions;

the method comprises the steps that an intelligent network card FPGA hangs down a device pressure test, the test content comprises but is not limited to FPGA hangs down DDR pressure test, FPGA hangs down ROM pressure test, FPGA hangs down SPI FLASH pressure test and FPGA external network port flow test, in order to support the FPGA hangs down the device pressure test, the FPGA needs to expose an interface to PCIe user space when an IP core is instantiated, and the SOC initiates the test;

the self-diagnosis of the FPGA program of the intelligent network card is realized, and the self-diagnosis program is required to be designed by the FPGA in a development stage and is solidified into the FPGA program in order to support the self-diagnosis of the FPGA program of the intelligent network card.

In some embodiments of the present invention, diagnosing the PCIE link and the NCSI lane, and determining whether the PCIE link and the NCSI lane are abnormal includes: diagnosing a PCIE link, and judging whether the PCIE link is abnormal or not; if the PCIE link is not abnormal, the NCSI channel is further diagnosed, and whether the NCSI channel is abnormal or not is judged.

In this embodiment, a PCIe link diagnostic program of the network card is executed first, if the PCIe link of the network card is detected to be correct, NCSI channel diagnosis is executed, and if the two detections are correct, the host sends a diagnostic instruction to the intelligent network card that is hung down.

In some embodiments of the present invention, diagnosing the PCIE link includes: acquiring PCIE slot information, checking an intelligent network card ID, a link rate, a link bandwidth and a UE/CE error, and performing a DMA pressure test; further diagnosing the NCSI channel includes: and checking the network communication state, the NCSI universal serial bus and the NCSI universal asynchronous receiving and transmitting transmitter, and carrying out a pressurization test on an NCSI network channel.

In this embodiment, the network card PCIe Link diagnostic content includes PCIe slot information acquisition, network card device ID check, Link Speed check, Link Width check, UE/CE Error check, and PCIe DMA pressure test; the NCSI channel diagnosis content comprises network communication state detection, NCSI network channel pressurization test, NCSI-USB detection and NCSI-UART detection.

In some embodiments of the invention, the method further comprises: if the PCIE link and/or the NCSI channel are abnormal, reporting the abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

In this embodiment, a PCIe link diagnostic program of the network card is executed first, if the PCIe link of the network card is detected incorrectly, NCSI channel diagnosis is executed, if the two detections have errors, an error record is performed and reported to the operation and maintenance staff, and the operation and maintenance staff determines whether to issue a diagnostic instruction to the intelligent network card that is hung down through the host side.

In some embodiments of the invention, the method further comprises: if the abnormality information is not recognized by self-diagnosis, a diagnosis report is generated based on the self-diagnosis data, and the diagnosis report is uploaded to the operation and maintenance personnel.

In this embodiment, if the above contents are detected correctly, the diagnostic report is reported to the operation and maintenance personnel, and the operation and maintenance personnel analyzes the fault caused by the operation of the network card software.

It should be particularly noted that, steps in the foregoing embodiments of the fault diagnosis method for the intelligent network card may be mutually intersected, replaced, added, and deleted, so that the fault diagnosis method for the intelligent network card based on reasonable permutation, combination and transformation also belongs to the protection scope of the present invention, and the protection scope of the present invention should not be limited to the embodiments.

In view of the above, a second aspect of the embodiments of the present invention provides a fault diagnosis device for an intelligent network card. Fig. 2 is a schematic diagram illustrating an embodiment of a fault diagnosis apparatus for an intelligent network card provided by the present invention. As shown in fig. 2, the fault diagnosis apparatus for an intelligent network card according to the embodiment of the present invention includes the following modules: the first module 011 is configured to respond to the received fault diagnosis instruction, diagnose the PCIE link and the NCSI channel by the host end, and determine whether the PCIE link and the NCSI channel are abnormal; a second module 012, configured to perform self-diagnosis by the intelligent network card end if neither the PCIE link nor the NCSI channel is abnormal; and a third module 013 configured to, if the anomaly information is identified by self-diagnosis, record the anomaly information and compare it with the local fault library, and provide a troubleshooting recommendation.

In some embodiments of the invention, the second module 012 is further configured to: receiving the diagnosis instruction by the intelligent network card end and carrying out rapid detection; if the abnormality is detected and identified quickly, detailed diagnosis is carried out, and specific abnormality information of error reporting points is recorded.

In some embodiments of the invention, the second module 012 is further configured to: the method comprises the steps of health diagnosis and pressure test of the lower-hanging device, link connectivity diagnosis and firmware version check.

In some embodiments of the invention, the first module 011 is further configured for: diagnosing a PCIE link, and judging whether the PCIE link is abnormal or not; if the PCIE link is not abnormal, the NCSI channel is further diagnosed, and whether the NCSI channel is abnormal or not is judged.

In some embodiments of the invention, the first module 011 is further configured for: acquiring PCIE slot information, checking an intelligent network card ID, a link rate, a link bandwidth and a UE/CE error, and performing a DMA pressure test; and checking the network communication state, the NCSI universal serial bus and the NCSI universal asynchronous receiving and transmitting transmitter, and carrying out a pressurization test on an NCSI network channel.

In some embodiments of the invention, the second module 012 is further configured to: if the PCIE link and/or the NCSI channel are abnormal, reporting the abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

In some embodiments of the invention, the third module 013 is further configured to: if the abnormality information is not recognized by self-diagnosis, a diagnosis report is generated based on the self-diagnosis data, and the diagnosis report is uploaded to the operation and maintenance personnel.

In view of the above object, a third aspect of the embodiments of the present invention provides a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, the computer apparatus of the embodiment of the present invention includes the following means: at least one processor 021; and a memory 022, the memory 022 storing computer instructions 023 executable on the processor, the instructions when executed by the processor implementing steps of the method comprising: in response to receiving the fault diagnosis instruction, the host end diagnoses the PCIE link and the NCSI channel and judges whether the PCIE link and the NCSI channel are abnormal or not; if the PCIE link and the NCSI channel are not abnormal, the intelligent network card end carries out self diagnosis; and if the abnormal information is identified through self-diagnosis, recording the abnormal information, comparing the abnormal information with a local fault library, and providing a fault removal suggestion.

In some embodiments of the invention, the steps further comprise: if the PCIE link and/or the NCSI channel are abnormal, reporting the abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

In some embodiments of the invention, the steps further comprise: if the abnormality information is not recognized by self-diagnosis, a diagnosis report is generated based on the self-diagnosis data, and the diagnosis report is uploaded to the operation and maintenance personnel.

The invention also provides a computer readable storage medium. FIG. 4 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer readable storage medium 031 stores a computer program 032 which, when executed by a processor, performs the method as described above.

Finally, it should be noted that, as those skilled in the art can understand, all or part of the processes in the methods of the above embodiments may be implemented by a computer program to instruct related hardware, and the program of the method for diagnosing a failure of an intelligent network card may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (D0L), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, D0L, or wireless technologies such as infrared, radio, and microwave are all included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A fault diagnosis method of an intelligent network card is characterized by comprising the following steps:

in response to receiving a fault diagnosis instruction, a host end diagnoses a PCIE link and an NCSI channel and judges whether the PCIE link and the NCSI channel are abnormal or not;

if the PCIE link and the NCSI channel are not abnormal, the intelligent network card end carries out self diagnosis; and

and if the abnormal information is identified through self-diagnosis, recording the abnormal information, comparing the abnormal information with a local fault library, and providing a fault removal suggestion.

2. The method as claimed in claim 1, wherein the self-diagnosing performed by the intelligent network card end includes:

receiving the diagnosis instruction by the intelligent network card end and carrying out rapid detection;

if the abnormality is detected and identified quickly, detailed diagnosis is carried out, and specific abnormality information of error reporting points is recorded.

3. The method as claimed in claim 1, wherein the self-diagnosing performed by the intelligent network card end includes:

the method comprises the steps of health diagnosis and pressure test of the lower-hanging device, link connectivity diagnosis and firmware version check.

4. The method according to claim 1, wherein diagnosing the PCIE link and the NCSI channel and determining whether the PCIE link and the NCSI channel are abnormal comprises:

diagnosing a PCIE link, and judging whether the PCIE link is abnormal or not;

if the PCIE link is not abnormal, the NCSI channel is further diagnosed, and whether the NCSI channel is abnormal or not is judged.

5. The method according to claim 4, wherein the diagnosing the PCIE link includes: acquiring PCIE slot information, checking an intelligent network card ID, a link rate, a link bandwidth and a UE/CE error, and performing a DMA pressure test;

further diagnosing the NCSI channel includes: and checking the network communication state, the NCSI universal serial bus and the NCSI universal asynchronous receiving and transmitting transmitter, and carrying out a pressurization test on an NCSI network channel.

6. The method for diagnosing the failure of the intelligent network card according to claim 1, further comprising:

if the PCIE link and/or the NCSI channel are abnormal, reporting abnormal information to operation and maintenance personnel to determine whether the intelligent network card end needs to carry out self diagnosis.

7. The method for diagnosing the failure of the intelligent network card according to claim 1, further comprising:

and if the abnormal information is not identified through self-diagnosis, generating a diagnosis report based on the self-diagnosis data, and uploading the diagnosis report to operation and maintenance personnel.

8. A failure diagnosis device of an intelligent network card is characterized by comprising:

the first module is configured to respond to a received fault diagnosis instruction, diagnose a PCIE link and an NCSI channel by a host end, and judge whether the PCIE link and the NCSI channel are abnormal or not;

the second module is configured to perform self-diagnosis by the intelligent network card end if the PCIE link and the NCSI channel are not abnormal; and

and the third module is configured to record the abnormal information and compare the abnormal information with the local fault library if the abnormal information is identified through self diagnosis, and provide a fault removal suggestion.

9. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.