CN115220972A - Equipment fault detection method, device, equipment and computer readable storage medium - Google Patents

Equipment fault detection method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN115220972A
CN115220972A CN202210640768.XA CN202210640768A CN115220972A CN 115220972 A CN115220972 A CN 115220972A CN 202210640768 A CN202210640768 A CN 202210640768A CN 115220972 A CN115220972 A CN 115220972A
Authority
CN
China
Prior art keywords
data
query acceleration
data query
fault detection
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210640768.XA
Other languages
Chinese (zh)
Inventor
王子奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yusur Technology Co ltd
Original Assignee
Yusur Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yusur Technology Co ltd filed Critical Yusur Technology Co ltd
Priority to CN202210640768.XA priority Critical patent/CN115220972A/en
Publication of CN115220972A publication Critical patent/CN115220972A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0769Readable error formats, e.g. cross-platform generic formats, human understandable formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2268Logging of test results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present disclosure relates to a method, an apparatus, a device and a computer readable storage medium for detecting a device failure, the method comprising: responding to a fault detection instruction of a user, and issuing test data to the data query acceleration equipment; determining a fault detection result of the data query acceleration equipment based on state information of the data query acceleration equipment when the data query acceleration equipment processes the test data; and feeding back the fault detection result to a user. According to the method, the corresponding detection is automatically carried out on the data query acceleration equipment according to the fault detection instruction of the user, and the corresponding fault detection result is fed back to the user in real time, so that the user can quickly and accurately position the fault of the bus or the data query acceleration equipment according to the fault detection result, and the efficiency and the accuracy of the equipment fault detection method are improved.

Description

Equipment fault detection method, device, equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of big data acceleration, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for detecting a device failure.
Background
The KPU (kernel processing unit) data query acceleration device is connected with the host computer through a PCIe (peripheral component interconnect express) bus and is used for loading intensive operations in a relational database in the host computer into a board card in the KPU data query acceleration device for execution, so that the operation pressure of a CPU (central processing unit) in the host computer under a high data throughput scene is reduced, and the data processing process is accelerated.
When a module in PCIe bus or KPU data query acceleration equipment is damaged and cannot work normally, data transmission errors occur, so that a data query task cannot be performed normally, but a user cannot quickly and accurately determine the fault reason.
Disclosure of Invention
In order to solve the technical problem, the present disclosure provides a device fault detection method, apparatus, device and computer readable storage medium, so as to quickly and accurately troubleshoot faults of a PCIe bus or a data query acceleration device.
In a first aspect, an embodiment of the present disclosure provides an apparatus fault detection method, including:
responding to a fault detection instruction of a user, and issuing test data to the data query acceleration equipment;
determining a fault detection result of the data query acceleration equipment based on state information of the data query acceleration equipment when the data query acceleration equipment processes the test data;
and feeding back the fault detection result to a user.
In some embodiments, issuing test data to the data query acceleration device in response to a fault detection instruction of a user includes:
responding to a path connectivity detection instruction of a user, and issuing first test data to the data query acceleration equipment;
determining a fault detection result of the data query acceleration device based on the state information of the data query acceleration device when processing the test data, including:
monitoring a status register of the data query acceleration equipment, and if a first flag bit in the status register indicates that the data query acceleration equipment does not receive the first test data, determining that a path between the data query acceleration equipment and host equipment is not connected.
In some embodiments, issuing test data to the data query acceleration device in response to a fault detection instruction of a user includes:
responding to a path integrity detection instruction of a user, and calculating a first check code of data to be calculated;
issuing the data to be calculated to the data query acceleration equipment, wherein the data query acceleration equipment is used for calculating a second check code of the data to be calculated;
determining a fault detection result of the data query acceleration device based on the state information of the data query acceleration device when processing the test data, wherein the fault detection result comprises:
and acquiring a second check code in the data query acceleration equipment, and if the first check code is different from the second check code, determining that a path between the data query acceleration equipment and the host equipment is incomplete.
In some embodiments, issuing test data to the data query acceleration device in response to a fault detection instruction of a user includes:
responding to a data register state detection instruction and/or an instruction register state detection instruction of a user, and issuing second test data to the data query acceleration equipment, wherein the data query acceleration equipment is used for storing the second test data into a data register and/or an instruction register;
determining a fault detection result of the data query acceleration device based on the state information of the data query acceleration device when processing the test data, including:
reading the data currently stored in the data register and/or the instruction register, and comparing the second test data with the data currently stored in the data register and/or the instruction register;
and if the second test data is different from the data currently stored in the data register and/or the instruction register, determining that the data register and/or the instruction register is abnormal.
In some embodiments, the method further comprises:
and resetting the data query acceleration device in response to a reset instruction of a user.
In some embodiments, resetting the data query acceleration device comprises:
issuing a reset code to the data query acceleration equipment, wherein the data query acceleration equipment is used for storing the reset code into a preset register;
and monitoring a status register of the data query acceleration equipment, and if a second flag bit in the status register indicates that the data query acceleration equipment is successfully reset, finishing the reset operation of the data query acceleration equipment.
In a second aspect, an embodiment of the present disclosure provides an apparatus for detecting a device failure, including:
the issuing module is used for responding to a fault detection instruction of a user and issuing test data to the data query acceleration equipment;
the determining module is used for determining a fault detection result of the data query acceleration equipment based on state information of the data query acceleration equipment when the test data is processed;
and the feedback module is used for feeding back the fault detection result to a user.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.
In a fifth aspect, the disclosed embodiments also provide a computer program product comprising a computer program or instructions, which when executed by a processor, implement the device failure detection method as described above.
According to the equipment fault detection method, the device, the equipment and the computer readable storage medium, the corresponding detection is automatically carried out on the data query acceleration equipment according to the fault detection instruction of the user, and the corresponding fault detection result is fed back to the user in real time, so that the user can quickly and accurately position the fault of the bus or the data query acceleration equipment according to the fault detection result, and the efficiency and the accuracy of the equipment fault detection method are improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of an apparatus fault detection method provided in an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an application scenario provided by the embodiment of the present disclosure;
fig. 3 is a schematic view of a visualization interface of a fault detection device provided in an embodiment of the present disclosure;
fig. 4 is a flowchart of an apparatus failure detection method according to another embodiment of the present disclosure;
fig. 5 is a flowchart of an apparatus failure detection method according to another embodiment of the present disclosure;
fig. 6 is a flowchart of an apparatus failure detection method according to another embodiment of the present disclosure;
fig. 7 is a flowchart of an apparatus fault detection method according to another embodiment of the disclosure;
fig. 8 is a schematic structural diagram of an apparatus fault detection device provided in the embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments of the present disclosure may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The KPU data query acceleration system is a data query acceleration scheme based on a heterogeneous computing system. Heterogeneous computer systems refer to systems of computing units using different types of computer instructions and architectures. The KPU data query acceleration system mainly comprises three components, namely a host device, KPU data query acceleration equipment and a data transmission line connecting the host device and the KPU data query acceleration equipment. The KPU data query acceleration device may be a hardware device based on a Field Programmable Gate Array (FPGA) Integrated Circuit, or may be a hardware device based on an Application Specific Integrated Circuit (ASIC). When a module in the KPU data query acceleration equipment is damaged, if the KPU data query acceleration equipment is hardware equipment based on the FPGA, a related signal oscillogram can only be checked through third-party software, and a user needs to perform secondary analysis on the basis of the signal oscillogram, so that the difficulty of analyzing hardware problems by the user is high; in the case of ASIC-based hardware devices, since ASICs are integrated circuits designed and manufactured according to specific user requirements and specific electronic system requirements, no software is currently available on the market for problem analysis. The connection between the host device and the KPU data query acceleration device is generally implemented based on a PCIe high-speed serial computer expansion bus, and in the data transmission process, data transmission errors may be caused by external reasons such as temperature, humidity, and electromagnetic influence. Therefore, when a KPU data query acceleration device or a data transmission link fails, a user cannot quickly and accurately locate the failure.
In addition, when the KPU data query acceleration equipment has problems, no software can reset the KPU data query acceleration equipment at present, and the common methods are to restart the computer, reinstall the driver and the like, so that the time consumption is long, the process is complicated, and the loss of hardware is high.
In view of the foregoing problems, embodiments of the present disclosure provide a method for detecting an equipment fault, which is described below with reference to specific embodiments.
Fig. 1 is a flowchart of an apparatus fault detection method provided in the embodiment of the present disclosure. The method can be applied to the application scenario shown in fig. 2, which includes the host device 21, the data query acceleration device 22, and the fault detection device 24. The fault detection device 24 is configured with a visualization module for human-computer interaction by a user; the fault detection device 24 is further configured with an instruction issuing module and an information collecting module, which are respectively used for issuing a fault detection instruction, collecting relevant information and a fault detection result. The host device 21, the data query acceleration device 22, and the failure detection device 24 are all connected by a bus 23, where the bus 23 may be a PCIe high speed serial computer expansion bus. It can be understood that the device fault detection method provided by the embodiment of the present disclosure may also be applied in other scenarios.
Fig. 3 is a schematic view of a visualization interface of a fault detection device according to an embodiment of the present disclosure. It is to be understood that this interface diagram is merely an example of an embodiment of the present disclosure and is not intended as a limitation of the present disclosure.
The following describes the method for detecting the device failure shown in fig. 1 with reference to the application scenario shown in fig. 2 and the interface schematic diagram shown in fig. 3, and the method includes the following specific steps:
s101, responding to a fault detection instruction of a user, and sending test data to the data query acceleration equipment.
The user configures the fault detection instructions through a visual interface of the fault detection device as shown in fig. 3. As shown in fig. 3, after setting the number of times of fault detection and checking the detection items on the interface, the user may control the fault detection device to start detecting. And after the user operates the control which starts to be detected, the fault detection equipment responds to a fault detection instruction of the user and sends corresponding test data to the data query acceleration equipment through the bus. For example, when the user sets the number of detection times to 2, the user selects the option for detecting the connectivity of the path and the option for detecting the integrity of the path, and clicks the button for starting detection, the fault detection device issues the test data for detecting the connectivity of the path and the test data for detecting the integrity of the path to the data query acceleration device through the bus, and issues the test data for detecting the connectivity of the path and the test data for detecting the integrity of the path again after the first detection is completed.
S102, determining a fault detection result of the data query acceleration equipment based on the state information of the data query acceleration equipment when the test data is processed.
The data query acceleration device may respond differently to test data of different test items, including but not limited to: and changing the state of the corresponding zone bit in the state register, calculating the test data to obtain a corresponding calculation result and the like. The fault detection device determines the fault detection results of different data query acceleration devices according to different state information when the data query acceleration devices process the test data.
And S103, feeding back the fault detection result to a user.
After determining the fault detection result of the data query acceleration device, the fault detection device displays the corresponding fault detection result on the interface shown in fig. 3, for example, by identifying different symbols after detecting the item option to distinguish different fault detection results, and displays detailed information corresponding to the fault detection result in the detection log display area.
The embodiment of the disclosure issues test data to the data query acceleration device by responding to the fault detection instruction of the user; determining a fault detection result of the data query acceleration equipment based on state information of the data query acceleration equipment when the data query acceleration equipment processes the test data; and feeding the fault detection result back to a user, automatically carrying out corresponding detection on the data query acceleration equipment according to a fault detection instruction of the user, and feeding the corresponding fault detection result back to the user in real time, so that the user can quickly and accurately position the fault of the bus or the data query acceleration equipment according to the fault detection result, and the efficiency and the accuracy of the equipment fault detection method are improved.
Fig. 4 is a flowchart of an apparatus fault detection method according to another embodiment of the present disclosure, as shown in fig. 3, the method includes the following steps:
s401, responding to a path connectivity detection instruction of a user, and issuing first test data to the data query acceleration equipment.
When a user needs to detect the path connectivity between the data query acceleration device and the host device, a path connectivity detection instruction is initiated for the fault detection device, which may be, for example, checking a path connectivity detection option in a visual interface of the fault detection device, and determining to start detection. And the fault detection equipment responds to the path connectivity detection instruction of the user and issues first test data to the acceleration equipment. The first test data may be any piece of data that is configured in the memory of the fault detection device in advance by a device developer, or a piece of data that is generated by the fault detection device according to a certain rule or randomly, which is not limited in this disclosure. Or, before issuing the first test data, the fault detection device may also issue a detection start signal to the data query acceleration device, where the detection start signal is used to start receiving the first test data after the data query acceleration device receives the detection start signal.
S402, judging whether the data query acceleration equipment receives the first test data. If yes, executing S403; if not, go to S404.
Specifically, whether the data query acceleration device receives the first test data is judged by monitoring a status register of the data query acceleration device.
The data query acceleration equipment is provided with a status register, wherein each flag bit of the status register respectively represents the status of different functions of the data query acceleration equipment. The first flag bit in the status register indicates whether the data query accelerator device currently receives data. For example, when the first flag bit is 1, the data is received by the data query acceleration device; when the first flag bit is 0, it represents that the data query acceleration device does not receive data. Therefore, whether the data query acceleration device receives the first test data can be judged by monitoring the first flag bit of the status register of the data query acceleration device.
And S403, determining the communication of the path between the data query acceleration device and the host device.
Determining that the path between the data query acceleration device and the host device is connected when the status register of the data query acceleration device indicates that the first test data is received by the data query acceleration device.
S404, determining that the path between the data query acceleration device and the host device is not connected.
Determining that the path between the data query acceleration device and the host device is not connected when the status register of the data query acceleration device indicates that the first test data was not received by the data query acceleration device. Or, a preset time may be preset, and when the status register of the data query acceleration device indicates that the data query acceleration device does not receive the first test data within the preset time, it may be determined that the path between the data query acceleration device and the host device is not connected.
And S405, feeding back the detection result of the channel connectivity to the user.
The method comprises the steps that a first test data is issued to a data query acceleration device by responding to a path connectivity detection instruction of a user; judging whether the data query accelerating equipment receives the first test data, detecting whether a channel between the data query accelerating equipment and the host equipment is communicated, and feeding back a channel connectivity detection result to a user, so that the user can quickly determine whether the channel between the data query accelerating equipment and the host equipment can carry out data transmission, and the efficiency of detecting the channel connectivity is improved.
Fig. 5 is a flowchart of an apparatus fault detection method according to another embodiment of the present disclosure, and as shown in fig. 5, the method includes the following steps:
s501, responding to a path integrity detection instruction of a user, and calculating a first check code of data to be calculated.
When a user needs to detect the integrity of a path between the data query acceleration device and the host device, a path integrity detection instruction is initiated for the failure detection device, which may be, for example, checking a path integrity detection option in a visual interface of the failure detection device, and determining to start detection. The fault detection device first determines a piece of data to be calculated in response to a user's path integrity detection instruction. The data to be calculated may be any piece of data that is configured in the memory of the fault detection device in advance by a device developer, or a piece of data that is generated by the fault detection device according to a certain rule or randomly, which is not limited in this disclosure. After the data to be calculated is determined, the fault detection equipment calculates the data to be calculated according to a preset calculation rule to obtain a first check code of the data to be calculated.
And S502, sending the data to be calculated to the data query acceleration equipment.
And the fault detection equipment transmits the data to be calculated to the data query acceleration equipment. Or before sending the data to be calculated, the fault detection device may also send a detection start signal to the data query acceleration device, where the detection start signal is used to start receiving the data to be calculated after the data query acceleration device receives the detection start signal. The fault detection device may also issue the data to be calculated and the first check code to the data query acceleration device together, which is not limited in this disclosure.
S503, acquiring a second check code in the data query acceleration device.
And after the data query acceleration equipment receives the data to be calculated, calculating the data to be calculated according to a preset calculation rule to obtain a second check code of the data to be calculated. The preset calculation rule in the data query acceleration device is the same as the preset calculation rule in the fault detection device, and a device developer can randomly change the preset calculation rule.
S504, whether the first check code is the same as the second check code is judged. If yes, go to S505; if not, go to S506.
The data query acceleration equipment calculates data to be calculated according to a preset calculation rule, after a second check code of the data to be calculated is obtained, the second check code is stored in a preset memory position in the data query acceleration equipment, the fault detection equipment reads the second check code in the preset memory position in the data query acceleration equipment, and whether the first check code is the same as the second check code is judged.
And S505, determining that the path between the data query acceleration device and the host device is complete.
If the first check code is the same as the second check code, the data to be calculated is considered to be correctly transmitted to the data query acceleration device from the fault detection device through a path between the data query acceleration device and the host device, and the integrity of the path between the data query acceleration device and the host device is determined.
S506, determining that the path between the data query acceleration device and the host device is incomplete.
If the first check code is different from the second check code, it is considered that in the process that the data to be calculated is transmitted to the data query accelerating device from the fault detection device through the path between the data query accelerating device and the host device, an error occurs in data transmission, so that the data to be calculated received by the data query accelerating device is different from the data to be calculated sent by the fault detection device, and therefore it is determined that the path between the data query accelerating device and the host device is incomplete.
And S507, feeding back the detection result of the path connectivity to the user.
According to the embodiment of the invention, the data to be calculated is calculated by the data query acceleration equipment and the fault detection equipment according to the same calculation rule respectively, and the calculation results of the data query acceleration equipment and the fault detection equipment are compared to judge whether the data to be calculated can be correctly transmitted through the access, so that whether the access between the data query acceleration equipment and the host equipment is complete is determined.
Fig. 6 is a flowchart of an apparatus fault detection method according to another embodiment of the present disclosure, and as shown in fig. 6, the method includes the following steps:
s601, responding to a data register state detection instruction and/or an instruction register state detection instruction of a user, and issuing second test data to the data query acceleration equipment.
The data query acceleration device is used for storing the second test data into the data register and/or the instruction register.
The data register and the instruction register are specially designed in the data query acceleration equipment and are used for storing storage addresses of data or instructions in a memory of the data query acceleration equipment. When the data register or the instruction register is abnormal, the data query acceleration device cannot correctly find the data or the instruction in the memory of the data query acceleration device, which causes data query errors or interruption.
When a user needs to detect the state of the data register and/or the instruction register, a data register state detection instruction and/or an instruction register state detection instruction is initiated for the fault detection device, which may be, for example, checking a data register state detection option and/or an instruction register state detection option in a visual interface of the fault detection device, and determining to start detection. And the fault detection equipment responds to a data register state detection instruction and/or an instruction register state detection instruction of a user, issues second test data to the data query acceleration equipment, and the data query acceleration equipment stores the second test data into the data register and/or the instruction register. Specifically, since the data query acceleration device is provided with a plurality of data registers and/or instruction registers, one or more specified data registers and/or instruction registers may be detected according to a user instruction, or all data registers and/or instruction registers may be detected in sequence.
When one or more appointed data registers and/or instruction registers are detected, the second test data comprise corresponding register number information, and the data query acceleration equipment stores the second test data into the corresponding registers according to the register number information; when all the data registers and/or the instruction registers are detected, the data query acceleration equipment stores the second test data into each data register and/or instruction register in sequence.
Before issuing the second test data to the data query acceleration device, in order to ensure that the data register and/or the instruction register can be used normally, the data in the data register and/or the instruction register needs to be backed up to the host device.
And S602, reading the data currently stored in the data register and/or the instruction register.
The fault detection device reads the currently stored data in each data register and/or instruction register to be detected in turn through the bus.
S603, judging whether the data currently stored in the data register and/or the instruction register is the same as the second test data. If yes, go to S604; if not, go to S605.
S604, determining that the data register and/or the instruction register are normal.
And when the currently stored data in the data register and/or the instruction register is the same as the second test data, the data register and/or the instruction register is considered to be normal.
S605, determining that the data register and/or the instruction register are abnormal.
When the currently stored data in the data register and/or the instruction register is different from the second test data, it is considered that an error occurs in the process that the data query acceleration device stores the second test data in the data register and/or the instruction register or the data register and/or the instruction register stores the second test data, and it is determined that the data register and/or the instruction register is abnormal.
S606, restoring the data in the data register and/or the instruction register.
And restoring original data in the data register and/or the instruction register which are backed up to the host equipment in the S601 to the data register and/or the instruction register again.
And S607, feeding back the detection result of the data register state and/or the detection result of the instruction register state to the user.
The embodiment of the disclosure determines whether the working state of the data register and/or the instruction register is normal by reading whether the data stored in the data register and/or the instruction register by the data query accelerator is the same as the data issued by the fault detection device, and can automatically detect a plurality of data registers and/or instruction registers at the same time, thereby further improving the efficiency of the device fault detection method.
On the basis of the above embodiment, the method for detecting the equipment fault provided by the present disclosure further includes: and resetting the data query acceleration device in response to a reset instruction of a user.
Specifically, resetting the data query acceleration device in response to a reset instruction of the user may be implemented by a flow shown in fig. 7. As shown in fig. 7, the method includes the following steps:
s701, responding to a reset instruction of a user, and issuing a reset code to the data query acceleration equipment.
The data query acceleration device is used for storing the reset code into a preset register.
When a user needs to reset the data query acceleration device, a reset instruction is initiated for the fault detection device. And responding to a reset instruction of a user, issuing a reset code to the data query acceleration equipment by the fault detection equipment, and storing the reset code into a preset register by the data query acceleration equipment. For example, the reset code 0x01 may be issued to the data query acceleration device, and the data query acceleration device stores the reset code in the 0x00 register. Or, the reset code may also be composed of multiple segments of codes, for example, the code 0x01 is issued to the data query acceleration device, then the code 0x02 is issued, and the data query acceleration device stores the code 0x01 and the code 0x02 in the 0x00 register in sequence, which is not limited in this disclosure.
S702, judging whether the data stored in the preset register is a reset code. If yes, executing S703; if not, go to step S701.
After the data query acceleration device issues the reset code, the data stored in the preset register is read, and whether the data stored in the preset register is the reset code or not is judged. If the data stored in the preset register is not the reset code, the reset code is issued to the data query acceleration equipment again, and the data query acceleration equipment stores the reset code into the preset register. If the reset code is composed of a plurality of sections of codes, it is necessary to determine whether the data stored in the preset register is the reset code or not a plurality of times. For example, after issuing the code 0x01, it is determined whether the data stored in the 0x00 register is 0x01, if so, the code 0x02 is continuously issued, and it is determined again whether the data stored in the 0x00 register is 0x02, if so, the next step is continuously executed.
S703, monitoring a second zone bit of a status register of the data query acceleration device.
When the data in the preset register of the data query acceleration device is the reset code, the data query acceleration device executes the reset operation. The state of the second flag bit of the status register of the data query acceleration device is used for indicating whether the data query acceleration device is reset or not.
S704, whether the data query acceleration device is reset successfully is judged. If yes, go to S705; if not, go to S706.
S705, the reset operation of the data query acceleration device is completed.
S706, determining that the data query acceleration device fails to reset.
If the second flag bit of the status register of the data query acceleration device indicates that the data query acceleration device is successfully reset, completing the reset operation of the data query acceleration device; otherwise, determining that the data query acceleration device fails to reset.
According to the embodiment of the disclosure, the data query acceleration device can be quickly reset by writing the reset code into the preset register in the data query acceleration device without restarting a computer, reinstalling a driver and the like, so that the reset efficiency of the data query acceleration device is improved, and the loss of hardware is reduced.
Fig. 8 is a schematic structural diagram of an apparatus fault detection device provided in the embodiment of the present disclosure. The device failure detection apparatus may be a failure detection device as described in the above embodiments, or the device failure detection apparatus may be a component or assembly in the failure detection device. The device fault detection apparatus provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the device fault detection method, as shown in fig. 8, the device fault detection apparatus 80 includes: the system comprises an issuing module 81, a determining module 82 and a feedback module 83; the issuing module 81 is configured to issue test data to the data query acceleration device in response to a fault detection instruction of a user; the determining module 82 is configured to determine a fault detection result of the data query acceleration device based on state information of the data query acceleration device when processing the test data; the feedback module 83 is configured to feed back the fault detection result to a user.
Optionally, the issuing module 81 includes a first issuing unit 811, configured to issue first test data to the data query acceleration device in response to a user's access connectivity detection instruction; the determining module 82 includes a first determining unit 821, configured to monitor a status register of the data query acceleration device, and determine that a path between the data query acceleration device and the host device is not connected if a first flag bit in the status register indicates that the data query acceleration device does not receive the first test data.
Optionally, the issuing module 81 includes a second issuing unit 812, configured to calculate a first check code of the data to be calculated in response to the user's path integrity detection instruction; issuing the data to be calculated to the data query acceleration equipment, wherein the data query acceleration equipment is used for calculating a second check code of the data to be calculated; the determining module 82 includes a second determining unit 822, configured to obtain a second check code in the data query acceleration device, and determine that a path between the data query acceleration device and the host device is incomplete if the first check code is different from the second check code.
Optionally, the issuing module 81 includes a third issuing unit 813, configured to issue second test data to the data query acceleration device in response to a data register state detection instruction and/or an instruction register state detection instruction of a user, where the data query acceleration device is configured to store the second test data in a data register and/or an instruction register; the determining module 82 comprises a third determining unit 823, configured to read the currently stored data in the data register and/or the instruction register, and compare the second test data with the currently stored data in the data register and/or the instruction register; and if the second test data is different from the data currently stored in the data register and/or the instruction register, determining that the data register and/or the instruction register is abnormal.
Optionally, the device failure detection apparatus 80 further includes a reset module 84, configured to reset the data query acceleration device in response to a reset instruction of a user.
Optionally, the reset module 84 is further configured to issue a reset code to the data query acceleration device, where the data query acceleration device is configured to store the reset code in a preset register; and monitoring a status register of the data query acceleration equipment, and if a second flag bit in the status register indicates that the data query acceleration equipment is successfully reset, finishing the reset operation of the data query acceleration equipment.
The device failure detection apparatus in the embodiment shown in fig. 8 may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device may be a device to be upgraded as described in the above embodiments. The electronic device provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the device fault detection method, as shown in fig. 9, the electronic device 90 includes: memory 91, processor 92, computer programs and communications interface 93; wherein a computer program is stored in the memory 91 and is configured to be executed by the processor 92 for the device failure detection method as described above.
In addition, the embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the device fault detection method described in the above embodiment.
Furthermore, the embodiments of the present disclosure also provide a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the method for detecting the device fault is implemented as described above.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of device fault detection, the method comprising:
responding to a fault detection instruction of a user, and issuing test data to the data query acceleration equipment;
determining a fault detection result of the data query acceleration equipment based on state information of the data query acceleration equipment when the data query acceleration equipment processes the test data;
and feeding back the fault detection result to a user.
2. The method of claim 1, wherein issuing test data to the data query acceleration device in response to a user's failure detection instruction comprises:
responding to a path connectivity detection instruction of a user, and issuing first test data to the data query acceleration equipment;
determining a fault detection result of the data query acceleration device based on the state information of the data query acceleration device when processing the test data, including:
monitoring a status register of the data query acceleration equipment, and if a first flag bit in the status register indicates that the data query acceleration equipment does not receive the first test data, determining that a path between the data query acceleration equipment and host equipment is not connected.
3. The method of claim 1, wherein issuing test data to the data query acceleration device in response to a user's failure detection instruction comprises:
responding to a path integrity detection instruction of a user, and calculating a first check code of data to be calculated;
issuing the data to be calculated to the data query acceleration equipment, wherein the data query acceleration equipment is used for calculating a second check code of the data to be calculated;
determining a fault detection result of the data query acceleration device based on the state information of the data query acceleration device when processing the test data, including:
and acquiring a second check code in the data query acceleration equipment, and if the first check code is different from the second check code, determining that a path between the data query acceleration equipment and the host equipment is incomplete.
4. The method of claim 1, wherein issuing test data to the data query acceleration device in response to a fault detection instruction from a user comprises:
responding to a data register state detection instruction and/or an instruction register state detection instruction of a user, and issuing second test data to the data query acceleration equipment, wherein the data query acceleration equipment is used for storing the second test data into a data register and/or an instruction register;
determining a fault detection result of the data query acceleration device based on the state information of the data query acceleration device when processing the test data, including:
reading currently stored data in the data register and/or the instruction register, and comparing the second test data with the currently stored data in the data register and/or the instruction register;
and if the second test data is different from the data currently stored in the data register and/or the instruction register, determining that the data register and/or the instruction register is abnormal.
5. The method of claim 1, further comprising:
and resetting the data query acceleration device in response to a reset instruction of a user.
6. The method of claim 5, wherein resetting the data query acceleration device comprises:
issuing a reset code to the data query acceleration equipment, wherein the data query acceleration equipment is used for storing the reset code into a preset register;
and monitoring a status register of the data query acceleration equipment, and if a second flag bit in the status register indicates that the data query acceleration equipment is successfully reset, finishing the reset operation of the data query acceleration equipment.
7. An apparatus for detecting a failure of a device, comprising:
the issuing module is used for responding to a fault detection instruction of a user and issuing test data to the data query acceleration equipment;
the determining module is used for determining a fault detection result of the data query acceleration equipment based on state information of the data query acceleration equipment when the data query acceleration equipment processes the test data;
and the feedback module is used for feeding back the fault detection result to a user.
8. The apparatus of claim 7, further comprising:
and the resetting module is used for responding to a resetting instruction of a user and resetting the data query acceleration equipment.
9. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202210640768.XA 2022-06-07 2022-06-07 Equipment fault detection method, device, equipment and computer readable storage medium Pending CN115220972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210640768.XA CN115220972A (en) 2022-06-07 2022-06-07 Equipment fault detection method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210640768.XA CN115220972A (en) 2022-06-07 2022-06-07 Equipment fault detection method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115220972A true CN115220972A (en) 2022-10-21

Family

ID=83608746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210640768.XA Pending CN115220972A (en) 2022-06-07 2022-06-07 Equipment fault detection method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115220972A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103675641A (en) * 2013-12-23 2014-03-26 龙芯中科技术有限公司 Chip fault positioning method, device and system
CN106126363A (en) * 2016-06-20 2016-11-16 芯海科技(深圳)股份有限公司 A kind of verification method of depositor
CN106294044A (en) * 2016-08-09 2017-01-04 上海东软载波微电子有限公司 The checking circuit of chip internal register and chip
CN107402830A (en) * 2016-04-25 2017-11-28 阿自倍尔株式会社 Register abnormal detector
CN109977684A (en) * 2019-02-12 2019-07-05 平安科技(深圳)有限公司 A kind of data transmission method, device and terminal device
CN110321256A (en) * 2019-05-16 2019-10-11 深圳市江波龙电子股份有限公司 A kind of test method, test equipment and computer storage medium storing equipment
CN110768814A (en) * 2018-07-26 2020-02-07 中车株洲电力机车研究所有限公司 Communication port fault detection method
CN111048139A (en) * 2019-12-22 2020-04-21 苏州浪潮智能科技有限公司 Storage medium detection method, device, equipment and readable storage medium
CN111061591A (en) * 2019-11-15 2020-04-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) System and method for realizing data integrity check based on memory integrity check controller
CN111414268A (en) * 2020-02-26 2020-07-14 华为技术有限公司 Fault processing method and device and server
CN113094266A (en) * 2021-04-06 2021-07-09 中国工商银行股份有限公司 Fault testing method, platform and equipment for container database
CN114020511A (en) * 2021-11-03 2022-02-08 西人马(西安)测控科技有限公司 FPGA-based fault detection method, device, equipment and readable storage medium
CN114297134A (en) * 2021-11-30 2022-04-08 山东云海国创云计算装备产业创新中心有限公司 Chip architecture and signal integrity test method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103675641A (en) * 2013-12-23 2014-03-26 龙芯中科技术有限公司 Chip fault positioning method, device and system
CN107402830A (en) * 2016-04-25 2017-11-28 阿自倍尔株式会社 Register abnormal detector
CN106126363A (en) * 2016-06-20 2016-11-16 芯海科技(深圳)股份有限公司 A kind of verification method of depositor
CN106294044A (en) * 2016-08-09 2017-01-04 上海东软载波微电子有限公司 The checking circuit of chip internal register and chip
CN110768814A (en) * 2018-07-26 2020-02-07 中车株洲电力机车研究所有限公司 Communication port fault detection method
CN109977684A (en) * 2019-02-12 2019-07-05 平安科技(深圳)有限公司 A kind of data transmission method, device and terminal device
CN110321256A (en) * 2019-05-16 2019-10-11 深圳市江波龙电子股份有限公司 A kind of test method, test equipment and computer storage medium storing equipment
CN111061591A (en) * 2019-11-15 2020-04-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) System and method for realizing data integrity check based on memory integrity check controller
CN111048139A (en) * 2019-12-22 2020-04-21 苏州浪潮智能科技有限公司 Storage medium detection method, device, equipment and readable storage medium
CN111414268A (en) * 2020-02-26 2020-07-14 华为技术有限公司 Fault processing method and device and server
CN113094266A (en) * 2021-04-06 2021-07-09 中国工商银行股份有限公司 Fault testing method, platform and equipment for container database
CN114020511A (en) * 2021-11-03 2022-02-08 西人马(西安)测控科技有限公司 FPGA-based fault detection method, device, equipment and readable storage medium
CN114297134A (en) * 2021-11-30 2022-04-08 山东云海国创云计算装备产业创新中心有限公司 Chip architecture and signal integrity test method

Similar Documents

Publication Publication Date Title
US9778988B2 (en) Power failure detection system and method
US6845469B2 (en) Method for managing an uncorrectable, unrecoverable data error (UE) as the UE passes through a plurality of devices in a central electronics complex
CN109117327A (en) A kind of hard disk detection method and device
CN106919462B (en) Method and device for generating fault record of processor
CN111211929A (en) Fault positioning method, fault positioning device, control equipment and intelligent equipment
US7734956B2 (en) Process management system
CN112256507B (en) Chip fault diagnosis method and device, readable storage medium and electronic equipment
US20180095806A1 (en) Technologies for fast boot with adaptive memory pre-training
US20060150033A1 (en) Method for monitoring the execution of a program in a micro-computer
CN111831466A (en) System equipment error reporting method, device, storage medium and computer equipment
US8799608B1 (en) Techniques involving flaky path detection
CN112506693A (en) Method and device for recording abnormal information, storage medium and electronic equipment
CN115220972A (en) Equipment fault detection method, device, equipment and computer readable storage medium
US11748220B2 (en) Transmission link testing
CN115756935A (en) Abnormal fault positioning method, device and equipment of embedded software system
US7415560B2 (en) Method of automatically monitoring computer system debugging routine
CN111931161B (en) RISC-V processor based chip verification method, apparatus and storage medium
JP2003271694A (en) Simulation method and device for verifying logic circuit including processor and error detecting program for verifying logic circuit
CN113742156A (en) Joint debugging method and device, electronic equipment and storage medium
CN116382968B (en) Fault detection method and device for external equipment
CN115640236B (en) Script quality detection method and computing device
US20050125583A1 (en) Detecting method for PCI system
CN1987809A (en) Method for self detecting result of basic input and output system
US20100318854A1 (en) System and method for checking firmware definition file
CN115408192A (en) IO error detection method of virtual machine and related components thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination