CN114780323A - Fault detection method, device and equipment for memory in server - Google Patents

Fault detection method, device and equipment for memory in server Download PDF

Info

Publication number
CN114780323A
CN114780323A CN202210685982.7A CN202210685982A CN114780323A CN 114780323 A CN114780323 A CN 114780323A CN 202210685982 A CN202210685982 A CN 202210685982A CN 114780323 A CN114780323 A CN 114780323A
Authority
CN
China
Prior art keywords
memory
pressure measurement
fault
memory block
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210685982.7A
Other languages
Chinese (zh)
Inventor
闫剑锋
卢双堂
傅先刚
陈昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Information Technologies Co Ltd
Original Assignee
New H3C Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Information Technologies Co Ltd filed Critical New H3C Information Technologies Co Ltd
Priority to CN202210685982.7A priority Critical patent/CN114780323A/en
Publication of CN114780323A publication Critical patent/CN114780323A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The method maps memory addresses in the server into memory space, divides the memory space, selects a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block and a target pressure measurement algorithm from a plurality of different associated pressure measurement algorithms for each memory block when fault testing is carried out on each divided memory block; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. Therefore, the embodiment can improve the detection efficiency and the detection rate of the memory faults.

Description

Fault detection method, device and equipment for memory in server
Technical Field
The present disclosure relates to fault detection technologies, and in particular, to a method, an apparatus, and a device for detecting a fault in a memory of a server.
Background
For a server, a memory, a mainboard and a CPU are three major components forming the server, and among the three major components, the server fault caused by the memory accounts for the highest percentage among the three major components. And the fault caused by the memory has a high probability to cause the downtime of the server, the interruption of the operation service is caused, and huge economic loss is caused. Particularly, when a server sold in foreign countries fails, the server is extremely difficult to maintain due to long distance, and great pressure is brought to after-sales service of manufacturers. Therefore, the method for improving the detection rate of the fault memory at the production end has important significance for the quality of the server.
At present, all the virtual addresses allocated to the memory in the server are usually tested by using a memtest algorithm for testing the performance of the memory. However, in the prior art, the detectable rate of the memory is generally improved by increasing the pressure measurement algorithm for testing the memory, but the current pressure measurement algorithms for testing the memory are various but are mature in basic technology, and the difficulty in making a breakthrough is large. Therefore, in order to increase the detectable rate of the memory of the production server, the pressure measurement time of the memory can only be continuously increased, which leads to low detection efficiency of the memory, and greatly increases the production cost.
Disclosure of Invention
The application provides a fault detection method, a fault detection device and fault detection equipment, which are used for improving the detection efficiency and improving the fault detection rate.
The technical scheme provided by the application comprises the following steps:
in a first aspect, an embodiment of the present application provides a method for detecting a failure of a memory in a server, including:
mapping a memory address in a server into a memory space, and dividing the memory space to obtain N memory blocks;
when fault testing is performed on the N memory blocks, the following pressure measurement operations are performed on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;
and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold value, ending the pressure measurement detection of the memory blocks and reporting the fault information of the fault memory blocks.
In a second aspect, an embodiment of the present application provides an apparatus for detecting a failure of a memory in a server, including:
the device comprises a blocking unit, a storage unit and a processing unit, wherein the blocking unit is used for mapping the memory address in the server into a memory space and dividing the memory space to obtain N memory blocks;
the fault detection unit is used for executing the following pressure measurement operation for each memory block when fault testing is performed on the N memory blocks: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;
and the fault reporting unit is used for ending the pressure measurement detection of each memory block and reporting the fault information of the fault memory block when the fault record accumulation corresponding to each memory block reaches the configured memory fault reporting threshold value.
According to the technical scheme, the server memory is divided into N memory blocks, when the N memory blocks are subjected to fault testing, a target memory address read-write mode is selected from multiple different memory address read-write modes associated with the memory blocks and a target pressure measurement algorithm is selected from multiple different associated pressure measurement algorithms for each memory block; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It is thus clear that this application embodiment can carry out the pressure measurement to each memory piece in parallel and detect, and the pressure measurement algorithm that each memory piece carried out the pressure measurement and detects and memory reading mode probably different, this makes can diversely visit the memory piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each memory piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency the time, can also improve memory fault detection rate, reduce the generation cost.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart of a method for detecting a failure of a memory in a server according to the present disclosure;
fig. 2(a) is a schematic diagram of a first memory address read/write method provided in the present application;
fig. 2(b) is a schematic diagram of a second memory address reading and writing manner provided in the present application;
fig. 2(c) is a schematic diagram of a third memory address read-write mode provided in the present application;
fig. 2(d) is a schematic diagram of a fourth memory address read/write method provided in the present application;
fig. 3 is a schematic structural diagram of a failure detection apparatus for a memory in a server according to the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
For a server, a memory, a mainboard and a CPU are three major components forming the server, and among the three major components, the failure of the server caused by the memory accounts for the highest percentage among the three major components. And the fault caused by the memory has a high probability of causing the downtime of the server, causing the interruption of the operation service and causing huge economic loss. In particular, when a server sold abroad fails, maintenance is extremely difficult due to a long way, and great pressure is brought to after-sale service of manufacturers. Therefore, the method has important significance for the quality of the server by improving the detection rate of the fault memory at the production end.
At present, all the virtual addresses allocated to the memory in the server are usually tested by memtest algorithm for testing the performance of the memory. However, the existing pressure measurement detection has a low memory detection rate at the production end, and particularly for memories with batch problems, a large amount of faulty memories flow to the market due to the low memory detection rate, so that a large amount of economic loss is caused. In addition, as the environment for using the server by a user is complex, the requirement on the process precision of the memory is higher and higher, the spacing between the particle storage units is smaller and smaller, the coupling coherence fault ratio is correspondingly improved, the high coupling coherence fault ratio causes the current memory fault recurrence rate to be 40% -50%, and the phenomenon is more and more obvious in the future. Based on the method, the detection rate of the production fault memory of the server is improved so as to improve the problems.
However, in the prior art, the detectable rate of the memory is generally improved by increasing the pressure measurement algorithm for testing the memory, but the current pressure measurement algorithms for testing the memory are various but are mature in basic technology, and the difficulty in making a breakthrough is large. Therefore, in order to improve the detectable rate of the memory of the production server, the pressure measurement time of the memory can only be continuously improved, which leads to low detection efficiency of the memory detection and greatly increases the production cost.
In order to solve the foregoing technical problem, an embodiment of the present application provides a method for detecting a failure of a memory in a server, including: mapping a memory address in a server into a memory space, and dividing the memory space to obtain N memory blocks; when fault testing is performed on the N memory blocks, the following pressure measurement operations are performed on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It can be seen that this application embodiment can carry out the pressure measurement to each RAM piece in parallel and detect, and each RAM piece carries out the pressure measurement algorithm that the pressure measurement detected and the memory reading mode that uses probably is different, this makes can diversely visit the RAM piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each RAM piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency in, can also improve memory fault relevance ratio, reduce the cost of generating.
Based on the above description, the following describes the flow shown in fig. 1 provided in the present application:
referring to fig. 1, fig. 1 is a flowchart of a method for detecting a failure of a memory in a server according to the present application. The method can be applied to an electronic device which is connected to a server and used for detecting the memory in the connected server. The method can also be applied to a server to detect memory in the server.
As shown in fig. 1, the process may include the following steps:
step 101, mapping a memory address in a server to a memory space, and dividing the memory space to obtain N memory blocks.
In practical application, a server is generally provided with a plurality of memory banks, the memory banks correspond to different physical addresses, in this embodiment, the physical addresses corresponding to the memory banks are mapped into a large memory space, N is a positive integer, and it is easy to see that the memory space refers to physical addresses of a memory. The partitioned memory blocks cannot be too small or too large, and if the partitioned memory blocks have too small memories, data needs to be read just before being written, so that the detection rate of a failed memory is affected. If the memory of the partitioned memory blocks is too large, the access rate of accessing each memory block is too slow, and the data written into the memory blocks can be read out only after a long time interval, which affects the detection rate of the failed memory. Based on this, as an embodiment, the implementation of the partition processing on the memory space in step 101 includes: and according to the principle that the access rate of each divided memory block is within a preset rate range, dividing the memory space. Since the physical addresses are all multiples of 8, 1024 x 4 is the smallest unit of access to memory, based on which N is a power of 4096 times 2 in some embodiments.
Step 102, when performing fault testing on N memory blocks, performing the following pressure measurement operations for each memory block: selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a target pressure measurement algorithm from a plurality of different pressure measurement algorithms associated with the memory block; and performing pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory failure times of the memory block if the memory block is detected to have a failure.
The multiple associated memory address read-write modes can be configured in the memory address read-write mode file in advance, so that when fault tests are performed on the N memory blocks, one memory address read-write mode file is selected from the memory address read-write mode files. The above-mentioned associated multiple memory address read-write modes may also be a memory address read-write mode file selected from multiple memory address read-write modes corresponding to the memory block according to a mapping relationship between the memory block and the multiple memory address read-write modes. Accordingly, a plurality of associated different pressure measurement algorithms may be configured in the pressure measurement algorithm file in advance, so that when the fault test is performed on the N memory blocks, one pressure measurement algorithm is selected from the pressure measurement algorithm file. The associated multiple pressure measurement algorithms may also be a pressure measurement algorithm selected from multiple pressure measurement algorithms corresponding to the memory block according to a mapping relationship between the memory block and the multiple pressure measurement algorithms. As to how to select a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block and how to select a target pressure measurement algorithm from a plurality of different pressure measurement algorithms associated with the memory block, the following description will be made in detail, and details are not repeated herein.
According to the sorting order of the physical addresses corresponding to the memory blocks, the physical address which is positioned at the forefront in the physical address sequence corresponding to the memory block is called a low address, and the physical address which is positioned at the rearmost in the physical address sequence corresponding to the memory block is called a high address. As an embodiment, the memory address reading and writing mode file may be a memory address reading and writing mode from a high address to a low address as indicated by an arrow direction shown in fig. 2(a), a memory address reading and writing mode from a low address to a high address as indicated by an arrow direction shown in fig. 2(b), a memory address reading and writing mode from a two-end address to a middle address as indicated by an arrow direction shown in fig. 2(c), or a memory address reading and writing mode from a middle address to a two-end address as indicated by an arrow direction shown in fig. 2 (d).
As another embodiment, the above-mentioned pressure measuring algorithm may also be called a memory fault testing algorithm, and may be a chessboard algorithm, a multiplication algorithm, a division algorithm, a step 1 algorithm, or a step 0 algorithm.
For example, for memory block a1, the checkerboard algorithm is run on the memory block in a memory address read-write manner from a high address to a low address, and for memory block a2, the walking 1 algorithm is run on the memory block in a memory address read-write manner from a low address to a high address.
Step 103, when the accumulated fault records corresponding to each memory block reach the configured memory fault reporting threshold, step 104 is executed.
Accumulating the fault records corresponding to each memory block, and executing step 104 when the accumulated fault records reach the configured memory fault reporting threshold. The memory failure reporting threshold value may be configured under a BIOS (Basic Input Output System). The memory fault reporting threshold may be set to 1, which means that step 104 is executed as soon as the fault record is found to be 1. The memory fault reporting threshold may be set to 2, so that when the fault record corresponding to each accumulated memory block reaches 2, step 104 is executed, where a value of the fault reporting threshold is related to an actual requirement on the memory fault rate, if the requirement on the memory fault rate is high, the fault reporting threshold is set to be lower, and if the requirement on the memory fault rate is low, the fault reporting threshold is set to be higher.
And step 104, ending the pressure measurement detection of each memory block, and reporting the fault information of the fault memory block.
In this step, as an embodiment, an ECC (Error correction Code, Error check and correction) checking tool in the server may be triggered to report the failure information of the failed memory block. On one hand, the fault information of the fault memory block is directly reported to an operating system, and on the other hand, the fault information of the fault memory block is reported to the base plate management controller.
So far, the description shown in fig. 1 is completed.
As can be seen from the above technical solutions,
in the method, a server memory is divided into N memory blocks, when fault testing is carried out on the N memory blocks, a target memory address read-write mode is selected from multiple different memory address read-write modes associated with the memory block and a target pressure measurement algorithm is selected from multiple different pressure measurement algorithms associated with each memory block; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It is thus clear that this application embodiment can carry out the pressure measurement to each memory piece in parallel and detect, and the pressure measurement algorithm that each memory piece carried out the pressure measurement and detects and memory reading mode probably different, this makes can diversely visit the memory piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each memory piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency the time, can also improve memory fault detection rate, reduce the generation cost.
After the flowchart of fig. 1 is completed, as an embodiment, the performing, in the step 102, the pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm is performed when the pressure measurement start time allocated to the memory block arrives. If it is detected that the memory block has a fault, or if it is detected that the memory block has no fault, the method further includes: and detecting whether the current time is the pressure measurement ending time distributed to the memory block, and if not, returning to execute pressure measurement operation aiming at the memory block.
In this embodiment, the pressure measurement time obtained by subtracting the pressure measurement start time from the pressure measurement end time is a pressure measurement time generally used in the art. For each memory block, the pressure measurement start time allocated to the memory block and the pressure measurement end time allocated to the memory block may be the pressure measurement start time and the pressure measurement end time allocated to each memory block in a unified manner, that is, the pressure measurement start time and the pressure measurement end time allocated to each memory block are the same. The pressure measurement start time and the pressure measurement end time allocated to the memory block individually may also be set, so that the pressure measurement start time allocated to each memory block and the pressure measurement end time allocated to the memory block may be different.
It can be seen that the technical scheme provided by this embodiment can select multiple memory address read-write modes to visit the memory block in multiple directions and adopt multiple pressure measurement algorithms to measure the pressure of each memory block within the set pressure measurement time, and then can exert the advantage of combining multiple pressure measurement algorithms and multiple memory address read modes, thereby can further improve the memory fault detection rate while improving the detection efficiency, greatly reduce the generation cost.
As an embodiment, before step 101, before mapping the memory address in the server to the memory space, the method further includes; and under an operating system, performing address mapping on the memory in the server to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner. In practical application, the mmap function can be called to map the virtual address and the physical address of the memory so as to realize the steps, and the reading and writing operation on the mapped virtual address is the same as the reading and writing operation on the physical address.
In some embodiments, the target memory address reading and writing mode may be a memory address reading and writing mode randomly selected from a plurality of associated different memory address reading and writing modes.
In other embodiments, the target load cell algorithm may be a load cell algorithm randomly selected from an associated plurality of different load cell algorithms.
In other embodiments, the target memory address reading and writing mode is a memory address reading and writing mode that is the same as the historical memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.
The historical memory address reading and writing mode of the embodiment is a memory address reading and writing mode selected before.
Multiple experiments prove that the higher detection rate can be obtained by continuously selecting the same memory address reading and writing mode for multiple times for one memory block.
In other embodiments, a pressure measurement algorithm that is the same as the historical pressure measurement algorithm is selected from an associated plurality of different pressure measurement algorithms.
The historical memory address reading and writing mode of the embodiment is a pressure measurement algorithm selected previously.
Multiple experiments verify that the same pressure measurement algorithm is continuously selected for multiple times for one memory block, so that higher detection rate can be obtained.
In other embodiments, the target memory address reading and writing mode may be a historical memory address reading and writing mode different from a recently selected memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.
The historical memory address reading and writing mode of this embodiment is a memory address reading and writing mode that was selected before the most recently selected memory address reading and writing mode.
In other embodiments, the target pressure measurement algorithm is a historical pressure measurement algorithm selected from an associated plurality of different pressure measurement algorithms that is different from a most recently selected pressure measurement algorithm.
The historical pressure measurement algorithm of the present embodiment is a historical pressure measurement algorithm that has been selected before the most recently selected historical pressure measurement algorithm.
Based on the foregoing embodiment, as an embodiment, the selecting a target memory address reading and writing manner from multiple different memory address reading and writing manners associated with the memory block and selecting a target pressure measurement algorithm from multiple different pressure measurement algorithms associated with the memory block include the following implementation manners,
the first realization mode is as follows: randomly selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block; and randomly selecting a target pressure measurement algorithm from the associated multiple different pressure measurement algorithms.
The second implementation manner is as follows: randomly selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a historical pressure measuring algorithm different from the most recently selected pressure measuring algorithm from the associated plurality of different pressure measuring algorithms.
The third implementation manner is as follows: selecting a memory address read-write mode which is the same as the historical memory address read-write mode from the associated multiple different memory address read-write modes, and randomly selecting a target pressure measuring algorithm from the associated multiple different pressure measuring algorithms.
The fourth implementation manner is as follows: randomly selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a pressure measuring algorithm which is the same as the historical pressure measuring algorithm from a plurality of associated different pressure measuring algorithms.
The fifth implementation manner is as follows: selecting a memory address reading and writing mode which is the same as the historical memory address reading and writing mode from the associated multiple different memory address reading and writing modes, and selecting a historical pressure measuring algorithm which is different from the most recently selected pressure measuring algorithm from the associated multiple different pressure measuring algorithms.
The sixth implementation manner is as follows: selecting a historical memory address reading and writing mode different from the most recently selected memory address reading and writing mode from the associated multiple different memory address reading and writing modes, and selecting a pressure measuring algorithm same as the historical pressure measuring algorithm from the associated multiple different pressure measuring algorithms.
The seventh implementation manner is as follows: selecting a historical memory address read-write mode different from the most recently selected memory address read-write mode from the associated multiple different memory address read-write modes, and randomly selecting a target voltage measuring algorithm from the associated multiple different voltage measuring algorithms.
The eighth implementation manner is: selecting a historical memory address read-write mode different from the most recently selected memory address read-write mode from the associated multiple different memory address read-write modes, and selecting a historical pressure measuring algorithm different from the most recently selected pressure measuring algorithm from the associated multiple different pressure measuring algorithms.
The above embodiments are described.
The following describes the apparatus provided in the present application:
referring to fig. 3, fig. 3 is a device 300 for detecting a failure of a memory in a server according to the present invention, including:
a partitioning unit 301, configured to map a memory address in a server into a memory space, and partition the memory space to obtain N memory blocks;
a fault detection unit 302, configured to, when performing a fault test on the N memory blocks, perform the following voltage measurement operation on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;
a failure determining unit 303, configured to trigger the failure reporting unit 304 when the accumulated failure records corresponding to the memory blocks reach the configured memory failure reporting threshold;
the fault reporting unit 304 is configured to end the voltage measurement detection on each memory block, and report fault information of the faulty memory block.
As an embodiment, the performing of the pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm is performed when the pressure measurement start time allocated to the memory block is reached;
if the memory block is detected to have a fault, or if the memory block is detected to have no fault, the apparatus further includes:
and the pressure measurement ending judging unit is used for detecting whether the current time is the pressure measurement ending time allocated to the memory block or not, and if not, returning to execute pressure measurement operation on the memory block.
As an embodiment, the apparatus further comprises;
and the address mapping unit is used for performing address mapping on the memory in the server under the operating system so as to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner.
As an embodiment, the blocking unit 301 is specifically configured to:
and according to the principle that the access rate of each divided memory block is within a preset rate range, dividing the memory space.
As an embodiment, the target memory address reading and writing mode is a memory address reading and writing mode randomly selected from a plurality of associated different memory address reading and writing modes.
As an embodiment, the target load cell algorithm is a load cell algorithm randomly selected from an associated plurality of different load cell algorithms.
As an embodiment, the target memory address reading and writing mode is a memory address reading and writing mode that is the same as the historical memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.
As an embodiment, the target pressure measurement algorithm is a pressure measurement algorithm selected from a plurality of different associated pressure measurement algorithms that is the same as the historical pressure measurement algorithm.
As an embodiment, the target memory address reading and writing mode is a historical memory address reading and writing mode that is different from a recently selected memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.
As one example, the target pressure measurement algorithm is a historical pressure measurement algorithm selected from an associated plurality of different pressure measurement algorithms that is different from a most recently selected pressure measurement algorithm.
Therefore, in the technical scheme of the embodiment of the application, a server memory is divided into N memory blocks, when the N memory blocks are subjected to fault testing, a target memory address reading and writing mode is selected from multiple different memory address reading and writing modes associated with the memory block for each memory block, and a target pressure measurement algorithm is selected from multiple different associated pressure measurement algorithms; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It is thus clear that this application embodiment can carry out the pressure measurement to each memory piece in parallel and detect, and the pressure measurement algorithm that each memory piece carried out the pressure measurement and detects and memory reading mode probably different, this makes can diversely visit the memory piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each memory piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency the time, can also improve memory fault detection rate, reduce the generation cost.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
In the electronic device provided in the embodiment of the present application, from a hardware level, a schematic diagram of a hardware architecture can be seen as shown in fig. 4. The method comprises the following steps: a machine-readable storage medium and a processor, wherein: the machine-readable storage medium stores machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to perform the fault detection operations of the memory in the server disclosed in the above examples.
A machine-readable storage medium is provided in an embodiment of the present application that stores machine-executable instructions that, when invoked and executed by a processor, cause the processor to perform the fault detection operations of memory in a server disclosed in the above examples.
Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and so forth. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement without inventive effort.
So far, the description of the apparatus shown in fig. 4 is completed.
The above description is only a preferred embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (11)

1. A method for detecting faults of a memory in a server is characterized by comprising the following steps:
mapping a memory address in a server into a memory space, and dividing the memory space to obtain N memory blocks;
when fault testing is performed on the N memory blocks, the following pressure measurement operation is performed on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;
and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks.
2. The method according to claim 1, wherein the performing of the pressure measurement detection on the memory block according to the target memory address reading and writing manner and the target pressure measurement algorithm is performed when a pressure measurement start time allocated to the memory block is reached;
if it is detected that the memory block has a fault, or if it is detected that the memory block has no fault, the method further includes:
and detecting whether the current time is the pressure measurement ending time allocated to the memory block, and if not, returning to execute the pressure measurement operation on the memory block.
3. The method of claim 1, wherein before said mapping the memory address in the server to the memory space, the method further comprises;
and under an operating system, performing address mapping on the memory in the server to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner.
4. The method according to claim 2, wherein the partitioning the memory space includes:
and according to the principle that the access rate of each divided memory block is within a preset rate range, dividing the memory space.
5. The method of claim 1, wherein the target memory address read/write mode is a memory address read/write mode selected randomly from a plurality of associated different memory address read/write modes, and/or
The target pressure measurement algorithm is a pressure measurement algorithm randomly selected from an associated plurality of different pressure measurement algorithms.
6. The method according to claim 1, wherein the target memory address reading/writing mode is a memory address reading/writing mode selected from multiple associated different memory address reading/writing modes, the memory address reading/writing mode being the same as the historical memory address reading/writing mode, or/and
the target pressure measuring algorithm is a pressure measuring algorithm which is selected from a plurality of related different pressure measuring algorithms and is the same as the historical pressure measuring algorithm.
7. A failure detection device for a memory in a server, comprising:
the system comprises a blocking unit, a storage unit and a processing unit, wherein the blocking unit is used for mapping a memory address in a server into a memory space and dividing the memory space to obtain N memory blocks;
the fault detection unit is used for executing the following pressure measurement operation for each memory block when fault testing is performed on the N memory blocks: selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a target pressure measurement algorithm from a plurality of different pressure measurement algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;
a failure determining unit, configured to trigger the failure reporting unit when the accumulated failure records corresponding to the memory blocks reach the configured memory failure reporting threshold;
and the fault reporting unit is used for finishing the pressure measurement detection of each memory block and reporting the fault information of the fault memory block.
8. The apparatus according to claim 7, wherein the performing of the pressure measurement detection on the memory block according to the target memory address reading and writing manner and the target pressure measurement algorithm is performed when a pressure measurement start time allocated to the memory block is reached;
if it is detected that the memory block has a fault, or if it is detected that the memory block has no fault, the apparatus further includes:
and the pressure measurement ending judging unit is used for detecting whether the current time is the pressure measurement ending time allocated to the memory block or not, and if not, returning to execute pressure measurement operation on the memory block.
9. The apparatus of claim 7, further comprising;
and the address mapping unit is used for carrying out address mapping on the memory in the server under an operating system so as to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner.
10. The apparatus according to claim 7, wherein the blocking unit is specifically configured to:
and according to the principle that the access rate of each divided memory block is within a preset rate range, dividing the memory space.
11. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to perform the method steps of any of claims 1-6.
CN202210685982.7A 2022-06-17 2022-06-17 Fault detection method, device and equipment for memory in server Pending CN114780323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210685982.7A CN114780323A (en) 2022-06-17 2022-06-17 Fault detection method, device and equipment for memory in server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210685982.7A CN114780323A (en) 2022-06-17 2022-06-17 Fault detection method, device and equipment for memory in server

Publications (1)

Publication Number Publication Date
CN114780323A true CN114780323A (en) 2022-07-22

Family

ID=82421167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210685982.7A Pending CN114780323A (en) 2022-06-17 2022-06-17 Fault detection method, device and equipment for memory in server

Country Status (1)

Country Link
CN (1) CN114780323A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118034991A (en) * 2024-04-11 2024-05-14 北京开源芯片研究院 Memory data access method and device, electronic equipment and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050047229A1 (en) * 2002-12-18 2005-03-03 Benoit Nadeau-Dostie Method and circuit for collecting memory failure information
CN103208314A (en) * 2013-03-04 2013-07-17 深圳市硅格半导体有限公司 Internal memory test method of embedded system and embedded system
CN103902419A (en) * 2014-03-28 2014-07-02 华为技术有限公司 Method and device for testing caches
CN106030544A (en) * 2014-12-24 2016-10-12 华为技术有限公司 Random access memory detection method of computer device and computer device
WO2020114937A1 (en) * 2018-12-07 2020-06-11 Koninklijke Philips N.V. A computing device with increased resistance against address probing
CN111739577A (en) * 2020-07-20 2020-10-02 成都智明达电子股份有限公司 DSP-based efficient DDR test method
CN113157509A (en) * 2021-04-25 2021-07-23 中国科学院微电子研究所 Memory security detection method and system on chip
CN113961478A (en) * 2021-09-28 2022-01-21 新华三云计算技术有限公司 Memory fault recording method and device
CN114116355A (en) * 2021-11-30 2022-03-01 新华三技术有限公司合肥分公司 Memory test method and device and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050047229A1 (en) * 2002-12-18 2005-03-03 Benoit Nadeau-Dostie Method and circuit for collecting memory failure information
CN103208314A (en) * 2013-03-04 2013-07-17 深圳市硅格半导体有限公司 Internal memory test method of embedded system and embedded system
CN103902419A (en) * 2014-03-28 2014-07-02 华为技术有限公司 Method and device for testing caches
CN106030544A (en) * 2014-12-24 2016-10-12 华为技术有限公司 Random access memory detection method of computer device and computer device
WO2020114937A1 (en) * 2018-12-07 2020-06-11 Koninklijke Philips N.V. A computing device with increased resistance against address probing
CN111739577A (en) * 2020-07-20 2020-10-02 成都智明达电子股份有限公司 DSP-based efficient DDR test method
CN113157509A (en) * 2021-04-25 2021-07-23 中国科学院微电子研究所 Memory security detection method and system on chip
CN113961478A (en) * 2021-09-28 2022-01-21 新华三云计算技术有限公司 Memory fault recording method and device
CN114116355A (en) * 2021-11-30 2022-03-01 新华三技术有限公司合肥分公司 Memory test method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟庆昌: "《操作***》", 30 April 2016 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118034991A (en) * 2024-04-11 2024-05-14 北京开源芯片研究院 Memory data access method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US8560922B2 (en) Bad block management for flash memory
US9465537B2 (en) Memory system and method of controlling memory system
CN103959258B (en) Background reorders --- the preventative wear control mechanism with limited expense
US20130318418A1 (en) Adaptive error correction for phase change memory
JP2012532372A (en) System and method for tracking error data in a storage device
US9535611B2 (en) Cache memory for hybrid disk drives
CN101937721A (en) Method for testing memory device
TW201021045A (en) Reliability test method for solid storage medium
CN101377748B (en) Method for checking reading and writing functions of memory device
CN106959912B (en) Disk detection method and device
JP5105351B2 (en) Nonvolatile semiconductor memory device
CN105469834A (en) Testing method for embedded flash memory
CN108039190A (en) A kind of test method and device
CN102981969A (en) Method for deleting repeated data and solid hard disc thereof
CN114780323A (en) Fault detection method, device and equipment for memory in server
CN102486938A (en) Method for rapid detection of memory and device
CN111816239B (en) Disk detection method and device, electronic equipment and machine-readable storage medium
CN114924923A (en) Method, system, equipment and medium for verifying correctness of hard disk write-in point
CN114283868A (en) Method and device for testing reliability of flash memory chip, electronic equipment and storage medium
CN105304140A (en) Method and apparatus for testing memory performance of electronic equipment
CN105302679A (en) Detection method and system for intelligent terminal storage stability
CN104794061A (en) Wear-leveling method for phase change storage system
CN111739574B (en) Static Random Access Memory (SRAM) verification method based on random binary sequence
CN107481764A (en) A kind of 3D Nand Flash scanning detection methods and system
US9262264B2 (en) Error correction code seeding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220722

RJ01 Rejection of invention patent application after publication