CN114780323A

CN114780323A - Fault detection method, device and equipment for memory in server

Info

Publication number: CN114780323A
Application number: CN202210685982.7A
Authority: CN
Inventors: 闫剑锋; 卢双堂; 傅先刚; 陈昊
Original assignee: New H3C Information Technologies Co Ltd
Current assignee: New H3C Information Technologies Co Ltd
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-07-22

Abstract

The method maps memory addresses in the server into memory space, divides the memory space, selects a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block and a target pressure measurement algorithm from a plurality of different associated pressure measurement algorithms for each memory block when fault testing is carried out on each divided memory block; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. Therefore, the embodiment can improve the detection efficiency and the detection rate of the memory faults.

Description

Fault detection method, device and equipment for memory in server

Technical Field

The present disclosure relates to fault detection technologies, and in particular, to a method, an apparatus, and a device for detecting a fault in a memory of a server.

Background

For a server, a memory, a mainboard and a CPU are three major components forming the server, and among the three major components, the server fault caused by the memory accounts for the highest percentage among the three major components. And the fault caused by the memory has a high probability to cause the downtime of the server, the interruption of the operation service is caused, and huge economic loss is caused. Particularly, when a server sold in foreign countries fails, the server is extremely difficult to maintain due to long distance, and great pressure is brought to after-sales service of manufacturers. Therefore, the method for improving the detection rate of the fault memory at the production end has important significance for the quality of the server.

At present, all the virtual addresses allocated to the memory in the server are usually tested by using a memtest algorithm for testing the performance of the memory. However, in the prior art, the detectable rate of the memory is generally improved by increasing the pressure measurement algorithm for testing the memory, but the current pressure measurement algorithms for testing the memory are various but are mature in basic technology, and the difficulty in making a breakthrough is large. Therefore, in order to increase the detectable rate of the memory of the production server, the pressure measurement time of the memory can only be continuously increased, which leads to low detection efficiency of the memory, and greatly increases the production cost.

Disclosure of Invention

The application provides a fault detection method, a fault detection device and fault detection equipment, which are used for improving the detection efficiency and improving the fault detection rate.

The technical scheme provided by the application comprises the following steps:

in a first aspect, an embodiment of the present application provides a method for detecting a failure of a memory in a server, including:

mapping a memory address in a server into a memory space, and dividing the memory space to obtain N memory blocks;

when fault testing is performed on the N memory blocks, the following pressure measurement operations are performed on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;

and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold value, ending the pressure measurement detection of the memory blocks and reporting the fault information of the fault memory blocks.

In a second aspect, an embodiment of the present application provides an apparatus for detecting a failure of a memory in a server, including:

the device comprises a blocking unit, a storage unit and a processing unit, wherein the blocking unit is used for mapping the memory address in the server into a memory space and dividing the memory space to obtain N memory blocks;

the fault detection unit is used for executing the following pressure measurement operation for each memory block when fault testing is performed on the N memory blocks: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;

and the fault reporting unit is used for ending the pressure measurement detection of each memory block and reporting the fault information of the fault memory block when the fault record accumulation corresponding to each memory block reaches the configured memory fault reporting threshold value.

According to the technical scheme, the server memory is divided into N memory blocks, when the N memory blocks are subjected to fault testing, a target memory address read-write mode is selected from multiple different memory address read-write modes associated with the memory blocks and a target pressure measurement algorithm is selected from multiple different associated pressure measurement algorithms for each memory block; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It is thus clear that this application embodiment can carry out the pressure measurement to each memory piece in parallel and detect, and the pressure measurement algorithm that each memory piece carried out the pressure measurement and detects and memory reading mode probably different, this makes can diversely visit the memory piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each memory piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency the time, can also improve memory fault detection rate, reduce the generation cost.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart of a method for detecting a failure of a memory in a server according to the present disclosure;

fig. 2(a) is a schematic diagram of a first memory address read/write method provided in the present application;

fig. 2(b) is a schematic diagram of a second memory address reading and writing manner provided in the present application;

fig. 2(c) is a schematic diagram of a third memory address read-write mode provided in the present application;

fig. 2(d) is a schematic diagram of a fourth memory address read/write method provided in the present application;

fig. 3 is a schematic structural diagram of a failure detection apparatus for a memory in a server according to the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

For a server, a memory, a mainboard and a CPU are three major components forming the server, and among the three major components, the failure of the server caused by the memory accounts for the highest percentage among the three major components. And the fault caused by the memory has a high probability of causing the downtime of the server, causing the interruption of the operation service and causing huge economic loss. In particular, when a server sold abroad fails, maintenance is extremely difficult due to a long way, and great pressure is brought to after-sale service of manufacturers. Therefore, the method has important significance for the quality of the server by improving the detection rate of the fault memory at the production end.

At present, all the virtual addresses allocated to the memory in the server are usually tested by memtest algorithm for testing the performance of the memory. However, the existing pressure measurement detection has a low memory detection rate at the production end, and particularly for memories with batch problems, a large amount of faulty memories flow to the market due to the low memory detection rate, so that a large amount of economic loss is caused. In addition, as the environment for using the server by a user is complex, the requirement on the process precision of the memory is higher and higher, the spacing between the particle storage units is smaller and smaller, the coupling coherence fault ratio is correspondingly improved, the high coupling coherence fault ratio causes the current memory fault recurrence rate to be 40% -50%, and the phenomenon is more and more obvious in the future. Based on the method, the detection rate of the production fault memory of the server is improved so as to improve the problems.

However, in the prior art, the detectable rate of the memory is generally improved by increasing the pressure measurement algorithm for testing the memory, but the current pressure measurement algorithms for testing the memory are various but are mature in basic technology, and the difficulty in making a breakthrough is large. Therefore, in order to improve the detectable rate of the memory of the production server, the pressure measurement time of the memory can only be continuously improved, which leads to low detection efficiency of the memory detection and greatly increases the production cost.

In order to solve the foregoing technical problem, an embodiment of the present application provides a method for detecting a failure of a memory in a server, including: mapping a memory address in a server into a memory space, and dividing the memory space to obtain N memory blocks; when fault testing is performed on the N memory blocks, the following pressure measurement operations are performed on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It can be seen that this application embodiment can carry out the pressure measurement to each RAM piece in parallel and detect, and each RAM piece carries out the pressure measurement algorithm that the pressure measurement detected and the memory reading mode that uses probably is different, this makes can diversely visit the RAM piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each RAM piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency in, can also improve memory fault relevance ratio, reduce the cost of generating.

Based on the above description, the following describes the flow shown in fig. 1 provided in the present application:

referring to fig. 1, fig. 1 is a flowchart of a method for detecting a failure of a memory in a server according to the present application. The method can be applied to an electronic device which is connected to a server and used for detecting the memory in the connected server. The method can also be applied to a server to detect memory in the server.

As shown in fig. 1, the process may include the following steps:

step 101, mapping a memory address in a server to a memory space, and dividing the memory space to obtain N memory blocks.

In practical application, a server is generally provided with a plurality of memory banks, the memory banks correspond to different physical addresses, in this embodiment, the physical addresses corresponding to the memory banks are mapped into a large memory space, N is a positive integer, and it is easy to see that the memory space refers to physical addresses of a memory. The partitioned memory blocks cannot be too small or too large, and if the partitioned memory blocks have too small memories, data needs to be read just before being written, so that the detection rate of a failed memory is affected. If the memory of the partitioned memory blocks is too large, the access rate of accessing each memory block is too slow, and the data written into the memory blocks can be read out only after a long time interval, which affects the detection rate of the failed memory. Based on this, as an embodiment, the implementation of the partition processing on the memory space in step 101 includes: and according to the principle that the access rate of each divided memory block is within a preset rate range, dividing the memory space. Since the physical addresses are all multiples of 8, 1024 x 4 is the smallest unit of access to memory, based on which N is a power of 4096 times 2 in some embodiments.

Step 102, when performing fault testing on N memory blocks, performing the following pressure measurement operations for each memory block: selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a target pressure measurement algorithm from a plurality of different pressure measurement algorithms associated with the memory block; and performing pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory failure times of the memory block if the memory block is detected to have a failure.

The multiple associated memory address read-write modes can be configured in the memory address read-write mode file in advance, so that when fault tests are performed on the N memory blocks, one memory address read-write mode file is selected from the memory address read-write mode files. The above-mentioned associated multiple memory address read-write modes may also be a memory address read-write mode file selected from multiple memory address read-write modes corresponding to the memory block according to a mapping relationship between the memory block and the multiple memory address read-write modes. Accordingly, a plurality of associated different pressure measurement algorithms may be configured in the pressure measurement algorithm file in advance, so that when the fault test is performed on the N memory blocks, one pressure measurement algorithm is selected from the pressure measurement algorithm file. The associated multiple pressure measurement algorithms may also be a pressure measurement algorithm selected from multiple pressure measurement algorithms corresponding to the memory block according to a mapping relationship between the memory block and the multiple pressure measurement algorithms. As to how to select a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block and how to select a target pressure measurement algorithm from a plurality of different pressure measurement algorithms associated with the memory block, the following description will be made in detail, and details are not repeated herein.

According to the sorting order of the physical addresses corresponding to the memory blocks, the physical address which is positioned at the forefront in the physical address sequence corresponding to the memory block is called a low address, and the physical address which is positioned at the rearmost in the physical address sequence corresponding to the memory block is called a high address. As an embodiment, the memory address reading and writing mode file may be a memory address reading and writing mode from a high address to a low address as indicated by an arrow direction shown in fig. 2(a), a memory address reading and writing mode from a low address to a high address as indicated by an arrow direction shown in fig. 2(b), a memory address reading and writing mode from a two-end address to a middle address as indicated by an arrow direction shown in fig. 2(c), or a memory address reading and writing mode from a middle address to a two-end address as indicated by an arrow direction shown in fig. 2 (d).

As another embodiment, the above-mentioned pressure measuring algorithm may also be called a memory fault testing algorithm, and may be a chessboard algorithm, a multiplication algorithm, a division algorithm, a step 1 algorithm, or a step 0 algorithm.

For example, for memory block a1, the checkerboard algorithm is run on the memory block in a memory address read-write manner from a high address to a low address, and for memory block a2, the walking 1 algorithm is run on the memory block in a memory address read-write manner from a low address to a high address.

Step 103, when the accumulated fault records corresponding to each memory block reach the configured memory fault reporting threshold, step 104 is executed.

Accumulating the fault records corresponding to each memory block, and executing step 104 when the accumulated fault records reach the configured memory fault reporting threshold. The memory failure reporting threshold value may be configured under a BIOS (Basic Input Output System). The memory fault reporting threshold may be set to 1, which means that step 104 is executed as soon as the fault record is found to be 1. The memory fault reporting threshold may be set to 2, so that when the fault record corresponding to each accumulated memory block reaches 2, step 104 is executed, where a value of the fault reporting threshold is related to an actual requirement on the memory fault rate, if the requirement on the memory fault rate is high, the fault reporting threshold is set to be lower, and if the requirement on the memory fault rate is low, the fault reporting threshold is set to be higher.

And step 104, ending the pressure measurement detection of each memory block, and reporting the fault information of the fault memory block.

In this step, as an embodiment, an ECC (Error correction Code, Error check and correction) checking tool in the server may be triggered to report the failure information of the failed memory block. On one hand, the fault information of the fault memory block is directly reported to an operating system, and on the other hand, the fault information of the fault memory block is reported to the base plate management controller.

So far, the description shown in fig. 1 is completed.

As can be seen from the above technical solutions,

in the method, a server memory is divided into N memory blocks, when fault testing is carried out on the N memory blocks, a target memory address read-write mode is selected from multiple different memory address read-write modes associated with the memory block and a target pressure measurement algorithm is selected from multiple different pressure measurement algorithms associated with each memory block; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It is thus clear that this application embodiment can carry out the pressure measurement to each memory piece in parallel and detect, and the pressure measurement algorithm that each memory piece carried out the pressure measurement and detects and memory reading mode probably different, this makes can diversely visit the memory piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each memory piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency the time, can also improve memory fault detection rate, reduce the generation cost.

After the flowchart of fig. 1 is completed, as an embodiment, the performing, in the step 102, the pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm is performed when the pressure measurement start time allocated to the memory block arrives. If it is detected that the memory block has a fault, or if it is detected that the memory block has no fault, the method further includes: and detecting whether the current time is the pressure measurement ending time distributed to the memory block, and if not, returning to execute pressure measurement operation aiming at the memory block.

In this embodiment, the pressure measurement time obtained by subtracting the pressure measurement start time from the pressure measurement end time is a pressure measurement time generally used in the art. For each memory block, the pressure measurement start time allocated to the memory block and the pressure measurement end time allocated to the memory block may be the pressure measurement start time and the pressure measurement end time allocated to each memory block in a unified manner, that is, the pressure measurement start time and the pressure measurement end time allocated to each memory block are the same. The pressure measurement start time and the pressure measurement end time allocated to the memory block individually may also be set, so that the pressure measurement start time allocated to each memory block and the pressure measurement end time allocated to the memory block may be different.

It can be seen that the technical scheme provided by this embodiment can select multiple memory address read-write modes to visit the memory block in multiple directions and adopt multiple pressure measurement algorithms to measure the pressure of each memory block within the set pressure measurement time, and then can exert the advantage of combining multiple pressure measurement algorithms and multiple memory address read modes, thereby can further improve the memory fault detection rate while improving the detection efficiency, greatly reduce the generation cost.

As an embodiment, before step 101, before mapping the memory address in the server to the memory space, the method further includes; and under an operating system, performing address mapping on the memory in the server to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner. In practical application, the mmap function can be called to map the virtual address and the physical address of the memory so as to realize the steps, and the reading and writing operation on the mapped virtual address is the same as the reading and writing operation on the physical address.

In some embodiments, the target memory address reading and writing mode may be a memory address reading and writing mode randomly selected from a plurality of associated different memory address reading and writing modes.

In other embodiments, the target load cell algorithm may be a load cell algorithm randomly selected from an associated plurality of different load cell algorithms.

In other embodiments, the target memory address reading and writing mode is a memory address reading and writing mode that is the same as the historical memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.

The historical memory address reading and writing mode of the embodiment is a memory address reading and writing mode selected before.

Multiple experiments prove that the higher detection rate can be obtained by continuously selecting the same memory address reading and writing mode for multiple times for one memory block.

In other embodiments, a pressure measurement algorithm that is the same as the historical pressure measurement algorithm is selected from an associated plurality of different pressure measurement algorithms.

The historical memory address reading and writing mode of the embodiment is a pressure measurement algorithm selected previously.

Multiple experiments verify that the same pressure measurement algorithm is continuously selected for multiple times for one memory block, so that higher detection rate can be obtained.

In other embodiments, the target memory address reading and writing mode may be a historical memory address reading and writing mode different from a recently selected memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.

The historical memory address reading and writing mode of this embodiment is a memory address reading and writing mode that was selected before the most recently selected memory address reading and writing mode.

In other embodiments, the target pressure measurement algorithm is a historical pressure measurement algorithm selected from an associated plurality of different pressure measurement algorithms that is different from a most recently selected pressure measurement algorithm.

The historical pressure measurement algorithm of the present embodiment is a historical pressure measurement algorithm that has been selected before the most recently selected historical pressure measurement algorithm.

Based on the foregoing embodiment, as an embodiment, the selecting a target memory address reading and writing manner from multiple different memory address reading and writing manners associated with the memory block and selecting a target pressure measurement algorithm from multiple different pressure measurement algorithms associated with the memory block include the following implementation manners,

the first realization mode is as follows: randomly selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block; and randomly selecting a target pressure measurement algorithm from the associated multiple different pressure measurement algorithms.

The second implementation manner is as follows: randomly selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a historical pressure measuring algorithm different from the most recently selected pressure measuring algorithm from the associated plurality of different pressure measuring algorithms.

The third implementation manner is as follows: selecting a memory address read-write mode which is the same as the historical memory address read-write mode from the associated multiple different memory address read-write modes, and randomly selecting a target pressure measuring algorithm from the associated multiple different pressure measuring algorithms.

The fourth implementation manner is as follows: randomly selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a pressure measuring algorithm which is the same as the historical pressure measuring algorithm from a plurality of associated different pressure measuring algorithms.

The fifth implementation manner is as follows: selecting a memory address reading and writing mode which is the same as the historical memory address reading and writing mode from the associated multiple different memory address reading and writing modes, and selecting a historical pressure measuring algorithm which is different from the most recently selected pressure measuring algorithm from the associated multiple different pressure measuring algorithms.

The sixth implementation manner is as follows: selecting a historical memory address reading and writing mode different from the most recently selected memory address reading and writing mode from the associated multiple different memory address reading and writing modes, and selecting a pressure measuring algorithm same as the historical pressure measuring algorithm from the associated multiple different pressure measuring algorithms.

The seventh implementation manner is as follows: selecting a historical memory address read-write mode different from the most recently selected memory address read-write mode from the associated multiple different memory address read-write modes, and randomly selecting a target voltage measuring algorithm from the associated multiple different voltage measuring algorithms.

The eighth implementation manner is: selecting a historical memory address read-write mode different from the most recently selected memory address read-write mode from the associated multiple different memory address read-write modes, and selecting a historical pressure measuring algorithm different from the most recently selected pressure measuring algorithm from the associated multiple different pressure measuring algorithms.

The above embodiments are described.

The following describes the apparatus provided in the present application:

referring to fig. 3, fig. 3 is a device 300 for detecting a failure of a memory in a server according to the present invention, including:

a partitioning unit 301, configured to map a memory address in a server into a memory space, and partition the memory space to obtain N memory blocks;

a fault detection unit 302, configured to, when performing a fault test on the N memory blocks, perform the following voltage measurement operation on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;

a failure determining unit 303, configured to trigger the failure reporting unit 304 when the accumulated failure records corresponding to the memory blocks reach the configured memory failure reporting threshold;

the fault reporting unit 304 is configured to end the voltage measurement detection on each memory block, and report fault information of the faulty memory block.

As an embodiment, the performing of the pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm is performed when the pressure measurement start time allocated to the memory block is reached;

if the memory block is detected to have a fault, or if the memory block is detected to have no fault, the apparatus further includes:

and the pressure measurement ending judging unit is used for detecting whether the current time is the pressure measurement ending time allocated to the memory block or not, and if not, returning to execute pressure measurement operation on the memory block.

As an embodiment, the apparatus further comprises;

and the address mapping unit is used for performing address mapping on the memory in the server under the operating system so as to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner.

As an embodiment, the blocking unit 301 is specifically configured to:

and according to the principle that the access rate of each divided memory block is within a preset rate range, dividing the memory space.

As an embodiment, the target memory address reading and writing mode is a memory address reading and writing mode randomly selected from a plurality of associated different memory address reading and writing modes.

As an embodiment, the target load cell algorithm is a load cell algorithm randomly selected from an associated plurality of different load cell algorithms.

As an embodiment, the target memory address reading and writing mode is a memory address reading and writing mode that is the same as the historical memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.

As an embodiment, the target pressure measurement algorithm is a pressure measurement algorithm selected from a plurality of different associated pressure measurement algorithms that is the same as the historical pressure measurement algorithm.

As an embodiment, the target memory address reading and writing mode is a historical memory address reading and writing mode that is different from a recently selected memory address reading and writing mode selected from multiple associated different memory address reading and writing modes.

As one example, the target pressure measurement algorithm is a historical pressure measurement algorithm selected from an associated plurality of different pressure measurement algorithms that is different from a most recently selected pressure measurement algorithm.

Therefore, in the technical scheme of the embodiment of the application, a server memory is divided into N memory blocks, when the N memory blocks are subjected to fault testing, a target memory address reading and writing mode is selected from multiple different memory address reading and writing modes associated with the memory block for each memory block, and a target pressure measurement algorithm is selected from multiple different associated pressure measurement algorithms; carrying out pressure measurement detection on the memory block according to a target memory address reading and writing mode and a target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults; and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks. It is thus clear that this application embodiment can carry out the pressure measurement to each memory piece in parallel and detect, and the pressure measurement algorithm that each memory piece carried out the pressure measurement and detects and memory reading mode probably different, this makes can diversely visit the memory piece and adopt multiple pressure measurement algorithm to carry out the pressure measurement to each memory piece according to multiple memory address reading and writing mode, and then can exert the advantage that multiple pressure measurement algorithm and multiple memory address reading mode combined together, thereby can improve detection efficiency the time, can also improve memory fault detection rate, reduce the generation cost.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

In the electronic device provided in the embodiment of the present application, from a hardware level, a schematic diagram of a hardware architecture can be seen as shown in fig. 4. The method comprises the following steps: a machine-readable storage medium and a processor, wherein: the machine-readable storage medium stores machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to perform the fault detection operations of the memory in the server disclosed in the above examples.

A machine-readable storage medium is provided in an embodiment of the present application that stores machine-executable instructions that, when invoked and executed by a processor, cause the processor to perform the fault detection operations of memory in a server disclosed in the above examples.

Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and so forth. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement without inventive effort.

So far, the description of the apparatus shown in fig. 4 is completed.

The above description is only a preferred embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for detecting faults of a memory in a server is characterized by comprising the following steps:

when fault testing is performed on the N memory blocks, the following pressure measurement operation is performed on each memory block: selecting a target memory address reading and writing mode from a plurality of different memory address reading and writing modes associated with the memory block, and selecting a target pressure measuring algorithm from a plurality of different pressure measuring algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;

and when the accumulated fault records corresponding to the memory blocks reach the configured memory fault reporting threshold, ending the pressure measurement detection on the memory blocks and reporting the fault information of the fault memory blocks.

2. The method according to claim 1, wherein the performing of the pressure measurement detection on the memory block according to the target memory address reading and writing manner and the target pressure measurement algorithm is performed when a pressure measurement start time allocated to the memory block is reached;

if it is detected that the memory block has a fault, or if it is detected that the memory block has no fault, the method further includes:

and detecting whether the current time is the pressure measurement ending time allocated to the memory block, and if not, returning to execute the pressure measurement operation on the memory block.

3. The method of claim 1, wherein before said mapping the memory address in the server to the memory space, the method further comprises;

and under an operating system, performing address mapping on the memory in the server to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner.

4. The method according to claim 2, wherein the partitioning the memory space includes:

5. The method of claim 1, wherein the target memory address read/write mode is a memory address read/write mode selected randomly from a plurality of associated different memory address read/write modes, and/or

The target pressure measurement algorithm is a pressure measurement algorithm randomly selected from an associated plurality of different pressure measurement algorithms.

6. The method according to claim 1, wherein the target memory address reading/writing mode is a memory address reading/writing mode selected from multiple associated different memory address reading/writing modes, the memory address reading/writing mode being the same as the historical memory address reading/writing mode, or/and

the target pressure measuring algorithm is a pressure measuring algorithm which is selected from a plurality of related different pressure measuring algorithms and is the same as the historical pressure measuring algorithm.

7. A failure detection device for a memory in a server, comprising:

the system comprises a blocking unit, a storage unit and a processing unit, wherein the blocking unit is used for mapping a memory address in a server into a memory space and dividing the memory space to obtain N memory blocks;

the fault detection unit is used for executing the following pressure measurement operation for each memory block when fault testing is performed on the N memory blocks: selecting a target memory address read-write mode from a plurality of different memory address read-write modes associated with the memory block, and selecting a target pressure measurement algorithm from a plurality of different pressure measurement algorithms associated with the memory block; carrying out pressure measurement detection on the memory block according to the target memory address reading and writing mode and the target pressure measurement algorithm, and recording the memory fault times of the memory block if the memory block is detected to have faults;

a failure determining unit, configured to trigger the failure reporting unit when the accumulated failure records corresponding to the memory blocks reach the configured memory failure reporting threshold;

and the fault reporting unit is used for finishing the pressure measurement detection of each memory block and reporting the fault information of the fault memory block.

8. The apparatus according to claim 7, wherein the performing of the pressure measurement detection on the memory block according to the target memory address reading and writing manner and the target pressure measurement algorithm is performed when a pressure measurement start time allocated to the memory block is reached;

if it is detected that the memory block has a fault, or if it is detected that the memory block has no fault, the apparatus further includes:

9. The apparatus of claim 7, further comprising;

and the address mapping unit is used for carrying out address mapping on the memory in the server under an operating system so as to map the virtual addresses of the memory in the server to the physical addresses in a one-to-one correspondence manner.

10. The apparatus according to claim 7, wherein the blocking unit is specifically configured to:

11. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to perform the method steps of any of claims 1-6.