CN118034986A - Memory fault processing method and device, storage medium and electronic equipment - Google Patents

Memory fault processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN118034986A
CN118034986A CN202410275282.XA CN202410275282A CN118034986A CN 118034986 A CN118034986 A CN 118034986A CN 202410275282 A CN202410275282 A CN 202410275282A CN 118034986 A CN118034986 A CN 118034986A
Authority
CN
China
Prior art keywords
memory
fault
virtual machine
information
sharing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410275282.XA
Other languages
Chinese (zh)
Inventor
张兆阁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202410275282.XA priority Critical patent/CN118034986A/en
Publication of CN118034986A publication Critical patent/CN118034986A/en
Pending legal-status Critical Current

Links

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention discloses a memory fault processing method and device, a storage medium and electronic equipment. The method comprises the following steps: performing memory fault detection on the physical server, and acquiring fault memory information under the condition that a memory fault is detected; determining a first process dependent on a failed memory based on the failed memory information; removing the sharing process from the first process to obtain a second process; and ending the virtual machine process and the second process corresponding to the fault memory. Under the condition that the memory fault is detected, the first process using the fault memory is determined, the shared process is filtered in the first process, the second process of the shared process is removed, the virtual machine process and the second process corresponding to the fault memory are ended, the influence of the ending shared process on other non-fault processes is avoided on the basis of guaranteeing the process influenced by the ending fault memory, the condition of other process faults caused by the ending of the shared process is reduced, and the overall fault rate of the system is reduced.

Description

Memory fault processing method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for processing a memory failure, a storage medium, and an electronic device.
Background
Cloud computing has been widely used as a new computing model due to its high availability and high scalability. By creating a plurality of virtual machines on the physical server, sharing of cloud computing infrastructure is realized, and resource utilization rate is improved.
In the current memory fault processing flow of the physical server, the function fault condition of other virtual machines can be caused, so that the overall fault rate of the physical server is increased.
Disclosure of Invention
The invention provides a memory fault processing method, a memory fault processing device, a storage medium and electronic equipment, so as to reduce the overall fault rate of a system.
According to one aspect of the present invention, a memory fault processing method is provided, and the memory fault processing method is applied to a physical server, where the physical server includes a plurality of virtual machines and a sharing process, at least one virtual machine corresponds to one sharing process, and the sharing process performs data interaction with the at least one virtual machine respectively;
The method comprises the following steps:
performing memory fault detection on the physical server, and acquiring fault memory information under the condition that a memory fault is detected;
determining a first process dependent on a failed memory based on the failed memory information;
removing the sharing process from the first process to obtain a second process;
and ending the virtual machine process and the second process corresponding to the fault memory.
Optionally, the ending the virtual machine process corresponding to the failed memory includes: and removing the fault memory corresponding to the virtual machine process from the shared memory of the shared process.
Optionally, the fault memory information includes a fault memory address;
the determining a first process dependent on the failed memory based on the failed memory information includes: and determining the process calling the fault memory address as a first process depending on the fault memory.
Optionally, after determining the first process depending on the failed memory based on the failed memory information, the method further includes: and matching the preset information of the sharing process with the process information of the first process to determine the sharing process in the first process.
Optionally, the sharing process includes one or more of the following: virtual switch processes and storage processes; the preset information of the sharing process comprises name information of the sharing process.
Optionally, after determining the first process depending on the failed memory based on the failed memory information, the method further includes:
and determining whether the fault memory is a virtual machine memory or not based on the first process, wherein the first process corresponding to the virtual machine memory comprises a process of a set type.
Optionally, in the case of detecting a memory failure, the method further includes: triggering the hardware interrupt of the physical server; and after ending the virtual machine process and the second process corresponding to the failed memory, further including: and carrying out interrupt recovery on the physical server.
According to another aspect of the present invention, there is provided a memory failure processing apparatus, including:
the fault detection module is used for detecting memory faults of the physical server;
the information acquisition module is used for acquiring the failure memory information under the condition that the memory failure is detected;
The process determining module is used for determining a first process depending on the fault memory based on the fault memory information; removing the sharing process from the first process to obtain a second process;
and the process processing module is used for ending the virtual machine process and the second process corresponding to the fault memory.
According to another aspect of the present invention, there is provided an electronic apparatus including:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the memory fault handling method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the memory fault handling method according to any of the embodiments of the present invention when executed.
According to the technical scheme, under the condition that the memory fault is detected, the first process using the fault memory is determined, the shared process is filtered in the first process, the second process for removing the shared process is obtained, and on the basis of guaranteeing the process influenced by the fault memory, the influence of the ending shared process on other non-fault processes is avoided, the condition of other process faults caused by the ending shared process is reduced, and the overall fault rate of the system is reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a cloud host framework in a physical server according to an embodiment of the present invention;
FIG. 2 is a flowchart of a memory failure processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a memory failure processing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a memory failure processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing a memory failure processing method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In a cloud host virtualization environment, a virtio protocol standard is generally used, and the device virtualization efficiency is improved through a shared memory. Exemplary, referring to fig. 1, fig. 1 is a schematic diagram of a cloud host framework in a physical server according to an embodiment of the present invention. And virtio-net is used as a network card drive in the virtual machine, and interaction is carried out with the virtual machine through vhost-user protocol in a virtual switch (Open vSswitch, OVS) to realize the back-end equipment of the virtio network card. The OVS realizes the reading and writing of the network card equipment data through all memories sharing the virtual machine.
In fig. 1, VM0, VM1, and VM2 are virtual machines, respectively, and a memory page in the virtual machine VM0 is VM0: memory page in mem page0, VM1 is VM1: memory page in mem page0, VM2 is VM2: mem page0, where vm0: mem page0 indicates memory page0 in virtual machine VM 0. And the three virtual machines respectively perform data interaction with the OVS, wherein the OVS is a sharing process corresponding to the VM0, the VM1 and the VM 2.
If a memory page of a virtual machine triggers a memory failure, the physical server can end all processes using the virtual machine, thereby avoiding loss caused by using wrong data. All processes using the virtual machine comprise processes for data interaction with other virtual machines, and accordingly, the memory fault processing flow causes the function faults of the other virtual machines, and the overall fault rate of the physical server is increased. For example, if the virtual machine VM0 triggers a memory failure, since the memory pages of the virtual machine VM0 are shared to the OVS, that is, the OVS uses the memory pages of the virtual machine VM0, the physical server ends the process of the virtual machine VM0 together with the OVS process. The OVS processes network requests of virtual machine VM1 and virtual machine VM2 in addition to network requests of virtual machine VM 0. In the case where the processes of the virtual machine VM0 and the OVS process are ended, network functions of the virtual machine VM1 and the virtual machine VM2 are caused to fail.
Similarly, when the virtual machine VM1 or the virtual machine VM2 triggers a memory failure, the OVS process ends, which also affects network function failures of other virtual machine processes.
It is understood that the OVS process in fig. 1 is only an example of a shared process, and the shared process may include a virtual switch process (OVS process) and a storage process. For example, virtual machine VM0, virtual machine VM1, and virtual machine VM2 may be corresponding to the same storage process for storing data corresponding to virtual machine VM0, virtual machine VM1, and virtual machine VM2, respectively. Under the condition that any virtual machine triggers a memory fault, the storage process is ended, and the data storage of other virtual machines is affected.
Aiming at the technical problems, the embodiment of the invention provides a memory fault processing method for reducing the overall fault rate. Referring to fig. 2, fig. 2 is a flowchart of a memory failure processing method provided in an embodiment of the present invention, where the method may be applicable to a failure processing case when a memory failure exists in a cloud host, and the method may be performed by a memory failure processing apparatus, where the memory failure processing apparatus may be implemented in a form of hardware and/or software, and the memory failure processing apparatus may be configured in an electronic device such as a computer or a server. It will be appreciated that the electronic device may be a physical server that includes a plurality of virtual machines and sharing processes created in advance therein, such as shown in fig. 1. The number of virtual machines, the types and the number of sharing processes, and the correspondence between the virtual machines and the sharing processes are not limited, and may be set according to the service requirements of the physical server.
As shown in fig. 2, the method includes:
S110, detecting a memory fault of the physical server, and acquiring fault memory information under the condition that the memory fault is detected.
S120, determining a first process depending on the fault memory based on the fault memory information.
S130, removing the sharing process from the first process to obtain a second process.
And S140, ending the virtual machine process and the second process corresponding to the fault memory.
In this embodiment, a plurality of virtual machines are created in advance in the physical server, or the physical server creates a virtual machine in response to a virtual machine creation request, and the service handled by the virtual machine is not limited here.
In the operation process of the physical server, the memory fault detection can be performed on the physical server according to a preset detection period, so as to obtain a memory fault detection result. The physical server is illustratively provided with a memory failure detection subsystem for performing memory failure detection on the physical server, for example, for performing memory failure detection on a linux kernel of the physical server. Optionally, a memory detection script is configured in the memory fault detection subsystem, and the memory fault detection task is executed according to a preset detection period. Optionally, the memory fault detection subsystem may be a neural network model for performing a memory fault detection task.
And acquiring a memory fault detection result of the memory fault detection subsystem, wherein the memory fault detection result can comprise a memory fault identifier, for example, a memory fault identifier is 1 to indicate that a fault memory exists, and a memory fault identifier is 0 to indicate that a fault memory does not exist. And under the condition that no fault memory exists, waiting for executing next memory fault detection, and under the condition that a memory fault exists, executing a fault processing flow. Optionally, the memory failure detection result may further include failure memory information, where the failure memory information includes, but is not limited to, a failure memory address and a process identifier corresponding to the failure memory.
It can be understood that the processes affected by the failed memory include the own process corresponding to the failed memory and the process using the failed memory. The process using the failed memory is the first process depending on the failed memory, where the first process depending on the failed memory may be at least one and may be determined according to the dependency relationship between the processes in the physical server.
Optionally, a process dependency relationship is obtained, where the process dependency relationship may be in a tree form, that is, a process dependency relationship tree, where the process dependency relationship tree includes a connection line between process identifiers and process identifiers, and the process dependency relationship is represented by the connection line, for example, a first identifier of a process a is used as a parent node, a second identifier of a process B is used as a child node of the process a, and a connection line is set between the first identifier of the process a and the second identifier of the process B, so that the process B is represented by the process a, that is, the process B uses memory data of the process a.
Under the condition that the fault memory information comprises the process identifier corresponding to the fault memory, the process identifier corresponding to the fault memory is matched in the process dependency relationship, and the child node of the process identifier corresponding to the fault memory, namely the first process depending on the process corresponding to the fault memory, is determined.
In some embodiments, each process identifier in the process dependency relationship is provided with corresponding address information, and accordingly, the process identifier corresponding to the failed memory address and the process identifier of the first process depending on the failed memory can be determined by matching the failed memory address in the failed memory information in the process dependency relationship. For example, the process identification may be one or more items including process name, process type information.
Optionally, the determining, based on the failed memory information, the first process that depends on the failed memory includes: and determining the process calling the fault memory address as a first process depending on the fault memory. Traversing the processes in the physical server, determining the address information called by each process, comparing the fault memory address with the address information called by each process, and if the address information called by the process comprises the fault memory address, determining the process as the first process depending on the fault memory.
Optionally, performing type judgment on the fault memory, and determining whether the fault memory is a virtual machine memory. If the failed memory is the virtual machine memory, step S130 is continuously executed, and if the failed memory is not the virtual machine memory, the corresponding process of the failed memory and the first process using the failed memory are ended.
In some embodiments, the judging manner of the type of the fault memory may be: and determining whether the fault memory is a virtual machine memory or not based on the first process, wherein the first process corresponding to the virtual machine memory comprises a process of a set type. Traversing a first process determined based on the failed memory address, and determining whether the first process comprises a set type process. The first process comprises a set type process which indicates that the fault memory is virtual machine memory, and the first process does not comprise the set type process which indicates that the fault memory is not virtual machine memory.
Optionally, the set type process includes an OVS process and a virtual machine process, and in the case that the first process includes both the OVS process and the virtual machine process, determining that the failed memory is a virtual machine memory; in the case where the first process does not include an OVS process and/or a virtual machine process, it is determined that the failed memory is not virtual machine memory. It can be appreciated that the virtual machine process in the first process is another virtual machine process different from the virtual machine process corresponding to the failed memory.
In some embodiments, the judging manner of the type of the fault memory may further be: the process identifier of the failure memory is obtained, for example, may be a process name and the like. Judging whether the process identifier comprises a virtual machine identifier or not, and if the virtual machine identifier exists in the process identifier, indicating that the fault memory is a virtual machine process. It can be appreciated that, in the process of creating a process (i.e., a virtual machine) corresponding to the memory of the virtual machine, a process identifier including the virtual machine identifier is generated.
And under the condition that the fault memory is the virtual machine memory, determining a sharing process in the first process, wherein the sharing process processes the data request of the virtual machine process corresponding to the virtual machine memory and processes the data requests of other processes at the same time, and referring to the OVS process shown in fig. 1. In order to avoid the fault of other processes in the processing process of the fault memory, the shared process is filtered in the first process, and the influence of the ending shared process on the other processes is avoided.
The identification mode of the sharing process comprises the following steps: and matching the preset information of the sharing process with the process information of the first process to determine the sharing process in the first process. Preset information of various types of sharing processes is preset, and the preset information can be name information of the sharing processes. The preset information of the sharing process can be stored in advance and can be obtained in a calling mode. Optionally, the sharing process includes one or more of the following: virtual switch processes and storage processes.
Traversing the first process, and if the process information of a certain first process is successfully matched with the preset information of the sharing process, namely the process information of the first process comprises the preset information of the sharing process, determining that the first process is the sharing process. And removing the sharing process from the first process to obtain a second process. It will be appreciated that the second process may or may not be empty.
By ending the virtual machine process and the second process corresponding to the fault memory, ending the self process corresponding to the fault memory and other processes using the fault memory, the influence of the fault memory on the operation of the physical server can be avoided.
Optionally, the virtual machine process and the second process are ended by killing the processes. Ending the virtual machine process corresponding to the fault memory, including: and removing the fault memory corresponding to the virtual machine process from the shared memory of the shared process. By removing the fault memory of the shared process, any adverse effect of the fault memory on the subsequent operation process of the shared memory is avoided. Specifically, the virtual machine memory responds to the process ending instruction, executes the exit flow, sends a memory removal request to the sharing process, and removes the memory shared to the sharing process through vhost-user protocol. The shared processes herein may be virtual switch processes and storage processes.
On the basis of the embodiment, triggering the hardware interrupt of the physical server under the condition that the memory fault is detected; and after ending the virtual machine process and the second process corresponding to the failed memory, further including: and carrying out interrupt recovery on the physical server. And by triggering the interrupt, the influence of the processing flow of the fault memory on the operation process of the physical server is avoided. The interrupt flow of the memory fault processing is realized based on an interrupt processing framework of the physical server, wherein the context saving and restoring part of the interrupt is completed by the interrupt processing framework of the physical server, the memory fault processing flow is a part of a kernel interrupt processing framework of the physical server, and after the memory fault processing flow is executed, the context is restored by the kernel interrupt processing framework of the physical server, and the interrupt flow is exited. For example, the physical server may configure a linux system, and the kernel of the physical server may be a linux kernel.
According to the technical scheme, under the condition that the memory fault is detected, the first process using the fault memory is determined, the shared process is filtered in the first process, the second process for removing the shared process is obtained, the virtual machine process and the second process corresponding to the fault memory are ended, the influence of the ending shared process on other non-fault processes is avoided on the basis of guaranteeing the process influenced by the ending fault memory, the fault condition of other processes caused by the ending shared process is reduced, and the overall fault rate of the system is reduced.
On the basis of the above embodiments, the present invention further provides a preferred example of a memory failure processing method, and referring to fig. 3, fig. 3 is a flowchart of a memory failure processing method provided by the present invention. And executing a memory detection task on the physical machine (namely the physical server) to obtain a memory fault detection result of the physical machine. In the case of detecting a memory failure in the physical machine memory, the physical machine core triggers mce (MACHINE CHECK exception, hardware interrupt triggered when the memory failure occurs) to interrupt. And acquiring fault memory information, wherein the fault memory information comprises a fault memory address and the like. And determining a first process using the failed memory based on the memory failure information to obtain process information of the first process, wherein the process information can comprise a process name and the like. And determining whether the fault memory is a virtual machine memory or not based on the process information of the first process, if the fault memory is the virtual machine memory, executing process filtering on the first process based on hwpoison filter function modules, namely removing a shared process, such as a ovs process and/or a storage process, in the first process to obtain a second process. The second process may or may not be empty. And ending the virtual machine process and the second process corresponding to the fault memory, for example, the virtual machine process and the second process can be ended by a process killing mode. In the process of ending the virtual machine process, during the exiting process of the virtual machine process, the memory shared to the sharing process (for example, ovs process) is removed through vhost-user protocol, so that the virtual machine fault memory does not have any influence on the functions of the sharing process. For example, the ovs process may continue to process network requests of other virtual machines without causing a functional failure to the other virtual machines. And under the condition of completing the fault processing flow, the physical machine continues to execute the original program by performing interrupt recovery through the callback function.
Fig. 4 is a schematic structural diagram of a memory failure processing apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes:
The fault detection module 210 is configured to perform memory fault detection on the physical server;
the information obtaining module 220 is configured to obtain faulty memory information when a memory fault is detected;
a process determining module 230, configured to determine a first process that depends on the failed memory based on the failed memory information; removing the sharing process from the first process to obtain a second process;
And the process processing module 240 is configured to end the virtual machine process and the second process corresponding to the failed memory.
According to the technical scheme, under the condition that the memory fault is detected, the first process using the fault memory is determined, the shared process is filtered in the first process, the second process for removing the shared process is obtained, the virtual machine process and the second process corresponding to the fault memory are ended, the influence of the ending shared process on other non-fault processes is avoided on the basis of guaranteeing the process influenced by the ending fault memory, the fault condition of other processes caused by the ending shared process is reduced, and the overall fault rate of the system is reduced.
Based on the above embodiment, optionally, the process processing module 240 is further configured to: and removing the fault memory corresponding to the virtual machine process from the shared memory of the shared process.
On the basis of the above embodiment, optionally, the faulty memory information includes a faulty memory address;
The process determination module 230 is configured to: and determining the process calling the fault memory address as a first process depending on the fault memory.
On the basis of the above embodiment, optionally, the process determining module 230 is further configured to:
after a first process depending on the fault memory is determined based on the fault memory information, preset information of the shared process is matched with process information of the first process, and the shared process in the first process is determined.
Optionally, the sharing process includes one or more of the following: virtual switch processes and storage processes; the preset information of the sharing process comprises name information of the sharing process.
On the basis of the above embodiment, optionally, the process determining module 230 is further configured to:
and determining whether the fault memory is a virtual machine memory or not based on the first process, wherein the first process corresponding to the virtual machine memory comprises a process of a set type.
On the basis of the above embodiment, optionally, the apparatus further includes:
the interrupt module is used for triggering the hardware interrupt of the physical server under the condition that the memory fault is detected;
And the interrupt recovery module is used for carrying out interrupt recovery on the physical server after ending the virtual machine process and the second process corresponding to the fault memory.
The memory fault processing device provided by the embodiment of the invention can execute the memory fault processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a memory failure handling method.
In some embodiments, the memory failure handling method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the memory failure handling method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the memory failure handling method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the memory fault handling method of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions, and the computer instructions are used for enabling a processor to execute a memory fault processing method, and the method comprises the following steps:
Performing memory fault detection on the physical server, and acquiring fault memory information under the condition that a memory fault is detected; determining a first process dependent on a failed memory based on the failed memory information; removing the sharing process from the first process to obtain a second process; and ending the virtual machine process and the second process corresponding to the fault memory.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. The memory fault processing method is characterized by being applied to a physical server, wherein the physical server comprises a plurality of virtual machines and sharing processes, at least one virtual machine corresponds to one sharing process, and the sharing processes respectively interact data with the at least one virtual machine;
The method comprises the following steps:
performing memory fault detection on the physical server, and acquiring fault memory information under the condition that a memory fault is detected;
determining a first process dependent on a failed memory based on the failed memory information;
removing the sharing process from the first process to obtain a second process;
and ending the virtual machine process and the second process corresponding to the fault memory.
2. The method of claim 1, wherein the ending the virtual machine process corresponding to the failed memory comprises:
and removing the fault memory corresponding to the virtual machine process from the shared memory of the shared process.
3. The method of claim 1, wherein the failed memory information comprises a failed memory address;
The determining a first process dependent on the failed memory based on the failed memory information includes:
and determining the process calling the fault memory address as a first process depending on the fault memory.
4. The method of claim 1, further comprising, after determining the first process dependent on the failed memory based on the failed memory information:
And matching the preset information of the sharing process with the process information of the first process to determine the sharing process in the first process.
5. The method of claim 4, wherein the sharing process comprises one or more of: virtual switch processes and storage processes; the preset information of the sharing process comprises name information of the sharing process.
6. The method of claim 1, further comprising, after determining the first process dependent on the failed memory based on the failed memory information:
and determining whether the fault memory is a virtual machine memory or not based on the first process, wherein the first process corresponding to the virtual machine memory comprises a process of a set type.
7. The method of claim 1, wherein in the event of a detected memory failure, further comprising: triggering the hardware interrupt of the physical server;
and after ending the virtual machine process and the second process corresponding to the failed memory, further including: and carrying out interrupt recovery on the physical server.
8. A memory failure handling apparatus, comprising:
the fault detection module is used for detecting memory faults of the physical server;
the information acquisition module is used for acquiring the failure memory information under the condition that the memory failure is detected;
The process determining module is used for determining a first process depending on the fault memory based on the fault memory information; removing the sharing process from the first process to obtain a second process;
and the process processing module is used for ending the virtual machine process and the second process corresponding to the fault memory.
9. An electronic device, the electronic device comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the memory fault handling method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the memory fault handling method of any one of claims 1-7 when executed.
CN202410275282.XA 2024-03-11 2024-03-11 Memory fault processing method and device, storage medium and electronic equipment Pending CN118034986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410275282.XA CN118034986A (en) 2024-03-11 2024-03-11 Memory fault processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410275282.XA CN118034986A (en) 2024-03-11 2024-03-11 Memory fault processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN118034986A true CN118034986A (en) 2024-05-14

Family

ID=91000416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410275282.XA Pending CN118034986A (en) 2024-03-11 2024-03-11 Memory fault processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN118034986A (en)

Similar Documents

Publication Publication Date Title
CN114328098B (en) Slow node detection method and device, electronic equipment and storage medium
CN111181780A (en) HA cluster-based host pool switching method, system, terminal and storage medium
CN116010220A (en) Alarm diagnosis method, device, equipment and storage medium
CN115145769A (en) Intelligent network card and power supply method, device and medium thereof
CN117033058A (en) Analysis method, device, equipment and medium for software crash data
CN111737055A (en) Service processing method, device, equipment and computer readable storage medium
CN116645082A (en) System inspection method, device, equipment and storage medium
CN118034986A (en) Memory fault processing method and device, storage medium and electronic equipment
CN114936106A (en) Method, device and medium for processing host fault
CN115130112A (en) Quick start-stop method, device, equipment and storage medium
CN114817075A (en) Inter-process heartbeat detection method and device
CN111857689A (en) Framework, function configuration method of framework, terminal and storage medium
CN116579914B (en) Execution method and device of graphic processor engine, electronic equipment and storage medium
CN115629918B (en) Data processing method, device, electronic equipment and storage medium
CN116450120B (en) Method, device, equipment and medium for analyzing kernel of real-time operating system
CN116719621B (en) Data write-back method, device, equipment and medium for mass tasks
CN116185695A (en) Log processing method and device, electronic equipment and storage medium
CN114691404A (en) Service process monitoring method and device, electronic equipment, storage medium and product
CN117931526A (en) Application program exception handling method and device, electronic equipment and storage medium
CN115390992A (en) Virtual machine creating method, device, equipment and storage medium
CN118069294A (en) Inter-core interrupt injection method and device, electronic equipment and storage medium
CN118377677A (en) Multi-server process monitoring method and device, electronic equipment and storage medium
CN117234779A (en) Exception recovery method, device, equipment and medium for distributed database
CN116302796A (en) Process monitoring method and device, electronic equipment and storage medium
CN117493291A (en) Log acquisition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination