CN116860508B

CN116860508B - Distributed system software defect continuous self-healing method, device, equipment and medium

Info

Publication number: CN116860508B
Application number: CN202311112671.2A
Authority: CN
Inventors: 李�杰; 张卫; 赵楠; 张希; 刘文杰
Original assignee: Shenzhen Huarui Distributed Technology Co ltd
Current assignee: Shenzhen Huarui Distributed Technology Co ltd
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-12-26
Anticipated expiration: 2043-08-31
Also published as: CN116860508A

Abstract

The invention relates to the technical field of computers, and provides a continuous self-healing method, device, equipment and medium for defects of distributed system software, which can replicate a real-time process of a distributed system based on a Fork mechanism to obtain a trial calculation process, process information input into the distributed system by using the real-time process and the trial calculation process, determine the detected information as abnormal information when detecting that any defects of the software in the distributed system are triggered by the information to cause the trial calculation process to fail, mark the abnormal information by using the real-time process so as to avoid the subsequent interference of the abnormal information, replicate the real-time process again based on the Fork mechanism to obtain a new trial calculation process, and continuously process the information input into the distributed system based on the real-time process and the new trial calculation process.

Description

Distributed system software defect continuous self-healing method, device, equipment and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for continuously self-healing a software defect of a distributed system.

Background

In the prior art, a replication state machine is a replication architecture frequently used in a distributed system, replicates external input of the system, and performs the same processing by using a plurality of obtained replicas for backup, thereby achieving the purpose of state replication.

However, in the above processing manner, since multiple copies under the replication state machine architecture synchronously process external input, when a specific input triggers a code defect of software, all copies may have process crashes at the same time, so that services are not available.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a method, an apparatus, a device and a medium for continuously self-healing defects of distributed system software, which aim to solve the problem of defects of distributed system software.

A distributed system software defect continuous self-healing method, the distributed system software defect continuous self-healing method comprising:

responding to a software defect self-healing instruction of a distributed system, and acquiring a real-time process of the distributed system;

copying the real-time process based on a Fork mechanism to obtain a trial calculation process;

processing the information input to the distributed system by utilizing the real-time process and the trial calculation process;

when detecting that a message triggers the defect of any software in the distributed system to cause the trial calculation process to fail, determining the detected message as an abnormal message;

when the real-time process monitors the faults of the trial calculation process, the real-time process is utilized to mark the abnormal message;

copying the real-time process again based on the Fork mechanism to obtain a new trial calculation process;

and continuing to process the message input to the distributed system based on the real-time process and the new trial calculation process.

According to the preferred embodiment of the invention, the trial calculation process and the real-time process both store the complete memory state of each software in the distributed system.

According to a preferred embodiment of the present invention, the processing the message input to the distributed system by using the real-time process and the trial-and-error process includes:

when any message is input to the distributed system, submitting the any message to the trial calculation process for processing;

and after the trial calculation process is finished, transmitting the arbitrary information to the real-time process for follow-up processing.

According to a preferred embodiment of the invention, the method further comprises:

and skipping marked abnormal messages in the process of continuing to process the messages input to the distributed system based on the real-time process and the new trial calculation process.

A distributed system software defect continuous self-healing device, the distributed system software defect continuous self-healing device comprising:

the acquisition unit is used for responding to the software defect self-healing instruction of the distributed system and acquiring the real-time process of the distributed system;

the copying unit is used for copying the real-time process based on a Fork mechanism to obtain a trial calculation process;

the processing unit is used for processing the information input to the distributed system by utilizing the real-time process and the trial calculation process;

a determining unit configured to determine, when a defect that a message triggers arbitrary software in the distributed system is detected to cause the trial-computing process to fail, the detected message as an abnormal message;

the marking unit is used for marking the abnormal message by utilizing the real-time process when the real-time process monitors the faults of the trial calculation process;

the copying unit is further used for copying the real-time process again based on the Fork mechanism to obtain a new trial calculation process;

and the processing unit is further used for continuously processing the message input to the distributed system based on the real-time process and the new trial calculation process.

According to a preferred embodiment of the present invention, the processing unit processes the message input to the distributed system using the real-time process and the trial-and-error process includes:

According to a preferred embodiment of the invention, the device further comprises:

and the skipping unit is used for skipping the marked abnormal message in the process of continuing to process the message input to the distributed system based on the real-time process and the new trial calculation process.

A computer device, the computer device comprising:

a memory storing at least one instruction; and

And the processor executes the instructions stored in the memory to realize the continuous self-healing method of the distributed system software defects.

A computer readable storage medium having stored therein at least one instruction for execution by a processor in a computer device to implement the distributed system software defect continuous self-healing method.

According to the technical scheme, the real-time process of the distributed system can be copied based on the Fork mechanism to obtain the trial calculation process, the real-time process and the trial calculation process are utilized to process the information input into the distributed system, when the defect that any software in the distributed system is triggered by the information is detected to cause the trial calculation process to fail, the detected information is determined to be an abnormal message, the abnormal message is marked by the real-time process so as to avoid the subsequent interference of the abnormal message, the real-time process is copied again based on the Fork mechanism to obtain a new trial calculation process, and the information input into the distributed system is continuously processed based on the real-time process and the new trial calculation process, so that the process is repeatedly performed.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a method for continuous self-healing of distributed system software defects according to the present invention.

FIG. 2 is a functional block diagram of a preferred embodiment of a distributed system software defect serial self-healing device according to the present invention.

FIG. 3 is a schematic diagram of a computer device for implementing a method for continuously self-healing defects in distributed system software according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a method for continuously self-healing defects in distributed system software according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.

The continuous self-healing method of the distributed system software defects is applied to one or more computer devices, wherein the computer devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware comprises, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices and the like.

The computer device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.

The computer device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.

The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.

S10, responding to a software defect self-healing instruction of the distributed system, and acquiring a real-time process of the distributed system.

The software defect self-healing instruction can be automatically triggered when the distributed system is started, and can also be triggered according to the requirement of a user, and the method is not limited.

The real-time process is an original process of the distributed system and is used for processing the information received by the distributed system.

S11, copying the real-time process based on a Fork mechanism to obtain a trial calculation process.

The trial calculation process is obtained by copying the real-time process, so that the trial calculation process and the real-time process both store the complete memory state of each software in the distributed system.

Namely: the real-time process is a process of the distributed system before copying, and the trial calculation process is a process which is generated after copying and is the same as the real-time process.

And S12, processing the information input into the distributed system by utilizing the real-time process and the trial calculation process.

In this embodiment, the processing the message input to the distributed system by using the real-time process and the trial-and-error process includes:

In the above embodiment, when any message is input into the distributed system, the trial calculation process processes the any message, and after the processing is completed, the any message is transferred to the real-time process for follow-up processing.

And S13, when detecting that a message triggers the defect of any software in the distributed system so as to cause the trial calculation process to fail, determining the detected message as an abnormal message.

In this embodiment, when there is a software defect triggered by a message, the trial-computing process crashes because the message is processed by the trial-computing process, and the real-time process does not crash because the real-time process has not yet started to process the message, so as to realize effective protection of the real-time process.

And S14, when the real-time process monitors the faults of the trial calculation process, marking the abnormal message by using the real-time process.

Specifically, an identification field may be added to the exception message to mark all messages that cause the software exception.

S15, copying the real-time process again based on the Fork mechanism to obtain a new trial calculation process.

The new trial-computing process is obtained by copying the real-time process, so that the state of the new trial-computing process is consistent with that of the real-time process, and no additional state synchronization or replay is needed.

S16, continuing to process the information input to the distributed system based on the real-time process and the new trial calculation process.

In this embodiment, the method further includes:

In the above embodiment, after the exception message is marked, when the message is processed later, the real-time process and the new trial calculation process skip all marked exception messages, and continue to process the subsequent new message according to the normal message flow, so as to avoid that the exception message causes software exception again, further realize effective protection of software in the distributed system, and ensure the stability of system operation.

And duplicate the identical application process by using the Fork mechanism, and carry on the trial calculation to each external input message in the said distributed system, filter the unusual message that will cause the application program to crash accurately through the result of the trial calculation, thus has guaranteed the security of the said real-time process.

In the process of realizing the continuous self-healing of the software defects of the distributed system based on the Fork mechanism, the embodiment does not need human intervention or external intervention, can repeat the exception handling flow, and can finish the continuous self-healing of the software defects in a scene in extremely short time (millisecond level to hundred millisecond level, depending on the time required by process replication).

FIG. 2 is a functional block diagram of a preferred embodiment of the distributed system software defect serial self-healing device according to the present invention. The distributed system software defect continuous self-healing device 11 comprises an acquisition unit 110, a replication unit 111, a processing unit 112, a determination unit 113, a marking unit 114 and a skipping unit 115. The module/unit referred to in the present invention refers to a series of computer program segments, which are stored in a memory, capable of being executed by a processor and of performing a fixed function. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.

The obtaining unit 110 is configured to obtain a real-time process of the distributed system in response to a software defect self-healing instruction of the distributed system.

The copying unit 111 is configured to copy the real-time process based on a Fork mechanism, and obtain a trial calculation process.

The processing unit 112 is configured to process a message input to the distributed system by using the real-time process and the trial calculation process.

In this embodiment, the processing unit 112 processes the message input to the distributed system by using the real-time process and the trial-and-error process includes:

The determining unit 113 is configured to determine, when a defect that a message triggers any software in the distributed system is detected to cause the trial calculation process to fail, the detected message as an abnormal message.

The marking unit 114 is configured to mark the exception message with the real-time process when the real-time process monitors the trial calculation process for failure.

The copying unit 111 is further configured to copy the real-time process again based on the Fork mechanism, so as to obtain a new trial calculation process.

The processing unit 112 is further configured to continue processing the message input to the distributed system based on the real-time process and the new trial-computing process.

In this embodiment, the apparatus further includes:

a skipping unit 115, configured to skip the marked abnormal message in a process of continuing to process the message input to the distributed system based on the real-time process and the new trial calculation process.

The computer device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program stored in the memory 12 and executable on the processor 13, such as a distributed system software defect successive self-healing program.

It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the computer device 1 and does not constitute a limitation of the computer device 1, the computer device 1 may be a bus type structure, a star type structure, the computer device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, for example, the computer device 1 may further comprise an input-output device, a network access device, etc.

It should be noted that the computer device 1 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.

The memory 12 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the computer device 1, such as a removable hard disk of the computer device 1. The memory 12 may in other embodiments also be an external storage device of the computer device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the computer device 1. The memory 12 may be used not only for storing application software installed in the computer device 1 and various types of data, such as code of a distributed system software defect continuous self-healing program, etc., but also for temporarily storing data that has been output or is to be output.

The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the computer device 1, connects the respective components of the entire computer device 1 using various interfaces and lines, executes or executes programs or modules stored in the memory 12 (for example, executes a distributed system software defect successive self-healing program, etc.), and invokes data stored in the memory 12 to perform various functions of the computer device 1 and process data.

The processor 13 executes the operating system of the computer device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps of the various embodiments of the distributed system software defect continuous self-healing method described above, such as the steps shown in fig. 1.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the computer device 1. For example, the computer program may be divided into an acquisition unit 110, a copying unit 111, a processing unit 112, a determination unit 113, a marking unit 114, a skipping unit 115.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional module is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute portions of the continuous self-healing method for a software defect of a distributed system according to various embodiments of the present invention.

The modules/units integrated in the computer device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on this understanding, the present invention may also be implemented by a computer program for instructing a relevant hardware device to implement all or part of the procedures of the above-mentioned embodiment method, where the computer program may be stored in a computer readable storage medium and the computer program may be executed by a processor to implement the steps of each of the above-mentioned method embodiments.

Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory, or the like.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one straight line is shown in fig. 3, but not only one bus or one type of bus. The bus is arranged to enable a connection communication between the memory 12 and at least one processor 13 or the like.

Although not shown, the computer device 1 may further comprise a power source (such as a battery) for powering the various components, preferably the power source may be logically connected to the at least one processor 13 via a power management means, whereby the functions of charge management, discharge management, and power consumption management are achieved by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The computer device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.

Further, the computer device 1 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the computer device 1 and other computer devices.

The computer device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the computer device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

Fig. 3 shows only a computer device 1 with components 12-13, it being understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the computer device 1 and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.

In connection with fig. 1, the memory 12 in the computer device 1 stores a plurality of instructions to implement a distributed system software defect successive self-healing method, the processor 13 being executable to implement:

Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

The data in this case were obtained legally.

In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The invention is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means stated in the invention may also be implemented by one unit or means, either by software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The continuous self-healing method for the defects of the distributed system software is characterized by comprising the following steps of:

continuing to process the message input to the distributed system based on the real-time process and the new trial calculation process; the state of the new trial calculation process is consistent with that of the real-time process, and additional state synchronization or replay is not needed;

skipping marked abnormal messages in the process of continuing to process the messages input to the distributed system based on the real-time process and the new trial calculation process;

the trial calculation process and the new trial calculation process perform trial calculation processing on each external input message in the distributed system to obtain a trial calculation processing result;

and filtering the abnormal message according to the trial calculation processing result.

2. The method of claim 1, wherein the trial-and-error process and the real-time process each store a complete memory state of each software in the distributed system.

3. The method for continuously self-healing software defects in a distributed system according to claim 1, wherein said processing messages input to said distributed system using said real-time process and said trial-and-error process comprises:

4. The utility model provides a continuous self-healing device of distributed system software defect which characterized in that, the continuous self-healing device of distributed system software defect includes:

the processing unit is further used for continuously processing the information input to the distributed system based on the real-time process and the new trial calculation process; the state of the new trial calculation process is consistent with that of the real-time process, and additional state synchronization or replay is not needed;

a skipping unit, configured to skip the marked abnormal message in a process of continuing to process a message input to the distributed system based on the real-time process and the new trial calculation process;

5. The device for continuously self-healing software defects in a distributed system according to claim 4, wherein the trial-and-error process and the real-time process both store complete memory states of each software in the distributed system.

6. The continuous self-healing device of software defects in a distributed system according to claim 4, wherein the processing unit processing messages input to the distributed system using the real-time process and the trial-and-error process comprises:

7. A computer device, the computer device comprising:

a memory storing at least one instruction; and

A processor executing instructions stored in the memory to implement the distributed system software defect successive self-healing method according to any one of claims 1 to 3.

8. A computer-readable storage medium, characterized by: the computer readable storage medium having stored therein at least one instruction for execution by a processor in a computer device to implement the distributed system software defect successive self-healing method according to any one of claims 1 to 3.