CN110941503A

CN110941503A - Fault processing method and device and electronic equipment

Info

Publication number: CN110941503A
Application number: CN201911138964.1A
Authority: CN
Inventors: 曹原; 时磊; 李运喜; 梅涛
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-03-31

Abstract

The invention provides a fault processing method, a fault processing device and electronic equipment, which are applied to a multi-core partition system, wherein the method comprises the steps of acquiring attribute information of an injected target fault, wherein the attribute information comprises a fault type of the target fault, a target core where the target fault is located and a target partition; judging whether the target fault occurs in the partition operation of the target core; if the target fault occurs during the partition operation of the target core, executing a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the target core; and if the target fault does not occur in the partition operation of the target core, executing a fault recovery scheme indicated in a core monitoring table by a monitoring task corresponding to the target core. By the scheme, various faults of the multi-core partition operating system are flexibly processed.

Description

Fault processing method and device and electronic equipment

Technical Field

The invention belongs to the field of computer application, and particularly relates to a fault processing method and device and electronic equipment.

Background

In the running process of a computer system, various types of errors and faults can be generated by an application program and an operating system, a traditional operating system can only simply conduct processing actions with simple methods and fixed actions when an error is found according to judgment of input and return values and the like, and configurable and targeted processing and recovery cannot be conducted according to the current system state and specific fault details. Accordingly, the ARINC653 sets forth implementation requirements for the health monitoring system: embedded real-time operating systems with partitioning functionality require health monitoring mechanisms to provide security functions such as fault response, handling, recovery, etc. The operating system not only needs to provide the user with the ability to freely select processing actions for certain faults, but also needs to support user-defined fault processing actions.

On the other hand, the traditional operating systems all run on the single-core processor, and the process and method for processing the fault are limited to the running scene of the single-core processor. Under the environment of a multi-core processor, a plurality of tasks of an operating system run in parallel, the fault state is more complex, the application range of fault processing is smaller, and the processing efficiency is poorer.

Therefore, the existing fault processing scheme has the technical problems of smaller application range and poorer processing efficiency of fault processing.

Disclosure of Invention

In order to solve the problems in the background art, embodiments of the present invention provide a method, an apparatus, and an electronic device for handling a fault, which includes:

in a first aspect, an embodiment of the present invention provides a fault handling method, which is applied to a multi-core partition operating system, and the method includes:

acquiring attribute information of the injected target fault, wherein the attribute information comprises a fault type of the target fault, a target core and a target partition where the target fault is located;

judging whether the target fault occurs in the partition operation of the target core;

if the target fault occurs during the partition operation of the target core, executing a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the target core;

and if the target fault does not occur in the partition operation of the target core, executing a fault recovery scheme indicated in a core monitoring table by a monitoring task corresponding to the target core.

Optionally, if the target failure occurs during the partition operation of the target core, the step of executing the failure recovery scheme indicated in the multi-partition monitoring table corresponding to the target core includes:

judging whether a first target level of the target fault in the target core is a module level or not according to the fault type of the target fault and the target partition;

if the first target level corresponding to the target fault is a module level, the monitoring task corresponding to the target core executes a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the module corresponding to the first target level;

and if the first target level corresponding to the target fault is not the module level, executing a fault recovery scheme indicated in the partition monitoring table corresponding to the target partition.

Optionally, if the first target level corresponding to the target fault is not a module level, executing a fault recovery scheme indicated in the partition monitoring table corresponding to the target partition, including;

judging whether a second target level of the target fault is a partition level or not according to a partition monitoring table corresponding to the target fault, and a fault type and a target partition corresponding to the target fault;

if the second target level of the target fault is a partition level, the monitoring task corresponding to the target partition executes a fault recovery scheme indicated in a partition monitoring table corresponding to the target partition;

and if the second target level of the target fault is a process level, executing a user-defined fault recovery scheme corresponding to the target partition.

Optionally, after the step of determining whether the second target hierarchy of the target fault is a partition level, the method further includes:

and if the second target level of the target fault is a process level and a user-defined fault recovery scheme corresponding to the target partition is not provided, executing the fault recovery scheme indicated in the partition monitoring table corresponding to the target partition by the monitoring task corresponding to the target partition.

Optionally, if the target failure does not occur during the partition operation of the target core, the step of executing, by the monitoring task corresponding to the target core, the failure recovery scheme indicated in the core monitoring table includes:

and if the target fault does not occur in the partition operation of the target core, interrupting the operation of other tasks in the target core, and executing a fault recovery scheme indicated in a core monitoring table by a monitoring task corresponding to the target core.

In a second aspect, the present invention provides a fault handling apparatus applied to a multi-core partition operating system, including:

an obtaining module, configured to obtain attribute information of an injected target fault, where the attribute information includes a fault type of the target fault, and a target core and a target partition where the target fault is located;

the judging module is used for judging whether the target fault occurs in the partition operation of the target core;

an execution module to:

Optionally, the execution module is configured to:

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the fault handling method of any one of the preceding first aspects.

The fault handling scheme provided by the embodiment of the invention is applied to a multi-core partition operating system, and the method comprises the following steps: acquiring attribute information of the injected target fault, wherein the attribute information comprises a fault type of the target fault, a target core and a target partition where the target fault is located; judging whether the target fault occurs in the partition operation of the target core; if the target fault occurs during the partition operation of the target core, executing a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the target core; and if the target fault does not occur in the partition operation of the target core, executing a fault recovery scheme indicated in a core monitoring table by a monitoring task corresponding to the target core. By the scheme, effective fault processing can be provided for the complex fault state of the multi-core partition system, the application range of the fault processing is expanded, and the processing efficiency of the operating system is improved.

Drawings

Fig. 1 is a schematic flow chart of a fault handling method according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of a fault handling method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an operating system related to the fault handling method according to the embodiment of the present invention.

Detailed Description

Referring to fig. 1, a schematic flow chart of a fault handling method according to an embodiment of the present invention is shown. The fault processing method is applied to the multi-core partition operating system shown in fig. 2. As shown in fig. 1, the method mainly comprises the following steps:

s101, acquiring attribute information of the injected target fault; the attribute information comprises a fault type of the target fault, a target core and a target partition where the target fault is located;

s102, judging whether the target fault occurs in the partition operation of the target core;

if the target fault occurs during the partition operation of the target core, executing step S103, and executing a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the target core;

if the target fault does not occur during the partition operation of the target core, step S104 is executed, and the monitoring task corresponding to the target core executes the fault recovery scheme indicated in the core monitoring table.

The fault handling method provided by the embodiment is applied to a multi-core partition operating system. The operating system has a partitioning function, and a modern embedded operating system supporting a multi-core processor is composed of a plurality of layers. Under different system states, different types of faults can occur to functions of different layers, and the health monitoring function of the operating system can find the faults occurring in different cores, states and layers and can call different processing methods to process and recover the faults according to the setting of a user.

As shown in FIG. 3, the provided failure handling scheme adds to the handling scheme for the case of partition runtime occurrences. Optionally, the monitoring task corresponding to the target core executes a fault recovery scheme indicated in the core monitoring table.

If the first target level corresponding to the target fault is not the module level, executing a fault recovery scheme indicated in a partition monitoring table corresponding to the target partition, wherein the step comprises the following steps;

Further, after the step of determining whether the second target hierarchy of the target fault is a partition level, the method further includes:

On the basis of the foregoing embodiment, if the target failure does not occur during the partition operation of the target core, the step of executing, by the monitoring task corresponding to the target core, the failure recovery scheme indicated in the core monitoring table includes:

In the fault handling scheme provided by this embodiment, the monitoring and monitoring scheme of the applied multi-core partition operating system includes a health monitoring configuration, a health monitoring task, fault dispatching and processing logic, and a health monitoring action. Each step will be explained separately below.

Firstly, the health monitoring configuration step comprises various levels of health monitoring tables and health monitoring task attribute configuration.

The health monitoring table is divided into a plurality of levels, and the relationship among specific fault types, system states, fault processing levels and fault processing actions is configured, so that the basis of the execution logic of the processing process when a fault occurs is provided.

Each core of the multi-core processor can correspond to a core health monitoring table, and the core health monitoring table can be shared by multiple cores. The core health monitoring table defines the corresponding relation between a specific fault type and a fault processing action in a specific system state, if a certain fault is allocated to a core-level health monitoring task to be processed, the health monitoring task can inquire the core health monitoring table corresponding to the core and execute a specified processing action according to the fault type and the current system state.

The multiple partitions running on each core may correspond to a multi-partition health monitoring table, and the multi-partition health monitoring table may be shared, and the number of partitions is not limited. The multi-partition health monitoring table defines what processing actions a particular fault type should be handled or handed over to partition level processing, and if at multi-partition level. And when the fault is determined to be processed by the multi-partition level, the core health monitoring task searches and executes the processing action corresponding to the fault type in the multi-partition level health monitoring table. When it is determined that the fault is handled by the partition level, the partition health monitoring table continues to be queried by the corresponding partition health monitoring task.

In addition, each partition may correspond to a shareable partition health monitoring table that defines what processing actions a particular fault type should be handled at the partition level or otherwise handed to process level processing, and if at the partition level. When the fault is determined to be processed by the partition level, the partition level health monitoring task searches and executes the processing action corresponding to the fault type in the partition level health monitoring table. When it is determined that the fault is handled by the process level, the operating system invokes a user-defined process level fault handling action.

The health monitoring task attribute configuration specifies various attributes of each health monitoring task, including the stack size of the health monitoring task, the depth of an owned message queue, the size of a log space, fault statistical data, whether to automatically log records, a fault type mask word and the like. The health monitoring task attribute configuration may be shared by multiple health monitoring tasks.

The health monitoring tasks are tasks with high priority and special for processing various faults of application programs and operating systems, are divided into core health monitoring tasks and partition health monitoring tasks, and are related to processors, partitions and other tasks as shown in fig. 2.

Each processor core runs a core health monitoring task that handles all faults dispatched to the core level and the multi-partition level within that processor core. The core health monitoring task has the highest priority among all tasks operated by the core, is in a suspended state when the system normally operates, is activated when a fault occurs, interrupts the operation of other tasks (including fault tasks), restores the suspended state after the fault processing action is completed, and continues to operate other tasks.

Each partition runs a partition level health monitoring task that handles all failures assigned to the partition level within that partition. The partition health monitoring task has the highest priority in the partition range, is in a suspended state when the partition normally runs, is activated when a fault occurs, interrupts the running of other tasks (including fault tasks) in the partition, restores the suspended state after the fault processing action is finished, and continues the running of other tasks. The partition health monitoring task can only run in the local partition time window.

When the application program or the operating system breaks down, the operating system injects the fault information into health monitoring, the health monitoring table is matched with the health monitoring task, the fault processing level and the processing action are determined according to the fault allocation and processing logic which meets the standard, and finally the specified processing action is executed. The fault dispatch and handling logic is shown in figure 3.

The operating system provides a set of fault handling actions, and the functions comprise: displaying fault information, restarting a fault partition, stopping a fault partition, restarting a fault module, performing no processing, and the like. The user can also implement custom fault handling actions according to the actual needs of the application.

As shown in fig. 3, the following will explain the fault processing flow of the present embodiment in detail by integrating a specific embodiment.

1. Injecting a fault;

when faults occur in the running process of the application program and the operating system codes, the fault injection interface can be called to inject corresponding faults, and the parameters comprise fault codes, custom fault messages and message lengths.

2. A fault processing flow;

after fault injection, switching to a fault processing flow, which mainly comprises the following steps:

2.1, checking the validity of the fault code, the self-defined fault message and the message length parameter, and returning an error if the information is illegal;

2.2, checking whether the operating system health monitoring is initialized, if the initialization is completed, injecting the operating system health monitoring, taking over the processing by the operating system health monitoring, and if not, returning an error;

2.3, acquiring fault related information and constructing an organization structure of the fault information;

2.4, logging the fault information;

2.5, judging whether the partitions synchronously run when the fault occurs, and if so, executing the step 2.7; otherwise, executing step 2.6;

and 2.6, judging whether the fault occurs in the task operated by the core, if so, searching a fault processing action corresponding to the fault type and the system state in the core health monitoring table, executing the processing action, and otherwise, returning an error.

2.7 searching the fault type and the corresponding processing level in the multi-partition health monitoring table, judging whether the processing is carried out in the multi-partition level, if so, executing the step 2.8, otherwise, executing the step 2.9.

2.8, searching a processing action corresponding to the fault type in the multi-partition health monitoring table, and executing the processing action;

2.9, judging whether the fault occurs in the task operated by the partition, if not, returning an error, if so, searching the fault type and the corresponding processing level in the partition health monitoring table, judging whether the fault type and the corresponding processing level are processed at the partition level, if so, executing a step 2.10, otherwise, executing a step 2.11;

and 2.10, searching the processing action corresponding to the fault type in the partition health monitoring table, and executing the processing action.

2.11 searching whether the user defines the process level fault processing action, if yes, executing the fault processing action defined by the user, otherwise, executing the step 2.10.

In addition, fault handling actions may also be customized by the user. In particular, the method comprises the following steps of,

the health monitoring provides default fault handling actions, and the user can also hook custom fault handling actions according to own needs.

If the application scene of the user does not want the health monitoring to execute the restarting module, suspend all operations or ignore wrong default actions, the user can customize the fault processing actions and attach the fault processing actions to the fault processing flow through a mechanism provided by the health monitoring, and the health monitoring can replace the default restarting, suspending and ignoring actions with the actions of the user.

The fault processing flow provides all detailed information of fault events including fault types, addresses, description texts and the like for a user through the hook parameters, and the user can realize self-defining fault processing logic according to the information and execute a corresponding processing method.

Furthermore, the present invention provides a fault handling apparatus applied to a multi-core partition operating system, including:

an execution module to:

if the target fault occurs during the partition operation of the target core, executing a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the target core on the target partition;

and if the target fault does not occur in the partition operation of the target core, executing a fault recovery scheme indicated in a core monitoring table corresponding to the target core on the target core.

Optionally, the execution module is configured to:

determining a first target level of the target fault in the target core according to the fault type of the target fault and the target partition, wherein the first target level is any one of a module level and a partition level;

if the first target level corresponding to the target fault is a module level, executing a fault recovery scheme indicated in a multi-partition monitoring table corresponding to the target partition on a module corresponding to the first target level;

and if the first target level corresponding to the target fault is a partition level, executing a fault recovery scheme indicated in a partition monitoring table corresponding to the target partition on the target partition.

Optionally, the execution module is configured to:

determining a second target level of the target fault according to the partition monitoring table corresponding to the target fault, the fault type corresponding to the target fault and the target partition, wherein the second target level is any one of a partition level and a process level;

if the second target level of the target fault is a partition level, executing a fault recovery action indicated in a partition monitoring table corresponding to the target partition on a target process corresponding to the target partition;

and if the second target level of the target fault is a process level, executing the fault recovery action indicated in the partition monitoring table corresponding to the target partition on the next process of the target process corresponding to the target fault.

Optionally, the execution module is configured to:

and if the second target level of the target fault is a process level and the fault recovery action of the next process of the target process corresponding to the target fault is not provided, executing the fault recovery action indicated in the partition monitoring table corresponding to the target partition on the target process corresponding to the target partition.

In addition, an embodiment of the present invention further provides an electronic device, including:

at least one processor; and the number of the first and second groups,

The fault processing scheme provided by the embodiment of the invention can provide effective fault processing aiming at the complex fault state of the multi-core partition system, expand the application range of the fault processing and improve the processing efficiency of the operating system. The specific implementation process of the fault handling apparatus and the electronic device provided in the embodiment of the present invention may refer to the specific implementation process of the above method embodiment, and will not be described again.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the scope of protection not disclosed should be dominated by the scope of protection claimed.

Claims

1. A fault handling method is applied to a multi-core partition operating system, and comprises the following steps:

2. The method according to claim 1, wherein if the target failure occurs during a partition runtime of the target core, the step of executing the failure recovery scheme indicated in the multi-partition monitoring table corresponding to the target core comprises:

3. The method according to claim 2, wherein the step of executing the fault recovery scheme indicated in the partition monitoring table corresponding to the target partition if the first target hierarchy corresponding to the target fault is not a module hierarchy comprises;

4. The method of claim 3, wherein after the step of determining whether the second target level of the target fault is a partition level, the method further comprises:

5. The method according to any one of claims 1 to 4, wherein if the target failure does not occur during the partition runtime of the target core, the step of executing, by the monitoring task corresponding to the target core, the failure recovery scheme indicated in the core monitoring table includes:

6. A fault handling device applied to a multi-core partition operating system comprises:

an execution module to:

7. The apparatus of claim 6, wherein the execution module is configured to:

8. The apparatus of claim 7, wherein the execution module is configured to:

9. The apparatus of claim 8, wherein the execution module is configured to:

10. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the fault handling method of any one of the preceding claims 1 to 5.