CN113868076A

CN113868076A - Method for processing multi-core fault in partition

Info

Publication number: CN113868076A
Application number: CN202111052046.4A
Authority: CN
Inventors: 曹原
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-12-31

Abstract

The invention provides a method for processing multi-core faults in a partition, which is suitable for processing the multi-core faults in the partition and comprises the following steps: s1: fault injection and collection of fault basic information and operating system running state information; s2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially; s3: and (3) fault treatment: one of the health monitoring processes in each partition is selected, and the health monitoring process is used as a fault processing process to take charge of processing all faults in the partition, and the fault basic information and the operating state information of the operating system obtained in S1 are obtained to process the fault. The method is used as a supplement of a health monitoring system of a traditional single-core processor operating system, and solves the problem of fault processing in a multi-core parallel mode.

Description

Method for processing multi-core fault in partition

Technical Field

The invention belongs to the field of computer software application, and particularly relates to a method for processing multi-core faults in a partition.

Background

In the running process of a computer system, various types of errors and faults can be generated by an application program and an operating system, a traditional operating system can only simply conduct processing actions with simple methods and fixed actions when an error is found according to judgment of input and return values and the like, and configurable and targeted processing and recovery cannot be conducted according to the current system state and specific fault details. Accordingly, ARINC653 proposes security functions for providing fault response, handling, recovery, etc. to health monitoring mechanisms. The operating system not only needs to provide the user with the ability to freely select processing actions for certain faults, but also needs to support user-defined fault processing actions.

On the other hand, the traditional operating systems all run on the single-core processor, and the process and method for processing the fault are limited to the running scene of the single-core processor. In the environment of a multi-core processor, a plurality of tasks of an operating system run in parallel, the fault state of the operating system is more complex, the operating system is required to process an error on a certain core under the condition that the normal running of other core tasks is not influenced, and the functions of fault processing, such as logic, method, log record and the like, need to support the running environment of the multi-core processor.

Disclosure of Invention

In order to solve the problems, the invention provides a method for processing multi-core faults in a partition, which is used as a supplement to a health monitoring system of a traditional single-core processor operating system and solves the problem of fault processing in a multi-core parallel mode.

The invention aims to provide a method for processing multi-core faults in a partition, which is suitable for processing the multi-core faults in the partition and comprises the following steps: s1: fault injection and collection of fault basic information and operating system running state information; s2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially; s3: and (3) fault treatment: one of the health monitoring processes in each partition is selected, and the health monitoring process is used as a fault processing process to take charge of processing all faults in the partition, and the fault basic information and the operating state information of the operating system obtained in S1 are obtained to process the fault.

The method for processing the multi-core fault in the partition provided by the invention is also characterized in that the fault basic information comprises fault codes, fault subcodes, fault time and related texts.

The method for processing the multi-core fault in the partition provided by the invention is also characterized in that the S3 comprises the following steps:

s3.1: acquiring a user-defined fault processing action, if the fault processing action exists, performing S3.2, and if the fault processing action does not exist, performing S3.4;

s3.2: calling a corresponding fault processing action and executing the next operation according to the return condition, if TRUE, completing the fault processing action, and if FALSE, entering S3.3;

s3.3: modifying the fault type in the fault event data structure into a health monitoring fault, storing the original type into a fault type history item, and executing S1 again;

s3.4: returning to the previous stage, and handing over the fault to the previous stage for processing.

The method for processing the multi-core fault in the partition provided by the invention is also characterized in that the fault processing process is a health monitoring process corresponding to a core with the smallest core ID value in each partition.

The method for processing the multi-core fault in the partition, provided by the invention, has the characteristics that in the S2, when the partition runs normally, the health monitoring process is suspended on a specified semaphore; when a fault occurs, the core where the fault is located releases semaphore to the health monitoring processes of all cores of the partition in the fault injection process, all cores of the partition run the health monitoring processes, and the original normal running processes are suspended, so that the protection of the fault site is realized.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method for processing multi-core faults in a partition, which is used as a supplement to a health monitoring system of a traditional single-core processor operating system and solves the problem of fault processing in a multi-core parallel mode.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow of operating system fault handling for a multi-core partition;

FIG. 2 is a relationship of a health monitoring process to a multi-core processor core, partition, and process.

Detailed Description

In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, the following embodiments are combined with the drawings to specifically describe the processing method provided by the invention.

In the description of the embodiments of the present invention, it should be understood that the terms "central", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only used for convenience in describing and simplifying the description of the present invention, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit to a number of indicated technical features. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.

The terms "mounted," "connected," and "coupled" are to be construed broadly and may, for example, be fixedly coupled, detachably coupled, or integrally coupled; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the creation of the present invention can be understood by those of ordinary skill in the art through specific situations.

As shown in fig. 1-2, a method for processing a multi-core fault in a partition, where the method is applicable to processing a multi-core fault in a partition, includes the following steps:

s1: fault injection and collection of fault basic information and operating system running state information; when the process in the application partition has a fault, a fault injection function is called through a fault injection interface provided by an operating system, the operating system is applied for intervention, and the fault allocation and processing process is started. The fault injection process collects basic information of the fault, including fault code (type), fault sub-code, time of fault, related text prompt, etc. Meanwhile, the running state information of the mobile phone operating system forms a fault event data structure. When the application in the partition fails, the failure information is sent to the health monitoring task at the partition level, the task executes the query of the health monitoring tables at each level, and the processing dispatch level is determined according to the query.

S2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially;

specifically, a multi-core operating environment is arranged in each partition, each core of the partition runs a health monitoring process, and the health monitoring process has the highest priority in the partition. When the partition runs normally, the health monitoring process is suspended on the designated semaphore. When a fault occurs, the core where the fault is located releases semaphore to the health monitoring processes of all cores of the partition in the injection process, and all cores of the partition run the health monitoring processes due to the highest priority of the health monitoring processes, so that the original normal running processes are suspended, and the protection of the fault site is realized. And recovering the suspended state after the fault processing health monitoring process executes the fault processing action, and continuously operating other tasks.

S3: and (3) fault treatment: and selecting one of the plurality of health monitoring processes in each partition, taking the health monitoring process as a fault processing process to be responsible for processing all faults in the partition, and processing the faults by the health monitoring processes in the partitions according to the query of the partition level health monitoring tasks on the health monitoring table. In each partition of the embodiment, the core with the smallest core ID value is responsible for all the processing tasks of the fault in the local partition, the health monitoring process of the core becomes the fault processing process, and the partition health monitoring task starts to work by notifying the fault processing process in the partition through a virtual interrupt. The fault handling process handles the fault by acquiring the fault basic information and the operating state information of the operating system obtained in S1, and executes the following steps:

s3.2: calling a corresponding fault handling action and executing the next operation according to the return condition, if the return is TRUE, the fault handling action is completed, and if the return is FALSE, the step enters S3.3;

The user-defined fault handling action refers to an operation system providing a set of fault handling actions, and the functions comprise: displaying fault information, restarting a fault partition, stopping the fault partition, restarting a fault module, performing no processing and the like, but only calling health monitoring tasks at partition and module levels. According to the ARINC653 standard, process-level fault handling actions are custom implemented by the user according to the actual needs of the application.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention. The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A multi-core fault processing method in a partition is suitable for multi-core fault processing in the partition, and is characterized by comprising the following steps:

s1: fault injection and collection of fault basic information and operating system running state information;

s3: and (3) fault treatment: one of the health monitoring processes in each partition is selected, and the health monitoring process is used as a fault processing process to take charge of processing all faults in the partition, and the fault basic information and the operating state information of the operating system obtained in S1 are obtained to process the fault.

2. The intra-partition multi-core fault handling method according to claim 1, wherein the fault basic information includes fault codes, fault subcodes, fault times and associated texts.

3. The intra-partition multi-core fault handling method according to claim 1, wherein the S3 includes the steps of:

4. The intra-partition multi-core fault handling method according to claim 1, wherein the fault handling process is a health monitoring process corresponding to a core with a smallest core ID value in each partition.

5. The method according to claim 1, wherein in S2, when the partition is running normally, the health monitoring process suspends on a specified semaphore; when a fault occurs, the core where the fault is located releases semaphore to the health monitoring processes of all cores of the partition in the fault injection process, all cores of the partition run the health monitoring processes, and the original normal running processes are suspended, so that the protection of the fault site is realized.