CN113868076A - Method for processing multi-core fault in partition - Google Patents

Method for processing multi-core fault in partition Download PDF

Info

Publication number
CN113868076A
CN113868076A CN202111052046.4A CN202111052046A CN113868076A CN 113868076 A CN113868076 A CN 113868076A CN 202111052046 A CN202111052046 A CN 202111052046A CN 113868076 A CN113868076 A CN 113868076A
Authority
CN
China
Prior art keywords
fault
partition
core
health monitoring
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111052046.4A
Other languages
Chinese (zh)
Inventor
曹原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202111052046.4A priority Critical patent/CN113868076A/en
Publication of CN113868076A publication Critical patent/CN113868076A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method for processing multi-core faults in a partition, which is suitable for processing the multi-core faults in the partition and comprises the following steps: s1: fault injection and collection of fault basic information and operating system running state information; s2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially; s3: and (3) fault treatment: one of the health monitoring processes in each partition is selected, and the health monitoring process is used as a fault processing process to take charge of processing all faults in the partition, and the fault basic information and the operating state information of the operating system obtained in S1 are obtained to process the fault. The method is used as a supplement of a health monitoring system of a traditional single-core processor operating system, and solves the problem of fault processing in a multi-core parallel mode.

Description

Method for processing multi-core fault in partition
Technical Field
The invention belongs to the field of computer software application, and particularly relates to a method for processing multi-core faults in a partition.
Background
In the running process of a computer system, various types of errors and faults can be generated by an application program and an operating system, a traditional operating system can only simply conduct processing actions with simple methods and fixed actions when an error is found according to judgment of input and return values and the like, and configurable and targeted processing and recovery cannot be conducted according to the current system state and specific fault details. Accordingly, ARINC653 proposes security functions for providing fault response, handling, recovery, etc. to health monitoring mechanisms. The operating system not only needs to provide the user with the ability to freely select processing actions for certain faults, but also needs to support user-defined fault processing actions.
On the other hand, the traditional operating systems all run on the single-core processor, and the process and method for processing the fault are limited to the running scene of the single-core processor. In the environment of a multi-core processor, a plurality of tasks of an operating system run in parallel, the fault state of the operating system is more complex, the operating system is required to process an error on a certain core under the condition that the normal running of other core tasks is not influenced, and the functions of fault processing, such as logic, method, log record and the like, need to support the running environment of the multi-core processor.
Disclosure of Invention
In order to solve the problems, the invention provides a method for processing multi-core faults in a partition, which is used as a supplement to a health monitoring system of a traditional single-core processor operating system and solves the problem of fault processing in a multi-core parallel mode.
The invention aims to provide a method for processing multi-core faults in a partition, which is suitable for processing the multi-core faults in the partition and comprises the following steps: s1: fault injection and collection of fault basic information and operating system running state information; s2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially; s3: and (3) fault treatment: one of the health monitoring processes in each partition is selected, and the health monitoring process is used as a fault processing process to take charge of processing all faults in the partition, and the fault basic information and the operating state information of the operating system obtained in S1 are obtained to process the fault.
The method for processing the multi-core fault in the partition provided by the invention is also characterized in that the fault basic information comprises fault codes, fault subcodes, fault time and related texts.
The method for processing the multi-core fault in the partition provided by the invention is also characterized in that the S3 comprises the following steps:
s3.1: acquiring a user-defined fault processing action, if the fault processing action exists, performing S3.2, and if the fault processing action does not exist, performing S3.4;
s3.2: calling a corresponding fault processing action and executing the next operation according to the return condition, if TRUE, completing the fault processing action, and if FALSE, entering S3.3;
s3.3: modifying the fault type in the fault event data structure into a health monitoring fault, storing the original type into a fault type history item, and executing S1 again;
s3.4: returning to the previous stage, and handing over the fault to the previous stage for processing.
The method for processing the multi-core fault in the partition provided by the invention is also characterized in that the fault processing process is a health monitoring process corresponding to a core with the smallest core ID value in each partition.
The method for processing the multi-core fault in the partition, provided by the invention, has the characteristics that in the S2, when the partition runs normally, the health monitoring process is suspended on a specified semaphore; when a fault occurs, the core where the fault is located releases semaphore to the health monitoring processes of all cores of the partition in the fault injection process, all cores of the partition run the health monitoring processes, and the original normal running processes are suspended, so that the protection of the fault site is realized.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for processing multi-core faults in a partition, which is used as a supplement to a health monitoring system of a traditional single-core processor operating system and solves the problem of fault processing in a multi-core parallel mode.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow of operating system fault handling for a multi-core partition;
FIG. 2 is a relationship of a health monitoring process to a multi-core processor core, partition, and process.
Detailed Description
In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, the following embodiments are combined with the drawings to specifically describe the processing method provided by the invention.
In the description of the embodiments of the present invention, it should be understood that the terms "central", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only used for convenience in describing and simplifying the description of the present invention, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit to a number of indicated technical features. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.
The terms "mounted," "connected," and "coupled" are to be construed broadly and may, for example, be fixedly coupled, detachably coupled, or integrally coupled; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the creation of the present invention can be understood by those of ordinary skill in the art through specific situations.
As shown in fig. 1-2, a method for processing a multi-core fault in a partition, where the method is applicable to processing a multi-core fault in a partition, includes the following steps:
s1: fault injection and collection of fault basic information and operating system running state information; when the process in the application partition has a fault, a fault injection function is called through a fault injection interface provided by an operating system, the operating system is applied for intervention, and the fault allocation and processing process is started. The fault injection process collects basic information of the fault, including fault code (type), fault sub-code, time of fault, related text prompt, etc. Meanwhile, the running state information of the mobile phone operating system forms a fault event data structure. When the application in the partition fails, the failure information is sent to the health monitoring task at the partition level, the task executes the query of the health monitoring tables at each level, and the processing dispatch level is determined according to the query.
S2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially;
specifically, a multi-core operating environment is arranged in each partition, each core of the partition runs a health monitoring process, and the health monitoring process has the highest priority in the partition. When the partition runs normally, the health monitoring process is suspended on the designated semaphore. When a fault occurs, the core where the fault is located releases semaphore to the health monitoring processes of all cores of the partition in the injection process, and all cores of the partition run the health monitoring processes due to the highest priority of the health monitoring processes, so that the original normal running processes are suspended, and the protection of the fault site is realized. And recovering the suspended state after the fault processing health monitoring process executes the fault processing action, and continuously operating other tasks.
S3: and (3) fault treatment: and selecting one of the plurality of health monitoring processes in each partition, taking the health monitoring process as a fault processing process to be responsible for processing all faults in the partition, and processing the faults by the health monitoring processes in the partitions according to the query of the partition level health monitoring tasks on the health monitoring table. In each partition of the embodiment, the core with the smallest core ID value is responsible for all the processing tasks of the fault in the local partition, the health monitoring process of the core becomes the fault processing process, and the partition health monitoring task starts to work by notifying the fault processing process in the partition through a virtual interrupt. The fault handling process handles the fault by acquiring the fault basic information and the operating state information of the operating system obtained in S1, and executes the following steps:
s3.1: acquiring a user-defined fault processing action, if the fault processing action exists, performing S3.2, and if the fault processing action does not exist, performing S3.4;
s3.2: calling a corresponding fault handling action and executing the next operation according to the return condition, if the return is TRUE, the fault handling action is completed, and if the return is FALSE, the step enters S3.3;
s3.3: modifying the fault type in the fault event data structure into a health monitoring fault, storing the original type into a fault type history item, and executing S1 again;
s3.4: returning to the previous stage, and handing over the fault to the previous stage for processing.
The user-defined fault handling action refers to an operation system providing a set of fault handling actions, and the functions comprise: displaying fault information, restarting a fault partition, stopping the fault partition, restarting a fault module, performing no processing and the like, but only calling health monitoring tasks at partition and module levels. According to the ARINC653 standard, process-level fault handling actions are custom implemented by the user according to the actual needs of the application.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention. The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A multi-core fault processing method in a partition is suitable for multi-core fault processing in the partition, and is characterized by comprising the following steps:
s1: fault injection and collection of fault basic information and operating system running state information;
s2: and (4) fault field protection: running a health monitoring process on each core in the partition, wherein the health monitoring process has the highest priority on each core, and realizing the protection of a fault field through the health monitoring process running preferentially;
s3: and (3) fault treatment: one of the health monitoring processes in each partition is selected, and the health monitoring process is used as a fault processing process to take charge of processing all faults in the partition, and the fault basic information and the operating state information of the operating system obtained in S1 are obtained to process the fault.
2. The intra-partition multi-core fault handling method according to claim 1, wherein the fault basic information includes fault codes, fault subcodes, fault times and associated texts.
3. The intra-partition multi-core fault handling method according to claim 1, wherein the S3 includes the steps of:
s3.1: acquiring a user-defined fault processing action, if the fault processing action exists, performing S3.2, and if the fault processing action does not exist, performing S3.4;
s3.2: calling a corresponding fault processing action and executing the next operation according to the return condition, if TRUE, completing the fault processing action, and if FALSE, entering S3.3;
s3.3: modifying the fault type in the fault event data structure into a health monitoring fault, storing the original type into a fault type history item, and executing S1 again;
s3.4: returning to the previous stage, and handing over the fault to the previous stage for processing.
4. The intra-partition multi-core fault handling method according to claim 1, wherein the fault handling process is a health monitoring process corresponding to a core with a smallest core ID value in each partition.
5. The method according to claim 1, wherein in S2, when the partition is running normally, the health monitoring process suspends on a specified semaphore; when a fault occurs, the core where the fault is located releases semaphore to the health monitoring processes of all cores of the partition in the fault injection process, all cores of the partition run the health monitoring processes, and the original normal running processes are suspended, so that the protection of the fault site is realized.
CN202111052046.4A 2021-09-08 2021-09-08 Method for processing multi-core fault in partition Pending CN113868076A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111052046.4A CN113868076A (en) 2021-09-08 2021-09-08 Method for processing multi-core fault in partition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111052046.4A CN113868076A (en) 2021-09-08 2021-09-08 Method for processing multi-core fault in partition

Publications (1)

Publication Number Publication Date
CN113868076A true CN113868076A (en) 2021-12-31

Family

ID=78994976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111052046.4A Pending CN113868076A (en) 2021-09-08 2021-09-08 Method for processing multi-core fault in partition

Country Status (1)

Country Link
CN (1) CN113868076A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078562A1 (en) * 2002-10-17 2004-04-22 Maarten Koning Health monitoring system for a partitioned architecture
CN110941503A (en) * 2019-11-20 2020-03-31 中国航空工业集团公司西安航空计算技术研究所 Fault processing method and device and electronic equipment
CN112115022A (en) * 2020-08-27 2020-12-22 北京航空航天大学 AADL-based IMA system health monitoring test method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078562A1 (en) * 2002-10-17 2004-04-22 Maarten Koning Health monitoring system for a partitioned architecture
CN110941503A (en) * 2019-11-20 2020-03-31 中国航空工业集团公司西安航空计算技术研究所 Fault processing method and device and electronic equipment
CN112115022A (en) * 2020-08-27 2020-12-22 北京航空航天大学 AADL-based IMA system health monitoring test method

Similar Documents

Publication Publication Date Title
JP5684946B2 (en) Method and system for supporting analysis of root cause of event
CN100359481C (en) Abnormal monitoring equipment and method for multi-task system
WO2019006654A1 (en) Financial self-service equipment maintenance dispatch generation method, hand-held terminal and electronic device
CN108388514A (en) Automatic interface testing method, device, equipment and computer readable storage medium
CN112187933A (en) Method and system for monitoring services in multi-architecture cloud platform
CN110659147B (en) Self-repairing method and system based on module self-checking behavior
CN113821257B (en) Method and device for inquiring information of processor kernel call stack
CN106170013B (en) A kind of Kafka message uniqueness method based on Redis
US20210334130A1 (en) Node-local-unscheduler for scheduling remediation
CN113868076A (en) Method for processing multi-core fault in partition
CN113760491A (en) Task scheduling system, method, equipment and storage medium
CN110908644B (en) Configuration method and device of state node, computer equipment and storage medium
CN111008031B (en) Component updating method and device, electronic equipment and computer readable storage medium
CN110941503A (en) Fault processing method and device and electronic equipment
CN112416725A (en) Pressure testing method and device
CN116136801B (en) Cloud platform data processing method and device, electronic equipment and storage medium
CN103326880A (en) Genesys calling system high-availability cloud computing system and method
CN112540871A (en) Method for realizing general register reservation and recovery
CN115883340A (en) Dual-mode communication fault processing method and device based on HPLC (high Performance liquid chromatography) and HRF (high resolution factor)
CN107704473A (en) A kind of data processing method and device
CN115629920A (en) Data request exception handling method and device and computer readable storage medium
WO2020177495A1 (en) Database connection management method and apparatus, and device
CN114785673B (en) Method and device for acquiring abnormal information during active-standby switching
CN213876704U (en) Intelligent circuit breaker
JP2716537B2 (en) Down monitoring processing method in complex system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination