CN103678013A - Redundancy detection system of multi-core processor operating system level process - Google Patents

Redundancy detection system of multi-core processor operating system level process Download PDF

Info

Publication number
CN103678013A
CN103678013A CN201310696234.XA CN201310696234A CN103678013A CN 103678013 A CN103678013 A CN 103678013A CN 201310696234 A CN201310696234 A CN 201310696234A CN 103678013 A CN103678013 A CN 103678013A
Authority
CN
China
Prior art keywords
module
operating system
detection
fault recovery
processor operating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310696234.XA
Other languages
Chinese (zh)
Inventor
季振洲
廉晓洋
吴昊
苏雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201310696234.XA priority Critical patent/CN103678013A/en
Publication of CN103678013A publication Critical patent/CN103678013A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention provides a redundancy detection system of a multi-core processor operating system level process. The redundancy detection system comprises a cache module, a synchronous module, a detection module and a fault recovery interface module. The cache module is used for caching key data called by a multi-core processor operating system. The synchronous module is used for finishing horizontal and longitudinal time synchronization among all sub-processes. The detection module is used for comparing whether cache data called by all the sub-processes are identical or not, and if not, a fault symbol is generated. The fault recovery interface module is used for providing the fault symbol and the key data for the exterior, if current detection is normal, an restoring point establishing symbol is transmitted, the necessary key data for establishing the restoring point are provided, and otherwise, a fault recovery symbol is transmitted. The redundancy detection system can take the advantages of all levels into consideration and is a detection scheme which is relatively flexible and free of performance losses.

Description

The redundancy detection system of multicore processor operating system level process
Technical field
The present invention relates to computer software and hardware field, the reliability detection system that relates in particular to a kind of redundancy detection technology and use this technology.
Background technology
At present, the research of transient fault detection technique is penetrated in the every aspect of Computer Architecture.Existing transient fault detection technique mainly comprise the fault detect based on processor layer, operating system layer, compiling layer and application layer.Wherein, the detection technique based on processor layer need to have detailed understanding to processor itself, and need to increase extra hardware configuration, has certain hardware costs, transparent to application software, consumes maximum, is difficult for realizing.Detection technique based on compiling layer does not rely on bottom hardware on the one hand, do not need extra hardware costs, do not rely on the other hand concrete application program, do not need to change according to specific application program, but need to compile again the source code of program, and a lot of application do not have source code, this has not just had ample scope for abilities.Detection technique based on application layer is the most flexible, but need on operating system, build, and causes efficiency lower.
Summary of the invention
The difficult realization that the present invention exists for the existing transient fault detection technique of solution, the problem that need to the source code of program be compiled, need to be built in operating system, and then a kind of multicore processor operating system level process redundancy detection system is provided.
The present invention is achieved by the following technical solutions:
A multicore processor operating system level process redundancy detection system, comprising: cache module, synchronization module, detection module and fault recovery interface module;
Described polycaryon processor is MIPS framework four core processors;
Described operating system is linux operating system;
Described cache module is for the critical data of the system call of multicore processor operating system described in buffer memory;
Described synchronization module has been used between each subprocess laterally and time synchronized longitudinally;
Whether described detection module is data cached identical for what relatively each subprocess called, generates Reflector if not identical;
Described fault recovery interface module, for described Reflector and described critical data are provided to outside, creates sign if current detection normally, is transmitted restoration point, and the key data that create restoration point necessity are provided, otherwise transmits fault recovery sign.
Beneficial effect of the present invention: can take into account advantage at all levels, be a kind of relatively flexibly but the detection scheme of the nonvolatile loss of energy.
 
Accompanying drawing explanation
Fig. 1 is the structural representation of the redundancy detection system of multicore processor operating system level process provided by the invention;
Fig. 2 is the result schematic diagram of redundancy process executive system provided by the invention and ballot system.
 
Embodiment
In order more clearly to illustrate feature of the present invention and work ultimate principle, below in conjunction with drawings and Examples, the present invention will be described.
This embodiment provides a kind of redundancy detection system of multicore processor operating system level process, as shown in Figure 1, comprising: cache module 1, synchronization module 2, detection module 3 and fault recovery interface module 4;
Cache module 1 is for the critical data of the kernel calls of multicore processor operating system described in buffer memory;
Synchronization module 2 is for completing between each subprocess laterally and time synchronized longitudinally;
Whether detection module 3 is data cached identical for what relatively each subprocess called, generates Reflector if not identical;
Fault recovery interface module 4, for described Reflector and described critical data are provided to outside, creates sign if current detection normally, is transmitted restoration point, and the key data that create restoration point necessity are provided, otherwise transmits fault recovery sign.
The multicore processor operating system level process redundancy detection system that this embodiment provides be take and operated in MIPS framework four core processors in (SuSE) Linux OS and describe as example.The system call of the Linux kernel inside that Linux kernel level redundant module is increased income by modification realizes, and there are cache module 1, synchronization module 2, detection module 3 and fault recovery interface module 4 in its inside.
Concrete, cache module 1 is responsible for the buffer memory work that all subprocess built-in systems call critical data, and the packet of the buffer memory of is containing the relevant data of parameter, execution result and system call of importing into; Synchronization module 2 is for guaranteeing three subprocesss laterally and synchronous operation longitudinally, wherein synchronous cross is synchronous that between different processes, same systems is called, longitudinally synchronous is synchronous between calling of same systems in same process, and two synchronization mechanisms have guaranteed the correctness of the detection data obtained; Detection module 3 is for detection of the data in buffer memory in cache module, and the correctness of the data that wherein detect is by synchronization module 2 and cache module 1 assurance, and the method for detection can be used the mode of internal memory comparison; Fault recovery interface module 4 provides different fault recovery sign and necessary critical data to the external world according to the execution result of detection module 3, if testing result is normal, transmits and create restoration point sign and relevant data whose necessity, otherwise transmit fault recovery sign; This module mainly cooperates to relevant fault recovery technology.
The technical scheme that adopts this embodiment to provide, has guaranteed the correctness of testing result, and based on linux operating system nucleus, has guaranteed the high efficiency of detection system.
As shown in Figure 2, the specific embodiment of the present invention also provides a kind of redundancy process executive system and the ballot system Application Example as the redundancy detection system of multicore processor operating system level process.
This redundancy process executive system comprises a host process and three subordinate processes, and for execution and the management work of concrete application program, four all processes are independently moved on core at four core processors respectively, to make full use of the multinuclear advantage of polycaryon processor.Wherein host process is responsible for relevant initial work, comprise to the relevant initialization data of detection module transmission, initialization subordinate process context, create subordinate process, the subordinate process wherein creating requires to be distributed in different processor cores, and this can realize by the relative set in linux kernel; Subordinate process is concrete Application Instance.
This ballot system is voted for end product, to determine final operation result.This ballot system carries out the voting of end product with the execution result of three subordinate processes, and it belongs to small probability event based on transient fault, therefore last result can be voted, decides; Use vote module to be responsible for the final operation result of subordinate process to vote.This ballot system only plays a booster action, cooperates guaranteed the reliability service of detection system with detection system.
The above; it is only preferably embodiment of the present invention; these embodiments are all the different implementations based under general idea of the present invention; and protection scope of the present invention is not limited to this; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (1)

1. a multicore processor operating system level process redundancy detection system, is characterized in that, comprising: cache module, synchronization module, detection module and fault recovery interface module;
Described polycaryon processor is MIPS framework four core processors;
Described operating system is linux operating system;
Described cache module is for the critical data of the system call of multicore processor operating system described in buffer memory;
Described synchronization module has been used between each subprocess laterally and time synchronized longitudinally;
Whether described detection module is data cached identical for what relatively each subprocess called, generates Reflector if not identical;
Described fault recovery interface module, for described Reflector and described critical data are provided to outside, creates sign if current detection normally, is transmitted restoration point, and the key data that create restoration point necessity are provided, otherwise transmits fault recovery sign.
CN201310696234.XA 2013-12-18 2013-12-18 Redundancy detection system of multi-core processor operating system level process Pending CN103678013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310696234.XA CN103678013A (en) 2013-12-18 2013-12-18 Redundancy detection system of multi-core processor operating system level process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310696234.XA CN103678013A (en) 2013-12-18 2013-12-18 Redundancy detection system of multi-core processor operating system level process

Publications (1)

Publication Number Publication Date
CN103678013A true CN103678013A (en) 2014-03-26

Family

ID=50315665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310696234.XA Pending CN103678013A (en) 2013-12-18 2013-12-18 Redundancy detection system of multi-core processor operating system level process

Country Status (1)

Country Link
CN (1) CN103678013A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045531A (en) * 2015-07-01 2015-11-11 山东超越数控电子有限公司 Buffer synchronization mechanism between double storage controllers
CN113672377A (en) * 2020-05-13 2021-11-19 株式会社日立制作所 Program generating device, parallel computing device, and computer-readable recording medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088744A1 (en) * 2001-11-06 2003-05-08 Infineon Technologies Aktiengesellschaft Architecture with shared memory
US20110093661A1 (en) * 2008-06-17 2011-04-21 Nxp B.V. Multiprocessor system with mixed software hardware controlled cache management
CN102246155A (en) * 2008-12-10 2011-11-16 飞思卡尔半导体公司 Error detection in a multi-processor data processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088744A1 (en) * 2001-11-06 2003-05-08 Infineon Technologies Aktiengesellschaft Architecture with shared memory
US20110093661A1 (en) * 2008-06-17 2011-04-21 Nxp B.V. Multiprocessor system with mixed software hardware controlled cache management
CN102246155A (en) * 2008-12-10 2011-11-16 飞思卡尔半导体公司 Error detection in a multi-processor data processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHYE A,ET AL: "《37th Annua IEEE/IFIP International Conference on Dependable Systems and Networks》", 31 December 2007, article "Using process-level redundancy to exploit multiple cores for transient fault tolerance" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045531A (en) * 2015-07-01 2015-11-11 山东超越数控电子有限公司 Buffer synchronization mechanism between double storage controllers
CN105045531B (en) * 2015-07-01 2018-01-02 山东超越数控电子有限公司 Cache synchronization mechanism between one kind storage dual controller
CN113672377A (en) * 2020-05-13 2021-11-19 株式会社日立制作所 Program generating device, parallel computing device, and computer-readable recording medium

Similar Documents

Publication Publication Date Title
EP3690657B1 (en) Computer-based interlocking system and redundancy switching method thereof
US8990617B2 (en) Fault-tolerant computer system, fault-tolerant computer system control method and recording medium storing control program for fault-tolerant computer system
US20200019543A1 (en) Method, apparatus and device for updating data, and medium
US8447921B2 (en) Recovering failed writes to vital product data devices
CN105700907A (en) Leverage offload programming model for local checkpoints
US10831616B2 (en) Resilient programming frameworks for iterative computations
CN103984768B (en) A kind of data-base cluster manages method, node and the system of data
CN111581003B (en) Full-hardware dual-core lock-step processor fault-tolerant system
CN102708027B (en) A kind of method and system avoiding outage of communication device
CN111190766A (en) HBase database-based cross-machine-room cluster disaster recovery method, device and system
CN102591736A (en) Method for error detection during execution of a real-time operating system
CN104570831A (en) Process control systems and methods
CN108228391B (en) LockStep processor and management method
CN103154846A (en) Processor power management based on class and content of instructions
WO2014125606A1 (en) Control device
CN114416435A (en) Microprocessor architecture and microprocessor fault detection method
CN103019655B (en) Towards memory copying accelerated method and the device of multi-core microprocessor
CN100511167C (en) Method and device for monitoring memory cell of multiprocessor system
CN105094840A (en) Atomic operation implementation method and device based on cache consistency principle
CN103678013A (en) Redundancy detection system of multi-core processor operating system level process
CN104699550A (en) Error recovery method based on lockstep architecture
US20120185724A1 (en) Parity-based vital product data backup
CN201163399Y (en) Double-CPU protection information shared system based on double-port RAM
CN101241484B (en) Double CPU protection information shared processing method based on double port RAM
EP4170519A1 (en) Data synchronization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140326