CN104050051B - A kind of method for diagnosing faults of spaceborne computer - Google Patents

A kind of method for diagnosing faults of spaceborne computer Download PDF

Info

Publication number
CN104050051B
CN104050051B CN201410301310.7A CN201410301310A CN104050051B CN 104050051 B CN104050051 B CN 104050051B CN 201410301310 A CN201410301310 A CN 201410301310A CN 104050051 B CN104050051 B CN 104050051B
Authority
CN
China
Prior art keywords
fault
logic
state
hardware
spaceborne computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410301310.7A
Other languages
Chinese (zh)
Other versions
CN104050051A (en
Inventor
花秋琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aerospace Electronic Communication Equipment Research Institute
Original Assignee
Shanghai Aerospace Electronic Communication Equipment Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aerospace Electronic Communication Equipment Research Institute filed Critical Shanghai Aerospace Electronic Communication Equipment Research Institute
Priority to CN201410301310.7A priority Critical patent/CN104050051B/en
Publication of CN104050051A publication Critical patent/CN104050051A/en
Application granted granted Critical
Publication of CN104050051B publication Critical patent/CN104050051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses the method for diagnosing faults of a kind of spaceborne computer, by the way of cooperative work of software and hardware, complete the fault diagnosis of spaceborne computer, including: based on the fault detect asserted and hardware fault event-driven.Include based on the fault detect asserted: the hardware system of spaceborne computer provides operation interface and the numerical range of hardware driving parameter, by software retaking of a year or grade and judge;Function interface is carried out inputting parameter and the detection of state return by software;By asserting the working range inputting parameter, when judging input parameter beyond threshold value, in the way of traps or call back function, throwing is abnormal, and completes fault diagnosis and recovery in the abnormality processing flow process of processor;Hardware fault event-driven includes: uses and controls stream, the synchronous regime feedback system of data stream, with waiting signal, rub-out signal and three kinds of current operational processes of triggering mode interrupt handler of interrupt signal of bus access, and carry out Fault Identification and recovery according to event-driven source, feedback information.

Description

A kind of method for diagnosing faults of spaceborne computer
Technical field
The present invention relates to fault diagnosis technology field, particularly to the fault diagnosis side of a kind of spaceborne computer Method.
Background technology
Radiated by high energy particle in spatial environments, the unfavorable factor such as solar flare and high-low temperature difference, make Obtain the logical resource in computer, easily there is all kinds of instantaneous or permanent fault in storage medium.Space flight The Autonomous Control requirement of device height, adapts to the requirement of ability, holds in critical events complex space environment The requirement of continuous non-stop run, need to possess electronic equipment on satellite particularly core control calculating unit from Major error diagnosis and fault-tolerant ability.
Fault diagnosis and fault-toleranr technique, mainly by increasing redundancy, are known with backup, coding, pattern The mode such as not reaches the diagnosis to equipment fault and recovery.Existing electronic equipment on satellite is by scale, unit's device The restriction of the many factors such as part type selecting, the method currently mainly used include multimachine independently switch, cold and hot redundancy, The modes such as the storage hamming code of resource, two from three.Aforesaid way effective guarantee unit product is in fault After maximum service ability, the transient fault of storage resource is had preferably in real time error correction and detection ability.And In the development of modern electronic equipment on satellite, due to lifting, integrated level and the design scale of properties of product Increase, the use of a large amount of large scale integrated circuits so that existing diagnosis and fault-tolerant way can not be expired The application requirement of foot electronic equipment on satellite particularly spaceborne computer.
Summary of the invention
The present invention is directed to deficiencies of the prior art, it is provided that the fault of a kind of spaceborne computer is examined Disconnected method, the present invention is achieved through the following technical solutions:
The method for diagnosing faults of a kind of spaceborne computer, completes spaceborne by the way of cooperative work of software and hardware The fault diagnosis of computer, including: based on the fault detect asserted, and hardware fault event-driven;
Include based on the fault detect asserted:
The hardware system of spaceborne computer provides operation interface and the numerical range of hardware driving parameter, by soft Part retaking of a year or grade also judges;Function interface is carried out inputting parameter and the detection of state return by software;By asserting The working range of input parameter, when judging input parameter beyond threshold value, with traps or call back function Mode throwing is abnormal, and completes fault diagnosis and recovery in the abnormality processing flow process of processor;
Hardware fault event-driven includes:
Use and control stream, the synchronous regime feedback system of data stream, with waiting signal, the mistake of bus access Error signal and three kinds of current operational processes of triggering mode interrupt handler of interrupt signal, and drive according to event Dynamic source, feedback information carry out Fault Identification and recovery.
It is also preferred that the left use control stream, the synchronous regime feedback system of data stream to include:
Splitting the data stream of spaceborne computer by functional domain or clock zone, segmentation obtains some functions Unit, sets up state machine to each functional unit split, state machine include Idle state, operating conditions with And confirm three kinds of states of state, and cut-point is controlled flows synchronization and shakes hands, synchronizing shakes hands includes state machine Data communication verification between state confirmation and two functional units, makes mistakes when functional unit and causes two When synchronization failure between functional unit or communication verification are incorrect, two control stream synchronizations and shake hands unsuccessfully Cause data stream to link, until processor bus accesses time-out, enter bus access operation exception stream Journey.
It is also preferred that the left segmentation obtains some functional units and includes: processor decoding respective logic, main equipment lead to News control logic, control logic, storage control logic and interface accessing logic from device talk.
It is also preferred that the left state machine includes Idle state, operating conditions and confirms three kinds of states of state, between three kinds of states Switch condition include: current logic under Idle state, detection higher level's logic Booting sequence identify whether Effectively, and whether lower logical is in Idle state, if judged result is all for being, enters operating conditions, instead Do not change;Current logic is under operating conditions, and whether the workflow of detection current logic terminates, if Judged result for be then enter confirm state, otherwise do not change;Current logic, under confirming state, detects higher level It is invalid that the Booting sequence of logic identifies whether, and whether lower logical is in operating conditions, if judged result All for being, enter Idle state, otherwise do not change.
It is also preferred that the left for there is no the processors to be identified such as bus, increase time-out counter at Idle state, And when enumerator exceedes threshold value to interrupt or bus error mode notifier processes device.
Accompanying drawing explanation
Shown in Fig. 1 is present invention flow chart based on the synchronous regime feedback system controlling stream;
Shown in Fig. 2 is the state machine diagram of the present invention;
Shown in Fig. 3 is present invention algorithm flow chart based on the fault detect asserted;
Shown in Fig. 4 is the multi-data source acquisition system fault detect schematic diagram asserted of transaction-level of the present invention.
Detailed description of the invention
Below with reference to the accompanying drawing of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Whole description and discussion, it is clear that a part of example of the only present invention as described herein, be not Whole examples, based on the embodiment in the present invention, those of ordinary skill in the art are not making creation Property work on the premise of the every other embodiment that obtained, broadly fall into protection scope of the present invention.
For the ease of the understanding to the embodiment of the present invention, make as a example by specific embodiment below in conjunction with accompanying drawing Further explanation illustrates, and each embodiment does not constitute the restriction to the embodiment of the present invention.
The method for diagnosing faults of a kind of spaceborne computer, completes spaceborne by the way of cooperative work of software and hardware The fault diagnosis of computer, including: based on the fault detect asserted, and hardware fault event-driven.
Based on the fault detect asserted be finger processor set outside access time complete merit in interaction response mode Can operate, management software accesses hardware capability interface by port operation, and hardware system response software operates And real-time feedback control state, management software carries out the judgement of threshold value according to hard wired feed back state.Work as numerical value During beyond threshold value, the actively throwing of management software is abnormal.Equipment interface, communication logic to spaceborne computer Deng the functional interface of plain mode, the behaviour of hardware driving parameter can be provided by the hardware system of spaceborne computer Make interface and numerical range, by software retaking of a year or grade and judge;Function interface is carried out inputting parameter and shape by software The detection that state returns;By asserting the working range inputting parameter, when judging input parameter beyond threshold value, In the way of traps or call back function, throwing is abnormal, and completes event in the abnormality processing flow process of processor Barrier diagnosis and recovery.Above-mentioned detection method can solve the duty of functional interface or communication state and judge, For the complex control logic in hardware, the enforcement of algorithm policy, owing to procedural information amount is big, fault Occurring and the process of releasing is of short duration, the sample rate using processor to carry out information cannot meet wanting of fault detect Ask, to this end, the present invention proposes the fault detection method of a kind of transaction-level, have employed built-in assessment system Functional test mode, assessment system detection function logic information or signal condition conversion to process are carried out Real-time dynamic monitoring, carries out " asserting " to information value, signal condition, when procedural information or letter being detected The record of error condition is carried out during number abnormal state.In multi-data source collecting device as shown in Figure 4, adopt Carry out the storage of data source with multiple buffer queues, and implement scheduling strategy and carry out data by scheduler Unloading, scheduling strategy ensure that the receiving ability of each channel data source data, i.e. buffer queue must not occur The situation that data are overflowed, assesses the system full marking signal by detection buffer queue and write queue signal, And when two signal differences " assert effective ", when " asserting " lost efficacy, assessment system carries out error count And feed back to processor or master control system.When processor or main control unit detection counting are not zero, with software " assert mode " throwing is abnormal, and enters abnormality processing flow process as stated above.
Described in the present invention " asserting ", " traps ", " call back function " is all the known of this area Title and technology, this is not described in detail by the present invention.
Active is the fault detect side implemented for hardware device drivers based on the fault detect asserted Method, detected logic is the interface logic directly operated by drive software in equipment, to work autonomous in equipment Control stream, the control logic made are the invisible part of device drives.For this present invention by data flow and control Control flow point is segmented into multiple data mart modeling process by mode processed,.Divided multiple data mart modeling process is at letter Being shaken hands by controlled state when ceasing mutual, the mode such as communication data verification synchronizes.Control logical leading end i.e. with The mutual control logic of processor is believed by the wait control signal of processor, interrupt signal, miscue Number complete Tong Bu with processor.When the control link link of data stream occurs abnormal, relevant two groups Control logic cannot synchronize, cause these two groups to control logic and lost efficacy, and finally affect whole data streaming link Synchronization, until front-end control logical AND processor synchronous logic lost efficacy, cause processor wait time-out Exception or enabled device fault interrupt, to not having the processor of bus waiting state interface by interrupting or wrong Flag notification processor by mistake, and the recovery of fault is implemented by the abnormality processing of management software.Such as Fig. 1 institute Showing, the information control flow journey of spaceborne computer includes at data communication process, interface accessing operation, signal The functions such as reason, order-driven.Whole flow process has carried out dividing of flow process by the difference of functional type or clock zone Cut, be divided into processor to decode response logic, main equipment Communication Control logic, control logic from device talk, Storage controls logic and interface accessing logic.Each several part logic function is described as follows:
Processor decoding response logic: the port receiving processor accesses operation, safeguards and answer processor Bus cycles, mutual including processor waiting signal (RDY).Outside waiting signal (RDY) is If showing the mark of DSR in bus to processor, when waiting signal is invalid, processor waits Peripheral hardware provides data to bus, until DSR or bus cycles time-out.
Main equipment Communication Control logic: receive the operational order after the transfer of treated device function, and outside startup Portion's communication bus completes mutual with the communication from equipment.
Control logic from device talk: safeguard and the communication of primary processor, and data that bus is sent or Order is sent to the storage of local module and controls logic.
Storage controls logic: receives data or order that bus transmits, is processed bus data or right Order is translated, and data or order are committed to interface accessing logic.
Interface accessing logic: perform the order of processor or carry out the functional unit of data communication, complete with The communication of External Functionality Interface is mutual, performs or implements the order from processor.
The interface accessing operation of spaceborne computer have passed through above-mentioned 5 functional units, and each functional unit is built Vertical state machine as shown in Figure 2, the state transition condition of state machine is as shown in table 1.Free time in Fig. 2 State and confirmation state are two synchronous points in the present embodiment.Under Idle state, functional unit detection higher level patrol The control flow mark collected starts the judgement including data validity, and judges whether lower logical unit is located In operable state, only proceed to operating conditions when two conditions meet simultaneously.Under confirming state, function list The startup of unit's detection higher level's functional unit identifies whether to terminate, and confirms whether subordinate's flow process enters work shape State, only returns Idle state when two conditions are satisfied by.For processor response logic, processor total Line accesses the flow startup mark for this unit, and answer processor total when this unit is idle condition Line etc. (RDY) to be identified signal.It is absorbed in sky because of logic inefficacy or data check mistake when certain functional unit During not busy state, its higher level's functional unit deadlock in confirming state, when certain functional unit because fault is absorbed in busy state Time, its higher level's logic deadlock is in Idle state.The like until processor response logic is absorbed in deadlock, make One-tenth processor bus etc. are to be identified invalid, until processor bus accesses time-out, processor carries out bus visit Ask operation exception flow process.For there is no the processors to be identified such as bus, increase time-out count at Idle state Device, and when enumerator exceedes threshold value, to interrupt or bus error mode notifier processes device.
Numbering Source volt state Dbjective state Switch condition
1 Idle state Operating conditions Starting comb journey mark effectively, lower logical is in Idle state.
2 Operating conditions Confirm state Work comb journey terminates
3 Confirm state Idle state Higher level comb journey mark invalid, lower logical release Idle state.
Table 1
The transaction operation of spaceborne computer is included above-mentioned a plurality of control stream, or it is many to control stream by certain Secondary completes alternately.In order to reach the diagnosis to affairs and detection, present invention employs processor soft in Break and carry out based on the detection asserted.Detection process flow process as shown in Figure 3.Under normal flow, process The control that device carries out certain function port by Management Information Base sequence accesses.Add in the present invention order The detection process of sequence, and after completing arbitrarily to operate, carry out the judgement of hardware state, hardware device provides The operational feedback remote measurement of order, management software carries out the judgement of threshold value, when measuring parameter not at threshold range Time, start abnormal flow process by call back function or traps.Assert that description form has a following two kinds:
Assert(Expression,function); (1)
Assert(Expression,kind); (2)
Asserting of above two form has two parameters, and wherein parameter one is conditional expression, parameter two Corresponding abnormality processing mode.The exception handling parameter call back function mode of form (1), works as conditional expression When being unsatisfactory for, the exception handler (function) that management software transfer is corresponding.The exception of form (2) Processing parameter is traps number, when conditional expression is unsatisfactory for, produces the traps of corresponding vector number. The false code that two ways is asserted is as follows:
The detection method of asserting of above-mentioned form (1) can be used for the non-processor fault in spaceborne computer, Owing to the system register operation of processor need to be carried out when relating to processor exception, need to have system register Access rights, use form (2) asserts testing mechanism.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not Being confined to this, any those familiar with the art, can in the technical scope that the invention discloses The change readily occurred in or replacement, all should contain within protection scope of the present invention.Therefore, the present invention Protection domain should be as the criterion with scope of the claims.

Claims (4)

1. the method for diagnosing faults of a spaceborne computer, it is characterised in that complete the fault diagnosis of spaceborne computer by the way of cooperative work of software and hardware, including: based on the fault detect asserted, and hardware fault event-driven;
Described include based on the fault detect asserted:
The hardware system of spaceborne computer provides the operation interface of hardware driving parameter, by software retaking of a year or grade and judge;Function interface is carried out inputting parameter and the detection of state return by software;By asserting the working range inputting parameter, when judging input parameter beyond threshold value, in the way of traps or call back function, throwing is abnormal, and completes fault diagnosis and recovery in the abnormality processing flow process of processor;
Described hardware fault event-driven includes:
Use and control stream, the synchronous regime feedback system of data stream, with waiting signal, rub-out signal and three kinds of current operational processes of triggering mode interrupt handler of interrupt signal of bus access, and carry out Fault Identification and recovery according to event-driven source, feedback information;
Described employing controls stream, the synchronous regime feedback system of data stream includes:
The data stream of spaceborne computer is split by functional domain or clock zone, segmentation obtains some functional units, the each functional unit split is set up state machine, described state machine includes Idle state, operating conditions and confirmation three kinds of states of state, cut-point is controlled flow synchronization shake hands, described synchronization shake hands between the state confirmation and two functional units that include state machine data communication verification, when functional unit is made mistakes and is caused the synchronization failure between two functional units or communication verification incorrect, two control stream synchronization failure of shaking hands and cause data stream to link, until processor bus accesses time-out, enter bus access operation exception flow process.
The method for diagnosing faults of spaceborne computer the most according to claim 1, it is characterized in that, segmentation obtains some functional units and includes: processor decoding response logic, main equipment Communication Control logic, the logic that controls from device talk, storage control logic and interface accessing logic.
The method for diagnosing faults of spaceborne computer the most according to claim 1, it is characterized in that, described state machine includes Idle state, operating conditions and confirms three kinds of states of state, switch condition between three kinds of states includes: current logic is under Idle state, the Booting sequence of detection higher level's logic identifies whether effectively, and whether lower logical is in Idle state, if judged result is all for being, enters operating conditions, otherwise do not change;Current logic is under operating conditions, and whether the workflow of detection current logic terminates, the most then enter and confirm state, otherwise do not change;Current logic is under confirming state, and it is invalid that the Booting sequence of detection higher level's logic identifies whether, and whether lower logical is in operating conditions, if judged result is all for being, enters Idle state, otherwise does not changes.
The method for diagnosing faults of spaceborne computer the most according to claim 3, it is characterised in that for not having the processors to be identified such as bus, increases time-out counter at Idle state, and when enumerator exceedes threshold value to interrupt or bus error mode notifier processes device.
CN201410301310.7A 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer Active CN104050051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410301310.7A CN104050051B (en) 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410301310.7A CN104050051B (en) 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer

Publications (2)

Publication Number Publication Date
CN104050051A CN104050051A (en) 2014-09-17
CN104050051B true CN104050051B (en) 2016-10-26

Family

ID=51502944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410301310.7A Active CN104050051B (en) 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer

Country Status (1)

Country Link
CN (1) CN104050051B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815114A (en) * 2017-01-12 2017-06-09 西安科技大学 A kind of computer system fault handling method based on software-hardware synergism
CN108733539B (en) * 2018-05-24 2021-08-10 郑州云海信息技术有限公司 Method, device and system for stopping OSD service and readable storage medium
CN110673975B (en) * 2019-08-23 2023-06-02 上海航天控制技术研究所 Secure kernel structure of spaceborne computer software and secure operation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493809A (en) * 2009-03-03 2009-07-29 哈尔滨工业大学 Multi-core onboard spacecraft computer based on FPGA
CN103116535A (en) * 2011-11-17 2013-05-22 上海航天测控通信研究所 Satellite-bone dual-redundant computer mainframe working condition monitoring and fault autonomous switching device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493809A (en) * 2009-03-03 2009-07-29 哈尔滨工业大学 Multi-core onboard spacecraft computer based on FPGA
CN103116535A (en) * 2011-11-17 2013-05-22 上海航天测控通信研究所 Satellite-bone dual-redundant computer mainframe working condition monitoring and fault autonomous switching device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
星载高速数据处理技术研究;马寅;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20120415(第4期);第2-3章 *

Also Published As

Publication number Publication date
CN104050051A (en) 2014-09-17

Similar Documents

Publication Publication Date Title
Hinch et al. Effective configurations of a digital contact tracing app: a report to NHSX
CN101833497B (en) Computer fault management system based on expert system method
US8949671B2 (en) Fault detection, diagnosis, and prevention for complex computing systems
US9501308B2 (en) Implementing coherent accelerator function isolation for virtualization
CN101859268B (en) Context switch sampling
US20110320892A1 (en) Memory error isolation and recovery in a multiprocessor computer system
CN104335175A (en) Methods and systems to identify and migrate threads among system nodes based on system performance metrics
CN107077408A (en) Method, computer system, baseboard management controller and the system of troubleshooting
CN100375960C (en) Method and apparatus for regulating input/output fault
CN104050051B (en) A kind of method for diagnosing faults of spaceborne computer
Martino et al. Logdiver: A tool for measuring resilience of extreme-scale systems and applications
CN105204977A (en) System exception capturing method, main system, shadow system and intelligent equipment
EP2021925B1 (en) Arbiter diagnostic apparatus and method
JP4819014B2 (en) Log analysis method, log storage device, and program
CN103268276A (en) Deadlock/livelock resolution using service processor
CN103793309B (en) A kind of batch service method for early warning and device
CN105849705A (en) Pattern detector for detecting hangs
CN107908537A (en) A kind of system and method based on the processing of kernel module exception information
CN101794241A (en) Circuit of power-on reset of triple redundancecy fault-tolerance computer based on programmable logic device
US11782753B2 (en) Node-local-unscheduler for scheduling remediation
CN102929761B (en) A kind of system and method for tackling collapsibility mistake
CN105980978A (en) Distributed hang recovery logic
CN103995759A (en) High-availability computer system failure handling method and device based on core internal-external synergy
CN105934743A (en) Conditional pattern detector for detecting hangs
CN114579392A (en) AXI bus monitor for write transactions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant