CN104050051B - A kind of method for diagnosing faults of spaceborne computer - Google Patents
A kind of method for diagnosing faults of spaceborne computer Download PDFInfo
- Publication number
- CN104050051B CN104050051B CN201410301310.7A CN201410301310A CN104050051B CN 104050051 B CN104050051 B CN 104050051B CN 201410301310 A CN201410301310 A CN 201410301310A CN 104050051 B CN104050051 B CN 104050051B
- Authority
- CN
- China
- Prior art keywords
- fault
- logic
- state
- hardware
- spaceborne computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Hardware Redundancy (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses the method for diagnosing faults of a kind of spaceborne computer, by the way of cooperative work of software and hardware, complete the fault diagnosis of spaceborne computer, including: based on the fault detect asserted and hardware fault event-driven.Include based on the fault detect asserted: the hardware system of spaceborne computer provides operation interface and the numerical range of hardware driving parameter, by software retaking of a year or grade and judge;Function interface is carried out inputting parameter and the detection of state return by software;By asserting the working range inputting parameter, when judging input parameter beyond threshold value, in the way of traps or call back function, throwing is abnormal, and completes fault diagnosis and recovery in the abnormality processing flow process of processor;Hardware fault event-driven includes: uses and controls stream, the synchronous regime feedback system of data stream, with waiting signal, rub-out signal and three kinds of current operational processes of triggering mode interrupt handler of interrupt signal of bus access, and carry out Fault Identification and recovery according to event-driven source, feedback information.
Description
Technical field
The present invention relates to fault diagnosis technology field, particularly to the fault diagnosis side of a kind of spaceborne computer
Method.
Background technology
Radiated by high energy particle in spatial environments, the unfavorable factor such as solar flare and high-low temperature difference, make
Obtain the logical resource in computer, easily there is all kinds of instantaneous or permanent fault in storage medium.Space flight
The Autonomous Control requirement of device height, adapts to the requirement of ability, holds in critical events complex space environment
The requirement of continuous non-stop run, need to possess electronic equipment on satellite particularly core control calculating unit from
Major error diagnosis and fault-tolerant ability.
Fault diagnosis and fault-toleranr technique, mainly by increasing redundancy, are known with backup, coding, pattern
The mode such as not reaches the diagnosis to equipment fault and recovery.Existing electronic equipment on satellite is by scale, unit's device
The restriction of the many factors such as part type selecting, the method currently mainly used include multimachine independently switch, cold and hot redundancy,
The modes such as the storage hamming code of resource, two from three.Aforesaid way effective guarantee unit product is in fault
After maximum service ability, the transient fault of storage resource is had preferably in real time error correction and detection ability.And
In the development of modern electronic equipment on satellite, due to lifting, integrated level and the design scale of properties of product
Increase, the use of a large amount of large scale integrated circuits so that existing diagnosis and fault-tolerant way can not be expired
The application requirement of foot electronic equipment on satellite particularly spaceborne computer.
Summary of the invention
The present invention is directed to deficiencies of the prior art, it is provided that the fault of a kind of spaceborne computer is examined
Disconnected method, the present invention is achieved through the following technical solutions:
The method for diagnosing faults of a kind of spaceborne computer, completes spaceborne by the way of cooperative work of software and hardware
The fault diagnosis of computer, including: based on the fault detect asserted, and hardware fault event-driven;
Include based on the fault detect asserted:
The hardware system of spaceborne computer provides operation interface and the numerical range of hardware driving parameter, by soft
Part retaking of a year or grade also judges;Function interface is carried out inputting parameter and the detection of state return by software;By asserting
The working range of input parameter, when judging input parameter beyond threshold value, with traps or call back function
Mode throwing is abnormal, and completes fault diagnosis and recovery in the abnormality processing flow process of processor;
Hardware fault event-driven includes:
Use and control stream, the synchronous regime feedback system of data stream, with waiting signal, the mistake of bus access
Error signal and three kinds of current operational processes of triggering mode interrupt handler of interrupt signal, and drive according to event
Dynamic source, feedback information carry out Fault Identification and recovery.
It is also preferred that the left use control stream, the synchronous regime feedback system of data stream to include:
Splitting the data stream of spaceborne computer by functional domain or clock zone, segmentation obtains some functions
Unit, sets up state machine to each functional unit split, state machine include Idle state, operating conditions with
And confirm three kinds of states of state, and cut-point is controlled flows synchronization and shakes hands, synchronizing shakes hands includes state machine
Data communication verification between state confirmation and two functional units, makes mistakes when functional unit and causes two
When synchronization failure between functional unit or communication verification are incorrect, two control stream synchronizations and shake hands unsuccessfully
Cause data stream to link, until processor bus accesses time-out, enter bus access operation exception stream
Journey.
It is also preferred that the left segmentation obtains some functional units and includes: processor decoding respective logic, main equipment lead to
News control logic, control logic, storage control logic and interface accessing logic from device talk.
It is also preferred that the left state machine includes Idle state, operating conditions and confirms three kinds of states of state, between three kinds of states
Switch condition include: current logic under Idle state, detection higher level's logic Booting sequence identify whether
Effectively, and whether lower logical is in Idle state, if judged result is all for being, enters operating conditions, instead
Do not change;Current logic is under operating conditions, and whether the workflow of detection current logic terminates, if
Judged result for be then enter confirm state, otherwise do not change;Current logic, under confirming state, detects higher level
It is invalid that the Booting sequence of logic identifies whether, and whether lower logical is in operating conditions, if judged result
All for being, enter Idle state, otherwise do not change.
It is also preferred that the left for there is no the processors to be identified such as bus, increase time-out counter at Idle state,
And when enumerator exceedes threshold value to interrupt or bus error mode notifier processes device.
Accompanying drawing explanation
Shown in Fig. 1 is present invention flow chart based on the synchronous regime feedback system controlling stream;
Shown in Fig. 2 is the state machine diagram of the present invention;
Shown in Fig. 3 is present invention algorithm flow chart based on the fault detect asserted;
Shown in Fig. 4 is the multi-data source acquisition system fault detect schematic diagram asserted of transaction-level of the present invention.
Detailed description of the invention
Below with reference to the accompanying drawing of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Whole description and discussion, it is clear that a part of example of the only present invention as described herein, be not
Whole examples, based on the embodiment in the present invention, those of ordinary skill in the art are not making creation
Property work on the premise of the every other embodiment that obtained, broadly fall into protection scope of the present invention.
For the ease of the understanding to the embodiment of the present invention, make as a example by specific embodiment below in conjunction with accompanying drawing
Further explanation illustrates, and each embodiment does not constitute the restriction to the embodiment of the present invention.
The method for diagnosing faults of a kind of spaceborne computer, completes spaceborne by the way of cooperative work of software and hardware
The fault diagnosis of computer, including: based on the fault detect asserted, and hardware fault event-driven.
Based on the fault detect asserted be finger processor set outside access time complete merit in interaction response mode
Can operate, management software accesses hardware capability interface by port operation, and hardware system response software operates
And real-time feedback control state, management software carries out the judgement of threshold value according to hard wired feed back state.Work as numerical value
During beyond threshold value, the actively throwing of management software is abnormal.Equipment interface, communication logic to spaceborne computer
Deng the functional interface of plain mode, the behaviour of hardware driving parameter can be provided by the hardware system of spaceborne computer
Make interface and numerical range, by software retaking of a year or grade and judge;Function interface is carried out inputting parameter and shape by software
The detection that state returns;By asserting the working range inputting parameter, when judging input parameter beyond threshold value,
In the way of traps or call back function, throwing is abnormal, and completes event in the abnormality processing flow process of processor
Barrier diagnosis and recovery.Above-mentioned detection method can solve the duty of functional interface or communication state and judge,
For the complex control logic in hardware, the enforcement of algorithm policy, owing to procedural information amount is big, fault
Occurring and the process of releasing is of short duration, the sample rate using processor to carry out information cannot meet wanting of fault detect
Ask, to this end, the present invention proposes the fault detection method of a kind of transaction-level, have employed built-in assessment system
Functional test mode, assessment system detection function logic information or signal condition conversion to process are carried out
Real-time dynamic monitoring, carries out " asserting " to information value, signal condition, when procedural information or letter being detected
The record of error condition is carried out during number abnormal state.In multi-data source collecting device as shown in Figure 4, adopt
Carry out the storage of data source with multiple buffer queues, and implement scheduling strategy and carry out data by scheduler
Unloading, scheduling strategy ensure that the receiving ability of each channel data source data, i.e. buffer queue must not occur
The situation that data are overflowed, assesses the system full marking signal by detection buffer queue and write queue signal,
And when two signal differences " assert effective ", when " asserting " lost efficacy, assessment system carries out error count
And feed back to processor or master control system.When processor or main control unit detection counting are not zero, with software
" assert mode " throwing is abnormal, and enters abnormality processing flow process as stated above.
Described in the present invention " asserting ", " traps ", " call back function " is all the known of this area
Title and technology, this is not described in detail by the present invention.
Active is the fault detect side implemented for hardware device drivers based on the fault detect asserted
Method, detected logic is the interface logic directly operated by drive software in equipment, to work autonomous in equipment
Control stream, the control logic made are the invisible part of device drives.For this present invention by data flow and control
Control flow point is segmented into multiple data mart modeling process by mode processed,.Divided multiple data mart modeling process is at letter
Being shaken hands by controlled state when ceasing mutual, the mode such as communication data verification synchronizes.Control logical leading end i.e. with
The mutual control logic of processor is believed by the wait control signal of processor, interrupt signal, miscue
Number complete Tong Bu with processor.When the control link link of data stream occurs abnormal, relevant two groups
Control logic cannot synchronize, cause these two groups to control logic and lost efficacy, and finally affect whole data streaming link
Synchronization, until front-end control logical AND processor synchronous logic lost efficacy, cause processor wait time-out
Exception or enabled device fault interrupt, to not having the processor of bus waiting state interface by interrupting or wrong
Flag notification processor by mistake, and the recovery of fault is implemented by the abnormality processing of management software.Such as Fig. 1 institute
Showing, the information control flow journey of spaceborne computer includes at data communication process, interface accessing operation, signal
The functions such as reason, order-driven.Whole flow process has carried out dividing of flow process by the difference of functional type or clock zone
Cut, be divided into processor to decode response logic, main equipment Communication Control logic, control logic from device talk,
Storage controls logic and interface accessing logic.Each several part logic function is described as follows:
Processor decoding response logic: the port receiving processor accesses operation, safeguards and answer processor
Bus cycles, mutual including processor waiting signal (RDY).Outside waiting signal (RDY) is
If showing the mark of DSR in bus to processor, when waiting signal is invalid, processor waits
Peripheral hardware provides data to bus, until DSR or bus cycles time-out.
Main equipment Communication Control logic: receive the operational order after the transfer of treated device function, and outside startup
Portion's communication bus completes mutual with the communication from equipment.
Control logic from device talk: safeguard and the communication of primary processor, and data that bus is sent or
Order is sent to the storage of local module and controls logic.
Storage controls logic: receives data or order that bus transmits, is processed bus data or right
Order is translated, and data or order are committed to interface accessing logic.
Interface accessing logic: perform the order of processor or carry out the functional unit of data communication, complete with
The communication of External Functionality Interface is mutual, performs or implements the order from processor.
The interface accessing operation of spaceborne computer have passed through above-mentioned 5 functional units, and each functional unit is built
Vertical state machine as shown in Figure 2, the state transition condition of state machine is as shown in table 1.Free time in Fig. 2
State and confirmation state are two synchronous points in the present embodiment.Under Idle state, functional unit detection higher level patrol
The control flow mark collected starts the judgement including data validity, and judges whether lower logical unit is located
In operable state, only proceed to operating conditions when two conditions meet simultaneously.Under confirming state, function list
The startup of unit's detection higher level's functional unit identifies whether to terminate, and confirms whether subordinate's flow process enters work shape
State, only returns Idle state when two conditions are satisfied by.For processor response logic, processor total
Line accesses the flow startup mark for this unit, and answer processor total when this unit is idle condition
Line etc. (RDY) to be identified signal.It is absorbed in sky because of logic inefficacy or data check mistake when certain functional unit
During not busy state, its higher level's functional unit deadlock in confirming state, when certain functional unit because fault is absorbed in busy state
Time, its higher level's logic deadlock is in Idle state.The like until processor response logic is absorbed in deadlock, make
One-tenth processor bus etc. are to be identified invalid, until processor bus accesses time-out, processor carries out bus visit
Ask operation exception flow process.For there is no the processors to be identified such as bus, increase time-out count at Idle state
Device, and when enumerator exceedes threshold value, to interrupt or bus error mode notifier processes device.
Numbering | Source volt state | Dbjective state | Switch condition |
1 | Idle state | Operating conditions | Starting comb journey mark effectively, lower logical is in Idle state. |
2 | Operating conditions | Confirm state | Work comb journey terminates |
3 | Confirm state | Idle state | Higher level comb journey mark invalid, lower logical release Idle state. |
Table 1
The transaction operation of spaceborne computer is included above-mentioned a plurality of control stream, or it is many to control stream by certain
Secondary completes alternately.In order to reach the diagnosis to affairs and detection, present invention employs processor soft in
Break and carry out based on the detection asserted.Detection process flow process as shown in Figure 3.Under normal flow, process
The control that device carries out certain function port by Management Information Base sequence accesses.Add in the present invention order
The detection process of sequence, and after completing arbitrarily to operate, carry out the judgement of hardware state, hardware device provides
The operational feedback remote measurement of order, management software carries out the judgement of threshold value, when measuring parameter not at threshold range
Time, start abnormal flow process by call back function or traps.Assert that description form has a following two kinds:
Assert(Expression,function); (1)
Assert(Expression,kind); (2)
Asserting of above two form has two parameters, and wherein parameter one is conditional expression, parameter two
Corresponding abnormality processing mode.The exception handling parameter call back function mode of form (1), works as conditional expression
When being unsatisfactory for, the exception handler (function) that management software transfer is corresponding.The exception of form (2)
Processing parameter is traps number, when conditional expression is unsatisfactory for, produces the traps of corresponding vector number.
The false code that two ways is asserted is as follows:
The detection method of asserting of above-mentioned form (1) can be used for the non-processor fault in spaceborne computer,
Owing to the system register operation of processor need to be carried out when relating to processor exception, need to have system register
Access rights, use form (2) asserts testing mechanism.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not
Being confined to this, any those familiar with the art, can in the technical scope that the invention discloses
The change readily occurred in or replacement, all should contain within protection scope of the present invention.Therefore, the present invention
Protection domain should be as the criterion with scope of the claims.
Claims (4)
1. the method for diagnosing faults of a spaceborne computer, it is characterised in that complete the fault diagnosis of spaceborne computer by the way of cooperative work of software and hardware, including: based on the fault detect asserted, and hardware fault event-driven;
Described include based on the fault detect asserted:
The hardware system of spaceborne computer provides the operation interface of hardware driving parameter, by software retaking of a year or grade and judge;Function interface is carried out inputting parameter and the detection of state return by software;By asserting the working range inputting parameter, when judging input parameter beyond threshold value, in the way of traps or call back function, throwing is abnormal, and completes fault diagnosis and recovery in the abnormality processing flow process of processor;
Described hardware fault event-driven includes:
Use and control stream, the synchronous regime feedback system of data stream, with waiting signal, rub-out signal and three kinds of current operational processes of triggering mode interrupt handler of interrupt signal of bus access, and carry out Fault Identification and recovery according to event-driven source, feedback information;
Described employing controls stream, the synchronous regime feedback system of data stream includes:
The data stream of spaceborne computer is split by functional domain or clock zone, segmentation obtains some functional units, the each functional unit split is set up state machine, described state machine includes Idle state, operating conditions and confirmation three kinds of states of state, cut-point is controlled flow synchronization shake hands, described synchronization shake hands between the state confirmation and two functional units that include state machine data communication verification, when functional unit is made mistakes and is caused the synchronization failure between two functional units or communication verification incorrect, two control stream synchronization failure of shaking hands and cause data stream to link, until processor bus accesses time-out, enter bus access operation exception flow process.
The method for diagnosing faults of spaceborne computer the most according to claim 1, it is characterized in that, segmentation obtains some functional units and includes: processor decoding response logic, main equipment Communication Control logic, the logic that controls from device talk, storage control logic and interface accessing logic.
The method for diagnosing faults of spaceborne computer the most according to claim 1, it is characterized in that, described state machine includes Idle state, operating conditions and confirms three kinds of states of state, switch condition between three kinds of states includes: current logic is under Idle state, the Booting sequence of detection higher level's logic identifies whether effectively, and whether lower logical is in Idle state, if judged result is all for being, enters operating conditions, otherwise do not change;Current logic is under operating conditions, and whether the workflow of detection current logic terminates, the most then enter and confirm state, otherwise do not change;Current logic is under confirming state, and it is invalid that the Booting sequence of detection higher level's logic identifies whether, and whether lower logical is in operating conditions, if judged result is all for being, enters Idle state, otherwise does not changes.
The method for diagnosing faults of spaceborne computer the most according to claim 3, it is characterised in that for not having the processors to be identified such as bus, increases time-out counter at Idle state, and when enumerator exceedes threshold value to interrupt or bus error mode notifier processes device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410301310.7A CN104050051B (en) | 2014-06-27 | 2014-06-27 | A kind of method for diagnosing faults of spaceborne computer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410301310.7A CN104050051B (en) | 2014-06-27 | 2014-06-27 | A kind of method for diagnosing faults of spaceborne computer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104050051A CN104050051A (en) | 2014-09-17 |
CN104050051B true CN104050051B (en) | 2016-10-26 |
Family
ID=51502944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410301310.7A Active CN104050051B (en) | 2014-06-27 | 2014-06-27 | A kind of method for diagnosing faults of spaceborne computer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104050051B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815114A (en) * | 2017-01-12 | 2017-06-09 | 西安科技大学 | A kind of computer system fault handling method based on software-hardware synergism |
CN108733539B (en) * | 2018-05-24 | 2021-08-10 | 郑州云海信息技术有限公司 | Method, device and system for stopping OSD service and readable storage medium |
CN110673975B (en) * | 2019-08-23 | 2023-06-02 | 上海航天控制技术研究所 | Secure kernel structure of spaceborne computer software and secure operation method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101493809A (en) * | 2009-03-03 | 2009-07-29 | 哈尔滨工业大学 | Multi-core onboard spacecraft computer based on FPGA |
CN103116535A (en) * | 2011-11-17 | 2013-05-22 | 上海航天测控通信研究所 | Satellite-bone dual-redundant computer mainframe working condition monitoring and fault autonomous switching device |
-
2014
- 2014-06-27 CN CN201410301310.7A patent/CN104050051B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101493809A (en) * | 2009-03-03 | 2009-07-29 | 哈尔滨工业大学 | Multi-core onboard spacecraft computer based on FPGA |
CN103116535A (en) * | 2011-11-17 | 2013-05-22 | 上海航天测控通信研究所 | Satellite-bone dual-redundant computer mainframe working condition monitoring and fault autonomous switching device |
Non-Patent Citations (1)
Title |
---|
星载高速数据处理技术研究;马寅;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20120415(第4期);第2-3章 * |
Also Published As
Publication number | Publication date |
---|---|
CN104050051A (en) | 2014-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hinch et al. | Effective configurations of a digital contact tracing app: a report to NHSX | |
CN101833497B (en) | Computer fault management system based on expert system method | |
US8949671B2 (en) | Fault detection, diagnosis, and prevention for complex computing systems | |
US9501308B2 (en) | Implementing coherent accelerator function isolation for virtualization | |
CN101859268B (en) | Context switch sampling | |
US20110320892A1 (en) | Memory error isolation and recovery in a multiprocessor computer system | |
CN104335175A (en) | Methods and systems to identify and migrate threads among system nodes based on system performance metrics | |
CN107077408A (en) | Method, computer system, baseboard management controller and the system of troubleshooting | |
CN100375960C (en) | Method and apparatus for regulating input/output fault | |
CN104050051B (en) | A kind of method for diagnosing faults of spaceborne computer | |
Martino et al. | Logdiver: A tool for measuring resilience of extreme-scale systems and applications | |
CN105204977A (en) | System exception capturing method, main system, shadow system and intelligent equipment | |
EP2021925B1 (en) | Arbiter diagnostic apparatus and method | |
JP4819014B2 (en) | Log analysis method, log storage device, and program | |
CN103268276A (en) | Deadlock/livelock resolution using service processor | |
CN103793309B (en) | A kind of batch service method for early warning and device | |
CN105849705A (en) | Pattern detector for detecting hangs | |
CN107908537A (en) | A kind of system and method based on the processing of kernel module exception information | |
CN101794241A (en) | Circuit of power-on reset of triple redundancecy fault-tolerance computer based on programmable logic device | |
US11782753B2 (en) | Node-local-unscheduler for scheduling remediation | |
CN102929761B (en) | A kind of system and method for tackling collapsibility mistake | |
CN105980978A (en) | Distributed hang recovery logic | |
CN103995759A (en) | High-availability computer system failure handling method and device based on core internal-external synergy | |
CN105934743A (en) | Conditional pattern detector for detecting hangs | |
CN114579392A (en) | AXI bus monitor for write transactions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |