CN109597383B - Lightweight small system reliability structural design - Google Patents

Lightweight small system reliability structural design Download PDF

Info

Publication number
CN109597383B
CN109597383B CN201811499196.8A CN201811499196A CN109597383B CN 109597383 B CN109597383 B CN 109597383B CN 201811499196 A CN201811499196 A CN 201811499196A CN 109597383 B CN109597383 B CN 109597383B
Authority
CN
China
Prior art keywords
task
tasks
protection
communication
functional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811499196.8A
Other languages
Chinese (zh)
Other versions
CN109597383A (en
Inventor
顾满洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou quankong Technology Co.,Ltd.
Original Assignee
顾满洲
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 顾满洲 filed Critical 顾满洲
Priority to CN201811499196.8A priority Critical patent/CN109597383B/en
Publication of CN109597383A publication Critical patent/CN109597383A/en
Application granted granted Critical
Publication of CN109597383B publication Critical patent/CN109597383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/4185Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by the network communication
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/31From computer integrated manufacturing till monitoring
    • G05B2219/31088Network communication between supervisor and cell, machine group
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to a structure of a lightweight small-system reliability design, which can realize stable operation and quick recovery of an embedded system, and the system design can protect the operation of the embedded system for a long time from system division, task function and task structure design, namely, the coordination work among all tasks, so as to protect the system in real time and greatly improve the robustness of the system.

Description

Lightweight small system reliability structural design
Technical Field
The invention belongs to the field of communication, and particularly relates to a light-weight small-system high-reliability structural design.
Background
The reliability level of an embedded system is an important index in the field of industrial control, and serious consequences can be caused if the system crashes. However, the common means in the prior art is to adopt an external watchdog hardware technology, and the method can only solve the system crash problem by resetting after the system crashes, and has many disadvantages: firstly, the watchdog technology adopts an external counter, when the external counting time is up, the system does not reinitialize the watchdog, the watchdog can act, the acting time is about 1-3 seconds, and the time of 1-3 seconds cannot be used in many occasions; second, even if the system is restored, the system will be restored to the original state completely, which is equivalent to the system being powered up again and working again, and this insecurity is not allowed in many cases. Therefore, in high-reliability occasions, the technical scheme of using the internal watchdog and the external watchdog has obvious defects, and the technical problem to be solved by the invention is how to ensure the stable operation of the system and ensure the stable operation of the system without a reset scheme.
In addition, the large and small operating systems generally have tasks and scheduling, and also have a parallel characteristic, wherein the parallel characteristic means that two or more events occur at the same time, and the concurrency means that two or more events occur within the same time interval, and the problems are one of multiple reasons that the light and small systems cannot stably and reliably operate, and in addition, the problems of poor compatibility of human factor codes and the like often cause the small systems to fail to operate normally, tasks are falsely dead, and the small systems cannot be executed. The invention solves the problems that the light-weight small system runs stably, and the system is unstable or is frequently falsely dead through a notification mechanism between tasks and a scheme of multi-task response to a single task.
Disclosure of Invention
In order to solve the technical problems, the invention provides a technical scheme for ensuring the high-reliability safe operation of an embedded system.
The invention adopts a multitask monitoring system and a multitask stability design method, all functional tasks of a task creation mode are created by protecting tasks, and the tasks are coordinated by coordinating and recording the tasks and deleting and rebuilding the tasks, thereby ensuring a safety system structure.
After the system is initialized to work, firstly, a protection task is created, a work recording task is coordinated, and after the task is deleted and rebuilt, a plurality of functional tasks are created after the protection task is started, then the plurality of functional tasks submit a functional task work parameter form to the recording task through the protection task, record work parameters and system parameters, and are also responsible for feeding back the operation work condition to the protection task at regular time, and if the operation condition is abnormal, a task recovery flow is executed.
The system reliability operation structure comprises a task monitoring system and a task recovery system, wherein the task monitoring system comprises a recording task, a protection task, a deleting and rebuilding task, a plurality of functional tasks, a communication task, an external hardware watchdog and an emergency communication table, the protection task is responsible for operation real-time monitoring, all working parameters of the plurality of functional tasks are informed to record the task through the protection task, the system monitoring is responsible for monitoring through the protection task, and the task recovery system is set as follows: when the system is in risk or abnormal, the protection task is started, the deletion and reconstruction tasks start to work, the functional tasks are recovered, and simultaneously all working parameters of the functional tasks in the tasks are recovered and recorded, so that the running consistency of the system is ensured. The system also comprises a communication task mechanism, a standby communication mechanism and an emergency communication meter, wherein when the functional task communication is abnormal, the standby communication mechanism is started, and when the standby communication mechanism is abnormal, the emergency communication meter is used for realizing communication.
The communication tasks comprise two types, one type is task-level communication, and the task-level communication is dependent on the system and has a monitoring effect on the system; the other is a global communication scheme which is independent of the system to prevent the failure of implementing protection on the functional tasks caused by system problems, ensures communication and can monitor, and has a solution for risks except the functional tasks of the system; the global communication scheme is a representation form of the emergency communication table, is a form independent of system tasks, can be started only when all protection tasks are invalid, ensures that the protection tasks recover normal work, and is a final defense line, and the system is restarted when the protection tasks and the global communication scheme are all invalid.
The invention has the technical effects that: the invention improves the stability of a light-weight small system and solves the problem that the traditional equipment is falsely dead and can only be reset through an external hardware watchdog. The invention can further prevent the system from crashing in advance, and improves the robustness of the system.
The invention uses a new software architecture, solves the problem that a plurality of systems can not be used in high-reliability occasions, greatly improves the applicability of the systems, can greatly enhance the stability of the systems by using the technical scheme of the invention, and solves the problem that the common watchdog reset of the equipment can not be carried out.
Drawings
FIG. 1 is a general block diagram of a lightweight, low-system, high-reliability operational design of the present invention.
FIG. 2 is a block diagram of the process for the design of the lightweight, low system, high reliability operation of the present invention.
Fig. 3 is a recovery flow of the lightweight subsystem of the present invention in case of task anomaly.
Detailed Description
The following further description is made in conjunction with the drawings and examples of the present invention.
Fig. 1 illustrates a structure of the high-reliability operation design of the system of the present invention, where functional tasks 1 to N (N is a positive integer greater than 1) are independent of each other, and the system cooperates with a protection task during operation to monitor and record tasks, delete and reconstruct tasks, and the like, and perform data communication through communication tasks, and if the communication task or the functional task is abnormal, and the protection task finds that the system is at risk or abnormal, the protection task is started, the deletion and reconstruction tasks start working, the functional task or the communication task is recovered, and all working parameters of the functional tasks in the recording task are recovered, so as to ensure the consistency of the system operation.
As shown in fig. 1, the whole system includes a recording task, a protection task, a deletion and reconstruction task, functional tasks 1-N, a communication task, an external hardware dog task, and a set of independent system communication mechanism emergency communication table, which is responsible for checking communication of the communication task and replying mechanism conditions, and monitoring the communication task, the protection task starts task protection work when there is a task abnormal condition by checking the emergency communication table, and starts a backup communication mechanism emergency communication table to realize communication if the task cannot be recovered by the system task.
The external hardware watchdog is added in the whole system, normal operation of system software can be guaranteed, and the external hardware watchdog is started and the system is restarted when the protection task and the global communication scheme are all invalid.
Fig. 2 is a software implementation diagram of the high-reliability operation design of the system of the present invention, which is first started after the system is powered on, then the system key parameter configuration is performed, then a protection task is created, a recording task, a communication task, a deletion and reconstruction task are created through the protection task, and the protection task is also responsible for the dog feeding work of the external watchdog. After a communication mechanism is established, 1-N functional tasks can be established after the system is completed, all tasks have an emergency communication function and can work through protection tasks, and task protection and task reconstruction can be performed at any time. Meanwhile, if any abnormity occurs, the functional task can send a request to the protection task, the functional task is recovered by deleting and rebuilding the task, the protection task is simultaneously responsible for feeding the external hardware watchdog to work, and the external hardware watchdog feeds the dog to monitor the work of the protection task.
FIG. 3 is a recovery flow under task abnormal condition, after the system is started, the protection task works normally, when the system is abnormal, the protection task starts task protection, and there are two recovery modes, one is to recover according to the set normal steps through the communication task, record the data of the task with abnormal task processing, and let it work normally; and the other method is that the emergency communication is recovered, after important data is copied during the emergency communication recovery, an abnormal task is recovered by deleting and rebuilding the task, and when the task is recovered to be normal, the whole system works normally.
The detailed working process is as follows: after the functional tasks are started, the functional task forms including system working parameters are submitted to the recording tasks through the protection tasks, the functional tasks are responsible for feeding back the working conditions of the functional tasks to the protection tasks at regular time, core functional data are backed up in an emergency communication table, once the protection tasks find that the functional tasks are abnormal, the application is submitted to the deleting and rebuilding tasks, the rebuilding of the functional tasks is confirmed, all working parameters are extracted from the recording tasks by the recovery functional tasks after the rebuilding, and the functional tasks are ensured to work intermittently according to the states before the deleting. If the communication task cannot complete the communication work, the backup core data can be recovered from the emergency communication table, so that the normal operation of the system is ensured.
And the protection task needs to manage the communication task at the same time, is responsible for communication work before each task is deleted, detects the change of each working state of the work, and can start an emergency communication table if abnormality occurs, recover all functional task parameters and reestablish the communication task.
The protection task is monitored through an external hardware watchdog, if the external watchdog cannot monitor the protection task, the system is halted, a final emergency mechanism is started to restart the system, and finally the system is ensured to run safely and for a long time.
The technical scheme of the invention solves the problem that most systems can not be used in high-reliability occasions, greatly improves the applicability of the systems, can greatly enhance the stability of the systems by using the technical scheme of the invention, and solves the technical problem that the reset of the common watchdog of the electronic equipment can not be realized.

Claims (4)

1. The method for the reliable operation of the light-weight small system is characterized by comprising a task monitoring system and a task recovery system, wherein the task monitoring system comprises a recording task, a protection task, a deleting and rebuilding task, a plurality of functional tasks, a communication task, an external hardware watchdog and an emergency communication table, the protection task is responsible for the real-time monitoring of the operation, all working parameters of the plurality of functional tasks inform the recording task to carry out recording work through the protection task, and the task recovery system is set as follows: when the system is in risk or abnormal, the protection task is started, the deletion and reconstruction tasks start to work, the functional tasks are recovered, and simultaneously all working parameters of the functional tasks in the tasks are recovered and recorded, so that the running consistency of the system is ensured; the task creation mode is as follows: all functional tasks are created by protection tasks, which work by coordinating recording tasks and deleting and rebuilding tasks.
2. The method for reliable operation of a light weight, small system according to claim 1, further comprising a communication task mechanism, a backup communication mechanism, and an emergency communication table, wherein the backup communication mechanism is activated when the functional task communication is abnormal, and wherein the emergency communication table is used for communication when the backup communication mechanism is abnormal.
3. The method for reliable operation of a light weight small system according to claim 1, wherein the protection process of the functional task is: the method comprises the steps of firstly creating a protection task, creating a plurality of functional tasks after the protection task is started, submitting a functional task parameter form, recording working parameters and system parameters to a recording task through the protection task, feeding back running working conditions to the protection task at regular time, and executing a recovery task flow if the running working conditions are abnormal.
4. The method for reliable operation of a light weight, small system according to claim 1, wherein the communication tasks include two types, one is inter-task communication, which is system dependent and has a monitoring effect on the system; the other is a global communication scheme which is independent of the system to prevent the failure of implementing protection on the functional tasks caused by system problems, ensures communication and can monitor, and has a solution for risks except the functional tasks of the system; the global communication scheme is a representation form of the emergency communication table, is a form independent of system tasks, can be started only when all protection tasks are invalid, ensures that the protection tasks recover normal work, and is a final defense line, and the system is restarted when the protection tasks and the global communication scheme are all invalid.
CN201811499196.8A 2018-12-08 2018-12-08 Lightweight small system reliability structural design Active CN109597383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811499196.8A CN109597383B (en) 2018-12-08 2018-12-08 Lightweight small system reliability structural design

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811499196.8A CN109597383B (en) 2018-12-08 2018-12-08 Lightweight small system reliability structural design

Publications (2)

Publication Number Publication Date
CN109597383A CN109597383A (en) 2019-04-09
CN109597383B true CN109597383B (en) 2021-10-08

Family

ID=65961525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811499196.8A Active CN109597383B (en) 2018-12-08 2018-12-08 Lightweight small system reliability structural design

Country Status (1)

Country Link
CN (1) CN109597383B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113242437A (en) * 2021-04-01 2021-08-10 联通(广东)产业互联网有限公司 RTSP (real time streaming protocol) video plug-in-free playing method, system, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1409209A (en) * 2001-09-24 2003-04-09 深圳市中兴通讯股份有限公司上海第二研究所 Realizing method for multiple task real-time operation system
CN1564137A (en) * 2004-04-09 2005-01-12 中兴通讯股份有限公司 Method of parallel regulating multi-task of imbedding system
CN103377078A (en) * 2012-04-11 2013-10-30 广州市地下铁道总公司 Real-time task scheduling method and system for vehicular ATP
CN108647091A (en) * 2018-04-27 2018-10-12 北京空间飞行器总体设计部 A kind of the spaceborne computer dynamic reconfiguration method and system of task based access control self-adjusted block
CN108871421A (en) * 2018-04-26 2018-11-23 宁波弘泰水利信息科技有限公司 A kind of automatic system of hydrological data acquisition and transmission of real-time multi-task processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100636268B1 (en) * 2004-01-27 2006-10-19 삼성전자주식회사 Apparatus and method for monitoring software module state in systems using embedded multitask Operating System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1409209A (en) * 2001-09-24 2003-04-09 深圳市中兴通讯股份有限公司上海第二研究所 Realizing method for multiple task real-time operation system
CN1564137A (en) * 2004-04-09 2005-01-12 中兴通讯股份有限公司 Method of parallel regulating multi-task of imbedding system
CN103377078A (en) * 2012-04-11 2013-10-30 广州市地下铁道总公司 Real-time task scheduling method and system for vehicular ATP
CN108871421A (en) * 2018-04-26 2018-11-23 宁波弘泰水利信息科技有限公司 A kind of automatic system of hydrological data acquisition and transmission of real-time multi-task processing
CN108647091A (en) * 2018-04-27 2018-10-12 北京空间飞行器总体设计部 A kind of the spaceborne computer dynamic reconfiguration method and system of task based access control self-adjusted block

Also Published As

Publication number Publication date
CN109597383A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN102364448B (en) Fault-tolerant method for computer fault management system
Machida et al. Modeling and analysis of software rejuvenation in a server virtualized system
US7117393B2 (en) Failover method in a redundant computer system with storage devices
CN103370694A (en) Restarting data processing systems
WO2008092912A1 (en) System and method of error recovery for backup applications
CN102708028B (en) Trusted redundant fault-tolerant computer system
JP5183542B2 (en) Computer system and setting management method
US10558484B2 (en) Systems and methods for securing virtual machines
CN105550012A (en) Method for custom recovery of malfunctioning virtual machine
CN102455954A (en) Power-failure-preventing upgrading method of Linux system
CN103092724A (en) System self-recovery method for embedded electric power terminal
CN109597383B (en) Lightweight small system reliability structural design
CN111737038A (en) Control method based on small satellite double-machine system cutter
CN101916215A (en) Operation intercept based repentance method of distributed critical task system
CN103297264B (en) Cloud platform failure recovery method and system
CN101145983A (en) A self-diagnosis and self-discovery subsystem and method of network management system
CN100337211C (en) Method for safeguarding the continuous safety operation of computers
CN112650620B (en) Dual-computer cold backup autonomous redundancy method with master-slave relation
CN112631981A (en) Reliable fault-tolerant simulation engine for simulation training
CN105159794A (en) Mirror image implementing system and method
CN106371952A (en) Emergency management system based on physical machine
CN105988885A (en) Compensation rollback-based operation system fault self-recovery method
CN107590647A (en) The servo supervisory systems of ship-handling system
CN104346239A (en) Method and device for recovering anomaly of application program in embedded system
Chen et al. Low overhead incremental checkpointing and rollback recovery scheme on Windows operating system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211020

Address after: 510663 room 415, No. 9, caipin Road, Huangpu District, Guangzhou, Guangdong

Patentee after: Guangzhou quankong Technology Co.,Ltd.

Address before: 510663 f406b, No. 11, caipin Road, Science City, Guangzhou, Guangdong

Patentee before: Gu Manzhou

TR01 Transfer of patent right