CN108664357B - Embedded equipment system repairing method and system based on starting information statistics - Google Patents

Embedded equipment system repairing method and system based on starting information statistics Download PDF

Info

Publication number
CN108664357B
CN108664357B CN201810457133.XA CN201810457133A CN108664357B CN 108664357 B CN108664357 B CN 108664357B CN 201810457133 A CN201810457133 A CN 201810457133A CN 108664357 B CN108664357 B CN 108664357B
Authority
CN
China
Prior art keywords
application program
repairing
starting
started
normally
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810457133.XA
Other languages
Chinese (zh)
Other versions
CN108664357A (en
Inventor
陈玉峰
应站煌
郑晓庆
汪强
方正
王龙洋
张锋
刘博�
李永亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuji Group Co Ltd
XJ Electric Co Ltd
Xuchang XJ Software Technology Co Ltd
Original Assignee
Xuji Group Co Ltd
XJ Electric Co Ltd
Xuchang XJ Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuji Group Co Ltd, XJ Electric Co Ltd, Xuchang XJ Software Technology Co Ltd filed Critical Xuji Group Co Ltd
Priority to CN201810457133.XA priority Critical patent/CN108664357B/en
Publication of CN108664357A publication Critical patent/CN108664357A/en
Application granted granted Critical
Publication of CN108664357B publication Critical patent/CN108664357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/076Error or fault detection not based on redundancy by exceeding limits by exceeding a count or rate limit, e.g. word- or bit count limit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The invention relates to a method and a system for restoring an embedded device system based on starting information statistics.A system calculates the error value of a system starting counter and an application program starting counter when the system is started, if the error value is less than or equal to a threshold value, an operating system normally runs, and the system starting counter and the application program starting counter count successively; and if the error value is larger than the threshold value, entering a system repairing state, resetting the system starting counter and the application program starting counter after repairing, and restarting. The method relates to two aspects of a system and an application program, and the system is repaired according to the relation of the starting times of the two aspects, so that the reliable repair of the system is realized. In addition, the setting of the threshold value enables the repairing method to have certain fault-tolerant capability, for example, when the system is restarted due to sudden power failure, manual restarting and other factors, the system does not need to be repaired under the condition, and therefore, a certain threshold value is set for preventing the judging condition of the system repairing.

Description

Embedded equipment system repairing method and system based on starting information statistics
Technical Field
The invention relates to a method and a system for repairing an embedded device system based on starting information statistics.
Background
MMUs enable a single application thread to operate in a hardware protected address space, but in many low-end embedded devices MMUs are not employed even if such hardware is included. When all threads of an application share the same memory space, any one thread will intentionally or unintentionally corrupt the code, data, or stack of the other thread. An exception thread may even corrupt the kernel code or internal data structures, e.g., a pointer error in the thread can easily crash the entire system or at least cause a system reboot.
The existing method for repairing an embedded system is to automatically or manually repair and upgrade files of an operating system, for example: chinese patent application with application publication number CN103970564A discloses a method for automatically repairing an upgrade function of an embedded operating system, wherein a repair software package for repairing the embedded operating system is preset on an upgrade server, and the method further comprises the following steps: and judging whether the embedded operating system has faults or not, downloading the repair software package from the upgrade server, loading the content of the repair software package into the embedded system, and automatically upgrading the embedded operating system after the embedded operating system is successfully repaired. By using the method, the embedded system can automatically complete the repair and upgrade functions without manual triggering, and the storage resources of the embedded equipment are not occupied. The chinese patent application with application publication number CN102722394A discloses a method for starting and upgrading an embedded device, wherein the embedded device includes a first communication serial port, a second communication serial port, a microprocessor and a storage module; the memory module stores a bootloader module, an operating system and a kernel of the operating system; a group of environment variable groups for normal starting and a group of environment variable groups for upgrading are arranged in the bootloader module; the start-up upgrading method comprises the start-up requirements under various conditions, so that the normal start-up use of the embedded equipment can be ensured. Meanwhile, the starting upgrading method comprises the method steps when the user side carries out forced upgrading, so that the user can carry out forced upgrading under the condition that the system file is damaged, the user side carries out self-upgrading and repairing without returning to the original factory or a maintenance network for processing, and the use of the user is facilitated.
The above two patent materials are directed to boot upgrades of the hardware of embedded systems and the system files themselves. However, the hardware and system files of many embedded systems have no problem, but the logic of the application program has a problem, the execution of the subsequent logic is interrupted, the equipment enters a cycle of repeated restart, and manual intervention is difficult to perform due to the very short starting time (generally, millisecond level); in addition, repeated restarting of the equipment affects the stability of the equipment and even causes damage to the equipment.
At present, in a complex embedded system with high reliability and safety requirements, the situation of an operating system without memory protection still exists, and errors of an application program can cause some subtle faults which are difficult to track. In the prior art, a repair method for an embedded operating system mainly comprises the steps of repairing and upgrading an operating system file after a system fault occurs, and no effective intervention measure is provided for the logic problem of an application program.
Disclosure of Invention
The invention aims to provide a method for repairing an embedded equipment system based on starting information statistics, which is used for solving the problem that the method for repairing the embedded operating system mainly comprises the steps of repairing and upgrading the operating system file after system failure and has no effective intervention on the logic problem of an application program. The invention also provides a system for repairing the embedded equipment system based on the starting information statistics.
In order to achieve the above object, the present invention includes the following technical solutions.
A method for repairing an embedded device system based on starting information statistics comprises the following steps:
(1) when the system is started, calculating the error value of the count value of the system start and the count value of the application program start;
(2) when the error value is smaller than or equal to a set threshold value, the system normally operates; and when the error value is larger than the set threshold value, entering a system repair state, and after repair, resetting the system start count and the application program start count and restarting the system.
When the system is started, calculating error values of a system starting counter and an application program starting counter, if the error values are smaller than or equal to a threshold value, operating the system normally, and successively counting the system starting counter and the application program starting counter; and if the error value is larger than the threshold value, entering a system repairing state, resetting the system starting counter and the application program starting counter after repairing, and restarting. The system repair method considers the application program, relates to two aspects of the system and the application program, and carries out system repair according to the relation of the starting times of the two aspects so as to realize reliable repair of the system. Aiming at a large number of low-end embedded devices without MMU protection, when a system is started (generally at millisecond level), whether an application program can normally work or not is judged according to the starting counting relation of the system and the application program, then system repair is carried out, and meanwhile, repeated restarting of the system caused by the problems of the application program and instability and even damage of the device caused by the repeated restarting of the system are avoided. In addition, the setting of the threshold value enables the repairing method to have certain fault-tolerant capability, for example, when the system is restarted due to sudden power failure, manual restart and other factors, the system starts counting, the application program starting counter does not count and generates a certain difference value, although the certain difference value occurs, the system does not need to be repaired under the condition, and therefore, a certain threshold value is set for preventing the misjudgment condition of the system repair. In addition, the method can realize normal starting and forced upgrading, and does not need to add hardware configuration to the product.
Further, when the system is operating normally, the system start and application start continue to count normally.
Furthermore, the system starting counter counts the times of normal starting of the operating system, and the system is started and counted once after the system is normally started every time; the application program starting counter counts the times of stable running of the application program, the application program can normally work after the application program is normally started for a set time every time, and the application program is started and counted once.
Further, the repair is performed by manual intervention. When a problem occurs in a certain step of the business logic of the program, an entrance for manual repair is provided, manual intervention can be performed, and the system can be repaired conveniently.
An embedded device system repair system based on startup information statistics, comprising a repair control module including a memory, a processor, and a computer program stored in the memory and executable in the processor, the processor implementing steps when executing the computer program comprising:
(1) when the system is started, calculating the error value of the count value of the system start and the count value of the application program start;
(2) when the error value is smaller than or equal to a set threshold value, the system normally operates; and when the error value is larger than the set threshold value, entering a system repair state, and after repair, resetting the system start count and the application program start count and restarting the system.
Further, when the system is operating normally, the system start and application start continue to count normally.
Furthermore, the system starting counter counts the times of normal starting of the operating system, and the system is started and counted once after the system is normally started every time; the application program starting counter counts the times of stable running of the application program, the application program can normally work after the application program is normally started for a set time every time, and the application program is started and counted once.
Further, the repair is performed by manual intervention.
Drawings
FIG. 1 is a flow chart of a method for embedded device system repair based on startup information statistics;
FIG. 2 is a flow chart of an example method for embedded device system repair.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The embodiment provides a method for repairing an embedded device system based on starting information statistics. Fig. 1 is a flowchart of embedded device system repair based on startup information statistics.
As shown in fig. 1, whether a system repair state needs to be entered for manual intervention is determined, and the specific implementation manner is as follows:
(1) the abnormal operation of the application program can cause the system to be restarted continuously, and when the system is started every time, the error value of the count value of the system starting and the count value of the application program starting is calculated.
Here, the error value is exemplified by a difference value. In addition, for the convenience of counting, two counters are provided, namely a system starting counter and an application program starting counter, the system starting counter counts the system starting, and the application program counter counts the application program starting, so that in the system starting stage, the count value of the system starting counter and the count value of the application program starting counter are historical count information of the corresponding starting counters.
(2) Judging the size relationship between the obtained difference and a set threshold value:
when the difference value is smaller than or equal to the set threshold value, namely the difference between the count value of the system start and the count value of the application program start is not large, the operating system normally runs, the system start counter and the application program start counter count successively, namely normal counting continues, the system start counter counts the number of times of normal start of the operating system, and the system start counter counts once after the system is normally started every time; the application program starting counter counts the times of stable running of the application program, the application program can normally work after the application program is normally started for a set time every time, and the application program starting counter counts once.
When the difference is greater than the set threshold, it may be determined that the system is in a system repair state due to a continuous restart caused by an application exception, and then the system is repaired. And after the manual intervention repair, resetting the system starting counter and the application program starting counter, and restarting the system.
The set threshold value can be set according to actual conditions, and then the set threshold value enables the system repairing method to have certain fault-tolerant capability. When the system is restarted due to factors such as power failure, manual restart and the like, the system starting counter counts, and the application program starting counter does not count, so that a certain difference value is generated between the count value of the system starting counter and the count value of the application program starting counter. Although a certain difference occurs, in this case, the system does not need to be repaired, and therefore, the threshold is set in order to prevent the occurrence of the above-described erroneous judgment of system repair.
Therefore, the system repairing method judges whether the system repairing state needs to be entered for manual intervention or not according to the historical starting counter information of the system in the system starting stage.
The method for repairing the embedded device system based on the statistics of the startup information is described below by taking an example that the initial values of the system startup counter and the application program startup counter are 0, the set threshold value is 3, and the application program runs abnormally, and fig. 2 is a diagram illustrating an example of the method for repairing the embedded device system.
S1, when the system is started, calculating a difference value: the count value of the system starting counter-the count value of the application starting counter is equal to a difference value;
s2, when the difference value is less than or equal to the threshold value 3, the operating system is started normally, and the system starts a counter + 1;
s3, when the application program is normally started and operated, if the application program is abnormal and causes system restart, turning to S1;
s4, normally starting the application program, stably running the application program and normally operating the application program, and starting a counter +1 by the application program;
and S5, when the difference value is larger than the threshold value 3, determining that the system is in a system repair state due to continuous restart caused by application program exception, resetting the system start counter and the application program start counter after the application program is manually repaired in an intervention mode, restarting, and turning to S1.
The abnormal operation of the application program causes the system to be continuously restarted, when the system is started for 4 times, namely the system starting counter counts for 4 times, the application program starting counter does not count, then, the difference value is 4 and is larger than the threshold value 3, the system is determined to be continuously restarted due to the abnormal operation of the application program, manual repair is carried out, and the equipment instability or hardware damage caused by the continuous restart of the operating system can be avoided. And the threshold value is set to be 3, so that misjudgment caused by a difference value caused by system restart due to sudden power failure, artificial restart and other factors can be avoided, and certain fault-tolerant capability is realized.
The specific embodiments are given above, but the present invention is not limited to the described embodiments. The basic idea of the present invention lies in the above basic scheme, and it is obvious to those skilled in the art that no creative effort is needed to design various modified models, formulas and parameters according to the teaching of the present invention. Variations, modifications, substitutions and alterations may be made to the embodiments without departing from the principles and spirit of the invention, and still fall within the scope of the invention.
The system repair method can be used as a computer program, stored in a memory in a repair control module in the embedded device system repair system based on the starting information statistics and can run on a processor in the repair control module.

Claims (6)

1. A method for repairing an embedded device system based on starting information statistics is characterized by comprising the following steps:
(1) when the system is started, calculating the error value of the count value of the system start and the count value of the application program start; the system starting counter counts the times of normal starting of the operating system, and the system is started and counted once after the system is normally started each time; the application program starting counter counts the times of stable running of the application program, the application program can normally work after the application program is normally started for a set time every time, and the application program is started and counted once;
(2) when the error value is smaller than or equal to a set threshold value, the system normally operates; when the error value is larger than the set threshold value, entering a system repair state, resetting a system start count and an application program start count after repair, and restarting the system; the set threshold has fault-tolerant capability, and misjudgment caused by a difference value caused by system restart due to sudden power failure or artificial restart factors is avoided.
2. The embedded device system repairing method based on the startup information statistics as recited in claim 1, wherein when the system is operating normally, the system startup and the application startup continue to count normally.
3. The embedded device system repairing method based on startup information statistics as recited in claim 1 or 2, wherein the repairing manner is manual intervention repairing.
4. An embedded device system repair system based on startup information statistics, comprising a repair control module including a memory, a processor, and a computer program stored in the memory and executable in the processor, wherein the processor, when executing the computer program, performs steps comprising:
(1) when the system is started, calculating the error value of the count value of the system start and the count value of the application program start; the system starting counter counts the times of normal starting of the operating system, and the system is started and counted once after the system is normally started each time; the application program starting counter counts the times of stable running of the application program, the application program can normally work after the application program is normally started for a set time every time, and the application program is started and counted once;
(2) when the error value is smaller than or equal to a set threshold value, the system normally operates; when the error value is larger than the set threshold value, entering a system repair state, resetting a system start count and an application program start count after repair, and restarting the system; the set threshold has fault-tolerant capability, and misjudgment caused by a difference value caused by system restart due to sudden power failure or artificial restart factors is avoided.
5. The embedded device system repair system based on startup information statistics of claim 4, wherein when the system is running normally, system startup and application startup continue normal counting.
6. The embedded device system repairing system based on startup information statistics as recited in claim 4 or 5, wherein the repairing manner is manual intervention repairing.
CN201810457133.XA 2018-05-14 2018-05-14 Embedded equipment system repairing method and system based on starting information statistics Active CN108664357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810457133.XA CN108664357B (en) 2018-05-14 2018-05-14 Embedded equipment system repairing method and system based on starting information statistics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810457133.XA CN108664357B (en) 2018-05-14 2018-05-14 Embedded equipment system repairing method and system based on starting information statistics

Publications (2)

Publication Number Publication Date
CN108664357A CN108664357A (en) 2018-10-16
CN108664357B true CN108664357B (en) 2021-07-13

Family

ID=63779463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810457133.XA Active CN108664357B (en) 2018-05-14 2018-05-14 Embedded equipment system repairing method and system based on starting information statistics

Country Status (1)

Country Link
CN (1) CN108664357B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092724A (en) * 2012-12-19 2013-05-08 宁波三星电气股份有限公司 System self-recovery method for embedded electric power terminal
CN103970564A (en) * 2014-04-23 2014-08-06 京信通信***(中国)有限公司 Automatic repairing and upgrading method of embedded operating system and embedded operating system with automatic repairing and upgrading functions
CN105511976A (en) * 2015-12-01 2016-04-20 长城信息产业股份有限公司 Embedded system application program self-recovery operation method and device
CN107526646A (en) * 2016-06-20 2017-12-29 中兴通讯股份有限公司 Monitoring method, device and watchdog system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100820789B1 (en) * 2001-04-06 2008-04-10 엘지전자 주식회사 System based on real time and its monitoring method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092724A (en) * 2012-12-19 2013-05-08 宁波三星电气股份有限公司 System self-recovery method for embedded electric power terminal
CN103970564A (en) * 2014-04-23 2014-08-06 京信通信***(中国)有限公司 Automatic repairing and upgrading method of embedded operating system and embedded operating system with automatic repairing and upgrading functions
CN105511976A (en) * 2015-12-01 2016-04-20 长城信息产业股份有限公司 Embedded system application program self-recovery operation method and device
CN107526646A (en) * 2016-06-20 2017-12-29 中兴通讯股份有限公司 Monitoring method, device and watchdog system

Also Published As

Publication number Publication date
CN108664357A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
US20240012706A1 (en) Method, system and apparatus for fault positioning in starting process of server
US7395455B2 (en) System, method and program product for recovering from a failure
CN100451967C (en) Document switching method of basic input output system and controller capable of supporting switching thereof
US10921871B2 (en) BAS/HVAC control device automatic failure recovery
CN112306732B (en) Automatic error correction control method, device, equipment and medium in server
CN102567177A (en) System and method for detecting error of computer system
CN111897686A (en) Server cluster hard disk fault processing method and device, electronic equipment and storage medium
CN112231140A (en) Method, system, terminal and storage medium for fault recovery of BMC (baseboard management controller) of storage device
US10824517B2 (en) Backup and recovery of configuration files in management device
CN109933374B (en) Computer starting method
CN102262573B (en) Operating system (OS) start-up protecting method and device
CN108664357B (en) Embedded equipment system repairing method and system based on starting information statistics
CN112395121A (en) Drive loading processing method and device, storage medium and computer equipment
CN112000508A (en) Starting repair method of ARM server and related device
CN109002317B (en) PCBA firmware upgrading method and system and PCBA
CN111221683A (en) Double-flash hot backup method, system, terminal and storage medium for data center switch
CN107179911B (en) Method and equipment for restarting management engine
CN114217925A (en) Business program operation monitoring method and system for realizing abnormal automatic restart
CN114153503A (en) BIOS control method, device and medium
US7246206B2 (en) Method and device for storing a computer program in a program memory of a control unit
CN112685086B (en) Information processing method and electronic equipment
CN117116332B (en) Multi-bit error processing method, device, server and storage medium
CN116521419B (en) Control method of embedded operating system
CN114416196B (en) Multi-service cascade starting method and device and computer readable storage medium
CN117075977A (en) Method and device for starting processor, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant