CN108108259A

CN108108259A - A kind of kernel Fault Locating Method and device

Info

Publication number: CN108108259A
Application number: CN201810026869.1A
Authority: CN
Inventors: 常现超
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2018-01-11
Filing date: 2018-01-11
Publication date: 2018-06-01

Abstract

The present invention provides a kind of kernel Fault Locating Method and devices, monitoring server system, hardware whether failure, when system jam on server, or during hardware failure, the memory information of failure system is collected by BMC, analyzes internal storage data, the reason for rapidly analyzing failure simultaneously positions failure, find solution fault method, the present invention can ensure business on server can fast quick-recovery, reduce loss.

Description

A kind of kernel Fault Locating Method and device

Technical field

The present invention relates to the technical fields of server, and in particular to a kind of kernel Fault Locating Method and device.

Background technology

As client traffic demand constantly increases, the performance of server must be continuously increased, the hardware configuration of server It is constantly promoted, as CPU is likely to be breached more than thousand cores, memory reaches more than TB.Also event is improved while server hardware increase Barrier rate, operating system also become increasingly complex, and with the increase of hardware, driver also accordingly increases, and the BUG of introducing can also be got over Come more.When server fail, it is necessary to which quick analyzing failure cause simultaneously finds solution, it is necessary to preserve Or obtain corresponding data and analyzed, especially when key business is disposed on server, quickly cope with problem Economic loss will be reduced to client, ensures the fast quick-recovery of business.

In the prior art, common Fault Locating Method is installs K-UX operating systems and runs on the server, normally In the case of K-UX operating systems in K-UX kernels, when catastrophe failure occurs, K-UX kernels hang up, then start Crash kernels (Crash kernels：One small linux kernel is mainly used for the internal storage data of K-UX kernels being saved in magnetic Disk)；The internal storage data that K-UX kernels use is saved on disk by Crash kernels, to restart post analysis orientation problem next time； After Crash kernels have collected K-UX kernel memory informations, restart system and enter in BIOS, BIOS proceeds by hardware initialization etc. Operation, BIOS final stage start to load K-UX kernel activation systems；Into after K-UX systems, analysis crash kernels are saved in Internal storage data (as shown in Figure 2) on disk.The shortcomings that prior art is：1st, user configuration crash kernels are needed, and in distribution It deposits, wastes certain memory headroom；2nd, preserving internal storage data needs a large amount of disk spaces, wastes disk space；3rd, many users Crash kernels are not configured when installing K-UX, great difficulty is brought to follow-up orientation problem.

The content of the invention

Based on the above problem, the present invention proposes a kind of kernel Fault Locating Method and device, and failure system is collected by BMC The memory information of system, quick the reason for analyzing failure, simultaneously position failure.

The present invention provides following technical solution：

On the one hand, the present invention provides a kind of kernel Fault Locating Method, including：

Step 101, monitor K-UX kernels and/or hardware whether failure；

Step 102, if K-UX kernels and/or hardware fault, into BMC systems, the memory information of failure system is obtained；

Step 103, the memory information of the failure system is analyzed, positions failure.

Wherein, solution failure is further included after the positioning failure, recovers server normal operation.

Wherein, the failure system is K-UX systems or hardware system.

Wherein, the K-UX kernels failure includes at least one null pointer, Array Bound, soft deadlock, hard deadlock；It is described hard Part failure includes that disk sector can not be read and write, CPU core at least one can not work normally.

In addition, the present invention also provides a kind of kernel fault locator, described device includes：

Monitoring modular, for monitor K-UX kernels and/or hardware whether failure；

Acquisition module for entering BMC systems when K-UX kernels and/or hardware fault, obtains the memory information of failure system； Locating module for analyzing the memory information of the failure system, positions failure.

Wherein, the failure system is K-UX systems or hardware system.

The present invention provides a kind of kernel Fault Locating Method and device, monitoring server system, hardware whether failure, when System jam or during hardware failure on server, collects the memory information of failure system by BMC, in analysis Deposit data, rapidly analyze failure the reason for and position failure, find solution fault method, the present invention can ensure on server Business can fast quick-recovery, reduce loss.

Description of the drawings

Fig. 1 is the flow chart of the present invention；

Fig. 2 is the flow chart of the prior art.

Specific embodiment

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to needed in the embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Based on above-mentioned, on the one hand, embodiments of the present invention provide a kind of kernel Fault Locating Method, and attached drawing 1 is this The flow chart of invention, the described method includes：

Step 101, monitor K-UX kernels and/or hardware whether failure；

K-UX:Tide operating system, class Linux.K-UX operating systems are installed on server simultaneously normal operation, monitoring K-UX kernels or other hardware faults；

When K-UX kernels break down, log in BMC systems and obtain K-UX memory informations.Wherein, BMC： Baseboard Management Controller baseboard management controllers run the small-sized behaviour of a separate server system Make system, effect is the operations such as to facilitate the remote management of server, monitoring, install, restart.K-UX kernel catastrophe failures：Such as sky Pointer, Array Bound, soft deadlock, hard deadlock etc. cause the failure that K-UX systems can not work on.Hardware fault：Cause hardware The failure that can not be continuing with, if some sectors of disk can not be read and write, some CPU cores can not work normally.

The reason for analyzing the K-UX memory informations obtained, positioning failure；Failure is solved, recovers server normal operation.

The present invention provides a kind of kernel Fault Locating Method, monitoring server system, hardware whether failure, work as server When upper system jam or hardware failure, the memory information of failure system is collected by BMC, analyzes internal storage data, The reason for rapidly analyzing failure simultaneously positions failure, finds solution fault method, and the present invention can ensure the business on server Can fast quick-recovery, reduce loss.

On the other hand, embodiments of the present invention provide a kind of kernel fault locator, and described device includes：

Monitoring modular 201, for monitor K-UX kernels and/or hardware whether failure；

Acquisition module 202 for entering BMC systems when K-UX kernels and/or hardware fault, obtains the interior of failure system Deposit information；

Locating module 203 for analyzing the memory information of the failure system, positions failure.

The present invention provides a kind of kernel fault locator, monitoring server system, hardware whether failure, work as server When upper system jam or hardware failure, the memory information of failure system is collected by BMC, analyzes internal storage data, The reason for rapidly analyzing failure simultaneously positions failure, finds solution fault method, and the present invention can ensure the business on server Can fast quick-recovery, reduce loss.

The foregoing description of the disclosed embodiments enables those skilled in the art to realize or use the present invention.To this A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited The embodiments shown herein is formed on, but meets the most wide model consistent with the principles and novel features disclosed herein It encloses.

Claims

1. a kind of kernel Fault Locating Method, it is characterised in that：

Step 101, monitor K-UX kernels and/or hardware whether failure；

Step 102, if K-UX kernels and/or hardware fault, into BMC devices, the memory information of failed equipment is obtained；

Step 103, the memory information of the failed equipment is analyzed, positions failure.

2. according to the method described in claim 1, it is characterized in that：Solution failure is further included after the positioning failure, is recovered Server normal operation.

3. according to the method described in claim 1, it is characterized in that：The failed equipment is K-UX devices or hardware unit.

4. according to the method described in claim 1, it is characterized in that：The K-UX kernels failure include null pointer, Array Bound, At least one soft deadlock, hard deadlock；The hardware fault is including disk sector can not be read and write, CPU core can not work normally at least One of.

5. a kind of kernel fault locator, it is characterised in that：Described device includes：

Monitoring modular, for monitor K-UX kernels and/or hardware whether failure；

6. device according to claim 5, it is characterised in that：Solution failure is further included after the positioning failure, is recovered Server normal operation.

7. device according to claim 5, it is characterised in that：The failed equipment is K-UX devices or hardware unit.

8. device according to claim 5, it is characterised in that：The K-UX kernels failure include null pointer, Array Bound, At least one soft deadlock, hard deadlock；The hardware fault is including disk sector can not be read and write, CPU core can not work normally at least One of.