CN102929747A - Method for treating crash dump of Linux operation system based on loongson server - Google Patents

Method for treating crash dump of Linux operation system based on loongson server Download PDF

Info

Publication number
CN102929747A
CN102929747A CN2012104370920A CN201210437092A CN102929747A CN 102929747 A CN102929747 A CN 102929747A CN 2012104370920 A CN2012104370920 A CN 2012104370920A CN 201210437092 A CN201210437092 A CN 201210437092A CN 102929747 A CN102929747 A CN 102929747A
Authority
CN
China
Prior art keywords
kernel
linux
collapse
suse
dump
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104370920A
Other languages
Chinese (zh)
Other versions
CN102929747B (en
Inventor
张路波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Standard Software Co Ltd
Original Assignee
China Standard Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Standard Software Co Ltd filed Critical China Standard Software Co Ltd
Priority to CN201210437092.0A priority Critical patent/CN102929747B/en
Publication of CN102929747A publication Critical patent/CN102929747A/en
Application granted granted Critical
Publication of CN102929747B publication Critical patent/CN102929747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for treating crash dump of a Linux operation system based on a loongson server. The method includes that a kernel of an output and input system of the loongson boots program pass parameters and starts a Linux system; crash dump service of the Linux system is started, and captured kernel mapping file and startup parameters are loaded to storage space; when a system kernel crashes, the processor of the system is performed to control and capture kernel boot, memory images of crashed kernels are stored into dump files; and the Linux system is rebooted to analyze the dump files. According to the method, supports to million instructions per second (MIPS) frameworks of the crash dump service and the loongson server are added when the kernel of the Linux operation system operating on the loongson server is compiled, so that the Linux system operating on domestic MIPS frameworks can achieve kernel crash dump during crashing, and kernel debugging efficiency and practicability of developers are enhanced.

Description

Collapse the disposal route of dump based on the (SuSE) Linux OS of Loongson server
Technical field
The present invention relates to the operation system technology field, relate in particular to a kind of disposal route of collapsing dump based on the (SuSE) Linux OS of Loongson server.
Background technology
Along with country to autonomous controlled continuous attention, the Godson processor of domestic MIPS framework obtains more and more deep popularization.The problem that solves its software support ability seems more and more important, particularly supports to move (SuSE) Linux OS thereon.
Yet, although supported (SuSE) Linux OS at present, but the Kernel Panic dump function of the linux system of operation when collapse is still unrealized thereon, when collapse appears in the linux system that the disappearance of this function causes the Godson processor at the MIPS framework to move, the field data of the internal memory when the kernel development personnel can't be collapsed with the application tool analysis comes the reason of positioning system collapse, in the situation of the field data when not having system crash, the kernel development personnel find out and the crash reason of resolution system will be very difficult.
Although the collapse dump service (being called for short the kdump service) when operating at present the back-up system collapse of linux system under the architectural frameworks such as X86, Powerpc, Arm, Alpha, but owing between MIPS architectural framework and other several architectures a lot of differences are arranged, therefore when collapse occurs linux system, just can't realize collapsing dump function.
Summary of the invention
One of technical matters to be solved by this invention is that a kind of disposal route of collapsing dump based on the (SuSE) Linux OS of Loongson server need to be provided.
In order to solve the problems of the technologies described above, the invention provides a kind of disposal route of the (SuSE) Linux OS collapse dump based on Loongson server, the method comprises:
When described Loongson server powers up start, the boot kernel program Transfer Parameters of the input-output system of described Loongson server also starts the system kernel of described (SuSE) Linux OS, wherein, described parameter comprises the parameter of the seizure kernel setup storage space that is used to described (SuSE) Linux OS;
Open the collapse dump service of described (SuSE) Linux OS, described collapse dump service is loaded on image file and the start-up parameter of described seizure kernel in the described storage space;
When collapse occured for the system kernel of described (SuSE) Linux OS, the processor of carrying out described (SuSE) Linux OS was controlled described seizure kernel and is started, and the memory mirror that the described system kernel that collapses will occur is stored as dump file;
Restart the system kernel of the (SuSE) Linux OS on the described Loongson server, described dump file analyzed, with before finding once the system kernel of described (SuSE) Linux OS the reason of collapse occurs, wherein,
The system kernel that operates in the described (SuSE) Linux OS on the described Loongson server is supported described collapse dump service, and the seizure kernel of described (SuSE) Linux OS is supported the MIPS framework of described Loongson server.
Disposal route according to a further aspect of the invention, when the system kernel to described (SuSE) Linux OS compiles, in the kernel option, select CONFIG_KEXEC_CRASH, CONFIG_KEXEC, CONFIG_SYSFS and CONFIG_DEBUG_INFO option, so that the system kernel of described (SuSE) Linux OS is supported described collapse dump service.
Disposal route according to a further aspect of the invention, when the seizure kernel to described (SuSE) Linux OS compiles, in the kernel option, remove CONFIG_NUMA and CONFIG_SMP option, add the CONFIG_CRASH_DUMP option, so that the seizure kernel of described (SuSE) Linux OS is supported the MIPS framework of described Loongson server.
Disposal route according to a further aspect of the invention, described collapse dump service is loaded on the image file of described seizure kernel and start-up parameter in the step in the described storage space, further may further comprise the steps:
Described collapse dump service routine execution/etc/init.d/kdump script;
Described/etc/init.d/kdump script is carried out the kexec order of kexec instrument, image file and the start-up parameter of described seizure kernel is loaded in the described storage space,
Wherein, described kexec order comprises four kexec sections, and these four sections are respectively: the image file section of crash kernel; Pass to crash kernel order line message segment; Standard kernel memory information content, vmcoreinfo file, each processor message segment; Deposit the backup area segments of Backup Data;
Wherein, the region of memory tabulation part of the operating system of obtaining current operation in the kexec instrument is made amendment, and according to the internal memory physical address layout of the MIPS framework of described Loongson server the start address of described four kexec sections is made amendment.
Disposal route according to a further aspect of the invention, when collapse occured for the system kernel of described (SuSE) Linux OS, the processor of carrying out described (SuSE) Linux OS was controlled in the step that described seizure kernel starts and further be may further comprise the steps,
Described (SuSE) Linux OS enters in the collapse handling procedure of described system kernel, and described processor is controlled other processor shut-down operations on the described Loongson server;
Described processor is for starting described seizure kernel warning order line parameter and environmental variance parameter;
Described processor is based on described command line parameter and described environmental variance parameter, and the first address that jumps to described seizure kernel is sentenced the described seizure kernel of startup.
Disposal route is according to a further aspect of the invention carried out the processor of collapse handling procedure by look-at-me between other processor sending processors on described Loongson server, so that other processor shut-down operations on the described Loongson server.
Disposal route according to a further aspect of the invention, in the step of described processor for the described seizure kernel warning order line parameter of startup and environmental variance parameter,
The command line parameter that described processor transmits when described environmental variance parameter and described kexec order is loaded the image file of described seizure kernel copies in the memory headroom of reserving among the head.s, has then prepared to start the Status Flag set of described seizure kernel to being used for expression.
Disposal route according to a further aspect of the invention is stored as the memory mirror that the described system kernel of collapse occurs in the step of dump file, further may further comprise the steps,
After described seizure kernel starts, carry out vmcore_init initialization function;
Described vmcore_init initialization function judges whether the elfcorehdr parameter value in the described command line parameter is ELFCORE_ADDR_ERR or ELFCORE_ADDR_MAX,
ELFCORE_ADDR_ERR neither ELFCORE_ADDR_MAX if judged result is described elfcorehdr parameter value, establishment/proc/vmcore file then, and the memory mirror of described system kernel that collapse will occur with the ELF stored in file format in described/proc/vmcore file;
Open collapse dump service routine calling the makedumpfile order of makedumpfile instrument, with preserved the described system kernel that collapse occurs memory mirror /proc/vmcore file generated dump file is with as dump file.
Disposal route according to a further aspect of the invention, in described makedumpfile instrument, added the interface function for the MIPS framework of described Loongson server, comprising 3 interfaces, be respectively and obtain the physical base address interface, obtain the address of discontinuous memory field first element of chained list in the collapse kernel and the start address interface of discontinuous memory field, the translation interface that the virtual address when reading the collapse kernel information arrives physical address.
Disposal route is according to a further aspect of the invention analyzed described dump file by the crash instrument, with before finding once the system kernel of described (SuSE) Linux OS the reason of collapse occurs,
Wherein, in described crash instrument, added the interface function for the MIPS framework of described Loongson server, comprising 3 interfaces, the virtual address translation that is respectively the collapse kernel is the physical address interface, all hardware of MIPS framework machine is carried out the page directory item pointer interface that initialization arranges interface, obtains the appointment process.
Compared with prior art, one or more embodiment of the present invention can have following advantage:
The inventive method is to the system kernel that operates in the (SuSE) Linux OS on the Loongson server with when catching recompile kernel, added the support to the MIPS framework of collapse dump service and Loongson server, make the (SuSE) Linux OS that operates in domestic MIPS framework when system crash, realize the Kernel Panic dump, the field data of dump can make the reason of fast and accurately positioning system collapse of kernel development personnel when then the tool using analytic system was collapsed, and had improved efficient and practicality that the kernel development personnel carry out the kernel debugging.
Other features and advantages of the present invention will be set forth in the following description, and, partly from instructions, become apparent, perhaps understand by implementing the present invention.Purpose of the present invention and other advantages can realize and obtain by specifically noted structure in instructions, claims and accompanying drawing.
Description of drawings
Accompanying drawing is used to provide a further understanding of the present invention, and consists of the part of instructions, jointly is used for explaining the present invention with embodiments of the invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the schematic flow sheet based on the disposal route of the (SuSE) Linux OS of Loongson server collapse dump according to the embodiment of the invention;
Fig. 2 (a) and Fig. 2 (b) open the operational flowchart that collapses dump when collapse occurs for the operational flowchart of collapse dump service and system according to the kernel of the embodiment of the invention;
Fig. 3 is the workflow synoptic diagram according to the kdump service of the embodiment of the invention;
Fig. 4 is the file layout synoptic diagram according to the vmcore internal memory crashdump file of the embodiment of the invention.
Embodiment
Describe embodiments of the present invention in detail below with reference to drawings and Examples, how the application technology means solve technical matters to the present invention whereby, and the implementation procedure of reaching technique effect can fully understand and implements according to this.Need to prove that only otherwise consist of conflict, each embodiment among the present invention and each feature among each embodiment can mutually combine, formed technical scheme is all within protection scope of the present invention.
In addition, can in the computer system such as one group of computer executable instructions, carry out in the step shown in the process flow diagram of accompanying drawing, and, although there is shown logical order in flow process, but in some cases, can carry out step shown or that describe with the order that is different from herein.
The (SuSE) Linux OS of the embodiment of the invention on the Loongson server that operates in domestic MIPS framework illustrates and realize the disposal route of Kernel Panic dump when collapse occurs (SuSE) Linux OS as example.
Fig. 1 is the schematic flow sheet based on the disposal route of the (SuSE) Linux OS of Loongson server collapse dump according to the embodiment of the invention, Fig. 2 (a) and Fig. 2 (b) open the operational flowchart that collapses dump when collapse occurs for the operational flowchart of kdump collapse dump service and system according to the kernel of the embodiment of the invention, below with reference to Fig. 1 and Fig. 2, describe each step of the present invention in detail.
Step S110, when Loongson server powers up start, the boot kernel program Transfer Parameters of the input-output system of Loongson server (be called for short Pmon) also starts system kernel (the Standard kernel of (SuSE) Linux OS, also claim the first kernel), wherein, parameter comprises the parameter of seizure kernel (being called for short Crash kernel, is the second kernel that the moves) configuration storage space that is used to (SuSE) Linux OS when system crash.
Particularly, at first, Loongson server to domestic MIPS framework powers up start, pmon boot kernel program passes the ginseng operation and starts the first kernel of linux system, having one in the parameter of transmitting is " crashkernel=XXX@YYY ", and wherein, " XXX " is default memory size, " YYY " is offset, is that system is the start address of the region of memory of the second kernel reservation.Transmitting this parameter purpose is that " XXX " the big or small internal memory that begins from the represented address of internal memory " YYY " keeps to catching kernel (crash kernel also claims the second kernel) use.
Need to prove, operate in the system kernel support collapse dump service of the (SuSE) Linux OS on the Loongson server.
Can support collapse dump service (being called for short the kdump service) in order to make the system kernel on the Loongson server that operates in domestic MIPS framework, when the system kernel to (SuSE) Linux OS compiles, in the kernel option, select the options such as CONFIG_KEXEC_CRASH, CONFIG_KEXEC, CONFIG_SYSFS, CONFIG_DEBUG_INFO.
Incidentally, the seizure kernel of the (SuSE) Linux OS of the embodiment of the invention need to be supported the MIPS framework of Loongson server.Crash kernel is the reflection of a vmlinux form, i.e. a unpressed ELF image file.When catching kernel and compile, in order to support the MIPS framework, need under the platform of the domestic Loongson server of MIPS64 framework, remove the kernel options such as CONFIG_NUMA and CONFIG_SMP, add the CONFIG_CRASH_DUMP option.
Step S120, the collapse dump service of opening (SuSE) Linux OS is loaded in the storage space with image file and the start-up parameter that will catch kernel.
Particularly, unlatching operates in the kdump service in the (SuSE) Linux OS on the Loongson server of domestic MIPS framework, when opening this service, will carry out/the etc/init.d/kdump script, this script can carry out/and the sbin/kexc order is loaded on the image file of crash kernel and the start-up parameter that passes to crash kernel at the storage space place of the setting among the above-mentioned steps S110.
More specifically, opening the kdump service will carry out/the etc/init.d/kdump script, this script is carried out the kexec order of kexec-tools instrument, and the kexec order will be organized 4 kexec sections, and these four sections are respectively: the image file section of (1) crash kernel; (2) pass to crash kernel order line message segment; (3) standard kernel memory information content, vmcoreinfo file, each processor message segment; (4) deposit the backup area segments of Backup Data.
In the structure of kexec_info, the system call by sys_kexec_load copies kernel spacing to from user's space with the information of these four sections and the Information encapsulation between the memory field among the standard kernel for kexec order.Kernel according to code and data segment, crash kernel order line message segment, standard kernel memory information content, vmcoreinfo, each processor message segment of the crash kernel image file that records in this structure, deposit the information of the size that should leave the concrete start address of physical memory and section in of these four section correspondences of backup area segments of Backup Data, copy the content of each section to corresponding with it correct physical memory position from user's space.
Need to prove, for so that kexec supports the MIPS framework, revised the region of memory tabulation part of obtaining current operational system in the kexec instrument, be used for generating the second kernel section in the modifiers, the order line message segment, the section that the current information of each processor of the program header information of the ELF information of the first kernel, vmcoreinfo information, multiprocessing forms, start address in 4 sections of backup data information section (being above-mentioned 4 kexec sections), the address of these 4 sections obtains by computing according to the internal memory physical address layout of MIPS framework.
Need to prove, the kexec in the Kexec-tools kit be a kernel to the start-up loading device of kernel, it can start the new kernel of another one and need not pass through BIOS in the space of a kernel that is moving.In essence, the pre-loaded new kernel of kexec and this kernel left in the internal memory.The internal memory of storing new kernel reflection does not require it is continuous, and kexec preserves the trace information of the page or leaf of new kernel stored memory.When restarting, kexec copies new kernel reflection to position that it will be moved, then carries out some code is set, and then kexec gives new kernel with control.
Kexec mainly is divided into two parts: kernel spacing part and user's space part.Kernel spacing has partly been realized a new system call kexec load (), and it is with helping to carry out pre-loaded to new kernel.The user's space part also is known as the kexec instrument, be responsible for to resolve the kernel reflection, prepares suitable parameter section and code segment is set and the system call by a up-to-date realization passes to these data the kernel that is moving.
Step S130, when collapse occured for the system kernel of (SuSE) Linux OS, the processor control of carrying out (SuSE) Linux OS caught kernel and starts, and the memory mirror that the system kernel that collapses will occur is stored as dump file.
Particularly, when collapse occurs in the linux system on the Loongson server that operates in domestic MIPS framework, this moment, system entered in the collapse handling procedure of kernel, carry out the processor of (SuSE) Linux OS, namely carry out the processor of the collapse handling procedure in the collapse dump service, control on the Loongson server of domestic MIPS framework other several processor shut-down operations.For example, always have 8 processors on Godson 3A SERVER, need to stop 7 extra processors this moment.
Then, processor wherein can comprise the elfcorehdr parameter for starting crash kernel warning order line parameter and environmental variance parameter in the order line, jump to the first address of crash kernel to start crashkernel, at last, other processor of initialization normally moves other processor.
In this step, make the linux system generation collapse on the Loongson server that operates in domestic MIPS framework that following several situation can be arranged:
(1) if exist one firmly to lock, and disposed " NMI house dog ", kernel will trigger die nmi (), causes system crash.
(2) if be provided with panic on oops when having called die () or having called die () in interrupting context, kernel can collapse.
When (3) moving in system, pin simultaneously the ALT on the keyboard, SysRq and c button or at system terminal input echo c/proc/sysrq-trigger.
(4) in kernel, call BUG (), BUG ON () or panic () function.
(5) in the code of kernel state, a null pointer is carried out referencing operation or removes 0 mistake.
Use above any method to make system crash, this moment, system will enter in the collapse handling procedure, the processor of carrying out the collapse handling procedure passes through to look-at-me between other processor sending processor, other processor receiving between processor after the look-at-me the value pop down of each register in each self processor, entering afterwards idle running and judging that constantly expression prepared to start the Status Flag that catches kernel (also weighing, it is ready to open).
Because when Kernel Panic, whole system is not out of service, and each processor is still by periodic duty, and various interruptions still can occur frequently, so these situation meetings so that the scene that collapses destroyed.If can not be when system crash saving scene, carry out core dump and become and seem meaningless.
Interruption is a kind of program that processor is hung up carrying out and turn the operation of processing special event.When processor was received a look-at-me, it can be pressed into storehouse with the value of each register of current processor, and then redirect removes to carry out the interrupt response handling procedure.After interrupt handling routine is finished, the value of the register of original pop down is recovered, then continued to carry out.Wherein, the operation with the value pop down of each register of processor is called the interrupt spot protection.Similar with the protection interrupt spot, each register of processor also is to reappear the on-the-spot important component part of collapse.Therefore, when system crash, instruction executing location and other information in order to check program need to preserve the value of each register in the processor.For interruption, the value of processor register is kept in the internal memory storehouse.And for core dump, these values then are written to just can be guaranteed in the file can not lose.
In addition, because when pmon passes ginseng to standard kernel, parameter information has been left on the end address of standard kernel, the parameter information of the core position behind the end address can be covered after the standard kernel operation, and so that the parameter of environmental variance and order line is lost, system crash enters behind the crash kernel owing to not having command line parameter and environmental variance parameter to cause and can not normally start like this.Need to reserve the memory headroom of one page for head it off in head.s, standard kernel is stored in environmental variance and command line parameter in the static array after starting.
The command line parameter that the processor of carrying out the collapse handling procedure transmits when just the environmental variance parameter in the static array and kexec order load crash kernel copies in the memory headroom of one page of reserving among the head.s, and waiting will be to restarting ready Status Flag set after ready.
This moment, other processor jumped to kexec_smp_wait assembly code place, the processor of carrying out the collapse handling procedure jumps to relocate_new_kernel assembly code place, other processor all can jump to the entrance of crash kernel with the processor of carrying out the collapse handling procedure after the processor of the medium pending collapse handling procedure of kexec_smp_wait is stored in the tram with crash kernel.
Because after the Godson processor of domestic MIPS framework powered up, all processors all can move, the executing location that powers up article one instruction of preprocessor is at 0xffffffffbfc00000.Only have No. 0 processor can go to load kernel this moment and jump to the kernel code porch and carry out, wait for that look-at-me is finished initialization between No. 0 processor sending processor and begin to carry out kernel code thereby other processor then dallies.If the execution kernel code of multiprocessing is not such order then kernel will move undesired.Reason owing to this characteristic, when the entrance that all processors jump to crash kernel after system crash will begin to carry out, will judge first that the processor of carrying out kernel program is No. 0 processor, if not, then other processor idle running are waited for, if so, will make other processor also begin to carry out kernel program to look-at-me between other processor sending processor after then No. 0 processor continuation execution and initialization are finished.
Be divided into crash pattern and normal pattern when kernel is restarted, the crash pattern refers to the system that is moving because the mistake of system causes the pattern of collapse; The normal pattern refers to that any mistake does not occur in the system that is moving, but uses kexec to order the kernel that will restart to be loaded in the internal memory, and the grammer that loads kernel is as follows:
kexec-l<kernel-image>--append="<command-line-option>"。
After kernel loads successfully, use the kexec-e order just can restart the kernel of new loading, what present embodiment used is the crash pattern.
The below describes the step that the memory mirror that the system kernel that collapses will occur is stored as dump file in detail.
Particularly, after the normal startup of crash kernel, can carry out vmcore_init initialization function, this function is judged elfcorehdr parameter value whether ELFCORE_ADDR_ERR or the ELFCORE_ADDR_MAX in order line, if all be not, establishment/proc/vmcore file and running environment and the memory headroom of the first kernel standard kernel wrapped up with the ELF file layout then, automatically perform/etc/init.d/kdump script unlatching kdump service, this script can judge/whether the proc/vmcore file exist, if the makedumpfile that exists then will call the makedumpfile instrument and provide orders to generate the dump file, the dump file is restarted system after successfully generating.
Need to prove that the running environment of makedumpfile tool analysis standard kernel and memory headroom also store these information in the disk file into, so that the kernel development personnel come the reason of analytic system collapse according to the memory information of storing in the file.In the makedumpfile instrument, added the interface function of the corresponding concrete MIPS framework of common interface, comprising 3 interfaces, be respectively obtain the physical base address (the physical base address refers to the specific address of the physical memory position at code or data place, this base address refer to be the initial physical address of collapse core position, kernel place), obtain the start address, the virtual address when reading the collapse kernel information of the address of discontinuous memory field first element of chained list and discontinuous memory field in the collapse kernel to the conversion of physical address.
If the elfcorehdr parameter value is ELFCORE_ADDR_ERR or ELFCORE_ADDR_MAX, then the vmcore_init function do not create/the proc/vmcore file directly returns.Owing to there is not establishment/proc/vmcore file, do not generate the dump file with regard to not calling the makedumpfile instrument when operation/etc/init.d/kdump script was opened the kdump service after system started in this case.
Fig. 3 is the workflow synoptic diagram according to the kdump service of the embodiment of the invention, as shown in Figure 3, when system crash panic appears in system, kdump can kexec starts preprepared this Starting mode of crash kernel. fast and fast start-up mechanism is similar by calling, can not pass through BIOS, belong to warm start.After Crash kernel started, the memory mirror during previous kernel operation can be saved to/proc/vmcore, order cp that can be by xcopy or scp with its vmcore file copy on local disk or remote disk.
Fig. 4 is the file layout synoptic diagram according to the vmcore internal memory crashdump file of the embodiment of the invention, as shown in Figure 4, the core dump file of Vmcore file layout is comprised of a dump head and a series of data page that comprises Installed System Memory, and the layout of it and ELF is very similar.
Its form forms as shown in Figure 4.The head of file is comprised of two parts, and one is universal memory dump head (Generic Dump Header), and it is a part that has nothing to do with system architecture; Another is framework associated internal memory dump head (Architecture DumpHeader), and it is a part that is closely related with framework.The mode of this double head, can the different framework dumps of the fine permission different data structures that are closely related with CPU.What be right after head is a series of parts that are comprised of dump top margin section (Dump Page Header) and dump page data (Dump PageData).Wherein, the information that top margin section comprises has: the size of page or leaf, the sign relevant with page or leaf (as whether compressing etc.), page address etc.After top margin section has been exactly page data, and page data can compress, and also can not compress.Each data page is by continuous preservation, end mark to the last (PAGE END).This mechanism allows multiple different compress mode, dissimilar data frameworks, the sequential scheduling of different pages or leaves.
Step S140 restarts the system kernel of the (SuSE) Linux OS on the Loongson server, and dump file is analyzed, and the reason of collapse occurs with the system kernel that finds a front (SuSE) Linux OS.
After restarting system, can analyze the vmcore file of just now preserving by analysis tool, search the reason that causes panic.
Particularly, can analyze the Kernel Panic dump file vmcore of final generation by utilizing the crash instrument.The crash instrument is designed to and concrete kernel version independent.Crash can be by upgrading to obtain the support on the new kernel code that affects the crash function.Crash has the kernel stack of all processors to follow the tracks of for the order of kernel core analysis, the source code dis-assembling, and format kernel data structure and variable show, virtual memory data, tabulation demonstration etc.Also comprise in addition several orders relevant with the particular core subsystem.Simultaneously, crash expands to support the order of gdb by the module of gdb.The crash instrument is designed to and concrete kernel version independent.Crash can be by upgrading to obtain the support on the new kernel code that affects the crash function.
In order to support the MIPS framework, the crash instrument has been carried out following modification: in order to support the MIPS framework, the main realization of in the crash instrument, having added the part of interface function that is directed to the MIPS framework, comprising 3 interfaces, the virtual address translation that is respectively the collapse kernel is the physical address interface, all necessary hardware of MIPS framework machine is carried out the page directory item pointer interface that initialization arranges interface, obtains the appointment process.
The present invention is by having added the support to the MIPS framework in kexec instrument, makedumpfile instrument and crash instrument, while has been added the support aspect the MIPS framework in the kdump mechanism of existing kernel, make LINUX operating system on the Loongson server that operates in domestic MIPS framework when system crash, realize the Kernel Panic dump, the field data of dump can make the reason of fast and accurately positioning system collapse of kernel development personnel when using above-mentioned tool analysis system crash, has improved efficient and practicality that the kernel development personnel carry out the kernel debugging.
Those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and be carried out by calculation element, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
Although the disclosed embodiment of the present invention as above, the embodiment that described content just adopts for the ease of understanding the present invention is not to limit the present invention.Technician in any the technical field of the invention; under the prerequisite that does not break away from the disclosed spirit and scope of the present invention; can do any modification and variation in the details that reaches of implementing in form; but scope of patent protection of the present invention still must be as the criterion with the scope that appending claims was defined.

Claims (10)

1. the disposal route based on the (SuSE) Linux OS collapse dump of Loongson server is characterized in that, comprising:
When described Loongson server powers up start, the boot kernel program Transfer Parameters of the input-output system of described Loongson server also starts the system kernel of described (SuSE) Linux OS, wherein, described parameter comprises the parameter of the seizure kernel setup storage space that is used to described (SuSE) Linux OS;
Open the collapse dump service of described (SuSE) Linux OS, described collapse dump service is loaded on image file and the start-up parameter of described seizure kernel in the described storage space;
When collapse occured for the system kernel of described (SuSE) Linux OS, the processor of carrying out described (SuSE) Linux OS was controlled described seizure kernel and is started, and the memory mirror that the described system kernel that collapses will occur is stored as dump file;
Restart the system kernel of the (SuSE) Linux OS on the described Loongson server, described dump file analyzed, with before finding once the system kernel of described (SuSE) Linux OS the reason of collapse occurs, wherein,
The system kernel that operates in the described (SuSE) Linux OS on the described Loongson server is supported described collapse dump service, and the seizure kernel of described (SuSE) Linux OS is supported the MIPS framework of described Loongson server.
2. disposal route according to claim 1 is characterized in that,
When the system kernel to described (SuSE) Linux OS compiles, in the kernel option, select CONFIG_KEXEC_CRASH, CONFIG_KEXEC, CONFIG_SYSFS and CONFIG_DEBUG_INFO option, so that the system kernel of described (SuSE) Linux OS is supported described collapse dump service.
3. disposal route according to claim 2 is characterized in that,
When the seizure kernel to described (SuSE) Linux OS compiles, in the kernel option, remove CONFIG_NUMA and CONFIG_SMP option, add the CONFIG_CRASH_DUMP option, so that the seizure kernel of described (SuSE) Linux OS is supported the MIPS framework of described Loongson server.
4. disposal route according to claim 3 is characterized in that, described collapse dump service is loaded on the image file of described seizure kernel and start-up parameter in the step in the described storage space, further may further comprise the steps:
Described collapse dump service routine execution/etc/init.d/kdump script;
Described/etc/init.d/kdump script is carried out the kexec order of kexec instrument, image file and the start-up parameter of described seizure kernel is loaded in the described storage space,
Wherein, described kexec order comprises four kexec sections, and these four sections are respectively: the image file section of crash kernel; Pass to crash kernel order line message segment; Standard kernel memory information content, vmcoreinfo file, each processor message segment; Deposit the backup area segments of Backup Data;
Wherein, the region of memory tabulation part of the operating system of obtaining current operation in the kexec instrument is made amendment, and according to the internal memory physical address layout of the MIPS framework of described Loongson server the start address of described four kexec sections is made amendment.
5. disposal route according to claim 3 is characterized in that, when collapse occured for the system kernel of described (SuSE) Linux OS, the processor of carrying out described (SuSE) Linux OS was controlled in the step that described seizure kernel starts and further be may further comprise the steps,
Described (SuSE) Linux OS enters in the collapse handling procedure of described system kernel, and described processor is controlled other processor shut-down operations on the described Loongson server;
Described processor is for starting described seizure kernel warning order line parameter and environmental variance parameter;
Described processor is based on described command line parameter and described environmental variance parameter, and the first address that jumps to described seizure kernel is sentenced the described seizure kernel of startup.
6. disposal route according to claim 5 is characterized in that,
Carry out the processor of collapse handling procedure by look-at-me between other processor sending processors on described Loongson server, so that other processor shut-down operations on the described Loongson server.
7. disposal route according to claim 5 is characterized in that, in the step of described processor for the described seizure kernel warning order line parameter of startup and environmental variance parameter,
The command line parameter that described processor transmits when described environmental variance parameter and described kexec order is loaded the image file of described seizure kernel copies in the memory headroom of reserving among the head.s, has then prepared to start the Status Flag set of described seizure kernel to being used for expression.
8. disposal route according to claim 7 is characterized in that, the memory mirror that the described system kernel of collapse occurs is stored as in the step of dump file, further may further comprise the steps,
After described seizure kernel starts, carry out vmcore_init initialization function;
Described vmcore_init initialization function judges whether the elfcorehdr parameter value in the described command line parameter is ELFCORE_ADDR_ERR or ELFCORE_ADDR_MAX,
ELFCORE_ADDR_ERR neither ELFCORE_ADDR_MAX if judged result is described elfcorehdr parameter value, establishment/proc/vmcore file then, and the memory mirror of described system kernel that collapse will occur with the ELF stored in file format in described/proc/vmcore file;
Open collapse dump service routine calling the makedumpfile order of makedumpfile instrument, with preserved the described system kernel that collapse occurs memory mirror /proc/vmcore file generated dump file is with as dump file.
9. each described disposal route in 8 according to claim 1 is characterized in that,
In described makedumpfile instrument, added the interface function for the MIPS framework of described Loongson server, comprising 3 interfaces, be respectively and obtain the physical base address interface, obtain the address of discontinuous memory field first element of chained list in the collapse kernel and the start address interface of discontinuous memory field, the translation interface that the virtual address when reading the collapse kernel information arrives physical address.
10. according to claim 1 to 8 each described disposal routes, it is characterized in that,
By the crash instrument described dump file is analyzed, with before finding once the system kernel of described (SuSE) Linux OS the reason of collapse occurs,
Wherein, in described crash instrument, added the interface function for the MIPS framework of described Loongson server, comprising 3 interfaces, the virtual address translation that is respectively the collapse kernel is the physical address interface, all hardware of MIPS framework machine is carried out the page directory item pointer interface that initialization arranges interface, obtains the appointment process.
CN201210437092.0A 2012-11-05 2012-11-05 Method for treating crash dump of Linux operation system based on loongson server Active CN102929747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210437092.0A CN102929747B (en) 2012-11-05 2012-11-05 Method for treating crash dump of Linux operation system based on loongson server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210437092.0A CN102929747B (en) 2012-11-05 2012-11-05 Method for treating crash dump of Linux operation system based on loongson server

Publications (2)

Publication Number Publication Date
CN102929747A true CN102929747A (en) 2013-02-13
CN102929747B CN102929747B (en) 2015-07-01

Family

ID=47644553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210437092.0A Active CN102929747B (en) 2012-11-05 2012-11-05 Method for treating crash dump of Linux operation system based on loongson server

Country Status (1)

Country Link
CN (1) CN102929747B (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226510A (en) * 2013-04-27 2013-07-31 华为技术有限公司 Method and device for analyzing vmcore file
CN103927240A (en) * 2014-05-06 2014-07-16 成都西加云杉科技有限公司 Information dumping method and device answering to software breakdown
CN104750605A (en) * 2013-12-30 2015-07-01 伊姆西公司 Method for including kernel object information in user dump
CN105160001A (en) * 2015-09-09 2015-12-16 山东省计算中心(国家超级计算济南中心) Physical memory mirror image document analysis method of Linux system
CN105242981A (en) * 2015-10-30 2016-01-13 浪潮电子信息产业股份有限公司 Configuration method of Kdump and computer device
CN105843705A (en) * 2016-03-22 2016-08-10 青岛海信移动通信技术股份有限公司 Mobile communication terminal and memory dumping method thereof
CN106293499A (en) * 2015-06-12 2017-01-04 联想(北京)有限公司 A kind of file acquisition method and baseboard management controller, basic input output system
CN106339285A (en) * 2016-08-19 2017-01-18 浪潮电子信息产业股份有限公司 Analysis method for accidental restart of LINUX system
CN106776090A (en) * 2016-11-29 2017-05-31 郑州云海信息技术有限公司 A kind of method for collecting information when RHEL operating systems are without response
WO2017148271A1 (en) * 2016-03-04 2017-09-08 中兴通讯股份有限公司 Linux system reset processing method and device, and computer storage medium
CN107357684A (en) * 2017-07-07 2017-11-17 郑州云海信息技术有限公司 A kind of kernel failure method for restarting and device
CN107368384A (en) * 2017-07-21 2017-11-21 郑州云海信息技术有限公司 A kind of Linux server abnormal information dump system and method
CN107506638A (en) * 2017-08-09 2017-12-22 南京大学 A kind of kernel controlling stream method for detecting abnormality based on hardware mechanisms
CN108073507A (en) * 2016-11-17 2018-05-25 联芯科技有限公司 A kind of processing method and processing device of Kernel Panic field data
CN108228260A (en) * 2018-01-02 2018-06-29 联想(北京)有限公司 Kernel switching method and electronic equipment
CN108334462A (en) * 2018-03-05 2018-07-27 山东超越数控电子股份有限公司 A kind of optical channel card implementation method based on milky way kylin operating system
CN108920215A (en) * 2018-07-18 2018-11-30 郑州云海信息技术有限公司 A method of passing through initramfs collection system log
CN109582542A (en) * 2018-12-04 2019-04-05 中国航空工业集团公司西安航空计算技术研究所 A kind of method of core of embedded system dump
CN109597677A (en) * 2018-12-07 2019-04-09 北京百度网讯科技有限公司 Method and apparatus for handling information
CN110083477A (en) * 2019-05-06 2019-08-02 深圳市智微智能科技开发有限公司 Guarantee the method for memory mapping when a kind of Kernel Panic
CN110262918A (en) * 2019-06-19 2019-09-20 深圳市网心科技有限公司 Process collapses analysis method and device, distributed apparatus and storage medium
CN110647451A (en) * 2019-08-30 2020-01-03 深圳壹账通智能科技有限公司 Application program abnormity analysis method and generation method
CN110673974A (en) * 2019-08-20 2020-01-10 中科创达软件股份有限公司 System debugging method and device
CN111124488A (en) * 2019-12-11 2020-05-08 山东超越数控电子股份有限公司 Debian system transplanting method based on Loongson processor
CN111459716A (en) * 2020-03-02 2020-07-28 天津众达智腾科技有限公司 Kernel backup loading mode based on domestic processor
CN112395137A (en) * 2021-01-21 2021-02-23 北京太一星晨信息技术有限公司 Linux kernel exception processing method, equipment and device
CN112650610A (en) * 2020-12-11 2021-04-13 苏州浪潮智能科技有限公司 Linux system crash control method, system and medium
CN113127263A (en) * 2020-01-15 2021-07-16 中移(苏州)软件技术有限公司 Kernel crash recovery method, device, equipment and storage medium
CN113326213A (en) * 2021-05-24 2021-08-31 北京计算机技术及应用研究所 Method for realizing address mapping in driver under Feiteng server platform
CN113434150A (en) * 2021-08-30 2021-09-24 麒麟软件有限公司 Linux kernel crash information positioning method
US11556349B2 (en) 2020-03-04 2023-01-17 International Business Machines Corporation Booting a secondary operating system kernel with reclaimed primary kernel memory
CN116775501A (en) * 2023-08-25 2023-09-19 荣耀终端有限公司 Software testing method, server, readable storage medium and chip system
CN117931608A (en) * 2024-03-14 2024-04-26 麒麟软件有限公司 Method and device for counting file cache occupation in vmcore and storage medium
CN117931608B (en) * 2024-03-14 2024-07-05 麒麟软件有限公司 Method and device for counting file cache occupation in vmcore and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050389A1 (en) * 2003-08-25 2005-03-03 Chaurasia Rajesh Kumar Method of and apparatus for cross-platform core dumping during dynamic binary translation
CN101324850A (en) * 2007-06-12 2008-12-17 中兴通讯股份有限公司 LINUX inner core dynamic loading method
CN101820356A (en) * 2010-02-06 2010-09-01 大连大学 Network fault diagnosis system based on ARM-Linux
US7818616B2 (en) * 2007-07-25 2010-10-19 Cisco Technology, Inc. Warm reboot enabled kernel dumper
US20110004780A1 (en) * 2009-07-06 2011-01-06 Yutaka Hirata Server system and crash dump collection method
CN201774541U (en) * 2010-02-06 2011-03-23 大连大学 Portable network fault diagnostic device
CN102270173A (en) * 2011-07-21 2011-12-07 哈尔滨工业大学 Fault injection tool based on SCSI (small computer system interface) driver layer

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050389A1 (en) * 2003-08-25 2005-03-03 Chaurasia Rajesh Kumar Method of and apparatus for cross-platform core dumping during dynamic binary translation
CN101324850A (en) * 2007-06-12 2008-12-17 中兴通讯股份有限公司 LINUX inner core dynamic loading method
US7818616B2 (en) * 2007-07-25 2010-10-19 Cisco Technology, Inc. Warm reboot enabled kernel dumper
US20110004780A1 (en) * 2009-07-06 2011-01-06 Yutaka Hirata Server system and crash dump collection method
CN101820356A (en) * 2010-02-06 2010-09-01 大连大学 Network fault diagnosis system based on ARM-Linux
CN201774541U (en) * 2010-02-06 2011-03-23 大连大学 Portable network fault diagnostic device
CN102270173A (en) * 2011-07-21 2011-12-07 哈尔滨工业大学 Fault injection tool based on SCSI (small computer system interface) driver layer

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226510B (en) * 2013-04-27 2015-09-30 华为技术有限公司 Resolve the method and apparatus of vmcore file
CN103226510A (en) * 2013-04-27 2013-07-31 华为技术有限公司 Method and device for analyzing vmcore file
CN104750605A (en) * 2013-12-30 2015-07-01 伊姆西公司 Method for including kernel object information in user dump
CN103927240A (en) * 2014-05-06 2014-07-16 成都西加云杉科技有限公司 Information dumping method and device answering to software breakdown
CN106293499B (en) * 2015-06-12 2019-08-27 联想(北京)有限公司 A kind of file acquisition method and baseboard management controller, basic input output system
CN106293499A (en) * 2015-06-12 2017-01-04 联想(北京)有限公司 A kind of file acquisition method and baseboard management controller, basic input output system
CN105160001B (en) * 2015-09-09 2017-03-08 山东省计算中心(国家超级计算济南中心) A kind of linux system physical memory image file analysis method
CN105160001A (en) * 2015-09-09 2015-12-16 山东省计算中心(国家超级计算济南中心) Physical memory mirror image document analysis method of Linux system
CN105242981A (en) * 2015-10-30 2016-01-13 浪潮电子信息产业股份有限公司 Configuration method of Kdump and computer device
WO2017148271A1 (en) * 2016-03-04 2017-09-08 中兴通讯股份有限公司 Linux system reset processing method and device, and computer storage medium
CN105843705A (en) * 2016-03-22 2016-08-10 青岛海信移动通信技术股份有限公司 Mobile communication terminal and memory dumping method thereof
CN106339285A (en) * 2016-08-19 2017-01-18 浪潮电子信息产业股份有限公司 Analysis method for accidental restart of LINUX system
CN108073507A (en) * 2016-11-17 2018-05-25 联芯科技有限公司 A kind of processing method and processing device of Kernel Panic field data
CN106776090A (en) * 2016-11-29 2017-05-31 郑州云海信息技术有限公司 A kind of method for collecting information when RHEL operating systems are without response
CN107357684A (en) * 2017-07-07 2017-11-17 郑州云海信息技术有限公司 A kind of kernel failure method for restarting and device
CN107368384A (en) * 2017-07-21 2017-11-21 郑州云海信息技术有限公司 A kind of Linux server abnormal information dump system and method
CN107506638B (en) * 2017-08-09 2020-10-16 南京大学 Kernel control flow abnormity detection method based on hardware mechanism
CN107506638A (en) * 2017-08-09 2017-12-22 南京大学 A kind of kernel controlling stream method for detecting abnormality based on hardware mechanisms
CN108228260A (en) * 2018-01-02 2018-06-29 联想(北京)有限公司 Kernel switching method and electronic equipment
CN108334462A (en) * 2018-03-05 2018-07-27 山东超越数控电子股份有限公司 A kind of optical channel card implementation method based on milky way kylin operating system
CN108920215A (en) * 2018-07-18 2018-11-30 郑州云海信息技术有限公司 A method of passing through initramfs collection system log
CN109582542A (en) * 2018-12-04 2019-04-05 中国航空工业集团公司西安航空计算技术研究所 A kind of method of core of embedded system dump
CN109582542B (en) * 2018-12-04 2023-02-21 中国航空工业集团公司西安航空计算技术研究所 Method for dumping core of embedded system
CN109597677A (en) * 2018-12-07 2019-04-09 北京百度网讯科技有限公司 Method and apparatus for handling information
CN109597677B (en) * 2018-12-07 2020-05-22 北京百度网讯科技有限公司 Method and apparatus for processing information
US11392461B2 (en) 2018-12-07 2022-07-19 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for processing information
CN110083477A (en) * 2019-05-06 2019-08-02 深圳市智微智能科技开发有限公司 Guarantee the method for memory mapping when a kind of Kernel Panic
CN110262918A (en) * 2019-06-19 2019-09-20 深圳市网心科技有限公司 Process collapses analysis method and device, distributed apparatus and storage medium
CN110262918B (en) * 2019-06-19 2023-07-18 深圳市网心科技有限公司 Process crash analysis method and device, distributed equipment and storage medium
CN110673974A (en) * 2019-08-20 2020-01-10 中科创达软件股份有限公司 System debugging method and device
CN110647451A (en) * 2019-08-30 2020-01-03 深圳壹账通智能科技有限公司 Application program abnormity analysis method and generation method
CN111124488A (en) * 2019-12-11 2020-05-08 山东超越数控电子股份有限公司 Debian system transplanting method based on Loongson processor
CN113127263A (en) * 2020-01-15 2021-07-16 中移(苏州)软件技术有限公司 Kernel crash recovery method, device, equipment and storage medium
CN111459716A (en) * 2020-03-02 2020-07-28 天津众达智腾科技有限公司 Kernel backup loading mode based on domestic processor
US11556349B2 (en) 2020-03-04 2023-01-17 International Business Machines Corporation Booting a secondary operating system kernel with reclaimed primary kernel memory
CN112650610B (en) * 2020-12-11 2023-01-10 苏州浪潮智能科技有限公司 Linux system crash control method, system and medium
CN112650610A (en) * 2020-12-11 2021-04-13 苏州浪潮智能科技有限公司 Linux system crash control method, system and medium
CN112395137A (en) * 2021-01-21 2021-02-23 北京太一星晨信息技术有限公司 Linux kernel exception processing method, equipment and device
CN113326213A (en) * 2021-05-24 2021-08-31 北京计算机技术及应用研究所 Method for realizing address mapping in driver under Feiteng server platform
CN113326213B (en) * 2021-05-24 2023-07-28 北京计算机技术及应用研究所 Method for realizing address mapping in driver under Feiteng server platform
CN113434150A (en) * 2021-08-30 2021-09-24 麒麟软件有限公司 Linux kernel crash information positioning method
CN113434150B (en) * 2021-08-30 2021-12-17 麒麟软件有限公司 Linux kernel crash information positioning method
CN116775501A (en) * 2023-08-25 2023-09-19 荣耀终端有限公司 Software testing method, server, readable storage medium and chip system
CN116775501B (en) * 2023-08-25 2023-12-12 荣耀终端有限公司 Software testing method, server, readable storage medium and chip system
CN117931608A (en) * 2024-03-14 2024-04-26 麒麟软件有限公司 Method and device for counting file cache occupation in vmcore and storage medium
CN117931608B (en) * 2024-03-14 2024-07-05 麒麟软件有限公司 Method and device for counting file cache occupation in vmcore and storage medium

Also Published As

Publication number Publication date
CN102929747B (en) 2015-07-01

Similar Documents

Publication Publication Date Title
CN102929747A (en) Method for treating crash dump of Linux operation system based on loongson server
US7574627B2 (en) Memory dump method, memory dump program and computer system
US7516361B2 (en) Method for automatic checkpoint of system and application software
US9158628B2 (en) Bios failover update with service processor having direct serial peripheral interface (SPI) access
US20040172578A1 (en) Method and system of operating system recovery
CN104254840A (en) Memory dump and analysis in a computer system
TW201502764A (en) Specialized boot path for speeding up resume from sleep state
US20070288532A1 (en) Method of updating an executable file for a redundant system with old and new files assured
US8489933B2 (en) Data processing device and method for memory dump collection
CN110413432B (en) Information processing method, electronic equipment and storage medium
JP2005301639A (en) Method and program for handling os failure
US9575827B2 (en) Memory management program, memory management method, and memory management device
JP2007133544A (en) Failure information analysis method and its implementation device
CN103019706A (en) Method and device for processing startup item
CN114020340B (en) Server system and data processing method thereof
JP4759941B2 (en) Boot image providing system and method, boot node device, boot server device, and program
US8468388B2 (en) Restoring programs after operating system failure
JP6599725B2 (en) Information processing apparatus, log management method, and computer program
CN113806139A (en) Operating system recovery method, operating system recovery device, storage medium and computer program product
CN116679992A (en) Information processing method and device, electronic equipment and storage medium
CN109634782B (en) Method and device for detecting system robustness, storage medium and terminal
CN107168815B (en) Method for collecting hardware error information
US8667335B2 (en) Information processing apparatus and method for acquiring information for hung-up cause investigation
CN102455919A (en) Automatic optimization setting method for basic input output system(BIOS)
US11334419B1 (en) Information handling system fault analysis with remote remediation file system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant