WO2017124704A1 - 日志内容的显示方法及装置 - Google Patents

日志内容的显示方法及装置 Download PDF

Info

Publication number
WO2017124704A1
WO2017124704A1 PCT/CN2016/088920 CN2016088920W WO2017124704A1 WO 2017124704 A1 WO2017124704 A1 WO 2017124704A1 CN 2016088920 W CN2016088920 W CN 2016088920W WO 2017124704 A1 WO2017124704 A1 WO 2017124704A1
Authority
WO
WIPO (PCT)
Prior art keywords
error event
log file
log
error
management platform
Prior art date
Application number
PCT/CN2016/088920
Other languages
English (en)
French (fr)
Inventor
朱芳蕾
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017124704A1 publication Critical patent/WO2017124704A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Definitions

  • This application relates to, but is not limited to, the field of cloud computing technology.
  • OpenStack cloud computing management platform also known as: OpenStack
  • IaaS Infrastructure as a Service
  • OpenStack is multi-node and multi-component collaboration
  • the cause of the failure can usually be located only through the display results and logs of the command; while the OpenStack log is a sub-node sub-component record, and when a certain amount is reached, Will be compressed into gz file archives, so when OpenStack fails to operate, you need a wealth of expertise and skilled Linux operating system skills, in order to find possible reasons for operation failure from the scattered log across the physical nodes. .
  • nova show command Take the relatively simple virtual machine deployment failure reasoning as an example.
  • the usual practice is to use the nova show command to view the name of the compute node where the virtual machine is located according to the virtual machine account (Identification, ID: ID), and use the nova hypervisor-list command and The nova hypervisor-show command finds the Internet Protocol (IP) address of the corresponding compute node. If the nova show command indicates that the compute node where the VM is located is not empty, the Secure Shell protocol (Secure Shell, SSH for short) is used.
  • IP Internet Protocol
  • the present invention provides a method and a device for displaying log content, so as to solve the problem that the reason for finding errors in the log files of the cloud computing management platform in the related art is low.
  • a method for displaying log content including:
  • the log content characterizing the error event in the log file associated with the error event is displayed.
  • the searching for the log file associated with the error event according to the preset rule and the trace path of the log file includes:
  • a log file of components associated with the type of component of the error event, and a log file of components associated with the type of error event are searched.
  • the searching for the log file associated with the error event according to the preset rule and the trace path of the log file includes:
  • a target log file containing the predetermined error key in the specified log file is determined as a log file associated with the error event.
  • the method before the searching for the log file associated with the error event according to the preset rule and the trace path of the log file, the method further includes:
  • the preset rule includes an association rule between the multiple components and/or an association rule between the multiple error events and the component respectively.
  • the displaying, in the log file associated with the error event, a log content characterizing the error event including:
  • Parsing a log file associated with the error event to obtain a log cause including the error event and a log content of the corresponding solution, and displaying the error cause including the error event and the log content of the corresponding solution; and / or,
  • the log content characterizing the error event in the log file associated with the error event is marked and displayed.
  • a display device for log content comprising:
  • Obtaining a module configured to: obtain state information for identifying a state of each component on the cloud computing management platform, and generate a log file traceback path corresponding to each of the state information;
  • a search module configured to: when an error event occurs in the cloud computing management platform, search and the error event according to a preset rule and a trace path of the log file acquired by the acquiring module Associated log file;
  • a display module configured to: display, in the log file associated with the error event that is searched by the search module, a log content that represents the error event.
  • the searching module includes:
  • the parsing unit is configured to: parse the error event to obtain a type of the component in which the error event occurs, and a type of the error event, wherein the type of the component of the cloud computing management platform comprises one or more of the following: a control node, a computing node, a network, a storage device, and the type of the error event includes one or more of the following: a service failure, a component failure;
  • a first search unit configured to: search a log file of a component associated with a type of a component of the error event parsed by the parsing unit, and a type of the error event parsed by the parsing unit The log file for the component.
  • the searching module includes:
  • An obtaining unit configured to: obtain a system time of the cloud computing management platform when the error event occurs;
  • a second search unit configured to: search for a specified log file with the same time as the system time obtained by the obtaining unit, where the timestamp is a time when the cloud computing management platform writes a log file;
  • a determining unit configured to: determine, in the specified log file searched by the second searching unit, a target log file containing a predetermined error keyword as a log file associated with the error event.
  • the device further includes:
  • a setting module configured to: after the search module searches for a log file associated with the error event according to the preset rule and the trace path of the log file, set a plurality of components on the cloud computing management platform And an association rule between the plurality of error events and the component on the cloud computing management platform; wherein the preset rule includes an association rule between the plurality of components and/or Or the association rules between the multiple error events and the components respectively.
  • the display module includes:
  • a first display unit configured to: parse the searched by the search module and the error event
  • the associated log file obtains the error content including the error event and the log content of the corresponding solution, and displays the error cause including the error event and the log content of the corresponding solution; and/or,
  • the second display unit is configured to: display the log content representing the error event in the log file associated with the error event searched by the search module, and display the log content.
  • the method and device for displaying the log content obtained by the embodiment of the present invention obtain state information for identifying the state of each component on the cloud computing management platform, and generate a log file traceback path corresponding to each state information, in the cloud computing
  • the log file associated with the error event is searched according to the preset rule and the trace path of the log file, and then the log content representing the error event in the log file associated with the error event is displayed;
  • the embodiment of the invention solves the problem that the reason for finding the error in the log file of the cloud computing management platform is low, and realizes the association between the error event and the log file, thereby improving the reason for finding the error when the cloud computing management platform has an error. effectiveness.
  • FIG. 1 is a flowchart of a method for displaying log content according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another method for displaying log content according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of still another method for displaying log content according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a display device for log content according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another display device for displaying log content according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of still another display device for log content according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of still another display device for log content according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a display device for log content according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of an OpenStack multi-node deployment in a log content display apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a module interaction in a display device for log content according to an embodiment of the present invention.
  • FIG. 11 is a flowchart of performing a detection operation by a display device for log content according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of association analysis of a log analysis module in a display device for displaying log content according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for displaying log content according to an embodiment of the present invention. As shown in FIG. 1 , the method for displaying the log content provided in this embodiment includes the following steps, that is, steps 110 to 130:
  • Step 110 Obtain status information for identifying a status of each component on the cloud computing management platform, and generate a traceback path of the log file corresponding to each status information.
  • the cloud computing management platform OpenStack is deployed through one or more nodes, and the node includes a control node and a computing node, and the control node can also join a high availability (High Availablity, HA for short) control.
  • the node is further divided into an active node (ie, an active node) and a standby node, and the nodes are connected through a network, and each type of information of the cloud computing management platform is saved by the storage device, and the above are all belong to the cloud computing management platform.
  • each component's active state can include creation, end, associated events between components, etc., such as components, networks, storage devices, magnetic array connections, etc., will generate corresponding state information, and generate log files, these logs Files can be saved locally to the corresponding component or uploaded to the associated device.
  • the log file generated by each component on the cloud computing management platform or the saved log file. Since the status information of a component is associated with other components or the state of the component itself, the log file can be traced according to the state information of the component. A path that represents the associated path from the component's state information to other associated components or other structures within the component.
  • Step 120 When an error event occurs on the cloud computing management platform, search for a log file associated with the error event according to the preset rule and the trace path of the log file.
  • an error event occurs in a component of the cloud computing management platform, such as component failure, connection timeout, etc.
  • the component will actively report an error event or an alarm message, and the cloud computing management platform may actively monitor the work of each component according to a certain period.
  • the state when the cloud computing management platform has an error event, it is necessary to find the cause of the error and repair the error fault as soon as possible, so as to reduce the problem caused by the fault and improve the user experience.
  • This embodiment is based on the preset rules in the cloud computing management platform.
  • Searching for the log file associated with the error event in the log file and finding the cause of the error greatly reduces the time for obtaining the error cause of the error event by manual analysis; in an optional implementation manner of the embodiment of the present invention, the preset rule The trace path of the log file can be included.
  • step 130 the log content representing the error event in the log file associated with the error event is displayed.
  • the log content representing the error event in the log file is displayed, for example, the log file or the duplicate log content may not be displayed, and the log content may be
  • the analysis directly generates the error cause of the error event and the corresponding solution, so that the user can find the cause of the error in the shortest time.
  • the method for displaying the log content obtains the state information for identifying the state of each component on the cloud computing management platform, and generates a traceback path of the log file corresponding to each state information, in the cloud computing management platform.
  • the log file associated with the error event is searched according to a preset rule, and then the log content representing the error event in the log file associated with the error event is displayed; this embodiment solves the related art in the cloud computing
  • the problem of finding the cause of the error in the log file of the management platform is low, and the association between the error event and the log file is realized, thereby improving the efficiency of finding the cause of the error when an error occurs in the cloud computing management platform.
  • the implementation manner of the log file associated with the error event is searched according to the preset rule and the trace path of the log file, for example, by using the following two methods: a strong association search mode
  • the weak association search method can be applied to different scenarios separately, or can be applied at the same time to improve the accuracy of the search.
  • Step 120 may include the following steps, namely, step 121 to step 122:
  • Step 121 Parse the error event to obtain the type of the component in which the error event occurs, and the type of the error event, where the type of the component of the cloud computing management platform includes one or more of the following: a control node, a computing node, a network, a storage device,
  • the type of error event includes one or more of the following: service failure, component failure.
  • the error event can be parsed to obtain the type of the error event and the component type of the error event, that is, the type of the component object; in the actual application, the error object keyword is detected according to the error event, firstly by strong association.
  • Judging the type of the component object for example, determining which component, such as the control node, the computing node, the network, and the storage device, can further trace back after parsing the component type, for example, parsing the object of the computing node,
  • the virtual machine is still an image or the like, and is an active node or a standby node of the control node.
  • it can also parse the type of the error event, whether it is a service failure or a component failure.
  • Step 122 searching for a log file of components associated with the type of component of the error event, and a log file of the component associated with the type of error event.
  • the error object is a virtual machine, you need to find all the compute nodes; for example, the error event prompt is related to the underlying libvirt, then convert to the instance id of libvirt according to the object ID, query the libvirt and qemu related logs; for example, the error event prompts for the network interface
  • the binding fails, the neutron agent status is queried. If the neutron agent status is incorrect, the corresponding rule may be used to find the possible cause of the neutron agent being incorrect. This is traced step by step until the final possible cause is found. If the wrong object is a cloud
  • the hard disk determines whether the magnetic array returns an error in the log file of the cloud hard disk.
  • the error information is queried according to the error code, and the cloud hard disk ID can be converted into the virtual disk ID on the magnetic array, and the magnetic code is registered. Array related status. If the error object is of another type, the log is traced back according to the corresponding business logic.
  • Step 120 may include the following steps: Step 123 to Step 125:
  • Step 123 Obtain a system time of the cloud computing management platform when an error event occurs.
  • the system time can be the time of the control node of the cloud computing management platform, or the system time of the component.
  • Step 124 Search for a specified log file with the same timestamp as the system time, where the timestamp is the time when the cloud computing management platform writes the log file.
  • the timestamp may also be the time when the cloud computing management platform creates or generates a log file according to the principle of synchronizing with the occurrence of the error event.
  • step 125 the target log file containing the predetermined error key in the specified log file is determined as the log file associated with the error event.
  • the predetermined error keyword may be the words error, fail, etc. in the component log file, search and extract the log file by time and predetermined words, and then find the log content related to the reason for the failure in the extracted log file.
  • the method before searching for the log file associated with the error event according to the preset rule and the trace path of the log file, that is, before step 120, the method may further include: setting multiple on the cloud computing management platform.
  • the association rule between the components; and/or the association rule between the multiple error events and the components on the cloud computing management platform; and the preset rules in the embodiment of the present invention may include one or more of the following: Association rules between multiple components, multiple error events, and association rules between components.
  • an association rule for searching for an associated log file including an association rule between multiple components and/or an association rule between multiple error events and components, by setting an association rule. Then, once the type of the error event and the type of the component in which the error event occurred are resolved, the search can be performed according to the preset rule.
  • management rules can include:
  • Node log set the association of each service node in OpenStack to itself, the service is abnormal, may be the cause of the failure of the service node itself; set the physical machine to the physical machine network, the central processing unit (Central Processing Unit, CPU for short),
  • the association of the memory, etc. involves the abnormality of the physical machine itself, which may be the failure of the network, the CPU, the memory, etc.; setting the failure of the cloud disk-related failure to the association of the magnetic array service port, and the failure related to the cloud disk may be related to the state of the magnetic disk service port; Set the system error to the system's residual error resource association.
  • the residual error resource may reflect the abnormality of the system for a certain period of time, or it may be the cause of the failure of this operation; set the component to the configuration item association, the component failure may be Related to the configuration file, the corresponding configuration item check; set Associated member to receive a command, error events may OpenStack components executing related commands, OpenStack OpenStack states associated with the transmitted command, when searching the associated log file, the log file can be traced sequentially in the above order of association.
  • a preset rule for setting parameters of the analysis log file may also be set, for example, including: setting an identifier of each component object, characterizing a log file of the error event, and querying at most The number of log files, whether to query the archived log files, whether to display only the meaningful part of the log, and so on.
  • the implementation of displaying the log content of the error event in the log file associated with the error event may be implemented in the following two manners:
  • Mode 1 parsing the log file associated with the error event to obtain the error cause including the error event and the log content of the corresponding solution, and displaying the error cause including the error event and the log content of the corresponding solution.
  • the log file may be processed and displayed.
  • the process may include: parsing the log file according to the parsing rule, and the parsing rule includes an event error.
  • the reason and solution description for example, if the log content of the virtual machine creation failure is "No valid host was found", according to the error reason of the log analysis, the computing node service is unavailable, the result can be displayed as: "Because of the calculation The node service is unavailable, causing the virtual machine to fail to create.”
  • the log content indicating the error event in the log file associated with the error event is marked and displayed.
  • the log file can be directly displayed, for example, it can include: determining the similarity between each log and the previous log. If the preset threshold is exceeded, such as 90%, it is considered to be a duplicate log, and the repeated part is only displayed once, and the rest is "" is replaced; if the log contains the words error, fail, etc., it is displayed in red, indicating that special attention is required. Others such as operation steps and status are displayed in orange, and the duplicate log is displayed in yellow, indicating that it can be ignored.
  • the fixed format part such as req-id
  • req-id can be truncated and not displayed, only the meaningful part of the log can be displayed, and the complete log can be displayed, that is, the duplicate log and the req-id and other fixed format parts of the log are displayed as they are.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium (such as ROM/RAM, disk).
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods of various embodiments of the present invention.
  • a display device for the log content is provided, and the device is used to implement the foregoing embodiments and optional implementation manners, and details are not described herein.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • FIG. 4 is a schematic structural diagram of a display device for log content according to an embodiment of the present invention.
  • the display device of the log content provided by this embodiment includes: an obtaining module 10, a searching module 20, and a display module 30.
  • the obtaining module 10 is configured to: obtain each component used to identify the cloud computing management platform. State information of the state, and generate a log file traceback path corresponding to each state information;
  • the search module 20 is configured to: when an error event occurs in the cloud computing management platform, search for a log file associated with the error event according to the preset rule and the trace path of the log file acquired by the obtaining module 10;
  • the display module 30 is configured to display the log content representing the error event in the log file associated with the error event searched by the search module 20.
  • FIG. 5 it is a schematic structural diagram of another display device for log content according to an embodiment of the present invention.
  • the search module 20 in the display device of the log content provided by the embodiment may include a parsing unit 21 and a first search unit 22.
  • the parsing unit 21 is configured to: parse the error event to obtain the type of the component in which the error event occurs, and the type of the error event, wherein the type of the component of the cloud computing management platform includes one or more of the following: a control node, a computing node , network, storage device, the type of error event includes one or more of the following: service failure, component failure;
  • the first search unit 22 is configured to: search a log file of a component associated with the type of the component of the error event parsed by the parsing unit 21, and a log file of the component associated with the type of the error event parsed by the parsing unit 22 .
  • FIG. 6 is a schematic structural diagram of another display device for displaying log content according to an embodiment of the present invention.
  • the search module 20 in the display device of the log content provided by the embodiment may include: an obtaining unit 23, a second searching unit 24, and a determining unit 25, wherein
  • the obtaining unit 23 is configured to: acquire a system time of the cloud computing management platform when an error event occurs;
  • the second search unit 24 is configured to: search for a specified log file with the same time as the system time acquired by the obtaining unit 23, where the timestamp is the time when the cloud computing management platform writes the log file;
  • the determining unit 25 is configured to determine the target log file containing the predetermined error key in the specified log file searched by the second search unit 24 as the log file associated with the error event.
  • FIG. 7 is a schematic structural diagram of another display device for displaying log content according to an embodiment of the present invention.
  • the display device of the log content provided in this embodiment may further include:
  • the setting module 40 is configured to: before the search module 20 searches for a log file associated with the error event according to the preset rule and the trace path of the log file, set an association rule between the multiple components on the cloud computing management platform; and/or And setting an association rule between the multiple error events and the component on the cloud computing management platform; wherein the preset rule includes an association rule between the multiple components and/or an association rule between the multiple error events and the component respectively .
  • the display module 30 in the display device of the log content provided by the embodiment may include: a first display unit 31 and a second display unit 31, wherein
  • the first display unit 31 is configured to: parse the log file associated with the error event searched by the search module 20 to obtain the error cause including the error event and the log content of the corresponding solution, and include the error cause and corresponding solution of the error event.
  • the log content of the scheme is displayed;
  • the second display unit 32 is configured to display the log content indicating the error event in the log file associated with the error event searched by the search module 20 and display the log content.
  • the first display unit 31 and the second display unit 32 may be simultaneously disposed in the display module 30, or only one of them may be disposed in the display module 30.
  • FIG. 9 is a schematic diagram of an OpenStack multi-node deployment in a log content display apparatus according to an embodiment of the present invention.
  • OpenStack multi-node deployment can include multiple control nodes. Network, storage and other components are distributed on each control node. Each control node is added to HA control (divided into active node and standby node). There are multiple compute nodes. The analysis entry of the device in this embodiment is the IP address of the node where the keystone is located.
  • FIG. 10 is a schematic diagram of a module interaction in a display device for log content according to an embodiment of the present invention.
  • the parameter configuration module 51 includes a parameter configuration module 51, a state detection module 52, a log analysis module 53, and a log display module 54.
  • the parameter configuration module 51 is a precondition for the operation of all other modules, and the output of the state detection module 52.
  • the log analysis module 53 is provided as a reference factor for further query, and the output of the log analysis module 53 is provided to the log display module 54 for optimized display output.
  • the functions of the above modules are as follows:
  • the parameter configuration module 51 is configured to: set some parameters of the log analysis, and set the parameters.
  • the content can include the object keyword to be queried, the maximum number of logs to be queried, whether to query the archived log file, whether to display only the meaningful part of the log, etc.
  • the status detecting module 52 is configured to: detect the status of each service, network, magnetic array connection, etc. in the OpenStack, and provide the result to the log analysis module 53 as a reference factor of the log trace path; according to the needs of the log analysis module, The configuration item is checked, and the corresponding OpenStack command is executed.
  • the log analysis module 53 and the built-in business logic association rule are set to: according to the error object keyword, combined with the state detection result, first determine the object type (virtual machine, cloud hard disk, mirror image, or other) by strong association, and convert the ID.
  • the keyword is the internal ID corresponding to each component, and then according to the ID keyword and the same request ID, the hierarchical search query is performed according to the specific business logic; secondly, the weakly associated manner is used to query the component logs in the same time period. Error, fail, etc., extract the part of the log that may be related to the reason for the failure.
  • the log display module 54 the built-in log content parsing rule is set to: if the log content has a corresponding parsing rule, the error is generated according to the parsed error reason and the suggested solution; otherwise, the result log is directly displayed, including fail or The log of the error word is displayed in red, and the rest is displayed in orange as the operation steps, status, etc., and the log that is basically repeated with the previous one can be compressed and displayed in yellow, and the same part is replaced by "".
  • the parameter configuration module 51 may further set an OpenStack keystone node IP, an object keyword, such as an object ID of an OpenStack operation failure, or a req-id of one request, or an error message, or a time point.
  • an object keyword such as an object ID of an OpenStack operation failure, or a req-id of one request, or an error message, or a time point.
  • the maximum number of logs to be queried If the failure has just occurred, the number of entries can be reduced to speed up the query. Whether to query the archived log files. If the failure has just occurred, you can not query the archived log files to speed up the query.
  • FIG. 11 is a flowchart of performing a detection operation by a display device for log content according to an embodiment of the present invention.
  • the detection process may include a Japanese step, that is, steps 901 to 906:
  • Step 901 Obtain an endpoint service IP of each component on the OpenStack.
  • Step 902 Obtain all computing node IPs on the OpenStack.
  • Step 903 Obtain all service states on the OpenStack.
  • Step 904 Obtain a state of a physical machine network, a CPU, a memory, and the like on the OpenStack.
  • Step 905 Obtain a state of a magnetic array service port on the OpenStack.
  • Step 906 Obtain whether the OpenStack system has residual error resources.
  • the state detecting module 52 in this embodiment checks the overall running state of the OpenStack. If an abnormality is found, it can be used as a reference factor for the next log query path.
  • the status detection content may include:
  • the failure related to the cloud disk may be related to this;
  • the residual error resources may reflect the abnormality of the system for a certain period of time, or may be one of the reasons for the failure of this operation;
  • the log analysis result may be related to some OpenStack status, execute the corresponding OpenStack command and obtain the result.
  • FIG. 12 is a schematic diagram of association analysis of a log analysis module in a log content display apparatus according to an embodiment of the present invention.
  • the log analysis module 53 of the embodiment has a built-in service logic association rule, and the association rule includes The relationship between objects and objects, errors and objects, for example, virtual machines may be related to libvirt, virtual machines may be related to nova-compute service status, virtual machines may be related to network, virtual machines may be related to storage, and connections are rejected.
  • Connection rejection may be related to the maximum number of connections to the database
  • the cloud hard disk may be related to the magnetic array connection status
  • the cloud hard disk may be related to the magnetic array error code
  • the cloud hard disk may be related to the local device path
  • the network interface binding failure may be related to the neutron agent status
  • the neutron agent status may be related to the configuration file, the neutron agent status may be related to the number of neutron server instances, and so on.
  • the built-in business logic association rules also include parsing of some error return codes, such as the error code of the magnetic array.
  • the log analysis module 53 of this embodiment has two types of association analysis methods, which can be used at the same time.
  • the first type is a strong association, which is associated and extracted by keyword and specific business logic. For example, if it is a virtual machine ID, the corresponding libvirt instance name is queried, and then the libvirt instance name is used to query the libvirt log;
  • the second category is weak association, which is associated with the regular logic according to the time point. For example, when a virtual machine creation failure error occurs, the log associations containing the words error, fail, exception, traceback, etc. in all components at the same time point are taken out, because It is possible that an error in another component caused the virtual machine to fail to create.
  • the log analysis module 53 of the embodiment when using the strong association analysis, first obtains the corresponding request ID from the log according to the object ID, and expands the query keyword list to increase the request ID; secondly, determines the object type, according to the built-in business logic association rule. , step by step association lookup. For example, if the error object is a virtual machine and needs to find all compute nodes, for example, the error event prompt is related to the underlying libvirt, then the id of the libvirt and qemu related logs is converted according to the object ID converted to the instance id of libvirt; for example, the error event prompt is If the network interface fails to be bound, the neutron agent status is queried.
  • the error object is a cloud hard disk
  • the log analysis module 53 may input a req-id of the request, or an error message, or a time point when the explicit object ID is not found, and the log of the same req-id is obtained, and Logs for the same time period. For cases where no strong conclusions are reached in the strong correlation analysis, the weak correlation analysis will continue, and the relevant logs will be output for manual analysis.
  • the log analysis module 53 of this embodiment may determine the date according to the input of the parameter configuration module. If the failed operation has just occurred, you can configure the number of bars to be faster to speed up the query; according to the parameter configuration module input, if you need to check the archived logs, query in all compressed archived gz log files. .
  • the log display module 54 has a built-in log content parsing rule, and the parsing rule includes a description of the cause and solution of some explicit errors, for example, if the error information of the virtual machine creation fails is “No valid host was found”, according to the log.
  • the error caused by the analysis is that the compute node service is unavailable, and the result is: the virtual machine creation fails because the compute node service is unavailable.
  • the result log outputted by the log analysis module is directly displayed: if the similarity between each log and the previous log is more than 90%, it is considered to be a duplicate log, and the repeated part is displayed as... If the log contains errors, fail, etc., it will be displayed in red, indicating that special attention is required.
  • the duplicate log is displayed in yellow, indicating that it can be ignored.
  • the fixed format part of the log such as req- The id can be truncated and not displayed. Only the meaningful part of the log is displayed. According to the input of the parameter configuration module, the complete log can also be displayed. That is, the duplicate log and the req-id and other log fixed format parts are all displayed as they are.
  • the foregoing modules in this embodiment may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the foregoing modules Located in multiple processors.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • S1 Obtain status information for identifying a status of each component on the cloud computing management platform, and generate a log file corresponding to each status information;
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a read-only memory (ROM), and a random access memory (Random Access).
  • Memory referred to as: RAM
  • mobile hard disk disk or optical disk, and other media that can store program code.
  • the processor performs, according to the stored program code in the storage medium, acquiring state information for identifying a state of each component on the cloud computing management platform and generating a log file corresponding to the state information.
  • the processor performs, according to the stored program code in the storage medium, when the error event occurs on the cloud computing management platform, searches for the log associated with the error event according to the preset rule and the trace path of the log file. file.
  • the processor displays, according to the stored program code in the storage medium, a log content that represents an error event in the log file associated with the error event.
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
  • the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the embodiment of the present invention obtains state information for identifying the state of each component on the cloud computing management platform, and generates a log file traceback path corresponding to each state information.
  • the trace path of the rule and the log file searches for a log file associated with the error event, and then displays the log content characterizing the error event in the log file associated with the error event; the embodiment of the present invention solves the related art in the cloud
  • the problem of finding the cause of the error in the log file of the computing management platform is low, and the association between the error event and the log file is realized, thereby improving the efficiency of finding the cause of the error when an error occurs in the cloud computing management platform.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种日志内容的显示方法及装置,其中,该方法包括:获取设置为:标识云计算管理平台上每个组件的状态的状态信息,并生成与每个状态信息对应的日志文件追溯路径;在云计算管理平台出现错误事件时,根据预设规则和日志文件的追溯路径搜索与该错误事件相关联的日志文件;将与该错误事件相关联的日志文件中表征错误事件的日志内容进行显示。

Description

日志内容的显示方法及装置 技术领域
本申请涉及但不限于云计算技术领域。
背景技术
OpenStack云计算管理平台(又称为:OpenStack)是基础设施即服务(Infrastructure as a Service,简称为:IaaS)的云计算解决方案,以OpenStack的完全开源和易拓展的特点得到业界越来越多的关注。
由于OpenStack是多节点,多组件协作,当OpenStack出现操作失败时,通常只能通过命令的显示结果和日志来定位失败的原因;而OpenStack的日志是分节点分组件记录,并且当达到一定量时会压缩为gz文件存档,因此当OpenStack出现操作失败的时候,需要有丰富的专业知识和熟练的Linux操作***使用技巧,才能从跨物理节点的,分散杂乱的日志中查找到操作失败的可能原因。
以相对简单的虚拟机部署失败原因查找为例,通常的做法是:用nova show命令根据虚拟机账号(Identification,简称为:ID)查看虚拟机所在计算节点的名称、用nova hypervisor-list命令和nova hypervisor-show命令查找相应计算节点的互联网协议(Internet Protocol,简称为:IP)地址,如果nova show命令显示虚拟机所在计算节点不为空,则通过安全外壳协议(Secure Shell,简称为:SSH)到相应计算节点上,根据虚拟机ID在nova-compute.log中查找日志;如果nova-compute.log中显示与libvirt有关,则使用nova show命令查找虚拟机的libvirt instance name,并使用virsh命令查看和在libvirt.log中进行查询,如果nova show命令显示虚拟机所在计算节点为空,则在控制节点nova-scheduler.log中查找日志,如果是cell环境,需要分别到apicell(顶层cell)和各子cell上查找,如果nova show命令显示结果为空,则到控制节点上nova-api.log和nova-conductor.log中查找日志。
相关技术在查找日志时存在以下缺陷:需要根据OpenStack中的命令显示 结果来决定去哪个节点哪个日志文、件中查找,这需要有较丰富的专业知识、需要经过多次OpenStack命令与日志查找才能最终得到结果,在日志文件中查找时,常用的tail、grep等linux命令只能进行简单提取和筛选,通常要加入很多的人眼判断;并且,当一个操作失败的结果,可能是底层或其它组件的原因导致时,日志中不一定会直接反映这个结果,需要通过反复查询比对同时段的各组件日志来发现可能的失败原因,当节点数较多、或日志文件可能已经压缩为gz文件存档时,查找的工作量将大大增加。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
针对相关技术在云计算管理平台出现错误时,查找错误原因的工作量大而导致效率低的技术问题,相关技术中尚未提出有效的解决方法。
本文提供了一种日志内容的显示方法及装置,以解决相关技术中在云计算管理平台的日志文件中查找错误原因效率低的问题。
一种日志内容的显示方法,包括:
获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个所述状态信息对应的日志文件的追溯路径;
在所述云计算管理平台出现错误事件时,根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件;
将与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示。
可选地,所述根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件,包括:
解析所述错误事件得到发生所述错误事件的组件的类型,以及所述错误事件的类型,其中,所述云计算管理平台的组件的类型包括以下一项或多项:控制节点、计算节点、网络、存储设备,所述错误事件的类型包括以下一项或多项:服务失败、组件故障;
搜索与所述错误事件的组件的类型相关联的组件的日志文件,以及与所述错误事件的类型相关联的组件的日志文件。
可选地,所述根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件,包括:
获取发生所述错误事件时所述云计算管理平台的***时间;
搜索时间戳与所述***时间相同的指定日志文件,其中,所述时间戳为所述云计算管理平台写入日志文件的时间;
将所述指定日志文件中包含预定错误关键字的目标日志文件确定为与所述错误事件相关联的日志文件。
可选地,所述根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件之前,所述方法还包括:
设置所述云计算管理平台上多个组件之间的关联规则;和/或,设置所述云计算管理平台上多个错误事件分别与组件之间的关联规则;
其中,所述预设规则包括所述多个组件之间的关联规则和/或所述多个错误事件分别与组件之间的关联规则。
可选地,所述将与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示,包括:
解析与所述错误事件相关联的日志文件得到包括所述错误事件的错误原因和对应解决方案的日志内容,并将包括所述错误事件的错误原因和对应解决方案的所述日志内容进行显示;和/或,
将与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行标注后进行显示。
一种日志内容的显示装置,包括:
获取模块,设置为:获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个所述状态信息对应的日志文件追溯路径;
搜索模块,设置为:在所述云计算管理平台出现错误事件时,根据预设规则和所述获取模块获取到的所述日志文件的追溯路径搜索与所述错误事件 相关联的日志文件;
显示模块,设置为:将所述搜索模块搜索到的与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示。
可选地,所述搜索模块包括:
解析单元,设置为:解析所述错误事件得到发生所述错误事件的组件的类型,以及所述错误事件的类型,其中,所述云计算管理平台的组件的类型包括以下一项或多项:控制节点、计算节点、网络、存储设备,所述错误事件的类型包括以下一项或多项:服务失败、组件故障;
第一搜索单元,设置为:搜索与所述解析单元解析得到的所述错误事件的组件的类型相关联的组件的日志文件,以及与所述解析单元解析得到的所述错误事件的类型相关联的组件的日志文件。
可选地,所述搜索模块包括:
获取单元,设置为:获取发生所述错误事件时所述云计算管理平台的***时间;
第二搜索单元,设置为:搜索时间戳与所述获取单元获取到的所述***时间相同的指定日志文件,其中,所述时间戳为所述云计算管理平台写入日志文件的时间;
确定单元,设置为:将所述第二搜索单元搜索到的所述指定日志文件中包含预定错误关键字的目标日志文件确定为与所述错误事件相关联的日志文件。
可选地,所述装置还包括:
设置模块,设置为:在所述搜索模块根据所述预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件之前,设置所述云计算管理平台上多个组件之间的关联规则;和/或,设置所述云计算管理平台上多个错误事件分别与组件之间的关联规则;其中,所述预设规则包括所述多个组件之间的关联规则和/或所述多个错误事件分别与组件之间的关联规则。
可选地,所述显示模块包括:
第一显示单元,设置为:解析所述搜索模块搜索到的与所述错误事件相 关联的日志文件得到包括所述错误事件的错误原因和对应解决方案的日志内容,并将包括所述错误事件的错误原因和对应解决方案的所述日志内容进行显示;和/或,
第二显示单元,设置为:将所述搜索模块搜索到的与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行标注后进行显示。
本发明实施例提供的日志内容的显示方法及装置,通过获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个状态信息对应的日志文件追溯路径,在云计算管理平台出现错误事件时,根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件,然后将与错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示;本发明实施例解决了相关技术中在云计算管理平台的日志文件中查找错误原因效率低的问题,实现了错误事件与日志文件的关联,从而提高了在云计算管理平台出现错误时查找错误原因的效率。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1为本发明实施例提供的一种日志内容的显示方法的流程图;
图2为本发明实施例提供的另一种日志内容的显示方法的流程图;
图3为本发明实施例提供的又一种日志内容的显示方法的流程图;
图4为本发明实施例提供的一种日志内容的显示装置的结构示意图;
图5为本发明实施例提供的另一种日志内容的显示装置的结构示意图;
图6为本发明实施例提供的又一种日志内容的显示装置的结构示意图;
图7为本发明实施例提供的再一种日志内容的显示装置的结构示意图;
图8为本发明实施例提供的还一种日志内容的显示装置的结构示意图;
图9为本发明实施例提供的日志内容的显示装置中一种OpenStack多节点部署示意图;
图10为本发明实施例提供的日志内容的显示装置中一种模块交互的示意 图;
图11为本发明实施例提供的日志内容的显示装置执行检测操作的流程图;
图12为本发明实施例的提供的日志内容的显示装置中一种日志分析模块关联分析的示意图。
本发明的实施方式
下文中将结合附图对本发明的实施方式进行详细说明。需要说明的是,在不冲突的情况下,本文中的实施例及实施例中的特征可以相互任意组合。
在附图的流程图示出的步骤可以在诸根据一组计算机可执行指令的计算机***中执行。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
需要说明的是,本发明实施例的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
在本实施例中提供了一种日志内容的显示方法,图1为本发明实施例提供的一种日志内容的显示方法的流程图。如图1所示,本实施例提供的日志内容的显示方法包括如下步骤,即步骤110~步骤130:
步骤110,获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个状态信息对应的日志文件的追溯路径。
在本实施例中,云计算管理平台OpenStack通过一个或多个节点进行部署,而节点又包括控制节点和计算节点,控制节点还可以加入高可用性(High Availablity,简称为:HA)控制,这样控制节点又分为活动节点(即主用节点)和备用节点,节点之间通过网络进行通信连接,云计算管理平台的每种类型的信息通过存储设备进行保存,上述都是属于云计算管理平台的组件,每个组件的活动状态可以包括创建、结束、组件之间的关联事件等,例如组件、网络、存储设备的磁阵连接等状态,都会产生对应的状态信息,并生成日志文件,这些日志文件可以保存在对应的组件本地,也可以上传到关联的 设备。
获取云计算管理平台上每个组件生成的日志文件或者已经保存的日志文件,由于某个组件的状态信息与其他组件或者该组件自身的状态关联,还可以根据组件的状态信息得到日志文件的追溯路径,该追溯路径表示从该组件的状态信息到相关联的其他组件或者该组件内的其他结构的关联路径。
步骤120,在云计算管理平台出现错误事件时,根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件。
可选地,在云计算管理平台的组件出现错误事件时,例如组件故障、连接超时等,组件会主动上报错误事件或者报警信息,云计算管理平台也可以按照一定周期主动监测每个组件的工作状态,当云计算管理平台出现错误事件时,就需要尽快找到错误原因并修复错误故障,以减少故障带来的问题,提升用户的使用体验,本实施例根据预设规则在云计算管理平台的日志文件中搜索与错误事件相关联日志文件并查找错误原因,大大减少了通过人工分析来得到错误事件的错误原因的时间;在本发明实施例的一种可选地实现方式中,预设规则中可以包括日志文件的追溯路径。
步骤130,将与错误事件相关联的日志文件中表征错误事件的日志内容进行显示。
可选地,在搜索到与错误事件相关联的日志文件后,将日志文件中表征错误事件的日志内容进行显示,例如,可以不用显示全部的日志文件或者重复的日志内容,还可以对日志内容进行解析,直接生成错误事件的错误原因和对应的解决方案,以使用户可以在最短的时间找到错误的原因。
本实施例提供的日志内容的显示方法,通过获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个状态信息对应的日志文件的追溯路径,在云计算管理平台出现错误事件时,根据预设规则搜索与错误事件相关联的日志文件,然后将与错误事件相关联的日志文件中表征错误事件的日志内容进行显示;本实施例解决了相关技术中在云计算管理平台的日志文件中查找错误原因效率低的问题,实现了错误事件与日志文件的关联,从而提高了在云计算管理平台出现错误时查找错误原因的效率。
可选地,在本发明实施例中,根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件的实现方式,例如可以通过以下两种方式来实现,分别是强关联搜索方式和弱关联搜索方式,可以分别适用不同的场景,也可以同时适用,提高搜索的精确度。
在本发明的一个可选地实施例中,如图2所示,为本发明实施例提供的另一种日志内容的显示方法的流程图。在图1所示实施例的基础上,对步骤120的实现方式进行详细说明,本实施例采用强关联的搜索方式,则步骤120可以包括如下步骤,即步骤121~步骤122:
步骤121,解析错误事件得到发生错误事件的组件的类型,以及错误事件的类型,其中,云计算管理平台的组件的类型包括以下一项或多项:控制节点、计算节点、网络、存储设备,错误事件的类型包括以下一项或多项:服务失败、组件故障。
可选地,可以解析错误事件,得到错误事件的类型和发生错误事件的组件类型,即组件对象的类型;在实际应用中,根据错误事件中出错对象关键字进行检测,首先以强关联的方式,判断组件对象的类型,例如,判断是何种组件,如控制节点、计算节点、网络、存储设备,在解析到组件类型后,还可以进行进一步的追溯,例如,解析计算节点的对象,是虚拟机还是镜像等,是控制节点的活动节点还是备用节点,另外,还可以解析错误事件的类型,是服务失败还是组件故障等。
步骤122,搜索与错误事件的组件的类型相关联的组件的日志文件,以及与错误事件的类型相关联的组件的日志文件。
在搜索相关联的日志文件的过程中,可以搜索与组件的类型相关联的组件的日志文件,以及与错误事件的类型相关联的组件的日志文件,进行逐级关联查找,下面通过一个应用示例进行说明:
如果错误对象是虚拟机,需要查找所有计算节点;例如,错误事件提示与底层libvirt有关,则根据对象ID转换成libvirt的instance id,查询libvirt和qemu相关日志;再例如,错误事件提示为网络接口绑定失败,则查询neutron agent状态,如neutron agent状态不正确,则按对应的规则查找neutron agent不正确的可能原因,如此逐级追溯,直到找到最终的可能原因。如果错误对象是云 硬盘,则在云硬盘的日志文件中判断是否磁阵返回的错误,如是磁阵返回的错误,根据错误码查询出错信息,并且可以把云硬盘ID转换成磁阵上的虚拟盘ID,登录磁阵查询相关状态。如果错误对象是其它类型,同理根据相应的业务逻辑进行日志追溯。
在本发明的一个可选地实施例中,如图3所示,为本发明实施例提供的又一种日志内容的显示方法的流程图。在图1所示实施例的基础上,对步骤120的实现方式进行详细说明,本实施例采用弱关联的搜索方式,则步骤120可以包括如下步骤,即步骤123~步骤125:
步骤123,获取发生错误事件时云计算管理平台的***时间。
***时间可以是云计算管理平台的控制节点的时间,也可以是组件的***时间。
步骤124,搜索时间戳与***时间相同的指定日志文件,其中,该时间戳为云计算管理平台写入日志文件的时间。
可选地,按照与发生错误事件同步的原则,时间戳也可以是该云计算管理平台创建或者生成日志文件的时间。
步骤125,将指定日志文件中包含预定错误关键字的目标日志文件确定为与错误事件相关联的日志文件。
预定错误关键字可以是组件日志文件中的error、fail等字样,通过时间和预定的字样搜索和提取日志文件,然后在提取的日志文件中查找出与失败原因相关的日志内容。
可选地,在本发明实施例中,在根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件之前,即步骤120之前,还可以包括:设置云计算管理平台上多个组件之间的关联规则;和/或,设置云计算管理平台上多个错误事件分别与组件之间的关联规则;另外,本发明实施例中的预设规则可以包括以下一项或多项:多个组件之间的关联规则、多个错误事件分别与组件之间的关联规则。
在本实施例中,设置搜索关联日志文件的关联规则,包括多个组件之间的关联规则和/或多个错误事件分别与组件之间的关联规则,通过设置关联规 则,一旦解析到错误事件的类型和发生错误事件的组件的类型,就可以按照预设规则进行搜索。在实际应用中,管理规则可以包括:
设置虚拟机失败的错误事件到计算节点的关联,涉及虚拟机的失败可能需要查询所有计算节点;设置故障与主备倒换有关的错误事件的关联,涉及主备倒换有关的故障需要分别查询主备节点日志;设置OpenStack中每个服务节点到自身的关联,服务不正常,可能是服务节点自身故障的原因;设置物理机到物理机网络、中央处理器(Central Processing Unit,简称为:CPU)、内存等的关联,涉及物理机本身的异常,可能是网络、CPU、内存等失败;设置云硬盘相关的失败到磁阵业务口的关联,云硬盘相关的失败可能与磁阵业务口状态有关;设置***错误到***残留的错误资源的关联,残留的错误资源可能反映出***某一段时间内的不正常,也可能是导致本次操作失败的原因;设置组件到配置项的关联,组件故障可能与配置文件有关,则进行相应的配置项检查;设置组件到接收命令的关联,错误事件可能与组件执行的OpenStack中命令有关,OpenStack中命令的发送与OpenStack状态有关,在搜索关联日志文件时,可以按照上述关联顺序依次追溯日志文件。
可选地,在本发明实施例中,还可以设置对分析日志文件的参数进行设置的预设规则,例如包括:设置每个组件对象的标识,表征错误事件的日志文件关键字,最多查询的日志文件条数,是否查询归档的日志文件,是否只显示日志的有意义部分等。
可选地,在本发明实施例中,将与错误事件相关联的日志文件中表征错误事件的日志内容进行显示的实现方式,例如可以通过以下两种方式实现:
方式1,解析与错误事件相关联的日志文件得到包括错误事件的错误原因和对应解决方案的日志内容,并将包括错误事件的错误原因和对应解决方案的日志内容进行显示。
在通过上述强关联的搜索方式和弱关联的搜索方式搜索到日志文件后,还可以对日志文件进行处理后进行显示,处理的过程可以包括:根据解析规则解析日志文件,解析规则包含了事件错误的原因和解决方案说明,例如,如果虚拟机创建失败的日志内容为“No valid host was found”,根据日志分析得到的错误原因为计算节点服务不可用,则可以显示结果为:“由于计算 节点服务不可用导致虚拟机创建失败”。
方式2,将与错误事件相关联的日志文件中表征错误事件的日志内容进行标注后进行显示。
可选地,如果没有解析到明确对应的错误原因,或者预设的解析规则中没有包括对应于错误事件的错误原因或解决方案。可以直接对日志文件进行优化显示,例如可以包括:判断每条日志与上一条日志的相似度,如果超过预设阈值,如90%,则认为是重复日志,重复部分只显示一次,其余部分以“…”替代;日志中如果包含error、fail等字样,用红色进行标注后显示,表示需要特别关注,其它如操作步骤、状态等以橙色显示,重复日志以黄色显示,表示一般可以忽略;日志的固定格式部分,例如req-id,可以截断不显示,只显示日志有意义的部分,也可以显示完整的日志,即重复日志和req-id等日志固定格式部分全部原样显示。
本实施例在实际应用中,在条件适用的场景下,上述方式1和方式2两种方式也可以结合使用。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。
在本发明实施例中还提供了一种日志内容的显示装置,该装置用于实现上述实施例和可选地实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。
图4为本发明实施例提供的一种日志内容的显示装置的结构示意图。本实施例提供的日志内容的显示装置包括:获取模块10、搜索模块20和显示模块30。
其中,获取模块10,设置为:获取用于标识云计算管理平台上每个组件 的状态的状态信息,并生成与每个状态信息对应的日志文件追溯路径;
搜索模块20,设置为:在云计算管理平台出现错误事件时,根据预设规则和获取模块10获取到的日志文件的追溯路径搜索与错误事件相关联的日志文件;
显示模块30,设置为:将搜索模块20搜索到的与错误事件相关联的日志文件中表征错误事件的日志内容进行显示。
可选地,如图5所示,为本发明实施例提供的另一种日志内容的显示装置的结构示意图。在图4所示装置的结构基础上,本实施例提供的日志内容的显示装置中的搜索模块20可以包括:解析单元21和第一搜索单元22。
其中,解析单元21,设置为:解析错误事件得到发生错误事件的组件的类型,以及错误事件的类型,其中,云计算管理平台的组件的类型包括以下一项或多项:控制节点、计算节点、网络、存储设备,错误事件的类型包括以下一项或多项:服务失败、组件故障;
第一搜索单元22,设置为:搜索与解析单元21解析得到的错误事件的组件的类型相关联的组件的日志文件,以及与解析单元22解析得到的错误事件的类型相关联的组件的日志文件。
可选地,如图6所示,为本发明实施例提供的又一种日志内容的显示装置的结构示意图。在图4所示装置的结构基础上,本实施例提供的日志内容的显示装置中的搜索模块20可以包括:获取单元23、第二搜索单元24和确定单元25,其中,
获取单元23,设置为:获取发生错误事件时云计算管理平台的***时间;
第二搜索单元24,设置为:搜索时间戳与获取单元23获取到的***时间相同的指定日志文件,其中,时间戳为云计算管理平台写入日志文件的时间;
确定单元25,设置为:将第二搜索单元24搜索到的指定日志文件中包含预定错误关键字的目标日志文件确定为与错误事件相关联的日志文件。
可选地,如图7所示,为本发明实施例提供的再一种日志内容的显示装置的结构示意图。在图4所示装置的结构基础上,本实施例提供的日志内容的显示装置还可以包括:
设置模块40,设置为:在搜索模块20根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件之前,设置云计算管理平台上多个组件之间的关联规则;和/或,设置云计算管理平台上多个错误事件分别与组件之间的关联规则;其中,该预设规则包括多个组件之间的关联规则和/或多个错误事件分别与组件之间的关联规则。
可选地,如图8所示,为本发明实施例提供的还一种日志内容的显示装置的结构示意图。在图4所示装置的结构基础上,本实施例提供的日志内容的显示装置中的显示模块30可以包括:第一显示单元31、第二显示单元31,其中,
第一显示单元31,设置为:解析搜索模块20搜索到的与错误事件相关联的日志文件得到包括错误事件的错误原因和对应解决方案的日志内容,并将包括错误事件的错误原因和对应解决方案的日志内容进行显示;
第二显示单元32,设置为:将搜索模块20搜索到的与错误事件相关联的日志文件中表征错误事件的日志内容进行标注后进行显示。
本实施例在实际应用中,第一显示单元31和第二显示单元32可以同时设置在显示模块30中,也可以只设置其中之一在显示模块30中。
下面结合本发明的实施例和可选实施方式进行详细说明:
图9为本发明实施例提供的日志内容的显示装置中一种OpenStack多节点部署示意图。如图9所示,OpenStack多节点部署,可以包含多个控制节点,网络、存储等组件分布在每个控制节点上,每个控制节点都加入HA控制(分为活动节点和备用节点),另有多个计算节点。本实施例的装置的分析入口,为keystone所在节点IP地址。
图10为本发明实施例提供的日志内容的显示装置中一种模块交互的示意图。如图10所示,包括:参数配置模块51、状态检测模块52、日志分析模块53、日志显示模块54,其中,参数配置模块51是其它所有模块工作的前置条件,状态检测模块52的输出提供给日志分析模块53作为进一步查询的参考因素,日志分析模块53的输出提供给日志显示模块54,进行优化显示输出。上述模块的功能如下:
参数配置模块51,设置为:对日志分析的一些参数进行设置,设置的内 容可以包括要查询的对象关键字,最多查询的日志条数,是否查询归档的日志文件,是否只显示日志的有意义部分等;
状态检测模块52,设置为:检测OpenStack中每个服务、网络、磁阵连接等方面的状态,结果提供给日志分析模块53,作为日志追溯路径的参考因素;根据日志分析模块的需要,对相关配置项进行检查,执行相应的OpenStack命令等;
日志分析模块53,内置业务逻辑关联规则,设置为:根据出错对象关键字,结合状态检测结果,首先以强关联的方式,判断对象类型(虚拟机、云硬盘、镜像,或其它),转换ID关键字为各组件对应的内部ID,然后按ID关键字,以及相同的请求ID,根据特定的业务逻辑进行逐级追溯查询;其次以弱关联的方式,按相同的时间段查询各组件日志中error、fail等字样,提取出与失败原因可能相关的日志部分。
日志显示模块54:内置日志内容解析规则,设置为:如果日志内容有相对应的解析规则,则按解析后的错误原因和建议解决方案进行显示;否则直接对结果日志进行优化显示,包含fail或error字样的日志以红色显示,其余如操作步骤、状态等以橙色显示,与上一条基本重复的日志可以压缩并以黄色显示,相同部分以“…”代替。
可选地,在本实施例中,参数配置模块51还可以设置OpenStack keystone节点IP,对象关键字,例如OpenStack操作失败的对象ID,或一次请求的req-id,或一段错误信息,或时间点,最多查询的日志条数,如果失败刚发生,条数可以取少一点,以加快查询速度,是否查询归档的日志文件,如果失败刚发生,可以不查询归档的日志文件,以加快查询速度,是否只显示日志的有意义部分,可以选择不显示OpenStack日志的固定格式部分,例如req-id等,以及以省略号来替代显示日志的重复部分,使日志显示结果更为人性化。
图11为本发明实施例提供的日志内容的显示装置执行检测操作的流程图。如图11所示,该检测流程可以包括日系步骤,即步骤901~步骤906:
步骤901,获取OpenStack上每个组件的endpoint服务IP;
步骤902,获取OpenStack上所有计算节点IP;
步骤903,获取OpenStack上的所有服务状态;
步骤904,获取OpenStack上物理机网络、CPU、内存等状态;
步骤905,获取OpenStack上磁阵业务口的状态;
步骤906,获取OpenStack的***是否有残留的错误资源。
本实施例中的状态检测模块52,对OpenStack整体运行状态进行检查,如发现有异常,可以作为下一步日志查询路径的参考因素。可选地,状态检测内容可以包括:
获取每个组件的endpoint服务IP,以决定哪个组件到哪个节点上取日志;
获取所有计算节点IP,涉及虚拟机的失败可能需要查询所有计算节点;
获取双机HA的节点IP,假如故障与主备倒换有关,需要分别查询主备节点日志;
获取OpenStack上所有服务状态,假如有服务不正常,可能是失败原因之一;
获取物理机网络、CPU、内存等状态,假如有物理机本身的异常,可能是失败原因之一;
获取磁阵业务口状态,云硬盘相关的失败可能与此有关;
获取OpenStack的***是否有残留的错误资源,残留的错误资源可能反映出***某一段时间内的不正常,也可能是导致本次操作失败的原因之一;
检查配置项是否正确,如日志分析结果可能与配置文件有关,则进行相应的配置项检查;
执行相应的OpenStack命令,如日志分析结果可能与某些OpenStack状态有关,执行相应的OpenStack命令并获取结果。
图12为本发明实施例提供的日志内容的显示装置中一种日志分析模块关联分析的示意图,如图12所示,本实施例的日志分析模块53,内置业务逻辑关联规则,关联规则包含了对象与对象、错误与对象之间的关联关系,例如:虚拟机与libvirt可能有关,虚拟机与nova-compute服务状态可能有关,虚拟机与网络可能有关,虚拟机与存储可能有关;连接拒绝与服务状态可能有关, 连接拒绝与数据库最大连接数可能有关;云硬盘与磁阵连接状态可能有关;云硬盘与磁阵错误码可能有关,云硬盘与本地设备路径可能有关;网络接口绑定失败与neutron agent状态可能有关;neutron agent状态与配置文件可能有关,neutron agent状态与neutron server实例个数可能有关,等等。内置的业务逻辑关联规则中还包含对一些错误返回码的解析,例如磁阵的错误码。
如图12所示,本实施例的日志分析模块53,其关联分析方法分为两类,可以同时使用。第一类是强关联,按关键字和特定的业务逻辑进行关联和抽取,例如,如果是虚拟机ID,则把相应的libvirt instance name查询出来,再以libvirt instance name去查询libvirt的日志;第二类是弱关联,按时间点和常规逻辑进行关联,例如当发生一个虚拟机创建失败错误时,把同一时间点所有组件中包含error、fail、exception、traceback等字样的日志关联取出,因有可能是其它组件的错误导致虚拟机创建失败。
本实施例的日志分析模块53,在使用强关联分析时,首先根据对象ID从日志中获取相应的请求ID,扩充查询关键字列表增加请求ID;其次判断对象类型,根据内置的业务逻辑关联规则,逐级关联查找。例如,如果错误对象是虚拟机,需要查找所有计算节点,例如,错误事件提示与底层libvirt有关,则根据对象ID转换成libvirt的instance id,查询libvirt和qemu相关日志;再例如,错误事件提示为网络接口绑定失败,则查询neutron agent状态,如neutron agent状态不正确,则按对应的规则查找neutron agent不正确的可能原因,如此逐级追溯,直到找到最终的可能原因。如果错误对象是云硬盘,需要判断是否磁阵返回的错误,如是磁阵返回的错误,根据错误码查询出错信息,必要时可以把云硬盘ID转换成磁阵上的虚拟盘ID,登录磁阵查询相关状态。如果错误对象是其它类型,同理根据相应的业务逻辑进行日志追溯。
可选地,日志分析模块53在没有发现明确的对象ID时,也可以输入一次请求的req-id,或一段错误信息,或一个时间点,此时主要是获取相同req-id的日志,以及相同时间段的日志。对于强关联分析中没有得出明确结论的情形,也会继续进行弱关联分析,输出相关的日志供人工分析使用。
本实施例的日志分析模块53,根据参数配置模块的输入,可以决定取日 志的条数,如果失败操作刚刚发生,可以配置条数少一点,以加快查询速度;根据参数配置模块的输入,如果需要检查已归档日志,则在所有已压缩归档的gz日志文件中进行查询。
可选地,日志显示模块54内置日志内容解析规则,解析规则包含了对一些明确错误的原因和解决方案说明,例如:如果虚拟机创建失败的错误信息为“No valid host was found”,根据日志分析得到的错误原因为计算节点服务不可用,则显示结果为:由于计算节点服务不可用导致虚拟机创建失败。若没有明确对应的错误原因,则直接对日志分析模块输出的结果日志进行优化显示:判断每条日志与上一条日志的相似度如果超过90%,则认为是重复日志,重复部分以…显示;日志中如果包含error、fail等字样,以红色显示,表示需要特别关注,其它如操作步骤、状态等以橙色显示,重复日志以黄色显示,表示一般可以忽略;日志的固定格式部分,如req-id,可以截断不显示,只显示日志有意义的部分,根据参数配置模块的输入,也可以显示完整的日志,即重复日志和req-id等日志固定格式部分全部原样显示。
在实际应用中,本实施例中的上述模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个状态信息对应的日志文件;
S2,在云计算管理平台出现错误事件时,根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件;
S3,将与错误事件相关联的日志文件中表征错误事件的日志内容进行显示。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为:ROM)、随机存取存储器(Random Access  Memory,简称为:RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行获取用于标识云计算管理平台上各个组件的状态的状态信息并生成与状态信息对应的日志文件。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行在云计算管理平台出现错误事件时,根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行将与错误事件相关联的日志文件中表征错误事件的日志内容进行显示。
可选地,本实施例中的示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(根据***、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。
上述实施例中的装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。
上述实施例中的装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。
工业实用性
本发明实施例通过获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个状态信息对应的日志文件追溯路径,在云计算管理平台出现错误事件时,根据预设规则和日志文件的追溯路径搜索与错误事件相关联的日志文件,然后将与错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示;本发明实施例解决了相关技术中在云计算管理平台的日志文件中查找错误原因效率低的问题,实现了错误事件与日志文件的关联,从而提高了在云计算管理平台出现错误时查找错误原因的效率。

Claims (10)

  1. 一种日志内容的显示方法,包括:
    获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个所述状态信息对应的日志文件的追溯路径;
    在所述云计算管理平台出现错误事件时,根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件;
    将与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示。
  2. 根据权利要求1所述的方法,其中,所述根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件,包括:
    解析所述错误事件得到发生所述错误事件的组件的类型,以及所述错误事件的类型,其中,所述云计算管理平台的组件的类型包括以下一项或多项:控制节点、计算节点、网络、存储设备,所述错误事件的类型包括以下一项或多项:服务失败、组件故障;
    搜索与所述错误事件的组件的类型相关联的组件的日志文件,以及与所述错误事件的类型相关联的组件的日志文件。
  3. 根据权利要求1所述的方法,其中,所述根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件,包括:
    获取发生所述错误事件时所述云计算管理平台的***时间;
    搜索时间戳与所述***时间相同的指定日志文件,其中,所述时间戳为所述云计算管理平台写入日志文件的时间;
    将所述指定日志文件中包含预定错误关键字的目标日志文件确定为与所述错误事件相关联的日志文件。
  4. 根据权利要求1至3任意一项所述的方法,其中,所述根据预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件之前,所述方法还包括:
    设置所述云计算管理平台上多个组件之间的关联规则;和/或,设置所述 云计算管理平台上多个错误事件分别与组件之间的关联规则;
    其中,所述预设规则包括所述多个组件之间的关联规则和/或所述多个错误事件分别与组件之间的关联规则。
  5. 根据权利要求1所述的方法,其中,所述将与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示,包括:
    解析与所述错误事件相关联的日志文件得到包括所述错误事件的错误原因和对应解决方案的日志内容,并将包括所述错误事件的错误原因和对应解决方案的所述日志内容进行显示;和/或,
    将与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行标注后进行显示。
  6. 一种日志内容的显示装置,包括:
    获取模块,设置为:获取用于标识云计算管理平台上每个组件的状态的状态信息,并生成与每个所述状态信息对应的日志文件追溯路径;
    搜索模块,设置为:在所述云计算管理平台出现错误事件时,根据预设规则和所述获取模块获取到的所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件;
    显示模块,设置为:将所述搜索模块搜索到的与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行显示。
  7. 根据权利要求6所述的装置,其中,所述搜索模块包括:
    解析单元,设置为:解析所述错误事件得到发生所述错误事件的组件的类型,以及所述错误事件的类型,其中,所述云计算管理平台的组件的类型包括以下一项或多项:控制节点、计算节点、网络、存储设备,所述错误事件的类型包括以下一项或多项:服务失败、组件故障;
    第一搜索单元,设置为:搜索与所述解析单元解析得到的所述错误事件的组件的类型相关联的组件的日志文件,以及与所述解析单元解析得到的所述错误事件的类型相关联的组件的日志文件。
  8. 根据权利要求6所述的装置,其中,所述搜索模块包括:
    获取单元,设置为:获取发生所述错误事件时所述云计算管理平台的***时间;
    第二搜索单元,设置为:搜索时间戳与所述获取单元获取到的所述***时间相同的指定日志文件,其中,所述时间戳为所述云计算管理平台写入日志文件的时间;
    确定单元,设置为:将所述第二搜索单元搜索到的所述指定日志文件中包含预定错误关键字的目标日志文件确定为与所述错误事件相关联的日志文件。
  9. 根据权利要求6至8任意一项所述的装置,还包括:
    设置模块,设置为:在所述搜索模块根据所述预设规则和所述日志文件的追溯路径搜索与所述错误事件相关联的日志文件之前,设置所述云计算管理平台上多个组件之间的关联规则;和/或,设置所述云计算管理平台上多个错误事件分别与组件之间的关联规则;其中,所述预设规则包括所述多个组件之间的关联规则和/或所述多个错误事件分别与组件之间的关联规则。
  10. 根据权利要求6所述的装置,其中,所述显示模块包括:
    第一显示单元,设置为:解析所述搜索模块搜索到的与所述错误事件相关联的日志文件得到包括所述错误事件的错误原因和对应解决方案的日志内容,并将包括所述错误事件的错误原因和对应解决方案的所述日志内容进行显示;和/或,
    第二显示单元,设置为:将所述搜索模块搜索到的与所述错误事件相关联的日志文件中表征所述错误事件的日志内容进行标注后进行显示。
PCT/CN2016/088920 2016-01-18 2016-07-06 日志内容的显示方法及装置 WO2017124704A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610032700.8A CN106980627A (zh) 2016-01-18 2016-01-18 日志内容的显示方法及装置
CN201610032700.8 2016-01-18

Publications (1)

Publication Number Publication Date
WO2017124704A1 true WO2017124704A1 (zh) 2017-07-27

Family

ID=59340344

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088920 WO2017124704A1 (zh) 2016-01-18 2016-07-06 日志内容的显示方法及装置

Country Status (2)

Country Link
CN (1) CN106980627A (zh)
WO (1) WO2017124704A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977284A (zh) * 2019-03-18 2019-07-05 深圳市活力天汇科技股份有限公司 一种机票购买失败原因的诊断方法
CN110019067A (zh) * 2017-09-26 2019-07-16 深圳市中兴微电子技术有限公司 一种日志分析方法及***
WO2019158972A1 (en) * 2018-02-15 2019-08-22 Pratik Sharma Cloud configuration triggers
CN111045848A (zh) * 2019-12-19 2020-04-21 广州唯品会信息科技有限公司 日志分析方法、终端设备及计算机可读存储介质
CN111177078A (zh) * 2019-12-18 2020-05-19 广州华多网络科技有限公司 日志处理方法、装置及电子设备
CN111581265A (zh) * 2020-06-29 2020-08-25 杭州钧钥信息科技有限公司 一种基于数据挖掘和可视化的事故关联追溯方法
CN111740868A (zh) * 2020-07-07 2020-10-02 腾讯科技(深圳)有限公司 告警数据的处理方法和装置及存储介质
CN117170984A (zh) * 2023-11-02 2023-12-05 麒麟软件有限公司 一种linux***待机状态的异常检测方法及***

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107483238A (zh) * 2017-08-04 2017-12-15 郑州云海信息技术有限公司 一种日志管理方法、集群管理节点及***
CN107741956B (zh) * 2017-09-18 2020-07-03 杭州安恒信息技术股份有限公司 一种基于web容器配置文件的日志搜索方法
CN108429636B (zh) * 2018-02-01 2021-11-23 创新先进技术有限公司 定位异常***的方法及装置和电子设备
CN108599979B (zh) * 2018-03-05 2021-05-28 京信通信***(中国)有限公司 非ha模式向ha模式转换的方法和装置
CN108989086B (zh) * 2018-06-20 2021-03-30 复旦大学 OpenStack平台中的Open vSwitch违规端口操作自动发现与追溯***
CN109120450A (zh) * 2018-08-29 2019-01-01 郑州云海信息技术有限公司 一种虚拟化管理平台中neutron网络异常处理的方法及装置
CN109299921A (zh) * 2018-09-30 2019-02-01 深圳市英威腾电动汽车驱动技术有限公司 一种技术评审数据处理方法及相关装置
CN109783330B (zh) * 2018-12-10 2023-04-07 京东科技控股股份有限公司 日志处理方法、显示方法和相关装置、***
CN110245035A (zh) * 2019-05-20 2019-09-17 平安普惠企业管理有限公司 一种链路跟踪方法及装置
CN111008093A (zh) * 2019-12-22 2020-04-14 北京浪潮数据技术有限公司 一种故障日志查询方法、装置、设备及介质
CN111106965B (zh) * 2019-12-25 2023-04-07 浪潮商用机器有限公司 用于复杂***的日志智能分析方法、工具、设备及介质
CN113297026B (zh) * 2020-06-28 2022-06-07 阿里巴巴集团控股有限公司 对象检测方法、装置、电子设备及计算机可读存储介质
CN111930625B (zh) * 2020-08-12 2024-01-30 中国工商银行股份有限公司 基于云服务平台的日志获取方法、装置及***
CN112631873B (zh) * 2020-12-30 2023-11-21 平安证券股份有限公司 日志监控方法、装置、计算机设备和存储介质
CN113190401B (zh) * 2021-04-19 2023-08-25 Oppo广东移动通信有限公司 快游戏的异常监控方法、电子设备、移动终端以及存储介质
CN113535654B (zh) * 2021-06-11 2023-10-31 安徽安恒数智信息技术有限公司 日志处理方法、***、电子装置和存储介质
CN113992574B (zh) * 2021-09-30 2023-04-25 济南浪潮数据技术有限公司 一种设置路由器绑定节点优先级方法、***及设备
CN116755992B (zh) * 2023-08-17 2023-12-01 青岛民航凯亚***集成有限公司 一种基于OpenStack云计算的日志分析方法及***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036745A (zh) * 2012-12-21 2013-04-10 北京邮电大学 云计算中一种基于神经网络的异常检测***
CN103227734A (zh) * 2013-04-27 2013-07-31 华南理工大学 一种OpenStack云平台异常的检测方法
CN103475535A (zh) * 2013-08-23 2013-12-25 汉柏科技有限公司 云计算服务器日志管理***
WO2015187001A2 (en) * 2014-06-04 2015-12-10 Mimos Berhad System and method for managing resources failure using fast cause and effect analysis in a cloud computing system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143008A (zh) * 2010-01-29 2011-08-03 国际商业机器公司 用于数据中心的诊断故障事件的方法及装置
CN103902604B (zh) * 2012-12-28 2020-11-10 Ge医疗***环球技术有限公司 用于搜索并显示分散日志的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036745A (zh) * 2012-12-21 2013-04-10 北京邮电大学 云计算中一种基于神经网络的异常检测***
CN103227734A (zh) * 2013-04-27 2013-07-31 华南理工大学 一种OpenStack云平台异常的检测方法
CN103475535A (zh) * 2013-08-23 2013-12-25 汉柏科技有限公司 云计算服务器日志管理***
WO2015187001A2 (en) * 2014-06-04 2015-12-10 Mimos Berhad System and method for managing resources failure using fast cause and effect analysis in a cloud computing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIU, ZONGDA: "Request Tracing and Anomalies Detecting System in Cloud", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE, CHINA MASTER'S THESES FULL-TEXT DATABASE, no. 06, 15 June 2014 (2014-06-15), pages 8 - 10, 13, 13-16 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019067A (zh) * 2017-09-26 2019-07-16 深圳市中兴微电子技术有限公司 一种日志分析方法及***
CN110019067B (zh) * 2017-09-26 2023-05-30 深圳市中兴微电子技术有限公司 一种日志分析方法及***
WO2019158972A1 (en) * 2018-02-15 2019-08-22 Pratik Sharma Cloud configuration triggers
CN109977284A (zh) * 2019-03-18 2019-07-05 深圳市活力天汇科技股份有限公司 一种机票购买失败原因的诊断方法
CN111177078A (zh) * 2019-12-18 2020-05-19 广州华多网络科技有限公司 日志处理方法、装置及电子设备
CN111045848A (zh) * 2019-12-19 2020-04-21 广州唯品会信息科技有限公司 日志分析方法、终端设备及计算机可读存储介质
CN111045848B (zh) * 2019-12-19 2024-04-19 广州唯品会信息科技有限公司 日志分析方法、终端设备及计算机可读存储介质
CN111581265A (zh) * 2020-06-29 2020-08-25 杭州钧钥信息科技有限公司 一种基于数据挖掘和可视化的事故关联追溯方法
CN111740868A (zh) * 2020-07-07 2020-10-02 腾讯科技(深圳)有限公司 告警数据的处理方法和装置及存储介质
CN111740868B (zh) * 2020-07-07 2023-12-15 腾讯科技(深圳)有限公司 告警数据的处理方法和装置及存储介质
CN117170984A (zh) * 2023-11-02 2023-12-05 麒麟软件有限公司 一种linux***待机状态的异常检测方法及***
CN117170984B (zh) * 2023-11-02 2024-01-30 麒麟软件有限公司 一种linux***待机状态的异常检测方法及***

Also Published As

Publication number Publication date
CN106980627A (zh) 2017-07-25

Similar Documents

Publication Publication Date Title
WO2017124704A1 (zh) 日志内容的显示方法及装置
CN110245078B (zh) 一种软件的压力测试方法、装置、存储介质和服务器
US20190286510A1 (en) Automatic correlation of dynamic system events within computing devices
US9727407B2 (en) Log analytics for problem diagnosis
CN110928772B (zh) 一种测试方法及装置
US9294338B2 (en) Management computer and method for root cause analysis
Lou et al. Mining dependency in distributed systems through unstructured logs analysis
US7747742B2 (en) Online predicate checking for distributed systems
EP3616066B1 (en) Human-readable, language-independent stack trace summary generation
Vallee et al. A framework for proactive fault tolerance
US9122784B2 (en) Isolation of problems in a virtual environment
US20170034001A1 (en) Isolation of problems in a virtual environment
CN111858254B (zh) 数据的处理方法、装置、计算设备和介质
US10795793B1 (en) Method and system for simulating system failures using domain-specific language constructs
Jiang et al. Ranking the importance of alerts for problem determination in large computer systems
Stearley et al. Bridging the gaps: Joining information sources with splunk
Chen et al. Using runtime paths for macroanalysis
WO2016095716A1 (zh) 一种故障信息处理方法与相关装置
CN113835918A (zh) 一种服务器故障分析方法及装置
US10778550B2 (en) Programmatically diagnosing a software defined network
KR102415027B1 (ko) 대규모 클라우드 데이터 센터 자율 운영을 위한 백업 복구 방법
US9354962B1 (en) Memory dump file collection and analysis using analysis server and cloud knowledge base
CN105786865B (zh) 一种检索***故障分析方法及装置
US20160371324A1 (en) System for observing and analyzing configurations using dynamic tags and queries
Bhatia et al. Efficient failure diagnosis of OpenStack using Tempest

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16885965

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16885965

Country of ref document: EP

Kind code of ref document: A1