WO2020048047A1 - ***故障的预警方法、装置、设备及存储介质 - Google Patents

***故障的预警方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020048047A1
WO2020048047A1 PCT/CN2018/122807 CN2018122807W WO2020048047A1 WO 2020048047 A1 WO2020048047 A1 WO 2020048047A1 CN 2018122807 W CN2018122807 W CN 2018122807W WO 2020048047 A1 WO2020048047 A1 WO 2020048047A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameters
early warning
parameter
monitored
system failure
Prior art date
Application number
PCT/CN2018/122807
Other languages
English (en)
French (fr)
Inventor
王伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020048047A1 publication Critical patent/WO2020048047A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system

Definitions

  • the present application relates to the field of computer communications, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for early warning of system failure.
  • a system failure refers to a state in which the system is unable to perform specified functions, or performance degradation does not meet specified requirements.
  • faults may occur. Therefore, fault prediction becomes one of the effective ways to provide system reliability.
  • the existing fault alarm method can only alarm after a fault occurs, that is, when the system parameters exceed a set threshold, thereby giving a fault prompt message, so that the operation and maintenance personnel can take corresponding measures to resolve the fault through manual intervention. For example, migrate the faulty node, replace the faulty node, and so on. However, when the system parameters have exceeded the set threshold, the system abnormality problem has occurred, so the existing failure prediction method cannot achieve the purpose of early warning.
  • the main purpose of this application is to provide a method, device, equipment and computer-readable storage medium for early warning of system failure, which aims to solve the technical problems that the existing failure prediction technology requires manual verification and cannot be early-warned.
  • the present application provides a method for early warning of a system failure.
  • the method for early warning of a system failure includes the following steps:
  • the preset time unit obtain the current parameters to be monitored of the target system in the current time unit;
  • an alarm message is generated according to the abnormal parameter in the current parameter to be monitored and the standard parameter, and the alarm message is reported.
  • the step of obtaining standard parameters in a preset monitoring model and judging whether there is an abnormal parameter in the current parameter to be monitored according to the standard parameters includes:
  • the step of obtaining standard parameters in a preset monitoring model and judging whether there is an abnormal parameter in the current parameter to be monitored according to the standard parameters includes:
  • the method before the step of obtaining the current parameter to be monitored of the target system in the current time unit according to a preset time unit, the method further includes:
  • the abnormal monitoring parameters and the normal monitoring parameters in the parameters to be monitored are classified and stored, and the preset monitoring model is generated by training according to the abnormal monitoring parameters and the normal monitoring parameters.
  • ,Also includes:
  • generating an early warning message according to the abnormal parameter in the current parameter to be monitored and the standard parameter, and reporting the early warning message further includes:
  • an emergency warning message is generated according to the abnormal parameters of the continuous abnormality, and the emergency warning message is sent to a management end.
  • the method further includes:
  • the corresponding target repair strategy is searched in a preset policy library according to the existing anomaly parameters, and abnormality processing is performed according to the target repair strategies.
  • system failure early warning device which is characterized in that the system failure early warning device includes:
  • a parameter acquisition module configured to acquire the current parameters to be monitored of the target system in the current time unit according to a preset time unit
  • a parameter monitoring module configured to obtain standard parameters in a preset monitoring model, and determine whether there are abnormal parameters in the current parameters to be monitored according to the standard parameters;
  • the abnormality early warning module is configured to generate an early warning message according to the abnormal parameter in the current to-be-monitored parameter and the standard parameter if it is determined that the abnormal parameter exists in the current to-be-monitored parameter, and report the early-warning message.
  • the present application also provides a system failure early warning device.
  • the system failure early warning device includes a processor, a memory, and a system failure stored in the memory and executable by the processor.
  • the early-warning program of the method wherein when the early-warning program of the system failure is executed by the processor, the steps of the early-warning method of the system failure are implemented.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores an early-warning program for system failure, and the early-warning program for system failure is implemented when the early-warning program for system failure is executed by a processor.
  • the application provides a method for early warning of a system failure, that is, obtaining a current parameter to be monitored of a target system in a current time unit according to a preset time unit; obtaining a standard parameter in a preset monitoring model, and judging a location based on the standard parameter. Whether there is an abnormal parameter in the current to-be-monitored parameter; if it is determined that the abnormal parameter exists in the currently-to-be-monitored parameter, generating an early warning message according to the abnormal parameter in the currently-to-be-monitored parameter and the standard parameter, and reporting the early-warning Message.
  • the present application can set the core indicator data of the system as parameters to be monitored, such as memory parameters, application parameters, and business parameters, and perform real-time monitoring of the core monitoring data according to a preset time unit, so that the abnormal core can be detected in time. Monitor the data, so as to provide early warning of system data that is about to occur abnormally, to realize early warning when anomalies are about to occur, and to improve system efficiency.
  • FIG. 1 is a schematic diagram of a hardware structure of a system failure early warning device involved in a solution according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for early warning of a system fault in this application;
  • FIG. 3 is a schematic flowchart of a second embodiment of an early warning method for a system fault of the present application
  • FIG. 4 is a schematic flowchart of a third embodiment of an early warning method for a system fault of the present application.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of an early warning device for a system failure of the present application.
  • the method for early warning of a system failure is mainly applied to an early warning device for a system failure.
  • the early warning device for the system failure may be a device with display and processing functions such as a PC, a portable computer, or a mobile terminal.
  • FIG. 1 is a schematic diagram of a hardware structure of a system failure early-warning device involved in a solution according to an embodiment of the present application.
  • the early warning device for system failure may include a processor 1001 (such as a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display and an input unit such as a keyboard.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface.
  • the memory 1005 can be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory, the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on early warning equipment for system failure, and may include more or fewer components than shown in the figure, or combine some components, or different components. Layout.
  • the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and an early warning program for system failure.
  • the network communication module is mainly used to connect to a server and perform data communication with the server; and the processor 1001 can call a system failure early warning program stored in the memory 1005 and execute the system failure early warning method provided in the embodiment of the present application. .
  • the embodiment of the present application provides a method for early warning of system failure.
  • FIG. 2 is a schematic flowchart of a first embodiment of a system fault early warning method according to the present application.
  • the method for early warning of system failure includes the following steps:
  • Step S10 Acquire the current parameters to be monitored of the target system in the current time unit according to the preset time unit;
  • fault prediction methods are usually rule-based prediction technologies, that is, by collecting system operation information and matching with early warning rules, if there are preset rules that match the system operation information, such as detecting that system parameters are greater than a certain threshold , Indicating that the system has malfunctioned, and thus gives a fault prompt message, so that the operation and maintenance personnel can take corresponding measures to resolve the malfunction through manual intervention, such as migrating the faulty node, replacing the faulty node, etc.
  • detecting that the system parameter is greater than a certain threshold it means that a system failure has occurred, so it is not possible to give early warning of the failure, and the early warning effect cannot be achieved. Therefore, how to solve the problem of untimely warning of existing prediction technology has become a technical problem that needs to be solved at present.
  • a system failure early warning method is provided.
  • the core indicator data of the system such as memory parameters, application parameters, and business parameters, are monitored in real time so that the system that is about to encounter an abnormality can be timely.
  • Data for early warning is Specifically, the historical system data of the target system is collected.
  • the historical system data may be related historical data of the target system for one year or longer, and may include: memory parameters, such as CPU, memory occupation data, disk space data, etc., application parameters. , Such as the number of threads, requests, logs, etc., and business parameters, such as the number of online users, transactions, and the number of uploaded pictures. Counting each parameter in the historical system data according to a preset time unit.
  • the preset time unit refers to a preset time period as a time unit.
  • the preset time unit may be 1 minute, Time set of 5 minutes or 10 minutes. For example, starting at 00:00, 00: 00-00: 01 is a time period, or 00: 00-00: 05 is a time period. If the current time is 12:02, the current time unit is 12: 00-12: 05.
  • each parameter in the historical system data may be correspondingly stored in a time series database, the above parameters are quantified and a two-dimensional table is generated.
  • the horizontal axis of the two-dimensional table is each system parameter (CPU, memory occupation data, disk space data, number of threads, requests, number of logs ...), and the vertical axis is each independent time segment (1 minute or 5 minutes is One time slice), the two-dimensional table stores the system parameter data of each system parameter at each time segment. Annotate the status of each system parameter data. If it is abnormal, you can add an abnormal identification to the abnormal system parameter data. Normalize the abnormal system parameters and non-anomalous system parameters respectively, and then train the preset monitoring model according to the random forest algorithm, the processed abnormal system parameters, and the non-anomalous system parameters, thereby implementing the system through the preset monitoring model Automatic identification of parameter anomalies.
  • changes in system parameters corresponding to system abnormalities can be summarized, such as abnormal network delays, that is, when the network starts to delay but there is no interruption, it will cause the business system's business volume (reduction) and abnormal log volume (increase). Changes in system parameters such as application threads (increase). If the memory usage is abnormal, the system parameters such as CPU (occupancy increase), memory usage (increase), and disk space ratio (increase) will change.
  • step S10 the method further includes:
  • the abnormal monitoring parameters and the normal monitoring parameters in the parameters to be monitored are classified and stored, and the preset monitoring model is generated by training according to the abnormal monitoring parameters and the normal monitoring parameters.
  • each parameter in the historical system data is correspondingly stored in a time series database, and the above parameters are quantized and a two-dimensional table is generated.
  • the horizontal axis of the two-dimensional table is each system parameter (CPU, memory occupation data, disk space data, number of threads, requests, number of logs ...)
  • the vertical axis is each independent time unit (1 minute or 5 minutes is One time slice)
  • the two-dimensional table stores the system parameter data of each system parameter at each time segment. Annotate the status of each system parameter data. If it is abnormal, you can add an abnormal identification to the abnormal system parameter data.
  • Step S20 Obtain a standard parameter in a preset monitoring model, and determine whether there is an abnormal parameter in the current parameter to be monitored according to the standard parameter;
  • the current parameters to be monitored of the target system in the current time period are obtained.
  • the abnormality monitoring of the system parameters can be performed in both horizontal and vertical directions.
  • the horizontal direction may be a system parameter that compares the parameters to be monitored corresponding to each time segment to determine whether there are abnormal fluctuations.
  • the network when the network starts to delay without interruption, it will cause abnormal changes in system parameters such as the business system's business volume (reduction), abnormal log volume (increase), and application threads (increase).
  • system parameters such as the business system's business volume (reduction), abnormal log volume (increase), and application threads (increase).
  • it is difficult to trigger the alarm information because the parameter change does not reach the preset change threshold.
  • each abnormal parameter and corresponding standard data are pushed to the management terminal for abnormal confirmation. Vertically, the current to-be-monitored parameters of the current time segment are compared with the standard to-be-monitored parameters of the corresponding time segment.
  • step S30 if it is determined that the abnormal parameter exists in the current parameter to be monitored, an early warning message is generated according to the abnormal parameter in the current parameter to be monitored and the standard parameter, and the early warning message is reported.
  • an abnormality is found when the data is compared horizontally, it is determined that the current parameter to be monitored is abnormally fluctuated, or an abnormality occurs when the data is compared vertically, it is determined that the current parameter to be monitored is deviated from the standard parameter.
  • step S30 the method further includes:
  • some commonly used exception handling strategies can be associated with exception parameters and stored in a preset strategy library.
  • abnormal parameters of abnormal memory usage that is, CPU (occupation rate is increased), memory usage (increased), and disk space ratio (increased)
  • CPU occupation rate is increased
  • memory usage increased
  • disk space ratio increased
  • the monitoring parameters corresponding to the original abnormal parameters after the abnormal processing are obtained, and the monitoring parameters are fed back to the management end, so that the administrator can determine whether the abnormal situation is resolved.
  • This embodiment provides a method for early warning of a system failure, that is, acquiring a current parameter to be monitored of a target system in a current time unit according to a preset time unit; acquiring a standard parameter in a preset monitoring model, and judging according to the standard parameter Whether there is an abnormal parameter in the current parameter to be monitored; if it is determined that the abnormal parameter exists in the current parameter to be monitored, an early warning message is generated according to the abnormal parameter in the current parameter to be monitored and the standard parameter and reported to the Warning message.
  • the present application can set the core indicator data of the system as parameters to be monitored, such as memory parameters, application parameters, and business parameters, and perform real-time monitoring of the core monitoring data according to a preset time unit, so that the abnormal core can be detected in time. Monitor the data, so as to provide early warning of system data that is about to occur abnormally, to realize early warning when anomalies are about to occur, and to improve system efficiency.
  • FIG. 3 is a schematic flowchart of a second embodiment of an early warning method for a system fault of the present application.
  • the step S20 includes:
  • Step S21 According to the preset monitoring model, obtain a current standard parameter to be monitored corresponding to the target system in the current time unit as the standard parameter;
  • step S22 it is determined whether an abnormal parameter does not match the standard parameter in the current parameters to be monitored.
  • preset standard parameters corresponding to each time unit are set in the preset monitoring model, that is, the parameters to be monitored of each event unit are longitudinally compared, that is, the current monitoring parameters are compared with the corresponding preset standard parameters. Compared. If the parameter to be monitored in a certain time unit is significantly different from the preset standard parameter, an abnormality may occur in the parameter to be monitored corresponding to the time unit, and corresponding abnormal processing is required or continuous monitoring of the abnormal parameter is started from the time unit. To further confirm whether the abnormality persists.
  • step S20 further includes:
  • step S23 according to the preset monitoring model, other standard parameters to be monitored corresponding to the target system at other time units are obtained as standard parameters;
  • step S24 it is determined whether there are abnormal parameters in the current parameters to be monitored that do not match the standard parameters.
  • the parameters to be monitored corresponding to each time unit should be within the same range standard. Compare the current to-be-monitored parameter of the target system with other to-be-monitored parameters corresponding to other time units.
  • the other time unit refers to a non-current time unit, and may be several time units. That is, it is compared with the parameters to be monitored corresponding to the preset time unit before and after. If the parameter to be monitored of a certain time unit is significantly different from the parameter to be monitored corresponding to the preset time unit, the parameter to be monitored corresponding to the time unit Anomalies may occur. You need to perform corresponding exception handling or continuous monitoring of abnormal parameters from this time unit to further confirm whether the abnormality persists.
  • FIG. 4 is a schematic flowchart of a third embodiment of an early warning method for a system fault of the present application.
  • the method further includes:
  • Step S31 acquiring parameters to be monitored corresponding to several time units after the current time unit
  • Step S32 Determine whether the parameter to be monitored corresponding to the several time units is abnormal continuously according to the standard parameters corresponding to the time units in the preset monitoring model;
  • step S33 if the parameters to be monitored corresponding to the several time units continue to be abnormal, an emergency warning message is generated according to the abnormal parameters that continue to be abnormal, and the emergency warning message is sent to the management end.
  • step S34 if the parameters to be monitored corresponding to the several time units are not persistent abnormalities, a corresponding target repair strategy is searched in a preset policy library according to the existing abnormal parameters, and abnormality processing is performed according to the target repair strategies.
  • the parameter to be monitored corresponding to a preset time unit after the time unit is further monitored.
  • the temporary abnormality problem is an abnormality problem that the system can adjust by itself, or an abnormality problem that can be solved according to a preset policy library.
  • continuous monitoring may be performed on core index data corresponding to the abnormal parameter.
  • preset parameters to be monitored corresponding to a time unit are preset, where the preset can be one, three, or five, etc., or the parameters corresponding to system core indicators are continuously monitored, or Continuous monitoring is performed for indicators to be monitored corresponding to abnormal parameters.
  • the judging process of judging whether the parameters to be monitored corresponding to several time units are abnormal continuously is to judge one-to-one correspondence between the standard parameters in the several time units and the parameters to be monitored according to the same time unit. For example, if the parameters to be monitored in the current time unit 12: 00-12: 05 are compared with the standard parameters corresponding to the time unit 12: 00-12: 05 in the model, the next time unit 12: 05-12: 10 corresponds. The parameters to be monitored are compared with the standard parameters corresponding to the time unit of 12: 05-12: 10 in the model, and so on. That is, when an abnormality occurs in a certain indicator to be monitored in the current time unit, the indicator to be monitored can be continuously monitored to determine whether the abnormality continues.
  • an embodiment of the present application further provides a system failure early warning device.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of an early warning device for system failure of the present application.
  • the early warning device for system failure includes:
  • a parameter obtaining module 10 configured to obtain a current parameter to be monitored of the target system in a current time unit according to a preset time unit;
  • a parameter monitoring module 20 configured to obtain standard parameters in a preset monitoring model, and determine whether there are abnormal parameters in the current parameters to be monitored according to the standard parameters;
  • the abnormality early warning module 30 is configured to, if it is determined that the abnormal parameter exists in the current parameter to be monitored, generate an early warning message according to the abnormal parameter in the current parameter to be monitored and the standard parameter, and report the early warning message.
  • the early warning device for system failure further includes:
  • a parameter statistics module is configured to obtain historical system data of the target system, and to calculate parameters to be monitored for each time unit in the historical system data according to a preset time unit, where the parameters to be monitored include memory parameters, applications Parameters and business parameters;
  • a model building module is configured to classify and store the abnormal monitoring parameters and the normal monitoring parameters in the parameters to be monitored, and train and generate the preset monitoring model according to the abnormal monitoring parameters and the normal monitoring parameters.
  • the early warning device for system failure further includes:
  • An abnormality repairing module configured to find a corresponding target repairing strategy in a preset policy library according to the abnormality parameters, and perform abnormality processing according to the target repairing strategy;
  • a result feedback module is configured to obtain an abnormal processing result and feed the abnormal processing result to a management end.
  • parameter monitoring module 20 includes:
  • a parameter first obtaining unit configured to obtain, according to the preset monitoring model, a preset standard parameter corresponding to the target system in the current time unit as a standard parameter
  • the parameter first determining unit is configured to determine whether there is an abnormal parameter in the current parameter to be monitored that does not match the standard parameter.
  • a second parameter obtaining unit configured to obtain, according to the preset monitoring model, other standard parameters to be monitored corresponding to the target system at other time units as standard parameters;
  • the second parameter determining unit is configured to determine whether there is an abnormal parameter in the current parameter to be monitored that does not match the standard parameter.
  • abnormality warning module 30 further includes:
  • a third parameter obtaining unit configured to obtain the parameter to be monitored corresponding to several time units after the current time unit if it is determined that the abnormal parameter exists in the current parameter to be monitored;
  • a third parameter judging unit configured to determine whether a parameter to be monitored corresponding to the several time units is continuously abnormal according to standard parameters corresponding to the several time units in the preset monitoring model;
  • the emergency early warning unit is configured to generate an emergency early warning message according to the abnormal parameters of the continuous abnormality if the parameters to be monitored corresponding to the several time units continue to be abnormal, and send the emergency early warning message to a management end.
  • An anomaly repairing unit configured to find a corresponding target repair strategy in a preset policy library according to the existing anomaly parameters if the parameters to be monitored corresponding to the several time units are not persistent anomalies, and perform an exception according to the target repair strategy deal with.
  • each module in the above-mentioned system failure early warning device corresponds to each step in the embodiment of the above-mentioned system failure early warning method, and its functions and implementation processes are not repeated here one by one.
  • an embodiment of the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium of the present application stores an early warning program for system failure, wherein when the early warning program for system failure is executed by a processor, the steps of the early warning method for system failure are implemented.
  • the methods in the above embodiments can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware, but in many cases the former is better.
  • Implementation Based on such an understanding, the technical solution of this application that is essentially or contributes to the existing technology can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium (such as ROM / RAM) as described above. , Magnetic disk, optical disc), including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种***故障的预警方法、装置、设备及计算机可读存储介质,所述预警方法包括:根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数(S10);获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数(S20);若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息(S30)。所述方法可将***的核心指标数据设置为待监控参数,根据预设时间单位对核心监控数据进行实时监控,以便能够及时检测发生异常的核心监控数据,从而对即将发生异常的***数据进行预警,实现在异常即将发生时进行提前预警,提升***工作效率。

Description

***故障的预警方法、装置、设备及存储介质
本申请要求于2018年9月3日提交中国专利局、申请号为201811023379.2、发明名称为“***故障的预警方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及计算机通信领域,尤其涉及一种***故障的预警方法、装置、设备及计算机可读存储介质。
背景技术
***故障是指***不能完成规定功能,或性能退化不满足规定要求的状态。在现有的很多应用***中,例如数据中心***,通信***,卫星***,车载***等中,故障都有可能发生。因此,故障预测成为提供***可靠性的有效方式之一。现有故障告警方式只能在故障发生后,即***参数超过设定阈值时,才会进行告警,从而给出故障提示信息,以便于运维人员通过人工干预的方式采取相应措施以解决故障,例如迁移故障节点、更换故障节点等。但是,在***参数已经超过设定阈值时,***异常问题已经发生,所以现有故障预测方式并不能达到提前预警的目的。
因此,如何解决现有预测技术需要人工核查且不能进行提前预警的问题,成为了目前亟待解决的技术问题。
发明内容
本申请的主要目的在于提供一种***故障的预警方法、装置、设备及计算机可读存储介质,旨在解决现有故障预测技术需要人工核查且不能进行提前预警的技术问题。
为实现上述目的,本申请提供一种***故障的预警方法,所述***故障的预警方法包括以下步骤:
根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;
获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;
若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。
可选地,所述获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数的步骤包括:
根据所述预设监控模型,获取所述目标***在所述当前时间单位对应的当前待监控标准参数,作为标准参数;
判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
可选地,所述获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数的步骤包括:
根据所述预设监控模型,获取所述目标***在其他时间单位对应的其他待监控标准参数,作为标准参数;
判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
可选地,所述根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数的步骤之前,还包括:
获取所述目标***的历史***数据,根据预设的时间单位,统计所述历史***数据中各个时间单位的待监控参数,其中,所述待监控参数包括内存参数、应用参数和业务参数;
将所述待监控参数中的异常监控参数和正常监控参数进行分类存储,并根据异常监控参数和正常监控参数训练生成所述预设监控模型。
可选地,所述若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息的步骤之后,还包括:
根据所述异常参数,在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理;
获取异常处理结果,并将所述异常处理结果反馈至管理端。
可选地, 所述若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息的步骤之后,还包括:
获取所述当前时间单位之后若干个时间单位对应的待监控参数;
根据所述预设监控模型中所述若干个时间单位对应的标准参数,判断所述若干个时间单位对应的待监控参数是否持续异常;
若所述若干个时间单位对应的待监控参数持续异常,则根据持续异常的异常参数生成紧急预警消息,并将所述紧急预警消息发送至管理端。
可选地,所述根据所述预设监控模型中所述若干个时间单位对应的标准参数,判断所述若干个时间单位对应的待监控参数是否持续异常的步骤之后,还包括:
若所述若干个时间单位对应的待监控参数不是持续异常,则根据存在的异常参数在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理。
此外,为实现上述目的,本申请还提供一种***故障的预警装置,其特征在于,所述***故障的预警装置包括:
参数获取模块,用于根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;
参数监控模块,用于获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;
异常预警模块,用于若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。
此外,为实现上述目的,本申请还提供一种***故障的预警设备,所述***故障的预警设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的***故障的预警程序,其中所述***故障的预警程序被所述处理器执行时,实现如上述的***故障的预警方法的步骤。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有***故障的预警程序,其中所述***故障的预警程序被处理器执行时,实现如上述的***故障的预警方法的步骤。
本申请提供一种***故障的预警方法,即根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。通过上述方式,本申请可将***的核心指标数据设置为待监控参数,如内存参数、应用参数以及业务参数,根据预设时间单位对核心监控数据进行实时监控,以便能够及时检测发生异常的核心监控数据,从而对即将发生异常的***数据进行预警,实现在异常即将发生时提前进行预警,提升***工作效率。
附图说明
图1为本申请实施例方案中涉及的***故障的预警设备的硬件结构示意图;
图2为本申请***故障的预警方法第一实施例的流程示意图;
图3为本申请***故障的预警方法第二实施例的流程示意图;
图4为本申请***故障的预警方法第三实施例的流程示意图;
图5为本申请***故障的预警装置第一实施例的功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例涉及的***故障的预警方法主要应用于***故障的预警设备,该***故障的预警设备可以是PC、便携计算机、移动终端等具有显示和处理功能的设备。
参照图1,图1为本申请实施例方案中涉及的***故障的预警设备的硬件结构示意图。本申请实施例中,***故障的预警设备可以包括处理器1001(例如CPU),通信总线1002,用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信;用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard);网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口);存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器,存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的硬件结构并不构成对***故障的预警设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
继续参照图1,图1中作为一种计算机可读存储介质的存储器1005可以包括操作***、网络通信模块以及***故障的预警程序。
在图1中,网络通信模块主要用于连接服务器,与服务器进行数据通信;而处理器1001可以调用存储器1005中存储的***故障的预警程序,并执行本申请实施例提供的***故障的预警方法。
本申请实施例提供了一种***故障的预警方法。
参照图2,图2为本申请***故障的预警方法第一实施例的流程示意图。
本实施例中,所述***故障的预警方法包括以下步骤:
步骤S10, 根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;
目前,故障预测方法通常是基于规则的预测技术,即是通过采集***运行信息,并与预警规则进行匹配,若存在与***运行信息匹配的预设规则,如检测到***参数在大于某一阈值,说明***已经出现故障,从而给出故障提示信息,以便于运维人员通过人工干预的方式采取相应措施以解决故障,例如迁移故障节点、更换故障节点等。但是,在检测到***参数在大于某一阈值后,即表示当前已经发生了***故障,因此不能及时对故障作出预警,达不到预警效果。因此,如何解决现有预测技术预警不及时的问题,成为了目前亟待解决的技术问题。
本实施例中,为了解决上述问题,提供一种***故障的预警方法,通过对***的核心指标数据,如内存参数、应用参数以及业务参数,进行实时监控,以便能够及时对即将发生异常的***数据进行预警。具体地,采集目标***的历史***数据,该历史***数据可以是目标***一年或者更长时间的相关历史数据,可以包括:内存参数,如CPU、内存占用数据、磁盘空间数据等,应用参数,如线程数、请求数、日志数量等,以及业务参数,如在线用户数量、事务数以及上传图片数量等。将所述历史***数据中的各个参数按照预设的时间单位进行统计,预设的时间单位指的是以预先设置的一个时间段作为一个时间单位,该预设的时间单位可以是1分钟、5分钟或者是10分钟等时间集。例如,从00:00开始,00:00-00:01为一时间段,或者00:00-00:05为一时间段。若当前时间为12:02,则当前时间单位为12:00-12:05。具体实施例中,还可以将历史***数据中的各个参数对应存储至一时序数据库,将上述参数量化并生成一个二维表。其中,二维表的横轴为各个***参数(CPU,内存占用数据,磁盘空间数据,线程数,请求数,日志数量…),纵轴为每个独立的时间片段(1分钟或者5分钟为一个时间片),二维表中存放每个***参数在各个时间片段的***参数数据。将各个***参数数据进行状态标注,如是否异常,具体可将异常***参数数据添加异常标识。分别将异常***参数以及非异常***参数进行归一化处理,然后根据随机森林算法、处理后的异常***参数和非异常***参数对预设监控模型进行训练,从而通过该预设监控模型实现***参数异常的自动识别。具体实施例中,可将***异常对应的***参数变化进行归纳,如网络延迟异常,即当网络开始发生延迟但没有中断时,会导致业务***的业务量(减少)、异常日志量(增加)、应用线程(增加)等***参数变化。如内存占用异常,会导致CPU(占用率升高)、内存占用(增大)、磁盘空间占比(增加)等***参数变化。
进一步地,所述步骤S10之前,还包括:
获取所述目标***的历史***数据,根据预设的时间单位,统计所述历史***数据中各个时间单位的待监控参数,其中,所述待监控参数包括内存参数、应用参数和业务参数;
将所述待监控参数中的异常监控参数和正常监控参数进行分类存储,并根据异常监控参数和正常监控参数训练生成所述预设监控模型。
本实施例中,将历史***数据中的各个参数对应存储至一时序数据库,将上述参数量化并生成一个二维表。其中,二维表的横轴为各个***参数(CPU,内存占用数据,磁盘空间数据,线程数,请求数,日志数量…),纵轴为每个独立的时间单位(1分钟或者5分钟为一个时间片),二维表中存放每个***参数在各个时间片段的***参数数据。将各个***参数数据进行状态标注,如是否异常,具体可将异常***参数数据添加异常标识。分别将异常***参数以及非异常***参数进行归一化处理,然后根据随机森林算法、处理后的异常***参数和非异常***参数对预设监控模型进行训练,从而通过该预设监控模型实现***参数异常的自动识别。
步骤S20,获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;
本实施例中,按照上述的预设时间单位,获取当前时间段所述目标***的当前待监控参数。通过所述预设监控模型,获取所述当前时间段所关联的异常待监控数据以及非异常待监控数据,并将所述当前待监控参数分别与所述异常待监控数据、非异常待监控数据进行比较判断,判断所述当前待监控参数是否异常。具体实施例中,可从横向和纵向两方面进行***参数的异常监控。横向可以为将各个时间片段对应的待监控参数进行比较,判断是否有异常波动的***参数。如当网络开始发生延迟但没有中断时,会导致业务***的业务量(减少)、异常日志量(增加)、应用线程(增加)等***参数发生异常变化。但是在现有技术中由于上述参数变化并未达到预设变化阈值,因此难以触发告警信息。具体实施例中,若监测到所述***参数的异常变化符合某一预存异常情况时,将各个异常参数以及对应的标准数据推送至管理端,以便进行异常确认。纵向为将当前时间片段的当前待监控参数与对应时间片段的标准待监控参数进行比对。
步骤S30,若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。
本实施例中,若横向对比数据时发现异常,即判定所述当前待监控参数发生异常波动,或者是纵向对比数据时发生异常,即判定所述当前待监控参数与标准参数存在偏差。可提取出所述当前待监控参数中的异常参数,如减少的业务量、增加的异常***日志以及增加的应用线程数量,并根据对应的标准参数(可以为标准参数范围),生成预警提醒消息,并将所述预警消息发送至管理端,以便对应的管理人员进行异常确认,以提前对即将发生的***故障进行处理。
进一步地,所述步骤S30之后,还包括:
根据所述异常参数,在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理;
获取异常处理结果,并将所述异常处理结果反馈至管理端。
本实施例中,可以将一些常用的异常处理策略与异常参数进行关联,并存储至预设策略库中。如针对内存占用异常的异常参数时,即CPU(占用率升高)、内存占用(增大)、磁盘空间占比(增加),可将关闭后台运行的程序,减小内存占用作为对应关联存储的目标修改策略等。在进行异常处理后,获取异常处理后原异常参数对应的监控参数,并将监控参数反馈至管理端,以便管理员判断异常情况是否被解决。
本实施例提供一种***故障的预警方法,即根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。通过上述方式,本申请可将***的核心指标数据设置为待监控参数,如内存参数、应用参数以及业务参数,根据预设时间单位对核心监控数据进行实时监控,以便能够及时检测发生异常的核心监控数据,从而对即将发生异常的***数据进行预警,实现在异常即将发生时进行提前预警,提升***工作效率。
参照图3,图3为本申请***故障的预警方法第二实施例的流程示意图。
基于上述图2所示实施例,本实施例中,所述步骤S20包括:
步骤S21,根据所述预设监控模型,获取所述目标***在所述当前时间单位对应的当前待监控标准参数,作为标准参数;
步骤S22,判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
本实施例中,在预设监控模型中设置有每个时间单位对应的预设标准参数,即将各个事件单位的待监控参数进行纵向对比,即将所述当前监控参数与对应的预设标准参数进行对比。若某个时间单位的待监控参数与预设标准参数差异较大,则该时间单位对应的待监控参数可能发生异常,需要进行对应的异常处理或者是从该时间单位开始进行异常参数的持续监控,进一步确认是否持续异常。
进一步地,本实施例中,所述步骤S20还包括:
步骤S23,根据所述预设监控模型,获取所述目标***在其他时间单位对应的其他待监控标准参数,作为标准参数;
步骤S24,判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
本实施例中,***在正常运行情况下,各个时间单位对应的待监控参数应处于同一范围标准内。将所述目标***的当前待监控参数与其他时间单位对应的其他待监控参数进行横向对比。其中,所述其他时间单位是指非当前时间单位,可以是若干个时间单位。即与前后预设个时间单位对应的待监控参数进行对比,若某个时间单位的待监控参数与前后预设个时间单位对应的待监控参数差异较大,则该时间单位对应的待监控参数可能发生异常,需要进行对应的异常处理或者是从该时间单位开始进行异常参数的持续监控,进一步确认是否持续异常。
参照图4,图4为本申请***故障的预警方法第三实施例的流程示意图。
基于上述图2所示实施例,本实施例中,所述步骤S30之后,还包括:
步骤S31,获取所述当前时间单位之后若干个时间单位对应的待监控参数;
步骤S32,根据所述预设监控模型中所述若干个时间单位对应的标准参数,判断所述若干个时间单位对应的待监控参数是否持续异常;
步骤S33,若所述若干个时间单位对应的待监控参数持续异常,则根据持续异常的异常参数生成紧急预警消息,并将所述紧急预警消息发送至管理端。
步骤S34,若所述若干个时间单位对应的待监控参数不是持续异常,则根据存在的异常参数在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理。
本实施例中,为了避免***产生暂时性异常问题,可在监测到异常参数后,进一步对该时间单位后的预设个时间单位对应的待监控参数进行监控。其中,暂时性异常问题为***可自行调整的异常问题,或者可根据预设策略库即可解决的异常问题。具体地,在检测到当前待监控参数存在异常参数时,可针对所述异常参数对应的核心指标数据进行持续性监测。获取所述当前时间单位之后预设个时间单位对应的待监控参数,其中,预设个可以是一个、三个或者五个等,可以是对***核心指标对应的参数进行持续监控,也可以是针对异常参数对应的待监控指标进行持续监控。其中,判断若干个时间单位对应的待监控参数是否持续异常的判断过程是将若干个时间单位中的标准参数和待监控参数按照相同的时间单位一一对应判断的。如将当前时间单位12:00-12:05的待监控参数与模型中12:00-12:05时间单位对应的标准参数进行比对,将下个时间单位的12:05-12:10对应的待监控参数与模型中12:05-12:10时间单位对应的标准参数进行比对,以此类推。即在检测到当前时间单位的某个待监控指标发生异常时,可对该待监控指标进行持续监控,判断是否持续异常。或者对整个***的核心监控数据,即待监控参数,进行持续监控,并获取所述预设监控模型中所述若干个时间单位对应的标准参数,根据所述若干个时间单位对应的标准参数,判断所述待监控参数是否持续存在异常参数情况。若持续异常,则生成紧急预警,以便管理员及时进行异常处理,避免发生***故障。若未发生持续异常,即***当前的异常为暂时性异常,***可自行调整,可根据该异常参数进行目标修复策略查找,从而自动进行异常处理。
此外,本申请实施例还提供一种***故障的预警装置。
参照图5,图5为本申请***故障的预警装置第一实施例的功能模块示意图。
本实施例中,所述***故障的预警装置包括:
参数获取模块10,用于根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;
参数监控模块20,用于获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;
异常预警模块30,用于若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。
进一步地,所述***故障的预警装置还包括:
参数统计模块,用于获取所述目标***的历史***数据,根据预设的时间单位,统计所述历史***数据中各个时间单位的待监控参数,其中,所述待监控参数包括内存参数、应用参数和业务参数;
模型建立模块,用于将所述待监控参数中的异常监控参数和正常监控参数进行分类存储,并根据异常监控参数和正常监控参数训练生成所述预设监控模型。
进一步地,所述***故障的预警装置还包括:
异常修复模块,用于根据所述异常参数,在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理;
结果反馈模块,用于获取异常处理结果,并将所述异常处理结果反馈至管理端。
进一步地,所述参数监控模块20包括:
参数第一获取单元,用于根据所述预设监控模型,获取所述目标***在所述当前时间单位对应的预设标准参数,作为标准参数;
参数第一判断单元,用于判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
参数第二获取单元,用于根据所述预设监控模型,获取所述目标***在其他时间单位对应的其他待监控标准参数,作为标准参数;
参数第二判断单元,用于判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
进一步地,所述异常预警模块30还包括:
参数第三获取单元,用于若判定所述当前待监控参数存在所述异常参数,则获取所述当前时间单位之后若干个时间单位对应的待监控参数;
参数第三判断单元,用于根据所述预设监控模型中所述若干个时间单位对应的标准参数,判断所述若干个时间单位对应的待监控参数是否持续异常;
紧急预警单元,用于若所述若干个时间单位对应的待监控参数持续异常,则根据持续异常的异常参数生成紧急预警消息,并将所述紧急预警消息发送至管理端。
异常修复单元,用于若所述若干个时间单位对应的待监控参数不是持续异常,则根据存在的异常参数在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理。
其中,上述***故障的预警装置中各个模块与上述***故障的预警方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。
此外,本申请实施例还提供一种计算机可读存储介质。
本申请计算机可读存储介质上存储有***故障的预警程序,其中所述***故障的预警程序被处理器执行时,实现如上述的***故障的预警方法的步骤。
其中,***故障的预警程序被执行时所实现的方法可参照本申请***故障的预警方法的各个实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者***不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者***所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者***中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种***故障的预警方法,其特征在于,所述***故障的预警方法包括以下步骤:
    根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;
    获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;
    若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。
  2. 如权利要求1所述的***故障的预警方法,其特征在于,所述获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数的步骤包括:
    根据所述预设监控模型,获取所述目标***在所述当前时间单位对应的当前待监控标准参数,作为标准参数;
    判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
  3. 如权利要求1所述的***故障的预警方法,其特征在于,所述获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数的步骤包括:
    根据所述预设监控模型,获取所述目标***在其他时间单位对应的其他待监控标准参数,作为标准参数;
    判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
  4. 如权利要求1所述的***故障的预警方法,其特征在于,所述根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数的步骤之前,还包括:
    获取所述目标***的历史***数据,根据预设的时间单位,统计所述历史***数据中各个时间单位的待监控参数,其中,所述待监控参数包括内存参数、应用参数和业务参数;
    将所述待监控参数中的异常监控参数和正常监控参数进行分类存储,并根据异常监控参数和正常监控参数训练生成所述预设监控模型。
  5. 如权利要求1所述的***故障的预警方法,其特征在于,所述若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息的步骤之后,还包括:
    根据所述异常参数,在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理;
    获取异常处理结果,并将所述异常处理结果反馈至管理端。
  6. 如权利要求1所述的***故障的预警方法,其特征在于, 所述若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息的步骤之后,还包括:
    获取所述当前时间单位之后若干个时间单位对应的待监控参数;
    根据所述预设监控模型中所述若干个时间单位对应的标准参数,判断所述若干个时间单位对应的待监控参数是否持续异常;
    若所述若干个时间单位对应的待监控参数持续异常,则根据持续异常的异常参数生成紧急预警消息,并将所述紧急预警消息发送至管理端。
  7. 如权利要求6所述的***故障的预警方法,其特征在于,所述根据所述预设监控模型中所述若干个时间单位对应的标准参数,判断所述若干个时间单位对应的待监控参数是否持续异常的步骤之后,还包括:
    若所述若干个时间单位对应的待监控参数不是持续异常,则根据存在的异常参数在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理。
  8. 一种***故障的预警装置,其特征在于,所述***故障的预警装置包括:
    参数获取模块,用于根据预设的时间单位,获取目标***在当前时间单位的当前待监控参数;
    参数监控模块,用于获取预设监控模型中的标准参数,并根据所述标准参数判断所述当前待监控参数是否存在异常参数;
    异常预警模块,用于若判定所述当前待监控参数存在所述异常参数,则根据所述当前待监控参数中的异常参数以及所述标准参数生成预警消息,并上报所述预警消息。
  9. 如权利要求8所述的***故障的预警装置,其特征在于,所述***故障的预警装置包括:
    根据所述预设监控模型,获取所述目标***在所述当前时间单位对应的当前待监控标准参数,作为标准参数;
    判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
  10. 如权利要求8所述的***故障的预警装置,其特征在于,所述参数监控模块包括:
    参数第一获取单元,用于根据所述预设监控模型,获取所述目标***在其他时间单位对应的其他待监控标准参数,作为标准参数;
    参数第一判断单元,用于判断所述当前待监控参数中是否存在与所述标准参数不匹配的异常参数。
  11. 如权利要求8所述的***故障的预警装置,其特征在于,所述***故障的预警装置包括:
    参数统计模块,用于获取所述目标***的历史***数据,根据预设的时间单位,统计所述历史***数据中各个时间单位的待监控参数,其中,所述待监控参数包括内存参数、应用参数和业务参数;
    模型建立模块,用于将所述待监控参数中的异常监控参数和正常监控参数进行分类存储,并根据异常监控参数和正常监控参数训练生成所述预设监控模型。
  12. 如权利要求8所述的***故障的预警装置,其特征在于,所述***故障的预警装置包括:
    异常修复模块,用于根据所述异常参数,在预设策略库中查找对应的目标修复策略,并根据所述目标修复策略进行异常处理;
    结果反馈模块,用于获取异常处理结果,并将所述异常处理结果反馈至管理端。
  13. 一种***故障的预警设备,其特征在于,所述***故障的预警设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的***故障的预警程序,其中所述***故障的预警程序被所述处理器执行时,实现如权利要求1所述的***故障的预警方法的步骤。
  14. 一种***故障的预警设备,其特征在于,所述***故障的预警设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的***故障的预警程序,其中所述***故障的预警程序被所述处理器执行时,实现如权利要求2所述的***故障的预警方法的步骤。
  15. 一种***故障的预警设备,其特征在于,所述***故障的预警设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的***故障的预警程序,其中所述***故障的预警程序被所述处理器执行时,实现如权利要求3所述的***故障的预警方法的步骤。
  16. 一种***故障的预警设备,其特征在于,所述***故障的预警设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的***故障的预警程序,其中所述***故障的预警程序被所述处理器执行时,实现如权利要求4所述的***故障的预警方法的步骤。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有***故障的预警程序,其中所述***故障的预警程序被处理器执行时,实现如权利要求1所述的***故障的预警方法的步骤。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有***故障的预警程序,其中所述***故障的预警程序被处理器执行时,实现如权利要求2所述的***故障的预警方法的步骤。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有***故障的预警程序,其中所述***故障的预警程序被处理器执行时,实现如权利要求3所述的***故障的预警方法的步骤。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有***故障的预警程序,其中所述***故障的预警程序被处理器执行时,实现如权利要求4所述的***故障的预警方法的步骤。
PCT/CN2018/122807 2018-09-03 2018-12-21 ***故障的预警方法、装置、设备及存储介质 WO2020048047A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811023379.2A CN109684179B (zh) 2018-09-03 2018-09-03 ***故障的预警方法、装置、设备及存储介质
CN201811023379.2 2018-09-03

Publications (1)

Publication Number Publication Date
WO2020048047A1 true WO2020048047A1 (zh) 2020-03-12

Family

ID=66185650

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122807 WO2020048047A1 (zh) 2018-09-03 2018-12-21 ***故障的预警方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN109684179B (zh)
WO (1) WO2020048047A1 (zh)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414308A (zh) * 2020-03-26 2020-07-14 北京字节跳动网络技术有限公司 应用程序包的处理方法和应用程序的运行方法、装置
CN111831514A (zh) * 2020-07-21 2020-10-27 深信服科技股份有限公司 一种设备监控方法、装置、设备及存储介质
CN112000582A (zh) * 2020-08-31 2020-11-27 深圳市奇虎智能科技有限公司 服务端自动化测试预警方法、装置、设备及存储介质
CN112764985A (zh) * 2020-12-30 2021-05-07 中国人寿保险股份有限公司上海数据中心 一种数据中心***智能监控方法
CN113190416A (zh) * 2021-05-27 2021-07-30 中国工商银行股份有限公司 数据库执行计划的预警方法、装置、电子设备和存储介质
CN113608952A (zh) * 2021-06-18 2021-11-05 云南昆钢电子信息科技有限公司 一种基于日志构建支持环境的***故障处理方法及***
CN113688014A (zh) * 2021-07-30 2021-11-23 济南浪潮数据技术有限公司 一种srdc整机柜的告警处理方法、装置、设备及介质
CN113721557A (zh) * 2020-05-25 2021-11-30 中国石油化工股份有限公司 基于关联参数的石化装置运行工艺参数监测方法及装置
CN113849333A (zh) * 2021-09-26 2021-12-28 中国地质大学(武汉) 一种基于WN-Spline基线域算法的数据中心自驱排障方法及***
CN113886213A (zh) * 2020-06-29 2022-01-04 腾讯科技(深圳)有限公司 程序数据处理方法、装置、计算机可读存储介质及设备
CN113948145A (zh) * 2020-07-17 2022-01-18 长鑫存储技术有限公司 封装芯片的测试方法、***、计算机设备和存储介质
CN114244681A (zh) * 2021-12-21 2022-03-25 深圳Tcl新技术有限公司 设备连接故障预警方法、装置、存储介质及电子设备
CN114385435A (zh) * 2021-12-01 2022-04-22 山东有人物联网股份有限公司 一种程序运维监控方法、装置及计算机可读存储介质
CN114429311A (zh) * 2022-02-17 2022-05-03 广州志橙半导体有限公司 一种用于半导体制造流程的动态监控方法及***
CN114627627A (zh) * 2020-12-14 2022-06-14 深圳Tcl新技术有限公司 设备异常处理方法、装置、终端及计算机可读存储介质
CN114675754A (zh) * 2022-01-27 2022-06-28 浪潮(山东)计算机科技有限公司 一种鼠标控制方法、装置、设备及存储介质
CN115271685A (zh) * 2022-09-27 2022-11-01 卡斯柯信号(北京)有限公司 一种用于铁路行业高精度设备维修周期的监控方法及装置
CN115931416A (zh) * 2023-03-14 2023-04-07 枣庄市天工精密机械有限公司 一种基于数据分析的淋砂机故障检测***
CN116502957A (zh) * 2023-05-12 2023-07-28 深圳普菲特信息科技股份有限公司 产品生产质量监控方法、***和可读存储介质
CN116840600A (zh) * 2023-07-05 2023-10-03 河北久维电子科技有限公司 设备异常告警方法及变电站辅助***综合监控联动平台
CN117312879A (zh) * 2023-11-09 2023-12-29 江门塚田正川科技有限公司 一种注塑机生产数据监督预警方法、***及介质
CN117349061A (zh) * 2023-09-22 2024-01-05 天宇正清科技有限公司 一种智能接口管理方法、***、终端及存储介质
CN117639936A (zh) * 2023-12-12 2024-03-01 广西电网有限责任公司贵港供电局 一种基于ip光纤通信的传输方法和装置

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221776A (zh) * 2019-05-10 2019-09-10 苏州汇川技术有限公司 异常信息处理方法、***、装置即计算机可读存储介质
CN112036581B (zh) * 2019-05-15 2024-03-26 上海杰之能软件科技有限公司 交通工具空调***的性能检测方法及装置、存储介质、终端
CN110458713B (zh) * 2019-07-05 2023-10-13 中国平安人寿保险股份有限公司 模型监控方法、装置、计算机设备及存储介质
CN110704225A (zh) * 2019-09-18 2020-01-17 平安科技(深圳)有限公司 监控方法、装置、电子设备及计算机可读存储介质
CN111078446B (zh) * 2019-11-22 2023-08-29 北京达佳互联信息技术有限公司 一种故障信息获取方法、装置、电子设备及存储介质
CN111782462B (zh) * 2020-06-13 2024-05-24 华青融天(北京)软件股份有限公司 告警方法、装置和电子设备
CN111751724A (zh) * 2020-06-24 2020-10-09 湖北文理学院 电机应用的工况信息监控方法、装置及可读存储介质
CN112529575B (zh) * 2020-12-14 2023-12-22 深圳市快付通金融网络科技服务有限公司 风险预警方法、设备、存储介质及装置
CN113438113A (zh) * 2021-06-25 2021-09-24 树根互联股份有限公司 基于物联网的设备运行异常的监测维护方法及***
CN114237087A (zh) * 2021-11-09 2022-03-25 深圳市移动力量科技有限公司 监控***预警方法、装置及计算机可读存储介质
CN114035555A (zh) * 2021-11-19 2022-02-11 武汉润致新电子科技有限公司 一种plc控制器故障检测***
CN114513441B (zh) * 2022-04-15 2022-07-12 北京优特捷信息技术有限公司 基于区块链的***维护方法、装置、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811344A (zh) * 2014-01-23 2015-07-29 阿里巴巴集团控股有限公司 网络动态业务监控方法及装置
CN105262634A (zh) * 2015-09-06 2016-01-20 浪潮集团有限公司 一种监控阈值的生成方法、装置和***
CN105450454A (zh) * 2015-12-03 2016-03-30 广州华多网络科技有限公司 一种服务监控告警方法以及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080126881A1 (en) * 2006-07-26 2008-05-29 Tilmann Bruckhaus Method and apparatus for using performance parameters to predict a computer system failure
US10635563B2 (en) * 2016-08-04 2020-04-28 Oracle International Corporation Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems
CN107247649A (zh) * 2016-10-12 2017-10-13 北京奇虎科技有限公司 检测***健康状况的方法、装置及网关
CN106953766B (zh) * 2017-03-31 2020-06-26 北京奇艺世纪科技有限公司 一种报警方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811344A (zh) * 2014-01-23 2015-07-29 阿里巴巴集团控股有限公司 网络动态业务监控方法及装置
CN105262634A (zh) * 2015-09-06 2016-01-20 浪潮集团有限公司 一种监控阈值的生成方法、装置和***
CN105450454A (zh) * 2015-12-03 2016-03-30 广州华多网络科技有限公司 一种服务监控告警方法以及装置

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414308B (zh) * 2020-03-26 2023-08-22 抖音视界有限公司 应用程序包的处理方法和应用程序的运行方法、装置
CN111414308A (zh) * 2020-03-26 2020-07-14 北京字节跳动网络技术有限公司 应用程序包的处理方法和应用程序的运行方法、装置
CN113721557A (zh) * 2020-05-25 2021-11-30 中国石油化工股份有限公司 基于关联参数的石化装置运行工艺参数监测方法及装置
CN113886213A (zh) * 2020-06-29 2022-01-04 腾讯科技(深圳)有限公司 程序数据处理方法、装置、计算机可读存储介质及设备
CN113948145B (zh) * 2020-07-17 2024-05-14 长鑫存储技术有限公司 封装芯片的测试方法、***、计算机设备和存储介质
CN113948145A (zh) * 2020-07-17 2022-01-18 长鑫存储技术有限公司 封装芯片的测试方法、***、计算机设备和存储介质
US11862269B2 (en) 2020-07-17 2024-01-02 Changxin Memory Technologies, Inc. Testing method for packaged chip, testing system for packaged chip, computer device and storage medium
CN111831514A (zh) * 2020-07-21 2020-10-27 深信服科技股份有限公司 一种设备监控方法、装置、设备及存储介质
CN112000582A (zh) * 2020-08-31 2020-11-27 深圳市奇虎智能科技有限公司 服务端自动化测试预警方法、装置、设备及存储介质
CN114627627A (zh) * 2020-12-14 2022-06-14 深圳Tcl新技术有限公司 设备异常处理方法、装置、终端及计算机可读存储介质
CN112764985A (zh) * 2020-12-30 2021-05-07 中国人寿保险股份有限公司上海数据中心 一种数据中心***智能监控方法
CN112764985B (zh) * 2020-12-30 2024-05-17 中国人寿保险股份有限公司上海数据中心 一种数据中心***智能监控方法
CN113190416A (zh) * 2021-05-27 2021-07-30 中国工商银行股份有限公司 数据库执行计划的预警方法、装置、电子设备和存储介质
CN113608952A (zh) * 2021-06-18 2021-11-05 云南昆钢电子信息科技有限公司 一种基于日志构建支持环境的***故障处理方法及***
CN113608952B (zh) * 2021-06-18 2024-02-27 云南昆钢电子信息科技有限公司 一种基于日志构建支持环境的***故障处理方法及***
CN113688014A (zh) * 2021-07-30 2021-11-23 济南浪潮数据技术有限公司 一种srdc整机柜的告警处理方法、装置、设备及介质
CN113688014B (zh) * 2021-07-30 2024-02-09 济南浪潮数据技术有限公司 一种srdc整机柜的告警处理方法、装置、设备及介质
CN113849333A (zh) * 2021-09-26 2021-12-28 中国地质大学(武汉) 一种基于WN-Spline基线域算法的数据中心自驱排障方法及***
CN114385435A (zh) * 2021-12-01 2022-04-22 山东有人物联网股份有限公司 一种程序运维监控方法、装置及计算机可读存储介质
CN114244681A (zh) * 2021-12-21 2022-03-25 深圳Tcl新技术有限公司 设备连接故障预警方法、装置、存储介质及电子设备
CN114244681B (zh) * 2021-12-21 2023-08-01 深圳Tcl新技术有限公司 设备连接故障预警方法、装置、存储介质及电子设备
CN114675754A (zh) * 2022-01-27 2022-06-28 浪潮(山东)计算机科技有限公司 一种鼠标控制方法、装置、设备及存储介质
CN114429311A (zh) * 2022-02-17 2022-05-03 广州志橙半导体有限公司 一种用于半导体制造流程的动态监控方法及***
CN115271685A (zh) * 2022-09-27 2022-11-01 卡斯柯信号(北京)有限公司 一种用于铁路行业高精度设备维修周期的监控方法及装置
CN115931416B (zh) * 2023-03-14 2023-06-13 枣庄市天工精密机械有限公司 一种基于数据分析的淋砂机故障检测***
CN115931416A (zh) * 2023-03-14 2023-04-07 枣庄市天工精密机械有限公司 一种基于数据分析的淋砂机故障检测***
CN116502957A (zh) * 2023-05-12 2023-07-28 深圳普菲特信息科技股份有限公司 产品生产质量监控方法、***和可读存储介质
CN116840600B (zh) * 2023-07-05 2024-01-16 河北久维电子科技有限公司 设备异常告警方法及变电站辅助***综合监控联动平台
CN116840600A (zh) * 2023-07-05 2023-10-03 河北久维电子科技有限公司 设备异常告警方法及变电站辅助***综合监控联动平台
CN117349061A (zh) * 2023-09-22 2024-01-05 天宇正清科技有限公司 一种智能接口管理方法、***、终端及存储介质
CN117312879A (zh) * 2023-11-09 2023-12-29 江门塚田正川科技有限公司 一种注塑机生产数据监督预警方法、***及介质
CN117639936A (zh) * 2023-12-12 2024-03-01 广西电网有限责任公司贵港供电局 一种基于ip光纤通信的传输方法和装置
CN117639936B (zh) * 2023-12-12 2024-05-24 广西电网有限责任公司贵港供电局 一种基于ip光纤通信的传输方法和装置

Also Published As

Publication number Publication date
CN109684179B (zh) 2022-05-17
CN109684179A (zh) 2019-04-26

Similar Documents

Publication Publication Date Title
WO2020048047A1 (zh) ***故障的预警方法、装置、设备及存储介质
WO2020073688A1 (zh) 预测网络设备异常的方法、装置、设备及存储介质
WO2020015064A1 (zh) ***故障处理方法、装置、设备及存储介质
WO2020119369A1 (zh) 智能it运维故障定位方法、装置、设备及可读存储介质
WO2016056708A1 (ko) 시계열 데이터의 분석을 통한 이상 감지 및 예측 시스템 및 방법
WO2020143297A1 (zh) 呼叫中心的容灾方法、装置、设备及存储介质
WO2020119115A1 (zh) 数据审核方法、装置、设备及存储介质
WO2022071760A1 (ko) 화재 방호 방법 및 화재 방호 시스템
WO2020015060A1 (zh) 用电量异常评估方法、装置、设备和计算机存储介质
WO2021012481A1 (zh) ***性能监控方法、装置、设备及存储介质
WO2020233073A1 (zh) 区块链环境检测方法、设备、装置及存储介质
US10049509B1 (en) System and method for transmitting apparatus-related information in real time
CN105450472A (zh) 一种自动获取服务器物理部件状态的方法及装置
WO2020155773A1 (zh) 文本输入异常监控方法、装置、计算机设备及存储介质
WO2020253125A1 (zh) 日志管理方法、装置、设备及存储介质
WO2019041319A1 (zh) Wifi功能测试方法、终端设备、***及存储介质
WO2021017259A1 (zh) 会话监控调整方法、设备、存储介质及装置
JP2004086719A (ja) ネットワーク機器の保守システムおよび保守サービス提供方法
WO2020172919A1 (zh) Ai智能过程异常识别闭环控制方法、主机及装备***
WO2015046697A1 (ko) 제어패킷 이상 증가 검출 장치, 그 방법 및 기록매체
WO2018199372A1 (ko) 스트리밍 데이터 고속처리시스템을 모니터링하는 모니터링장치 및 그 방법
WO2012150778A2 (ko) 연결 상태 확인 이벤트에 기반하여 m2m 통신 개체간 연결을 관리하는 방법 및 장치
JP2007264907A (ja) 障害通報システム、障害通報方法及び障害通報プログラム
WO2015088268A1 (ko) Sdn 환경에서 네트워크 장치에 대한 장애를 처리하는 방법
WO2020027560A1 (ko) 전자 장치 및 전자 장치의 결함 진단 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18932688

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18932688

Country of ref document: EP

Kind code of ref document: A1