CN116795631A - Service system monitoring alarm method, device, equipment and medium - Google Patents

Service system monitoring alarm method, device, equipment and medium Download PDF

Info

Publication number
CN116795631A
CN116795631A CN202310639999.3A CN202310639999A CN116795631A CN 116795631 A CN116795631 A CN 116795631A CN 202310639999 A CN202310639999 A CN 202310639999A CN 116795631 A CN116795631 A CN 116795631A
Authority
CN
China
Prior art keywords
alarm
index
monitoring
information
monitoring index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310639999.3A
Other languages
Chinese (zh)
Inventor
杨云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202310639999.3A priority Critical patent/CN116795631A/en
Publication of CN116795631A publication Critical patent/CN116795631A/en
Pending legal-status Critical Current

Links

Landscapes

  • Alarm Systems (AREA)

Abstract

The application provides a service system monitoring and alarming method, device, equipment and medium, which can be used in the financial field or other fields. The method comprises the following steps: acquiring a current monitoring index of a service system, wherein the monitoring index comprises at least one of user behavior information, service information or system operation performance; comparing the index value of the monitoring index with an index system table to judge whether the monitoring index is abnormal, wherein a preset normal threshold value of the monitoring index is recorded in the index system table; if yes, acquiring an alarm rule corresponding to the monitoring index, and judging whether the index value of the monitoring index accords with the alarm rule; and if the index value of the monitoring index accords with the alarm rule, outputting alarm information. The application monitors, analyzes and safely warns various behavior data, system performance data and service progress data of the user acquired in real time by the service system, is convenient for finding out the risks faced by various services and the performances of the service system in time, and achieves the effect of rapidly and effectively positioning loss prevention.

Description

Service system monitoring alarm method, device, equipment and medium
Technical Field
The present application relates to the financial field and other fields, and in particular, to a method, apparatus, device and medium for monitoring and alarming in a business system.
Background
With the rapid development of the financial field and the internet technology, the business system has great wind control challenges and pressure under the conditions of big data, high concurrency and multiple businesses, and a large amount of data needs to be monitored and analyzed in real time and alarms of corresponding levels are carried out according to the monitoring results.
In the existing service system monitoring and alarming modes, alarming conditions are mainly set according to user behaviors and service development conditions.
However, the behavior of the user and the service development situation lack fine-grained division processing, for example, the behavior is divided into login, browsing, refreshing, clicking, consumption, reservation and the like, and the service development situation is divided into service processing rate, service volume and the like; secondly, the existing alarm condition is single, so that the user behavior and the service development situation are abnormal, but frequent alarm is carried out when the alarm is not carried out, and an efficient monitoring alarm mode cannot be provided.
Disclosure of Invention
The application provides a method, a device, equipment and a medium for monitoring and alarming a service system, which are used for solving the technical problem that an efficient service system monitoring and alarming mode cannot be provided in the prior art.
In a first aspect, the present application provides a service system monitoring and alarming method, including:
acquiring current monitoring indexes of a service system, wherein the monitoring indexes comprise at least one of user behavior information, service information or system operation performance, the user behavior information comprises duration and/or behavior times of various behaviors of a user aiming at the service system, the service information comprises at least one of transaction duration, transaction failure rate or technology failure rate of various services in the service system, and the system operation performance comprises at least one of memory occupancy rate, network flow consumption or CPU energy consumption;
comparing the index value of the monitoring index with an index system table to judge whether the monitoring index is abnormal, wherein a preset normal threshold value of the monitoring index is recorded in the index system table;
if yes, acquiring an alarm rule corresponding to the monitoring index, and judging whether the index value of the monitoring index accords with the alarm rule;
and if the index value of the monitoring index accords with the alarm rule, outputting alarm information.
In one possible implementation manner, before the obtaining the alarm rule corresponding to the monitoring indicator, the method further includes:
acquiring a configuration file of the monitoring index, wherein the configuration file comprises a preset alarm threshold value of the monitoring index;
and generating an alarm rule corresponding to the monitoring index according to the configuration file, and storing the monitoring index and the alarm rule in an associated mode.
In one possible implementation manner, if the index value of the monitoring index meets the alarm rule, outputting alarm information includes:
when the index value of the monitoring index is larger than a preset alarm threshold value in the alarm rule, generating alarm information, wherein the preset alarm threshold value is larger than the preset normal threshold value;
and alarming according to the alarming information.
In a possible implementation manner, when the index value of the monitoring index is greater than a preset alarm threshold value in the alarm rule, generating alarm information includes:
acquiring a difference value between the index value of the monitoring index and the preset alarm threshold value;
acquiring an alarm level table according to the type of the monitoring index;
acquiring alarm information comprising alarm levels according to the difference value and the alarm level table; the alarm levels are higher as the difference value is larger for the same monitoring index, and the alarm levels corresponding to the difference values of different sections are stored in the alarm level table.
In one possible implementation manner, the alarming according to the alarming information includes:
and acquiring an alarm terminal corresponding to the alarm level from an alarm database according to the alarm level, and pushing the alarm information to the alarm terminal, wherein the alarm terminal corresponding to the alarm level is stored in the alarm database.
In one possible implementation, after the alarm information is pushed to the alarm terminal, the alarm information and the corresponding alarm terminal are stored in the alarm database.
In one possible implementation manner, the obtaining the current monitoring index of the service system includes:
collecting a real-time log of a service system;
synchronizing log data in the real-time log to a Kafka queue;
and consuming the log data in the Kafka queue, and storing the consumed log data into a ClickHouse database according to monitoring indexes in a classified manner, wherein independent storage spaces corresponding to each monitoring index are arranged in the ClickHouse database aiming at different monitoring indexes.
In one possible implementation manner, before comparing the index value of the monitoring index with the index system table to determine whether the monitoring index is abnormal, the method further includes:
and when the service system is used for the first time, outputting prompt information, wherein the prompt information is used for prompting the setting of a preset normal threshold value of the monitoring index.
In a second aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored in the memory to implement the method as described above.
In a third aspect, the present application provides a computer readable storage medium having stored therein computer executable instructions for implementing a method as described above when executed by a processor.
The service system monitoring alarm method, device, equipment and medium provided by the application are used for monitoring and analyzing the user behavior information, service information and system operation performance as monitoring indexes based on a large amount of behavior information generated by daily users in the service system, the development condition of each service in the service system and the system operation performance data, judging whether the index value of the monitoring index accords with the alarm rule or not further based on a preset alarm rule if the index value of the monitoring index is larger than a preset normal threshold value, and carrying out corresponding alarm operation if the index value of the monitoring index accords with the alarm rule so as to realize timely alarm notification of the user, service or system level and better ensure the safety of the service system.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic diagram of a mobile phone interface for receiving alarm information by a mobile phone terminal according to an embodiment of the present application;
FIG. 3 is a flowchart I of a method for monitoring and alarming a service system according to an embodiment of the present application;
FIG. 4 is a second flowchart of a method for monitoring and alarming a service system according to an embodiment of the present application;
fig. 5 is a schematic hardware diagram of an electronic device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.
It should be noted that the service system monitoring and alarming method, device, equipment and medium of the present application can be used in the financial field and other fields, and can also be used in any field other than the financial field.
With the rapid development of the financial industry and the layering of various financial services, people pay more and more attention to data feedback brought by a service system so as to make reasonable response according to real-time data of the service system. Under the conditions of big data, high concurrency, multiple services and the like, the monitoring and alarming modes of the traditional service system cannot meet the diversified demands of users.
The alarm condition is set according to the user behavior and the service development condition, more valuable data information can be screened out, and the monitoring and alarm efficiency of the traditional service system can be improved.
Although the monitoring and alarming capability of the traditional service system is effectively improved, the system lacks fine granularity dividing processing aiming at user behaviors and service development conditions; secondly, the single alarm condition enables the service system to output frequent alarm prompts, so that the alarm prompts which need to be reacted can not be filtered out in time.
The application provides a service system monitoring alarm method capable of carrying out fine granularity division processing on user behavior information and service information, and simultaneously monitoring the running performance of the system; and secondly, judging the abnormality and the alarm condition of the monitoring index, and giving an alarm when the monitoring index is abnormal and accords with the alarm condition, so that alarm information which needs to be reacted in time can be filtered out conveniently.
A specific application scenario of the present application will be described with reference to fig. 1.
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. As shown in fig. 1, the scenario includes a user terminal 101, a server 102, and a service system 103, where the execution body in this embodiment is the server 102.
The user terminal 101 can perform the actions of login, browsing, refreshing, clicking, consumption, reservation and the like on the service system 103, and can also receive the alarm information from the server 102, so that the user can respond in time. For example, a mobile phone is a type of user terminal 101.
Fig. 2 is a schematic diagram of a mobile phone interface of a mobile phone as a user terminal for receiving alarm information according to an embodiment of the present application. As shown in fig. 2, the alarm information sent by the server 102 may include a specific monitoring index, an index value of the monitoring index, a preset normal threshold, a preset alarm threshold, and an alarm level, so that the user terminal 101 can quickly obtain a specific situation of the monitoring index.
The server 102 may acquire a real-time log of the service system 103, or may further acquire a monitoring index in the real-time log, so as to determine whether the index value of the monitoring index is abnormal and accords with the alarm rule, and output alarm information to the user terminal 101 and/or the service system in time when the index value of the monitoring index is abnormal and accords with the alarm rule.
In this embodiment, the service system 103 is configured to provide various services, and when the user terminal 102 operates on the service system 103, the service system 103 generates a real-time log including monitoring indexes.
In summary, the server 102 performs real-time monitoring analysis on the monitoring index by combining a preset index system table and an alarm rule on the basis of performing fine granularity division on the user behavior, the service development condition and the service system performance, and outputs alarm information when the index value of the monitoring index is abnormal and accords with the alarm rule.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 3 is a flowchart of a method for monitoring and alarming a service system according to an embodiment of the present application. As shown in fig. 3, the method includes:
s301, acquiring a current monitoring index of a service system, wherein the monitoring index comprises at least one of user behavior information, service information or system operation performance, the user behavior information comprises duration and/or behavior times of various behaviors of a user aiming at the service system, the service information comprises at least one of transaction duration, transaction failure rate or technical failure rate of various services in the service system, and the system operation performance comprises at least one of memory occupancy rate, network flow consumption or CPU energy consumption.
In the scheme, the data of the user behavior, the service progress and the system performance are used as the monitoring objects of the service system, and the service system is monitored and analyzed in real time, so that the change conditions of the user, various services and the service system in the current state can be obtained in time.
In this embodiment, the specific ranges of the user behavior information, the service information and the system operation performance are generally illustrated, for example, the user behavior information includes time and/or frequency of various behaviors when the user operates on the service system, where the behaviors may further include login, browsing, refreshing, clicking, transferring, consuming and reserving, the service information includes the development condition of various services provided by the housekeeping system, the development condition may further be further expanded into a transaction failure rate, a transaction duration, etc., and the system operation performance includes memory occupancy rate, network traffic consumption, etc., and of course, other similar index data may also be included.
In the specific implementation process, the collected monitoring indexes can be stored into a ClickHouse database by means of Kafka, and a real-time log of a service system is collected by way of example; synchronizing log data in the real-time log to a Kafka queue; and consuming the log data in the Kafka queue, and storing the consumed log data into a ClickHouse database according to the monitoring indexes in a classified manner, wherein independent storage spaces corresponding to each monitoring index are arranged in the ClickHouse database aiming at different monitoring indexes.
ClickHouse is a columnar database management system (DBMS: database Management System) for online analytics (OLAP: online Analytical Processing) that was open in 2016, primarily for online analytics processing queries (OLAP), capable of generating analytics data reports in real-time using SQL queries.
ClickHouse is a complete columnar database management system, allowing tables and databases to be created at runtime, data to be loaded and queries to be run without the need to reconfigure and restart servers, supporting linear expansion, simplicity, convenience, high reliability, and fault tolerance. Therefore, the collected log data can be stored and queried in real time by adopting the ClickHouse database, so that billion data second-level query is realized, and the time for data storage and query is saved.
Kafka is an open source stream processing platform. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all action flow data for consumers in a web site. Such actions (e.g., web browsing, searching, and other user actions) are a key factor in many social functions on modern networks. These data are typically addressed by processing logs and log aggregations due to throughput requirements.
Kafka is a distributed message queue. Kafka classifies messages as they are stored according to Topic (which can be understood as a queue), the sender of the message is called Producer, the receiver of the message is called Consumer, and furthermore the Kafka cluster is made up of a plurality of Kafka instances, each instance (server) being called brooker. Producer: the message producer, the client that sends a message to the kafka brooker. After the log data is collected, the log data is consumed through a Kafka message queue and then stored in a ClickHouse database, and the data collection, consumption and storage in the embodiment are smooth through the association application of the two, so that the subsequent analysis and judgment of monitoring indexes in the log data are facilitated.
In order to facilitate visual observation of the change condition and the current index value of each monitoring index, real-time data statistics and billboard display can be carried out by means of Grafana tools. Grafana is an open source application written in go language, is mainly used for visual display of large-scale index data, and is the most popular time sequence data display tool in network architecture and application analysis.
Grafana supports a number of different data sources, each with a specific query editor whose custom features and functions are the specific data sources disclosed, currently supporting the following data sources: graphite, elastic search, influxDB, prometheus, cloudwatch, mySQL, openTSDB, etc.
S302, comparing an index value of the monitoring index with an index system table, and judging whether the monitoring index is abnormal, wherein a preset normal threshold value of the monitoring index is recorded in the index system table; if yes, acquiring an alarm rule corresponding to the monitoring index, and judging whether the index value of the monitoring index accords with the alarm rule.
In this step, the index value of the monitoring index is compared with the preset normal threshold value, if the index value and the preset normal threshold value are not equal, the monitoring index is in an abnormal state, in order to avoid frequent alarms caused by the alarm when the monitoring index is abnormal, whether the index value of the monitoring index accords with the alarm rule needs to be further judged, namely, the monitoring index is judged twice, and the condition that the index value is abnormal but the alarm is not caused is effectively filtered.
The index system table stores preset normal thresholds of the monitoring indexes, in this embodiment, the monitoring indexes have three categories, different index system tables can be set for different categories, and the source of the preset normal thresholds is not limited, and prompt information can be output when the service system is used for the first time, and the prompt information is used for prompting the preset normal thresholds for setting the monitoring indexes; but may also originate from an inherent configuration of the business system.
For example, when an index system table corresponding to user behavior information, service information or system operation performance is established, basic contents such as query data, an operation method, a hierarchy, a dependency relation and the like required by each monitoring index are configured, and after placeholders and limiting conditions of an SQL (structured query language) instance are flexibly configured according to the index system table, a calculation instance of the monitoring index is generated, and the index system table corresponding to each monitoring index is obtained. Under the internal configuration mode, if the monitoring index is required to be adjusted, an operation and maintenance person of the service system does not need to restart the service system or manually write SQL, the monitoring index to be changed can be directly updated and modified corresponding to the index system table, or the text adjustment requirement is input, the service system carries out word segmentation and disassembly on the required sentence based on the analysis method of machine learning, the index system table is matched for understanding, a new monitoring SQL instance is generated, and after the operation and maintenance person confirms that the generated SQL is correct, the new instance takes effect.
In order to facilitate judging whether the index value of the monitoring index accords with the alarm rule, the alarm rule corresponding to the monitoring index can be generated in advance, and a configuration file of the monitoring index is obtained, wherein the configuration file comprises a preset alarm threshold value of the monitoring index; and generating an alarm rule corresponding to the monitoring index according to the configuration file, and storing the monitoring index and the alarm rule in an associated mode. Illustratively, a business is a severe level 2 rule when its technical failure rate is higher than 50% and a severe level 1 rule when it is higher than 70%.
In the specific implementation process, the rule for monitoring the index abnormality can be configured through the Promethues system platform, the alarm rule is configured through the alert manager tool in the platform, the ClickHouse cluster is monitored in real time by the Promethues platform, and when the index value of the monitored index is abnormal and accords with the alarm rule, the alarm is carried out.
Prometheus is an open-source system monitoring and alarming system, and in a Kubernetes container management system, prometheus is matched for monitoring, meanwhile, a plurality of exporters are supported for collecting data, pushgateway is also supported for reporting data, and the Prometheus performance is enough to support clusters of tens of thousands of scales. Prometaus collects and stores the monitored metrics as time series data, i.e., the metric information is stored with the timestamp that it was recorded on, and as an optional key value pair called a tag.
The Prometaus ecosystem is composed of a number of components, many of which are optional, and a brief description of the Prometaus part of the components is provided below:
1) Prometheus Server: for collecting and storing time series data.
2) Client Library: the client side library detects the application code, and when Prometaus grabs the HTTP endpoint of the instance, the client side library sends the current state of all tracked metrics to the Prometheus server side.
3) Exporters: prometheus supports a variety of exporters through which meta data can be collected and then sent to Prometheus server, all programs that provide monitoring data to the Promtheus server can be referred to as exporters.
4) Alert manager: after receiving the alert from prometheus server, the alert will be sent out, and the packet will be removed and routed to the corresponding receiver, and the common receiving modes are: email, weChat, spike, slot, etc.
Specifically, promethaus provides an alert manager to make a monitoring alarm of a service system based on promql, and when an index value of a monitoring index queried by promql exceeds a defined preset alarm threshold, promethaus sends an alarm message to the alert manager, and the manager issues an alarm to a configured alarm terminal.
5) Grafana: and monitoring the instrument panel, and visualizing the monitoring data. The foregoing mentions that Grafana can support the promethaus system data source, so that the promethaus system can be accessed as a third party visualization tool, which has been briefly described before and will not be described here again.
6) Pushgateway: each target host may report data to pushgateway and then prometheus server pulls data from pushgateway in unison.
S303, if the index value of the monitoring index accords with the alarm rule, outputting alarm information.
In the scheme, once the index value of the monitoring index is determined to be in accordance with the alarm rule on the basis of abnormality, alarm information is immediately output so that relevant personnel can react.
In the embodiment of the application, based on a large amount of behavior data generated by a user in a service system, the development condition of each service in the service system and the system operation performance data, the user behavior information, the service information and the system operation performance are used as monitoring indexes, the monitoring indexes are monitored and analyzed in real time, when the index value of the monitoring indexes is larger than a preset normal threshold value, whether the index value of the monitoring indexes accords with the alarm rule is further judged based on a preset alarm rule, and corresponding alarm operation is carried out when the index value accords with the alarm rule, so that the timely alarm notification of the user, the service or the system level is realized, and the safety of the service system is ensured.
The following describes, with reference to fig. 4 and a specific embodiment, a process of implementing an alarm according to an index value of a monitoring index and an alarm rule in the service system monitoring alarm method of the present application.
Fig. 4 is a flowchart two of a method for detecting disconnection of a home appliance according to an embodiment of the present application. As shown in fig. 3, the method includes:
s401, when the index value of the monitoring index is larger than a preset alarm threshold value in the alarm rule, acquiring a difference value between the index value of the monitoring index and the preset alarm threshold value, wherein the preset alarm threshold value is larger than a preset normal threshold value.
In the above scheme, if the index value of the monitoring index is greater than the preset alarm threshold value in the alarm rule, it is indicated that the index value of the monitoring index is in an abnormal state and accords with the alarm rule at this time, and at this time, the corresponding alarm level and alarm terminal can be identified according to the difference value of the two.
S402, acquiring an alarm level table according to the type of the monitoring index.
In this step, each type of monitoring index has a corresponding alarm level, and all alarm levels related to the type of monitoring index which is currently abnormal and accords with the alarm rule can be obtained from the alarm level table, so that the corresponding alarm level can be determined according to the range of the difference value.
S403, according to the difference value and the alarm level table, alarm information comprising alarm levels is obtained, wherein for the same monitoring index, the larger the difference value is, the higher the alarm level is, and the alarm levels corresponding to the difference values of different sections are stored in the alarm level table.
In the above scheme, the difference values of different sections correspond to different alarm levels, and according to the difference value between the index value of the monitoring index and the preset alarm threshold value and all alarm levels obtained from the alarm level table, the alarm information comprising the alarm levels can be determined.
The alarm information in this embodiment is given by way of example to include alarm levels, which may include severe, general, minor, and alert levels, and may be further subdivided on the basis of this. In the specific application process, the content contained in the alarm information can be set according to the actual requirements of different alarm terminals, for example, the names and the classifications of the monitoring indexes can be simultaneously shown; the specific content of the alarm rule can also be given, so that the alarm terminal can conveniently identify whether the alarm is wrong; the monitoring index can be endowed with a label so as to visually identify the label content by the alarm terminal.
When monitoring the service information, the average time consumption of the transfer service is found to be more than 20s, after the index system table of the service information is queried, the transaction time length of the transfer service is found to exceed the preset normal threshold, which indicates that the monitoring index is abnormal at the moment, after the corresponding alarm rule is further queried, the serious time consumption of the transfer service is obtained, the alarm rule with serious grade 1 is triggered, and the output alarm information can comprise the following contents: monitoring index name-business information; monitoring index classification-business time consumption; monitoring index labels-transfer; alert rules-the average time taken by a transfer service is a severe level 2 rule if it is greater than 15s, and a severe level 1 rule if it is greater than 20 s.
A severe level 2 rule is a case when the technical failure rate of a certain service is higher than 50%, and a severe level 1 rule is a case when the technical failure rate is higher than 70%.
S404, acquiring an alarm terminal corresponding to the alarm level from an alarm database according to the alarm level, and pushing the alarm information to the alarm terminal, wherein the alarm terminal corresponding to the alarm level is stored in the alarm database.
In the step, different alarm terminals corresponding to each alarm level are stored in the alarm database, and the alarm terminal corresponding to the alarm level can be known by inquiring the alarm database. The processing of each alarm rule may be different according to different processing methods, for example, different alarm information, specific alarm information and alarm rules are closely related to the current scene, and finally the alarm information output to the corresponding alarm terminal may be just a prompt, only a message prompt needs to be sent to related personnel, and some alarm levels are serious, so that timely sending and feedback of the message may be required, and telephone prompt may be required. The alarm terminal may be an associated responsible person terminal of the service system, and the associated responsible person may be further classified into a superior, direct and total responsible person.
In order to ensure the integrity of data, after the alarm information is pushed to the alarm terminal, the alarm information and the corresponding alarm terminal are stored in an alarm database.
In the embodiment of the application, the alarm level corresponding to the monitoring index which is abnormal and accords with the alarm rule is determined by acquiring the difference value between the index value of the monitoring index and the preset alarm threshold value in the alarm rule, and the corresponding alarm terminal is queried from the alarm database according to the alarm level, so that targeted and hierarchical alarm is realized.
In summary, according to the service system monitoring and alarming method provided by the embodiment of the application, by monitoring, analyzing and safely alarming various behavior data, service development data and system performance data of a user collected by a service system in real time, risks faced by the service system are conveniently found in time, meanwhile, alarming information comprising an alarming level is generated, and an alarming terminal corresponding to the alarming level is alarmed, so that relevant users can conveniently obtain the alarming information in time, and the effect of rapidly and effectively positioning loss prevention can be achieved.
The present application also provides an electronic device including: a processor, a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored in the memory to implement the business system monitoring and alerting method as described above.
Fig. 5 is a schematic hardware diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic device 50 provided in this embodiment includes: a processor 501 and a memory 502. The electronic device 50 further comprises a communication part 503. In the above-described electronic device, the memory 502, the processor 501, and the communication section 503 are electrically connected directly or indirectly to realize transmission or interaction of data. For example, the elements may be electrically coupled to each other via one or more communication buses or signal lines, such as bus 504. The memory 502 stores computer-executable instructions for implementing the foregoing service system monitoring and alerting methods, including at least one software functional module that may be stored in the memory in the form of software or firmware, and the processor 501 executes various functional applications and data processing by running the software programs and modules stored in the memory 502.
The specific implementation process of the processor 501 may refer to the above-mentioned method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
In the embodiment shown in fig. 5, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The Memory may comprise high-speed Memory (Random Access Memory, RAM) or may further comprise Non-volatile Memory (NVM), such as at least one disk Memory.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus.
In an embodiment of the application, a non-transitory computer readable storage medium is also provided, such as a memory 502, comprising instructions executable by the processor 501 of the electronic device 50 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
A non-transitory computer readable storage medium, which when executed by a processor of a terminal device, causes the terminal device to perform the recommendation method of financial products described above.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (10)

1. A method for monitoring and alarming a service system, comprising the steps of:
acquiring current monitoring indexes of a service system, wherein the monitoring indexes comprise at least one of user behavior information, service information or system operation performance, the user behavior information comprises duration and/or behavior times of various behaviors of a user aiming at the service system, the service information comprises at least one of transaction duration, transaction failure rate or technology failure rate of various services in the service system, and the system operation performance comprises at least one of memory occupancy rate, network flow consumption or CPU energy consumption;
comparing the index value of the monitoring index with an index system table to judge whether the monitoring index is abnormal, wherein a preset normal threshold value of the monitoring index is recorded in the index system table;
if yes, acquiring an alarm rule corresponding to the monitoring index, and judging whether the index value of the monitoring index accords with the alarm rule;
and if the index value of the monitoring index accords with the alarm rule, outputting alarm information.
2. The method of claim 1, wherein prior to the obtaining the alert rule corresponding to the monitoring indicator, the method further comprises:
acquiring a configuration file of the monitoring index, wherein the configuration file comprises a preset alarm threshold value of the monitoring index;
and generating an alarm rule corresponding to the monitoring index according to the configuration file, and storing the monitoring index and the alarm rule in an associated mode.
3. The method according to claim 2, wherein outputting the alarm information if the index value of the monitor index meets the alarm rule comprises:
when the index value of the monitoring index is larger than a preset alarm threshold value in the alarm rule, generating alarm information, wherein the preset alarm threshold value is larger than the preset normal threshold value;
and alarming according to the alarming information.
4. The method of claim 3, wherein generating the alert information when the index value of the monitor index is greater than a preset alert threshold in the alert rule comprises:
acquiring a difference value between the index value of the monitoring index and the preset alarm threshold value;
acquiring an alarm level table according to the type of the monitoring index;
and acquiring alarm information comprising alarm levels according to the difference value and the alarm level table, wherein the alarm levels are higher as the difference value is larger for the same monitoring index, and the alarm levels corresponding to the difference values of different sections are stored in the alarm level table.
5. The method of claim 4, wherein alerting based on the alert information comprises:
and acquiring an alarm terminal corresponding to the alarm level from an alarm database according to the alarm level, and pushing the alarm information to the alarm terminal, wherein the alarm terminal corresponding to the alarm level is stored in the alarm database.
6. The method of claim 5, wherein the method further comprises:
after the alarm information is pushed to the alarm terminal, the alarm information and the corresponding alarm terminal are stored in the alarm database.
7. The method of claim 1, wherein the obtaining the current monitoring indicator of the service system comprises:
collecting a real-time log of a service system;
synchronizing log data in the real-time log to a Kafka queue;
and consuming the log data in the Kafka queue, and storing the consumed log data into a ClickHouse database according to monitoring indexes in a classified manner, wherein independent storage spaces corresponding to each monitoring index are arranged in the ClickHouse database aiming at different monitoring indexes.
8. The method of claim 1, wherein before comparing the index value of the monitor index with the index system table to determine whether the monitor index is abnormal, the method further comprises:
and when the service system is used for the first time, outputting prompt information, wherein the prompt information is used for prompting the setting of a preset normal threshold value of the monitoring index.
9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 8.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 8.
CN202310639999.3A 2023-05-31 2023-05-31 Service system monitoring alarm method, device, equipment and medium Pending CN116795631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310639999.3A CN116795631A (en) 2023-05-31 2023-05-31 Service system monitoring alarm method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310639999.3A CN116795631A (en) 2023-05-31 2023-05-31 Service system monitoring alarm method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116795631A true CN116795631A (en) 2023-09-22

Family

ID=88041312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310639999.3A Pending CN116795631A (en) 2023-05-31 2023-05-31 Service system monitoring alarm method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116795631A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573464A (en) * 2023-10-20 2024-02-20 北京城建智控科技股份有限公司 Monitoring method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573464A (en) * 2023-10-20 2024-02-20 北京城建智控科技股份有限公司 Monitoring method and system

Similar Documents

Publication Publication Date Title
CN110493348B (en) Intelligent monitoring alarm system based on Internet of things
AU2016261088B2 (en) Social media events detection and verification
Liu et al. Reuters tracer: A large scale system of detecting & verifying real-time news events from twitter
CN109257200B (en) Method and device for monitoring big data platform
Guzman et al. On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model
CN111339175B (en) Data processing method, device, electronic equipment and readable storage medium
JP2022118108A (en) Log auditing method, device, electronic apparatus, medium and computer program
US9922116B2 (en) Managing big data for services
CN110912757B (en) Service monitoring method and server
Klein et al. Detection and extracting of emergency knowledge from twitter streams
CN112328425A (en) Anomaly detection method and system based on machine learning
CN111581056B (en) Software engineering database maintenance and early warning system based on artificial intelligence
CN112306700A (en) Abnormal RPC request diagnosis method and device
CN116795631A (en) Service system monitoring alarm method, device, equipment and medium
CN114461792A (en) Alarm event correlation method, device, electronic equipment, medium and program product
CN111258798A (en) Fault positioning method and device for monitoring data, computer equipment and storage medium
CN115118574B (en) Data processing method, device and storage medium
CN115328733A (en) Alarm method and device applied to business system, electronic equipment and storage medium
CN112182025A (en) Log analysis method, device, equipment and computer readable storage medium
CN114443437A (en) Alarm root cause output method, apparatus, device, medium, and program product
CN110677271B (en) Big data alarm method, device, equipment and storage medium based on ELK
CN115514618A (en) Alarm event processing method and device, electronic equipment and medium
CN115408236A (en) Log data auditing system, method, equipment and medium
Kuang et al. Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach
CN114756301A (en) Log processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination