CN113568822B - Service resource monitoring method, device, computing equipment and storage medium - Google Patents

Service resource monitoring method, device, computing equipment and storage medium Download PDF

Info

Publication number
CN113568822B
CN113568822B CN202110887054.4A CN202110887054A CN113568822B CN 113568822 B CN113568822 B CN 113568822B CN 202110887054 A CN202110887054 A CN 202110887054A CN 113568822 B CN113568822 B CN 113568822B
Authority
CN
China
Prior art keywords
monitoring
service resource
monitoring period
period
occupation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110887054.4A
Other languages
Chinese (zh)
Other versions
CN113568822A (en
Inventor
梁冰
宋成伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Antiy Technology Group Co Ltd
Original Assignee
Antiy Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Antiy Technology Group Co Ltd filed Critical Antiy Technology Group Co Ltd
Priority to CN202110887054.4A priority Critical patent/CN113568822B/en
Publication of CN113568822A publication Critical patent/CN113568822A/en
Application granted granted Critical
Publication of CN113568822B publication Critical patent/CN113568822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3041Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is an input/output interface
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a service resource monitoring method, a device, a computing device and a storage medium, wherein the method comprises the following steps: updating the monitoring threshold value by using a preset updating logic; wherein, the change of the monitoring threshold value is positively correlated with the change of the service resource; and in each monitoring period, monitoring the service resources by using the updated monitoring threshold value. According to the scheme, the probability of false alarm can be reduced.

Description

Service resource monitoring method, device, computing equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a business resource monitoring method, a business resource monitoring device, computing equipment and a storage medium.
Background
With the development of internet technology, there is an increasing number of electric businesses. In order to ensure the normal operation of the service, the monitoring service is required to monitor the resource occupation condition of the system. For example, the service occupies information of the CPU, memory occupied information, etc. of the system.
At present, in the existing service resource monitoring mode, a fixed threshold is set for the resource occupation information, and when the current resource occupation information of the service to the system exceeds the threshold, an alarm is given.
Disclosure of Invention
Based on the problem of high probability of false alarm in the process of monitoring service resources in the prior art, the embodiment of the invention provides a service resource monitoring method, a device, a computing device and a storage medium, which can reduce the probability of false alarm.
In a first aspect, an embodiment of the present invention provides a service resource monitoring method, including:
updating the monitoring threshold value by using a preset updating logic; wherein, the change of the monitoring threshold value is positively correlated with the change of the service resource;
and in each monitoring period, monitoring the service resources by using the updated monitoring threshold value.
Preferably, the updating the monitoring threshold value by using preset updating logic includes:
acquiring service resource occupation information of a current monitoring period;
acquiring service resource occupation information of at least one history monitoring period;
predicting service resource occupation information in a next monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period;
and updating the monitoring threshold according to the predicted service resource occupation information in the next monitoring period.
Preferably, the service resource occupation information includes: at least one of CPU occupancy rate, memory occupancy amount, disk IO read rate and disk IO write rate;
the obtaining the service resource occupation information of the current monitoring period comprises the following steps:
aiming at each target service resource, acquiring field information corresponding to the target service resource in the system file according to a preset acquisition period;
and after the current monitoring period is acquired, calculating service resource occupation information of the target service resource in the current monitoring period by utilizing the field information acquired in each acquisition period.
Preferably, the predicting the service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period includes:
calculating the average growth rate of each monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period;
and calculating service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the average growth rate.
Preferably, the updating the monitoring threshold according to the predicted traffic resource occupation information in the next monitoring period includes:
and determining a threshold range based on the preset monitoring precision and the predicted service resource occupation information in the next monitoring period, and determining the threshold range as the updated monitoring threshold.
Preferably, before the updating of the monitoring threshold value by using the preset updating logic, the method further includes:
dividing a monitoring period into at least two monitoring intervals based on operation characteristics of the service; the monitoring threshold comprises at least two; and the at least two monitoring thresholds are in one-to-one correspondence with the at least two monitoring intervals.
Preferably, the monitoring the service resource by using the updated monitoring threshold value includes:
acquiring current service resource occupation information;
judging whether the current business resource occupation information is abnormal or not by utilizing the updated monitoring threshold value; if abnormal, alarming is carried out.
In a second aspect, an embodiment of the present invention further provides a service resource monitoring device, including:
the threshold updating unit is used for updating the monitoring threshold by using preset updating logic; wherein, the change of the monitoring threshold value is positively correlated with the change of the service resource;
and the service resource monitoring unit is used for monitoring the service resources by using the updated monitoring threshold value in each monitoring period.
In a third aspect, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores a computer program, and the processor implements a method according to any embodiment of the present specification when executing the computer program.
In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a method according to any of the embodiments of the present specification.
The embodiment of the invention provides a service resource monitoring method, a device, a computing device and a storage medium, wherein the monitoring threshold is continuously updated, so that when the service resource is monitored in each monitoring period, the used monitoring threshold is not completely the same, the change of the monitoring threshold is positively correlated with the change of the service resource, the service resource can be monitored more accurately in the monitoring process, and the error alarm probability is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for monitoring service resources according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for updating a monitoring threshold according to an embodiment of the present invention;
FIG. 3 is a hardware architecture diagram of a computing device according to one embodiment of the present invention;
FIG. 4 is a block diagram of a service resource monitoring device according to an embodiment of the present invention;
fig. 5 is a block diagram of another service resource monitoring device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.
As described above, by setting a fixed threshold for the resource occupancy information, when it is monitored that the current resource occupancy information of the service to the system exceeds the threshold, an alarm is given. However, in actual situations, the resources required for the operation of the service are constantly changing. For example, as the number of users increases and the service types expand, the service resources occupied by the e-commerce service may increase gradually, and if the service resources occupied by the e-commerce service is monitored by using a fixed threshold, an error alarm may be caused. Based on the problem, considering that the resources required by the service operation are continuously changed, the service resources can be monitored by using the changed monitoring threshold value, so that the change of the monitoring threshold value is positively correlated with the change of the service resources, thereby improving the monitoring accuracy and reducing the probability of false alarm.
Specific implementations of the above concepts are described below.
Referring to fig. 1, an embodiment of the present invention provides a method for monitoring service resources, including:
step 100, updating the monitoring threshold value by using a preset updating logic; wherein the change of the monitoring threshold is positively correlated with the change of the traffic resource.
And 102, monitoring the service resources by using the updated monitoring threshold value in each monitoring period.
In the embodiment of the invention, the monitoring threshold value is continuously updated, so that the used monitoring threshold value is not completely the same when the service resource is monitored in each monitoring period, and the change of the monitoring threshold value is positively correlated with the change of the service resource, thereby ensuring more accurate monitoring of the service resource in the monitoring process and reducing the probability of false alarm.
The manner in which the individual steps shown in fig. 1 are performed is described below.
Firstly, aiming at step 100, updating a monitoring threshold value by using preset updating logic; wherein the change of the monitoring threshold is positively correlated with the change of the service resource.
In one embodiment of the present invention, the updating of the monitoring threshold may be performed once every monitoring period. Wherein the monitoring period may be one day, one week, one month, etc. In this embodiment, one day can be described as an example of one monitoring period.
In one embodiment of the present invention, in order to implement monitoring of service resources, the monitoring threshold may be a value or a threshold range. When the monitoring threshold is a value, whether the service resource is normal or not can be determined according to the service resource and the monitoring threshold. When the monitoring threshold is within a threshold range, whether the service resource is normal or not can be judged according to whether the service resource is within the threshold range or not.
Considering the operation characteristics of the service, the service resource occupation conditions are different in one monitoring period. E.g. e-commerce traffic, is divided into a busy state and an idle state during the day, and thus, in one embodiment of the present invention, before this step 100, it may include: dividing a monitoring period into at least two monitoring intervals based on operation characteristics of the service; the monitoring threshold comprises at least two; and the at least two monitoring thresholds are in one-to-one correspondence with the at least two monitoring intervals.
The operation characteristics of the service may include, but are not limited to: user access volume, service pressure.
For example, according to the number of user accesses, the monitoring period may be divided into three monitoring intervals: 0-8 points, 8-12 points and 12-24 points. Because the service resource occupation conditions of different monitoring intervals are different, each monitoring interval corresponds to one monitoring threshold value, and when the service resource is monitored in one monitoring interval of the current monitoring period, the service resource is required to be monitored by utilizing the monitoring threshold value corresponding to the monitoring interval. The monitoring granularity of the monitoring period is refined, so that the monitoring accuracy can be further improved, and the false alarm probability is reduced.
In one embodiment of the present invention, please refer to fig. 2, this step 100 may be implemented at least in one of the following ways:
and 200, acquiring service resource occupation information of the current monitoring period.
In one embodiment of the present invention, the service resource occupancy information may include, but is not limited to: at least one of CPU occupancy rate, memory occupancy amount, disk IO read rate and disk IO write rate.
In one embodiment of the present invention, the monitoring period may correspond to two cases:
case one: the monitoring period is not divided into monitoring intervals.
And a second case: the monitoring period is divided into several monitoring intervals.
The following describes the method for acquiring the service resource occupation information for the two cases.
In the first case, the monitoring period is not divided into monitoring intervals, and then the service resource occupation information in the whole monitoring period needs to be acquired.
This step 200 may include: aiming at each target service resource, acquiring field information corresponding to the target service resource in the system file according to a preset acquisition period; and after the current monitoring period is acquired, calculating service resource occupation information of the target service resource in the current monitoring period by utilizing the field information acquired in each acquisition period.
When the operation system is a Linux system, the service resource can be obtained by reading the corresponding field information in the system file in the Linux system. The service resource occupation information is acquired by reading the system file, so that the acquisition is convenient, and the acquired service resource occupation information is more accurate.
The following describes each target service resource of the four service resources.
Firstly, when the target service resource is the CPU occupancy rate, specifically:
in order to improve the accuracy of the acquired service resource occupancy information, the acquisition period may be preset, for example, 3 seconds, that is, acquired every 3 seconds.
When acquiring service resource occupation information, acquiring the serial number of a service process corresponding to the service; for example, the service is composed of two industriesThe business process is implemented with the numbers of pid1 and pid2 respectively. For each business process, a "/proc/+pid+/stat" file in the Linux system is respectively acquired, and an utime field (the time of running the process in a user state is shown as jiffies), a stime field (the time of running the process in a core state is shown as jiffies), a cutime field (the accumulated time of running all the waited-for processes of the task once in the user state is shown as jiffies) and a cstime field (the accumulated time of running all the waited-for processes of the task once in the core state is shown as jiffies) in the file are read. Determining the sum of the values of the four fields as the CPU usage of the business process in the current acquisition period, and recording as a 1 . Meanwhile, a user field (CPU time in a user state), a nice field (CPU time in a user state occupied by a low priority program), a system field (CPU time in a system state) and an idle field (CPU idle time) in a proc/stat file are read, the sum of the values of the four fields is determined as the total usage of the current system CPU, and the total usage is recorded as A 1 . After one acquisition period, the CPU usage and the total CPU usage of the next acquisition period are obtained in the same way, and are marked as a 2 、A 2 . At this time, the CPU occupancy rate of the service process in one acquisition period can be calculated by using the following formula (1):
P 1 =(a 2 -a 1 )/(A 2 -A 1 ) (1)
in the current monitoring period, n times of statistics can be performed on the CPU occupancy rate of a certain service process, and then the CPU occupancy rate P of the service process in the current monitoring period can be calculated by using the following formula (2):
P=(P 1 +P 2 +P 3 +...+P n )/n (2)
and determining the sum of the CPU occupancy rate of each business process corresponding to the business as the CPU occupancy rate of the business in the current monitoring period.
Secondly, when the target service resource is the memory occupation amount, specifically:
similarly, assuming that the acquisition period is 3 seconds, the service corresponding to the service advancesThe runs are numbered pid1 and pid2, respectively. For each business process, respectively acquiring a "/proc/+pid+/status" file in a Linux system, reading a vmRSS field in the file, determining the vmRSS field value as the memory occupation amount of the business process, and recording as M 1 . Collecting every 3 seconds to obtain n memory occupation amounts, and calculating the memory occupation amount of the service process in the current monitoring period by using the following formula:
M=(M 1 +M 2 +M 3 +...+M n )/n (3)
and determining the sum of the memory occupation quantity of each business process corresponding to the business as the memory occupation quantity of the business in the current monitoring period.
Thirdly, when the target service resource is the disk IO read rate or the disk IO write rate, specifically:
similarly, assuming that the acquisition period is 3 seconds, the numbers of the business processes corresponding to the business are pid1 and pid2, respectively. For each business process, respectively acquiring a "/proc/+pid+/io" file in a Linux system, reading a read_bytes field and a write_bytes field in the file, determining the numerical value of the read_bytes field as the read total bytes and the written total bytes of the business process, and respectively recording as R 1 、W 1 . After an interval of 3 seconds, the read total bytes, the written total bytes of the next acquisition cycle, denoted as R, are obtained in the same manner 2 、W 2 . At this time, the average disk IO read rate R of the service process in one acquisition period can be calculated by using the following formula 01 Average disk IO write Rate W 01
R 01 =(R 2 -R 1 )/T (4)
W 01 =(W 2 -W 1 )/T (5)
Where T is used to characterize the acquisition period.
In the current monitoring period, the average disk IO read rate and the average disk IO write rate of a certain service process can be counted for n times, and the disk IO read rate R and the disk IO write rate W of the service process in the current monitoring period can be calculated by using the following formula:
R=(R 01 +R 02 +R 03 +...+R 0n )/n (6)
W=(W 01 +W 02 +W 03 +...+W 0n )/n (7)
and determining the sum of the disk IO read rates R of each service process corresponding to the service as the disk IO read rate of the service in the current monitoring period, and determining the sum of the disk IO write rates W of each service process corresponding to the service as the disk IO write rate of the service in the current monitoring period.
The above completes the acquisition of the service resource occupation information of each current monitoring period in the case one.
In the second case, the monitoring period is divided into a plurality of monitoring intervals, and then corresponding service resource occupation information needs to be acquired for each monitoring interval in the current monitoring period.
In one embodiment of the present invention, taking a monitoring interval as an example, for example, 8-12 points, the step 200 may include: aiming at each target service resource, acquiring field information corresponding to the target service resource in the system file according to a preset acquisition period; and after the current monitoring interval of the current monitoring period is acquired, calculating service resource occupation information of the target service resource in the current monitoring interval by utilizing the field information acquired in each acquisition period. After the current monitoring period is collected, service resource occupation information of the target service resource in each monitoring interval can be obtained.
The method for calculating the service resource occupation information of the target service resource in the current monitoring interval is the same as the method for calculating the service resource occupation information of the target service resource in the current monitoring period in the case, and is not described in detail herein.
It should be noted that, the collected data and the calculated data of the service resource occupation information of the four service resources may be recorded in the log, so that when the service resource occupation is abnormal, the occupation condition of each service resource can be traced according to the log, so that the operation and maintenance personnel can quickly locate the root cause of the abnormal service resource occupation.
Step 202, obtaining service resource occupation information of at least one history monitoring period.
In one embodiment of the present invention, at least one monitoring period is spaced from the current monitoring period by at least one monitoring period. Taking the at least one history monitoring period as an example, monitoring is performed on the service resource to obtain service resource occupation information of each monitoring period, wherein each monitoring period can be 1 st day, 2 nd day, 3 rd day, … … th day, n-1 th day and n th day, wherein n is the current monitoring period, n is an integer not less than 3, and if one history monitoring period is obtained, the history monitoring period can be i th day, i is not greater than (n-2).
In one embodiment of the present invention, the at least one historical monitoring period is a monitoring period adjacent to the current monitoring period, and the number of the at least one historical monitoring periods obtained is equal to the set number. If the service resource is in the condition of increasing or decreasing, the more the number of the obtained historical monitoring periods is, the higher the prediction accuracy of the service resource occupation information of the next monitoring period is; if the service resources are in the condition of increasing and decreasing continuously, in order to improve the prediction accuracy, the closer the acquired historical monitoring period is to the current monitoring period, and when the number of the acquired at least one historical monitoring period is equal to the set number, the higher the prediction accuracy of the service resource occupation information of the next monitoring period of the current monitoring period is. For example, the set number is 10, and the obtained history monitoring period may be n-1 day, n-2 days, … …, n-10 days.
Step 204, predicting the service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period.
In predicting the traffic resource occupancy information in the next monitoring period, specifically, this step 204 may be predicted at least by one of the following ways:
firstly, calculating the average growth rate of each monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of at least one historical monitoring period.
For example, the at least one history monitoring period includes 1 st, 2 nd, 3 rd, … … and n-1 st days, and the corresponding service resource occupancy information is d 1 、d 2 、d 3 、……、d n-1 The service resource occupation information of the current monitoring period is d n . The average growth rate may be calculated δ using the following formula:
wherein d j And j is a positive integer not greater than n-2, and n is an integer greater than or equal to 3.
To further improve the accuracy of the calculation of the average growth rate, the average growth rate may also be calculated by the following formula:
wherein i is an integer of 1 or more, and n is an integer of 3 or more.
And then, calculating the service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the average growth rate.
In one embodiment of the present invention, the following formula may be used to calculate the traffic resource occupancy information d in the next monitoring period n+1
d n+1 =d n ·(1+δ) (10)
And step 206, updating the monitoring threshold according to the predicted service resource occupation information in the next monitoring period.
If the monitoring threshold is a value, the predicted traffic resource occupation information d in the next monitoring period can be used for n+1 A monitoring threshold is determined for the next monitoring period.
If the monitoring threshold is a threshold range, the threshold range can be determined by using the preset monitoring precision and the predicted service resource occupation information. Specifically, this step 206 may include: and determining a threshold range based on the preset monitoring precision and the predicted service resource occupation information in the next monitoring period, and determining the threshold range as the updated monitoring threshold. For example, the monitoring accuracy is 10%, and the threshold value range may be [ (d) n+1 -10%d n+1 ),(d n+1 +10%d n+1 )]。
In this embodiment, by setting the monitoring precision, the predicted service resource occupation information of the next monitoring period is adjusted to obtain a threshold range, so that the service resource is monitored by using the threshold range, and the floating of the service resource occupation information in the actual running process is considered, so that the probability of false alarm in the monitoring process can be further reduced.
In one embodiment of the present invention, if the monitoring period in step 200 corresponds to the second case, then for each current monitoring interval, the service resource occupation information of the current monitoring interval in the next monitoring period needs to be predicted. At this time, the updated monitoring thresholds corresponding to the monitoring intervals in the next monitoring period can be obtained, and the monitoring thresholds corresponding to the different monitoring intervals are not identical.
Then, for step 102, in each monitoring period, the service resource is monitored using the updated monitoring threshold.
For example, when monitoring the service resource on the n+1th day, in the n+1th day, the monitoring threshold corresponding to each monitoring interval is: the monitoring threshold value 1 corresponds to the monitoring interval between 0 and 8 points, the monitoring threshold value 2 corresponds to the monitoring interval between 8 and 12 points, and the monitoring threshold value 3 corresponds to the monitoring interval between 12 and 24 points; then monitoring the service resources by using a monitoring threshold value 1 between 0 and 8 points in the monitoring interval in the n+1th day; monitoring the service resources by using a monitoring threshold 2 between 8-12 points in a monitoring interval; and monitoring the service resources by using a monitoring threshold value 3 between 12 and 24 points in the monitoring interval.
In one embodiment of the present invention, the step 102 may specifically include: acquiring current service resource occupation information; judging whether the current business resource occupation information is abnormal or not by utilizing the updated monitoring threshold value; if abnormal, alarming is carried out.
As shown in fig. 3 and fig. 4, the embodiment of the invention provides a service resource monitoring device. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. In terms of hardware, as shown in fig. 3, a hardware architecture diagram of a computing device where a service resource monitoring apparatus provided in an embodiment of the present invention is located, where in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 3, the computing device where the embodiment is located may generally include other hardware, such as a forwarding chip responsible for processing a packet, and so on. Taking a software implementation as an example, as shown in fig. 4, as a device in a logic sense, the device is formed by reading a corresponding computer program in a nonvolatile memory into a memory by a CPU of a computing device where the device is located. The service resource monitoring device provided in this embodiment includes:
a threshold updating unit 401, configured to update the monitoring threshold by using a preset updating logic; wherein, the change of the monitoring threshold value is positively correlated with the change of the service resource;
and the service resource monitoring unit 402 is configured to monitor the service resource in each monitoring period by using the updated monitoring threshold.
In one embodiment of the present invention, the threshold updating unit 401 is specifically configured to obtain service resource occupation information of a current monitoring period; acquiring service resource occupation information of at least one history monitoring period; predicting service resource occupation information in a next monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period; and updating the monitoring threshold according to the predicted service resource occupation information in the next monitoring period.
In one embodiment of the present invention, the service resource occupancy information includes: at least one of CPU occupancy rate, memory occupancy amount, disk IO read rate and disk IO write rate;
the threshold updating unit 401 is specifically configured to, when acquiring service resource occupation information of a current monitoring period, collect, for each target service resource, field information corresponding to the target service resource in the system file according to a preset collection period; and after the current monitoring period is acquired, calculating service resource occupation information of the target service resource in the current monitoring period by utilizing the field information acquired in each acquisition period.
In one embodiment of the present invention, the threshold updating unit 401 is specifically configured to calculate, when predicting the service resource occupancy information in the next monitoring period according to the service resource occupancy information in the current monitoring period and the service resource occupancy information in the at least one history monitoring period, an average growth rate of each monitoring period according to the service resource occupancy information in the current monitoring period and the service resource occupancy information in the at least one history monitoring period; and calculating service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the average growth rate.
In one embodiment of the present invention, when the monitoring threshold is updated according to the predicted traffic resource occupancy information in the next monitoring period, the threshold updating unit 401 is specifically configured to determine a threshold range based on a preset monitoring precision and the predicted traffic resource occupancy information in the next monitoring period, and determine the threshold range as the updated monitoring threshold.
In one embodiment of the present invention, referring to fig. 5, the service resource monitoring device may further include:
an interval dividing unit 403, configured to divide the monitoring period into at least two monitoring intervals based on the operation characteristics of the service; the monitoring threshold comprises at least two; and the at least two monitoring thresholds are in one-to-one correspondence with the at least two monitoring intervals.
In one embodiment of the present invention, the service resource monitoring unit 402 is specifically configured to obtain current service resource occupation information; judging whether the current business resource occupation information is abnormal or not by utilizing the updated monitoring threshold value; if abnormal, alarming is carried out.
It will be appreciated that the structure illustrated in the embodiments of the present invention does not constitute a specific limitation on a service resource monitoring device. In other embodiments of the invention, a business resource monitoring device may include more or fewer components than shown, or may combine certain components, or may split certain components, or may have a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The content of information interaction and execution process between the modules in the device is based on the same conception as the embodiment of the method of the present invention, and specific content can be referred to the description in the embodiment of the method of the present invention, which is not repeated here.
The embodiment of the invention also provides a computing device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the business resource monitoring method in any embodiment of the invention when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor is caused to execute the service resource monitoring method in any embodiment of the invention.
Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.
In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present invention.
Examples of the storage medium for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer by a communication network.
Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.
Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion module connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion module is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.
The embodiments of the invention have at least the following beneficial effects:
1. in one embodiment of the invention, the monitoring threshold is updated continuously, so that the monitoring threshold used is not completely the same when the service resource is monitored in each monitoring period, and the change of the monitoring threshold is positively correlated with the change of the service resource, thereby ensuring more accurate monitoring of the service resource in the monitoring process and reducing the probability of false alarm.
2. In one embodiment of the invention, one monitoring period is divided into a plurality of monitoring intervals by utilizing the operation characteristics, so that different monitoring intervals use different monitoring thresholds to monitor service resources, thereby ensuring that the monitoring granularity is more refined, further improving the monitoring accuracy and reducing the probability of false alarm.
3. In one embodiment of the present invention, when the operating system is a Linux system, the service resource may be obtained by reading corresponding field information in a system file in the Linux system. The service resource occupation information is acquired by reading the system file, so that the acquisition is convenient, and the acquired service resource occupation information is more accurate.
4. In one embodiment of the invention, in the process of monitoring the service resources, the acquired data and the calculated data aiming at the service resource occupation information can be recorded in the log, so that when the service resource occupation is abnormal, the occupation condition of each service resource can be traced according to the log, and an operation and maintenance personnel can rapidly locate the source of the abnormal service resource occupation.
5. In one embodiment of the invention, the average growth rate of each monitoring period is calculated, and the next monitoring period is predicted by utilizing the average growth rate and the service resource occupation information in the current monitoring period, so that the prediction accuracy is higher.
6. In one embodiment of the invention, the monitoring precision is set, the predicted service resource occupation information of the next monitoring period is adjusted to obtain a threshold range, so that the service resource is monitored by using the threshold range, and the probability of false alarm in the monitoring process can be further reduced due to the fact that the floating of the service resource occupation information in the actual operation process is considered.
It is noted that relational terms such as first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one …" does not exclude the presence of additional identical elements in a process, method, article or apparatus that comprises the element.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: various media in which program code may be stored, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for monitoring service resources, comprising:
acquiring service resource occupation information of a current monitoring period; acquiring service resource occupation information of at least one history monitoring period; calculating the average growth rate of each monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period; calculating service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the average growth rate; determining a threshold range based on preset monitoring precision and service resource occupation information in the predicted next monitoring period, and determining the threshold range as an updated monitoring threshold; wherein, the change of the monitoring threshold value is positively correlated with the change of the service resource;
and in each monitoring period, monitoring the service resources by using the updated monitoring threshold value.
2. The method of claim 1, wherein the traffic resource occupancy information comprises: at least one of CPU occupancy rate, memory occupancy amount, disk IO read rate and disk IO write rate;
the obtaining the service resource occupation information of the current monitoring period comprises the following steps:
aiming at each target service resource, acquiring field information corresponding to the target service resource in the system file according to a preset acquisition period;
and after the current monitoring period is acquired, calculating service resource occupation information of the target service resource in the current monitoring period by utilizing the field information acquired in each acquisition period.
3. The method according to any one of claims 1-2, further comprising, prior to said updating the monitoring threshold with a preset update logic:
dividing a monitoring period into at least two monitoring intervals based on operation characteristics of the service; the monitoring threshold comprises at least two; and the at least two monitoring thresholds are in one-to-one correspondence with the at least two monitoring intervals.
4. The method according to any one of claims 1-2, wherein the monitoring of traffic resources using the updated monitoring threshold comprises:
acquiring current service resource occupation information;
judging whether the current business resource occupation information is abnormal or not by utilizing the updated monitoring threshold value; if abnormal, alarming is carried out.
5. A traffic resource monitoring device, comprising:
the threshold updating unit is used for acquiring service resource occupation information of the current monitoring period; acquiring service resource occupation information of at least one history monitoring period; calculating the average growth rate of each monitoring period according to the service resource occupation information of the current monitoring period and the service resource occupation information of the at least one historical monitoring period; calculating service resource occupation information in the next monitoring period according to the service resource occupation information of the current monitoring period and the average growth rate; determining a threshold range based on preset monitoring precision and service resource occupation information in the predicted next monitoring period, and determining the threshold range as an updated monitoring threshold; wherein, the change of the monitoring threshold value is positively correlated with the change of the service resource;
and the service resource monitoring unit is used for monitoring the service resources by using the updated monitoring threshold value in each monitoring period.
6. The apparatus of claim 5, wherein the traffic resource occupancy information comprises: at least one of CPU occupancy rate, memory occupancy amount, disk IO read rate and disk IO write rate;
the threshold updating unit is specifically configured to acquire, for each target service resource, field information corresponding to the target service resource in the system file according to a preset acquisition period when acquiring service resource occupation information of a current monitoring period; and after the current monitoring period is acquired, calculating service resource occupation information of the target service resource in the current monitoring period by utilizing the field information acquired in each acquisition period.
7. The apparatus according to any one of claims 5-6, further comprising:
the interval dividing unit is used for dividing the monitoring period into at least two monitoring intervals based on the operation characteristics of the service; the monitoring threshold comprises at least two; and the at least two monitoring thresholds are in one-to-one correspondence with the at least two monitoring intervals.
8. The apparatus according to any one of claims 5-6, wherein the service resource monitoring unit is specifically configured to obtain current service resource occupancy information; judging whether the current business resource occupation information is abnormal or not by utilizing the updated monitoring threshold value; if abnormal, alarming is carried out.
9. A computing device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the method of any of claims 1-4 when the computer program is executed.
10. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-4.
CN202110887054.4A 2021-08-03 2021-08-03 Service resource monitoring method, device, computing equipment and storage medium Active CN113568822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110887054.4A CN113568822B (en) 2021-08-03 2021-08-03 Service resource monitoring method, device, computing equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110887054.4A CN113568822B (en) 2021-08-03 2021-08-03 Service resource monitoring method, device, computing equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113568822A CN113568822A (en) 2021-10-29
CN113568822B true CN113568822B (en) 2023-09-05

Family

ID=78170098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110887054.4A Active CN113568822B (en) 2021-08-03 2021-08-03 Service resource monitoring method, device, computing equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113568822B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820630A (en) * 2015-05-22 2015-08-05 上海新炬网络信息技术有限公司 System resource monitoring device based on business variable quantity
CN106713029A (en) * 2016-12-20 2017-05-24 ***股份有限公司 Method and apparatus for determining resource monitoring thresholds
US10146612B1 (en) * 2015-06-08 2018-12-04 Sprint Communications Company L.P. Historical disk error monitoring
CN109800131A (en) * 2018-12-18 2019-05-24 平安健康保险股份有限公司 Monitor processing method, device, computer equipment and the storage medium of Linux server
CN110704284A (en) * 2019-09-27 2020-01-17 高新兴科技集团股份有限公司 Alarm processing method and system in video monitoring scene and electronic equipment
CN110971444A (en) * 2019-10-09 2020-04-07 中移(杭州)信息技术有限公司 Alarm management method, device, server and storage medium
CN112346924A (en) * 2020-09-21 2021-02-09 西安交大捷普网络科技有限公司 Server monitoring method and system
CN112699007A (en) * 2021-01-04 2021-04-23 网宿科技股份有限公司 Method, system, network device and storage medium for monitoring machine performance

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9400682B2 (en) * 2012-12-06 2016-07-26 Hewlett Packard Enterprise Development Lp Ranking and scheduling of monitoring tasks
JP6891611B2 (en) * 2017-04-17 2021-06-18 富士通株式会社 Management device, information processing system control method, and management device management program
US10514951B2 (en) * 2017-05-04 2019-12-24 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a stateless, deterministic scheduler and work discovery system with interruption recovery

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820630A (en) * 2015-05-22 2015-08-05 上海新炬网络信息技术有限公司 System resource monitoring device based on business variable quantity
US10146612B1 (en) * 2015-06-08 2018-12-04 Sprint Communications Company L.P. Historical disk error monitoring
CN106713029A (en) * 2016-12-20 2017-05-24 ***股份有限公司 Method and apparatus for determining resource monitoring thresholds
CN109800131A (en) * 2018-12-18 2019-05-24 平安健康保险股份有限公司 Monitor processing method, device, computer equipment and the storage medium of Linux server
CN110704284A (en) * 2019-09-27 2020-01-17 高新兴科技集团股份有限公司 Alarm processing method and system in video monitoring scene and electronic equipment
CN110971444A (en) * 2019-10-09 2020-04-07 中移(杭州)信息技术有限公司 Alarm management method, device, server and storage medium
CN112346924A (en) * 2020-09-21 2021-02-09 西安交大捷普网络科技有限公司 Server monitoring method and system
CN112699007A (en) * 2021-01-04 2021-04-23 网宿科技股份有限公司 Method, system, network device and storage medium for monitoring machine performance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
机器学习赋能的软件自适应性综述;张明悦;金芝;赵海燕;罗懿行;;软件学报(08);126-153 *

Also Published As

Publication number Publication date
CN113568822A (en) 2021-10-29

Similar Documents

Publication Publication Date Title
TWI738721B (en) Task scheduling method and device
US8578023B2 (en) Computer resource utilization modeling for multiple workloads
US20130158950A1 (en) Application performance analysis that is adaptive to business activity patterns
CN106202280B (en) Information processing method and server
US20180039895A1 (en) Data predicting method and apparatus
US9292336B1 (en) Systems and methods providing optimization data
CN110287229B (en) Data statistical processing method and device
CN112988550B (en) Server failure prediction method, device and computer readable medium
CN111045881A (en) Slow disk detection method and system
CN108243032B (en) Method, device and equipment for acquiring service level information
CN113590429B (en) Server fault diagnosis method and device and electronic equipment
CN111104342A (en) Method, electronic device and computer program product for storage
CN114444827B (en) Cluster performance evaluation method and device
CN113986595A (en) Abnormity positioning method and device
CN111737555A (en) Method and device for selecting hot keywords and storage medium
CN111858108A (en) Hard disk fault prediction method and device, electronic equipment and storage medium
CN113568822B (en) Service resource monitoring method, device, computing equipment and storage medium
CN111800807A (en) Method and device for alarming number of base station users
CN110855484B (en) Method, system, electronic device and storage medium for automatically detecting traffic change
CN117453480A (en) Early warning method, device, equipment and storage medium for monitoring data
CN110955587A (en) Method and device for determining equipment to be replaced
CN110020744A (en) Dynamic prediction method and its system
CN112882854B (en) Method and device for processing request exception
CN115269289A (en) Slow disk detection method and device, electronic equipment and storage medium
CN110991945B (en) Information processing method and device for equipment spare part, storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant