CN117591382A - Intelligent monitoring method, device, equipment and medium for server faults - Google Patents

Intelligent monitoring method, device, equipment and medium for server faults Download PDF

Info

Publication number
CN117591382A
CN117591382A CN202410076660.1A CN202410076660A CN117591382A CN 117591382 A CN117591382 A CN 117591382A CN 202410076660 A CN202410076660 A CN 202410076660A CN 117591382 A CN117591382 A CN 117591382A
Authority
CN
China
Prior art keywords
data
server
task
determining
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410076660.1A
Other languages
Chinese (zh)
Other versions
CN117591382B (en
Inventor
郭江谱
吴乘先
郑峰
张蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Raycom Joint Creation Tianjin Information Technology Co ltd
Original Assignee
Raycom Joint Creation Tianjin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Raycom Joint Creation Tianjin Information Technology Co ltd filed Critical Raycom Joint Creation Tianjin Information Technology Co ltd
Priority to CN202410076660.1A priority Critical patent/CN117591382B/en
Publication of CN117591382A publication Critical patent/CN117591382A/en
Application granted granted Critical
Publication of CN117591382B publication Critical patent/CN117591382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to a server fault intelligent monitoring method, a device, equipment and a medium, which are applied to the technical field of server monitoring, and the method comprises the following steps: acquiring data to be processed and server state data, wherein the data to be processed is data corresponding to a task to be processed by a server; determining a resource occupation value of the data to be processed; determining an adjustment policy based on the resource occupancy value and the server state data; and adjusting the server resources based on the adjustment strategy. The present application has the effect of reducing the likelihood of server failure.

Description

Intelligent monitoring method, device, equipment and medium for server faults
Technical Field
The application relates to the technical field of server monitoring, in particular to an intelligent monitoring method, device, equipment and medium for server faults.
Background
With the rapid growth of the internet and cloud computing, servers face a significant challenge in handling large numbers of data requests. While the load of the server is increased, various fault problems, such as dead halt, blocking, abnormal restarting and the like, of the server are inevitably caused by the overload of the server, and if the fault problems cannot be found and solved in time, huge losses are caused.
The conventional server monitoring method is often to perform fault monitoring, that is, monitor whether a server fails, and process the server when the server fails, however, process the server after the server fails, and whether the server is timely processed or not, a certain loss is caused, so that a technical scheme is needed that the server can be processed before the failure occurs, thereby reducing the possibility of the server failure.
Disclosure of Invention
In order to reduce the possibility of server faults, the application provides a method, a device, equipment and a medium for intelligently monitoring server faults.
In a first aspect, the present application provides a method for intelligently monitoring a server fault, which adopts the following technical scheme:
a server fault intelligent monitoring method comprises the following steps:
acquiring data to be processed and server state data, wherein the data to be processed is data corresponding to a task to be processed by a server;
determining a resource occupation value of the data to be processed;
determining an adjustment policy based on the resource occupancy value and the server state data;
and adjusting the server resources based on the adjustment strategy.
By adopting the technical scheme, the data to be processed and the server state data are acquired, the resource occupation value of the data to be processed is predicted, the adjustment strategy of the server resource is determined according to the resource occupation value and the server state data, and the server resource is adjusted according to the adjustment strategy, so that the server resource allocation is more reasonable, the possibility of overload of the server is reduced, and the possibility of failure of the server is reduced.
Optionally, the determining the resource occupation value of the data to be processed includes:
acquiring historical task data, wherein the historical task data comprises historical resource occupation values;
grouping the historical task data based on the historical resource occupation values to obtain at least one historical data combination, wherein the historical task data in each historical data combination have the same historical resource occupation value;
and determining the resource occupation value based on the data to be processed and the historical data combination.
By adopting the technical scheme, the resource occupation value is determined according to the historical resource occupation value in the historical data and the data to be processed, so that the resource occupation value is more scientific and reliable, the adjustment strategy determined according to the resource occupation value is more reliable, the server resource is adjusted according to the adjustment strategy, the server resource allocation is more scientific and reliable, and the possibility of server faults is reduced.
Optionally, the determining an adjustment policy based on the resource occupancy value and the server state data includes:
acquiring environment data, historical environment data and a historical server threshold, wherein the environment data is the environment data of a current server;
fitting the historical environment data and the historical server threshold to obtain an environment threshold fitting curve;
determining a server threshold based on the environmental data and the environmental threshold fitting curve;
an adjustment policy is determined based on the server threshold, the resource occupancy value, and the server state data.
By adopting the technical scheme, the influence of the environmental data on the performance of the server is considered, the environmental threshold fitting curve is obtained according to the historical environmental data and the historical server threshold, and the server threshold under the current environmental data is determined according to the environmental threshold fitting curve, so that the adjustment strategy determined according to the server threshold is more reasonable, and the possibility of server faults is reduced.
Optionally, the server state data includes a server occupancy value, and the determining an adjustment policy based on the server threshold, the resource occupancy value, and the server state data includes:
calculating an idle value based on the server occupancy value and the server threshold;
if the idle value is smaller than the resource occupation value, acquiring current task data, wherein the current task data is the task data being processed by the server;
determining a task grade based on the current task data, wherein the task grade is the task grade of the current task, and the current task is the task corresponding to the current task data;
and determining an adjustment strategy based on the task level and the resource occupation value.
By adopting the technical scheme, when the adjustment strategy is determined, the task grade is determined according to the specific condition of the task, and the adjustment strategy is determined according to the task grade, so that the reliability of the adjustment strategy is improved, and the reliability of the server task processing is improved.
Optionally, the determining the task level based on the current task data includes:
acquiring historical processing data and processing deadlines of a current task;
determining a first processing frequency of the current task based on the historical processing data;
determining a first urgency level for the current task based on the processing deadline;
determining a second resource occupancy value based on the current task data and historical data combination;
the task level is determined based on the first processing frequency, the first urgency level, and the second resource occupancy value.
By adopting the technical scheme, the task level is determined according to the first processing frequency, the first emergency level and the second resource occupation value of the current task, and the situation of multiple aspects of task processing is considered, so that the reliability of the task level is improved, and the reliability of an adjustment strategy is improved.
Optionally, the determining the task level based on the first processing frequency, the first urgency level, and the second resource occupancy value includes:
determining a first score based on the first processing frequency;
determining a second score based on the first urgency level;
determining a third score based on the second resource occupancy value;
calculating a grade score based on the first score, the second score, the third score and a preset weight;
a task ranking is determined based on the ranking score.
Optionally, the determining an adjustment policy based on the task level and the resource occupancy value includes:
if the task level does not have the third task level, determining the processing level of the data to be processed;
an adjustment policy is determined based on the processing level, the resource occupancy value, and the task level.
By adopting the technical scheme, the adjustment strategy is determined together according to the processing grade of the data to be processed, the resource occupation value and the task grade of the current task data, so that the adjustment strategy is more reasonable.
In a second aspect, the present application provides a server fault intelligent monitoring device, which adopts the following technical scheme:
an intelligent monitoring device for server faults, comprising:
the data acquisition module is used for acquiring data to be processed and server state data, wherein the data to be processed is data corresponding to a task to be processed by the server;
the resource determining module is used for determining the resource occupation value of the data to be processed;
a policy determination module configured to determine an adjustment policy based on the resource occupancy value and the server state data;
and the resource adjustment module is used for adjusting the server resource based on the adjustment strategy.
By adopting the technical scheme, the data to be processed and the server state data are acquired, the resource occupation value of the data to be processed is predicted, the adjustment strategy of the server resource is determined according to the resource occupation value and the server state data, and the server resource is adjusted according to the adjustment strategy, so that the server resource allocation is more reasonable, the possibility of overload of the server is reduced, and the possibility of failure of the server is reduced.
In a third aspect, the present application provides an electronic device, which adopts the following technical scheme:
an electronic device comprising a processor coupled with a memory;
the memory stores a computer program that can be loaded by a processor and that performs the server fault intelligent monitoring method according to any one of the first aspects.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
a computer-readable storage medium storing a computer program capable of being loaded by a processor and executing the server fault intelligent monitoring method according to any one of the first aspects.
Drawings
Fig. 1 is a schematic flow chart of a server fault intelligent monitoring method provided in an embodiment of the present application.
Fig. 2 is a block diagram of a server fault intelligent monitoring device according to an embodiment of the present application.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.
Description of the embodiments
The present application is described in further detail below with reference to the accompanying drawings.
The embodiment of the application provides a server fault intelligent monitoring method, which can be executed by electronic equipment, wherein the electronic equipment can be a server or terminal equipment, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud computing service. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a desktop computer, etc.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, unless otherwise specified, the term "/" generally indicates that the associated object is an "or" relationship.
As shown in fig. 1, a method for intelligently monitoring server faults is described as follows (steps S101 to S104):
step S101, obtaining data to be processed and server status data.
The server state data comprise server occupation values, wherein the server occupation values are occupation conditions of current resources of the server, and the server occupation values can be embodied in a percentage mode.
When a task party (can be a computer or other equipment) has a task and needs a server to process, the task party can communicate with the server, and the server can acquire data to be processed from the task party and acquire current state data of the server, namely server state data, from the server.
Step S102, determining the resource occupation value of the data to be processed.
Wherein, the resource occupation value can be embodied in the form of percentage.
When a task needs to be processed by the server, the resources occupied by the data to be processed are predicted, so that the server resources can be more reasonably allocated, the possibility of overload caused by insufficient server resources is reduced, and the possibility of faults is reduced.
Specifically, determining the resource occupation value of the data to be processed includes: acquiring historical task data, wherein the historical task data comprises historical resource occupation values; grouping the historical task data based on the historical resource occupation values to obtain at least one historical data combination, wherein the historical task data in each historical data combination have the same historical resource occupation value; a resource occupancy value is determined based on the data to be processed and the historical data combination.
In this embodiment, task data, i.e., historical task data, which is historically processed by the server is obtained from a database, where the historical task data includes task data and resource occupation values corresponding to the task data, i.e., historical resource occupation values, and the historical task data is grouped according to the historical resource occupation values, so that the historical task data with the same historical resource occupation value are in the same historical data combination, and the machine learning model is trained through the historical data combination and the corresponding historical resource occupation values, and the trained machine learning model can predict the corresponding resource occupation values through the data to be processed, where the machine learning model can be a convolutional neural network model, or an MLP-Mixer model, and is not specifically limited herein.
Step S103, determining an adjustment strategy based on the resource occupation value and the server state data.
And determining a resource adjustment strategy according to the resource occupation value of the data to be processed and the current resource occupation condition of the server, namely the server occupation value.
Specifically, determining an adjustment policy based on the resource occupancy value and the server state data includes: acquiring environment data, historical environment data and a historical server threshold, wherein the environment data is the environment data of a current server; fitting the historical environment data and the historical server threshold to obtain an environment threshold fitting curve; determining a server threshold based on the environmental data and the environmental threshold fitting curve; an adjustment policy is determined based on the server threshold, the resource occupancy value, and the server state data.
The performance of the server may be affected by the external environment, for example: if the temperature of the server is too high, the load borne at the moment is reduced, otherwise, faults are easy to occur, so that the monitoring of the server is not limited to the monitoring of the internal state, and the external environment is monitored.
In this embodiment, environmental data of a location where a server is located is obtained from a database or a sensor, the environmental data includes temperature and humidity, historical environmental data and a threshold corresponding to the server in the environment, namely, a historical server threshold, are obtained from the database, the historical environmental data includes historical temperature and historical humidity, the historical server threshold is the maximum resource utilization rate of the server, a server exceeding the historical server threshold fails, the historical environmental data and the historical server threshold are subjected to curve fitting to obtain an environmental threshold fitting curve, a current threshold of the server, namely, a server threshold, is searched from the environmental threshold fitting curve according to the environmental data, and an adjustment strategy is determined according to the server threshold, a resource occupation value of the data to be processed and a server occupation value in server state data.
Specifically, the server state data includes a server occupancy value, and determining an adjustment policy based on the server threshold, the resource occupancy value, and the server state data includes: calculating an idle value based on the server occupancy value and the server threshold; if the idle value is smaller than the resource occupation value, acquiring current task data, wherein the current task data is the task data being processed by the server; determining a task grade based on the current task data, wherein the task grade is the task grade of the current task, and the current task is the task corresponding to the current task data; an adjustment policy is determined based on the task level and the resource occupancy value.
In this embodiment, subtracting the server occupancy value from the server threshold to obtain an idle value, where the idle value is the current idle resource value of the server, and if the idle value is greater than or equal to the resource occupancy value, that is, the server can meet the processing requirement of the data to be processed, the adjustment policy is that adjustment is not required, and the data to be processed is directly distributed to the server for processing; if the idle value is smaller than the resource occupation value, namely the server cannot meet the processing requirement of the data to be processed, current task data, namely the data corresponding to the task currently being processed by the server, are obtained from the server, the task grade of the task being processed is determined according to the current task data, and an adjustment strategy is determined according to the task grade and the resource occupation value of the data to be processed, wherein the adjustment strategy is needed to be adjusted.
Further, determining a task level based on the current task data includes: acquiring historical processing data and processing deadlines of a current task; determining a first processing frequency of the current task based on the historical processing data; determining a first urgency level for the current task based on the processing deadline; determining a second resource occupancy value based on the current task data and the historical data combination; a task level is determined based on the first processing frequency, the first urgency level, and the second resource occupancy value.
In this embodiment, the current task is a task corresponding to current task data, historical processing data of the current task is obtained from a database, the historical processing data is processing data corresponding to a task identical to the current task, the historical processing data includes processing time, the processing time includes processing starting time and processing ending time, the processing ending time minus the processing starting time is processing duration, a processing deadline of the current task is obtained from the database, the processing deadline is ending processing time of the current task, the ending processing time minus the current time is obtained, if the remaining time is longer than the processing duration, the first emergency grade is the second grade, and if the remaining time is not longer than the processing duration, the first emergency grade is the first grade.
Statistical analysis is performed on the historical processing data of the current task, and a first processing frequency of the historical processing data is calculated, that is, how often the current task is processed on average, for example: the first treatment frequency was 8 days.
Predicting the current task data through the trained machine learning model to obtain a second resource occupation value corresponding to the current task data, scoring the current task according to the first processing frequency, the first emergency level and the second resource occupation value, and determining the task level of the current task according to the score.
Further, determining a task level based on the first processing frequency, the first urgency level, and the second resource occupancy value includes: determining a first score based on the first processing frequency; determining a second score based on the first urgency level; determining a third score based on the second resource occupancy value; calculating a grade score based on the first score, the second score, the third score and the preset weight; the task tier is determined based on the tier score.
In this embodiment, if the first treatment frequency is greater than 7 days, for example: the first treatment frequency is 8 days, the first score is a first preset score, and if the first treatment frequency is less than or equal to 7 days, the first score is a second preset score, wherein the first preset score is greater than the second preset score, for example: the first predetermined fraction is 80 and the second predetermined fraction is 50.
If the first emergency grade is the first grade, the second score is a third preset score, and if the first emergency grade is the second grade, the second score is a fourth preset score, wherein the third preset score is larger than the fourth preset score, the third preset score and the fourth preset score are independent of the first preset score, and the third preset score and the fourth preset score are independent of the second preset score.
If the second resource occupancy value is greater than 30%, for example: the second resource occupation value is 40%, the third score is a fifth preset score, and if the second resource occupation value is less than or equal to 30%, the third score is a sixth preset score, wherein the sixth preset score is greater than the fifth preset score, and the fifth preset score and the sixth preset score are independent of any one of the first preset score to the fourth preset score.
The preset weights comprise a first preset weight value, a second preset weight value and a third preset weight value, wherein the first score corresponds to the first preset weight value, the second score corresponds to the second preset weight value, the third score corresponds to the third preset weight value, the grade score=the first score, the first preset weight value+the second score, the second preset weight value+the third score, and the third preset weight value, wherein the second preset weight value is larger than the first preset weight value, and the first preset weight value is larger than the third preset weight value.
If the grade score is greater than the first grade score, the task grade is a first task grade, if the grade score is less than or equal to the first grade score, and the grade score is greater than the second grade score, the task grade is a second task grade, and if the grade score is less than or equal to the second grade score, the task grade is a third task grade, wherein the first grade score is greater than the second grade score.
The first to sixth predetermined scores, the first to third predetermined weight values, the first grade score and the second grade score are all predetermined by the staff, and are not specifically limited herein.
Specifically, determining an adjustment policy based on the task level and the resource occupancy value includes: if the task level does not have the third task level, determining the processing level of the data to be processed; an adjustment policy is determined based on the processing level, the resource occupancy value, and the task level.
If the task level of the current task is a third task level, determining a second resource occupation value of the current task corresponding to the third task level, wherein the second resource occupation value is predicted when the task level is determined, and the second resource occupation value is directly obtained from a database at the moment, if the second resource occupation value is greater than or equal to the resource occupation value of the data to be processed, the adjustment strategy is to process the data to be processed first, namely, server resources of the current task corresponding to the third task level are preferentially distributed to the data to be processed.
If the third task level exists in the task levels of the current task and the second resource occupation value is smaller than the resource occupation value of the data to be processed, or if the third task level does not exist in the task levels of the current task, determining the processing level of the data to be processed, wherein the method for determining the processing level is the same as the method for determining the task level, and the processing level includes the first processing level, the second processing level and the third processing level, which are not described herein.
The allocation sequence of the server resources is sequentially data to be processed corresponding to a first processing grade, current task data corresponding to a first task grade, data to be processed corresponding to a second processing grade, current task data corresponding to a second task grade, data to be processed corresponding to a third processing grade, and current task data corresponding to a third task grade.
And sequencing the data to be processed and the current task data according to the sequence and the corresponding processing grade and task grade, sequentially distributing the server resources according to the sequencing sequence, and if the rest server resources exist, continuing to distribute, wherein the adjustment strategy is to adjust the server resources according to the sequence.
For example: if the processing grade of the data to be processed is the first processing grade, the adjustment strategy is to preferentially process the task to be processed, namely, server resources are preferentially allocated to the data to be processed, and if the resource occupation value of the task to be processed is smaller than the server threshold value, namely, the server still has idle resources, the rest server resources are continuously allocated according to the sorting.
Step S104, adjusting the server resources based on the adjustment strategy.
And adjusting allocation of server resources according to the adjustment strategy.
Fig. 2 is a block diagram of a server fault intelligent monitoring device 200 according to an embodiment of the present application.
As shown in fig. 2, the server fault intelligent monitoring apparatus 200 mainly includes:
the data acquisition module 201 is configured to acquire data to be processed and server state data, where the data to be processed is data corresponding to a task to be processed by the server;
a resource determining module 202, configured to determine a resource occupation value of data to be processed;
a policy determination module 203, configured to determine an adjustment policy based on the resource occupancy value and the server state data;
the resource adjustment module 204 is configured to adjust server resources based on an adjustment policy.
As an optional implementation manner of this embodiment, the resource determining module 202 is further specifically configured to determine a resource occupation value of data to be processed, including: acquiring historical task data, wherein the historical task data comprises historical resource occupation values; grouping the historical task data based on the historical resource occupation values to obtain at least one historical data combination, wherein the historical task data in each historical data combination have the same historical resource occupation value; a resource occupancy value is determined based on the data to be processed and the historical data combination.
As an optional implementation manner of this embodiment, the policy determining module 203 is further specifically configured to determine an adjustment policy based on the resource occupancy value and the server status data, where the policy determining module includes: acquiring environment data, historical environment data and a historical server threshold, wherein the environment data is the environment data of a current server; fitting the historical environment data and the historical server threshold to obtain an environment threshold fitting curve; determining a server threshold based on the environmental data and the environmental threshold fitting curve; an adjustment policy is determined based on the server threshold, the resource occupancy value, and the server state data.
As an alternative implementation manner of this embodiment, the server status data includes a server occupancy value, and the policy determining module 203 is further specifically configured to determine an adjustment policy based on the server threshold, the resource occupancy value, and the server status data, including: calculating an idle value based on the server occupancy value and the server threshold; if the idle value is smaller than the resource occupation value, acquiring current task data, wherein the current task data is the task data being processed by the server; determining a task grade based on the current task data, wherein the task grade is the task grade of the current task, and the current task is the task corresponding to the current task data; an adjustment policy is determined based on the task level and the resource occupancy value.
As an optional implementation manner of this embodiment, the policy determining module 203 is further specifically configured to determine a task level based on current task data, including: acquiring historical processing data and processing deadlines of a current task; determining a first processing frequency of the current task based on the history processing data; determining a first urgency level for the current task based on the processing deadline; determining a second resource occupancy value based on the current task data and the historical data combination; a task level is determined based on the first processing frequency, the first urgency level, and the second resource occupancy value.
As an optional implementation manner of this embodiment, the policy determining module 203 is further specifically configured to determine a task level based on the first processing frequency, the first urgency level, and the second resource occupancy value, where the determining includes: determining a first score based on the first processing frequency; determining a second score based on the first urgency level; determining a third score based on the second resource occupancy value; calculating a grade score based on the first score, the second score, the third score and the preset weight; the task tier is determined based on the tier score.
As an optional implementation manner of this embodiment, the policy determining module 203 is further specifically configured to determine an adjustment policy based on the task level and the resource occupancy value, where the adjustment policy includes: if the task level does not have the third task level, determining the processing level of the data to be processed; an adjustment policy is determined based on the processing level, the resource occupancy value, and the task level.
In one example, a module in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms.
For another example, when a module in an apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke a program. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Fig. 3 is a block diagram of an electronic device 300 according to an embodiment of the present application.
As shown in FIG. 3, electronic device 300 includes a processor 301 and memory 302, and may further include an information input/information output (I/O) interface 303, one or more of a communication component 304, and a communication bus 305.
The processor 301 is configured to control the overall operation of the electronic device 300, so as to complete all or part of the steps of the server fault intelligent monitoring method; the memory 302 is used to store various types of data to support operation at the electronic device 300, which may include, for example, instructions for any application or method operating on the electronic device 300, as well as application-related data. The Memory 302 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as one or more of static random access Memory (Static Random Access Memory, SRAM), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.
The I/O interface 303 provides an interface between the processor 301 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 304 is used for wired or wireless communication between the electronic device 300 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G, or 4G, or a combination of one or more thereof, and accordingly the communication component 304 can include: wi-Fi part, bluetooth part, NFC part.
The electronic device 300 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), digital signal processors (Digital Signal Processor, abbreviated as DSP), digital signal processing devices (Digital Signal Processing Device, abbreviated as DSPD), programmable logic devices (Programmable Logic Device, abbreviated as PLD), field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), controllers, microcontrollers, microprocessors, or other electronic components for performing the server fault intelligent monitoring method as set forth in the above embodiments.
Communication bus 305 may include a pathway to transfer information between the aforementioned components. The communication bus 305 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus 305 may be divided into an address bus, a data bus, a control bus, and the like.
The electronic device 300 may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like, and may also be a server, and the like.
The application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the intelligent server fault monitoring method when being executed by a processor.
The computer readable storage medium may include: a U-disk, a removable hard disk, a read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the application referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or their equivalents is possible without departing from the spirit of the application. Such as the above-mentioned features and the technical features having similar functions (but not limited to) applied for in this application are replaced with each other.

Claims (10)

1. The intelligent monitoring method for the server faults is characterized by comprising the following steps of:
acquiring data to be processed and server state data, wherein the data to be processed is data corresponding to a task to be processed by a server;
determining a resource occupation value of the data to be processed;
determining an adjustment policy based on the resource occupancy value and the server state data;
and adjusting the server resources based on the adjustment strategy.
2. The method of claim 1, wherein the determining the resource occupancy value of the data to be processed comprises:
acquiring historical task data, wherein the historical task data comprises historical resource occupation values;
grouping the historical task data based on the historical resource occupation values to obtain at least one historical data combination, wherein the historical task data in each historical data combination have the same historical resource occupation value;
and determining the resource occupation value based on the data to be processed and the historical data combination.
3. The method of claim 1, wherein the determining an adjustment policy based on the resource occupancy value and the server state data comprises:
acquiring environment data, historical environment data and a historical server threshold, wherein the environment data is the environment data of a current server;
fitting the historical environment data and the historical server threshold to obtain an environment threshold fitting curve;
determining a server threshold based on the environmental data and the environmental threshold fitting curve;
an adjustment policy is determined based on the server threshold, the resource occupancy value, and the server state data.
4. The method of claim 3, wherein the server state data comprises a server occupancy value, wherein the determining an adjustment policy based on the server threshold, the resource occupancy value, and the server state data comprises:
calculating an idle value based on the server occupancy value and the server threshold;
if the idle value is smaller than the resource occupation value, acquiring current task data, wherein the current task data is the task data being processed by the server;
determining a task grade based on the current task data, wherein the task grade is the task grade of the current task, and the current task is the task corresponding to the current task data;
and determining an adjustment strategy based on the task level and the resource occupation value.
5. The method of claim 4, wherein said determining a task level based on said current task data comprises:
acquiring historical processing data and processing deadlines of a current task;
determining a first processing frequency of the current task based on the historical processing data;
determining a first urgency level for the current task based on the processing deadline;
determining a second resource occupancy value based on the current task data and historical data combination;
the task level is determined based on the first processing frequency, the first urgency level, and the second resource occupancy value.
6. The method of claim 5, wherein the determining the task level based on the first processing frequency, the first urgency level, and the second resource occupancy value comprises:
determining a first score based on the first processing frequency;
determining a second score based on the first urgency level;
determining a third score based on the second resource occupancy value;
calculating a grade score based on the first score, the second score, the third score and a preset weight;
the task level is determined based on the level score.
7. The method of claim 4, wherein the determining an adjustment policy based on the task level and the resource occupancy value comprises:
if the task level does not have the third task level, determining the processing level of the data to be processed;
an adjustment policy is determined based on the processing level, the resource occupancy value, and the task level.
8. The utility model provides a server trouble intelligent monitoring device which characterized in that includes:
the data acquisition module is used for acquiring data to be processed and server state data, wherein the data to be processed is data corresponding to a task to be processed by the server;
the resource determining module is used for determining the resource occupation value of the data to be processed;
a policy determination module configured to determine an adjustment policy based on the resource occupancy value and the server state data;
and the resource adjustment module is used for adjusting the server resource based on the adjustment strategy.
9. An electronic device comprising a processor coupled to a memory;
the processor is configured to execute a computer program stored in the memory to cause the electronic device to perform the method of any one of claims 1 to 7.
10. A computer readable storage medium comprising a computer program or instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 7.
CN202410076660.1A 2024-01-19 2024-01-19 Intelligent monitoring method, device, equipment and medium for server faults Active CN117591382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410076660.1A CN117591382B (en) 2024-01-19 2024-01-19 Intelligent monitoring method, device, equipment and medium for server faults

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410076660.1A CN117591382B (en) 2024-01-19 2024-01-19 Intelligent monitoring method, device, equipment and medium for server faults

Publications (2)

Publication Number Publication Date
CN117591382A true CN117591382A (en) 2024-02-23
CN117591382B CN117591382B (en) 2024-04-30

Family

ID=89922798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410076660.1A Active CN117591382B (en) 2024-01-19 2024-01-19 Intelligent monitoring method, device, equipment and medium for server faults

Country Status (1)

Country Link
CN (1) CN117591382B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030196126A1 (en) * 2002-04-11 2003-10-16 Fung Henry T. System, method, and architecture for dynamic server power management and dynamic workload management for multi-server environment
CN111752706A (en) * 2020-05-29 2020-10-09 北京沃东天骏信息技术有限公司 Resource allocation method, device and storage medium
CN112328399A (en) * 2020-11-17 2021-02-05 中国平安财产保险股份有限公司 Cluster resource scheduling method and device, computer equipment and storage medium
CN114185675A (en) * 2021-12-10 2022-03-15 恒睿(重庆)人工智能技术研究院有限公司 Resource management method, device, electronic equipment and storage medium
CN115461698A (en) * 2020-04-20 2022-12-09 祖达科尔有限公司 System and method for server power management
CN115858161A (en) * 2022-12-08 2023-03-28 北京元年科技股份有限公司 Method, device, equipment and storage medium for intelligently managing resources of data middleboxes
CN117311987A (en) * 2023-11-10 2023-12-29 中国联合网络通信集团有限公司 Method, device and storage medium for adjusting server processor frequency

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030196126A1 (en) * 2002-04-11 2003-10-16 Fung Henry T. System, method, and architecture for dynamic server power management and dynamic workload management for multi-server environment
CN115461698A (en) * 2020-04-20 2022-12-09 祖达科尔有限公司 System and method for server power management
CN111752706A (en) * 2020-05-29 2020-10-09 北京沃东天骏信息技术有限公司 Resource allocation method, device and storage medium
CN112328399A (en) * 2020-11-17 2021-02-05 中国平安财产保险股份有限公司 Cluster resource scheduling method and device, computer equipment and storage medium
CN114185675A (en) * 2021-12-10 2022-03-15 恒睿(重庆)人工智能技术研究院有限公司 Resource management method, device, electronic equipment and storage medium
CN115858161A (en) * 2022-12-08 2023-03-28 北京元年科技股份有限公司 Method, device, equipment and storage medium for intelligently managing resources of data middleboxes
CN117311987A (en) * 2023-11-10 2023-12-29 中国联合网络通信集团有限公司 Method, device and storage medium for adjusting server processor frequency

Also Published As

Publication number Publication date
CN117591382B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN112162865B (en) Scheduling method and device of server and server
US7631034B1 (en) Optimizing node selection when handling client requests for a distributed file system (DFS) based on a dynamically determined performance index
CN107247651B (en) Cloud computing platform monitoring and early warning method and system
CN107911399B (en) Elastic expansion method and system based on load prediction
CN109981702B (en) File storage method and system
CA3128540C (en) Cache system hotspot data access method, apparatus, computer device and storage medium
CN112689007B (en) Resource allocation method, device, computer equipment and storage medium
CN112506619B (en) Job processing method, job processing device, electronic equipment and storage medium
CN115277566B (en) Load balancing method and device for data access, computer equipment and medium
CN108769162A (en) Distributed message equalization processing method, device, electronic equipment, storage medium
CN114490078A (en) Dynamic capacity reduction and expansion method, device and equipment for micro-service
CN113672345A (en) IO prediction-based cloud virtualization engine distributed resource scheduling method
CN115033352A (en) Task scheduling method, device and equipment for multi-core processor and storage medium
CN117591382B (en) Intelligent monitoring method, device, equipment and medium for server faults
CN111159009A (en) Pressure testing method and device for log service system
CN115469980A (en) Product medium download task scheduling method and device and electronic equipment
CN114422530A (en) Flow control method and device, computer equipment and storage medium
CN110266525B (en) CDN server number configuration method, equipment and computer readable storage medium
CN114675845A (en) Information age optimization method and device, computer equipment and storage medium
CN114090256A (en) Application delivery load management method and system based on cloud computing
GB2504812A (en) Load balancing in a SAP (RTM) system for processors allocated to data intervals based on system load
CN117608862B (en) Data distribution control method, device, equipment and medium
CN111988403A (en) Request processing method and system of electronic equipment, storage medium and electronic equipment
CN113055199A (en) Gateway access method and device and gateway equipment
CN117314683B (en) Power operation and maintenance method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant