CN115297124B - System operation and maintenance management method and device and electronic equipment - Google Patents

System operation and maintenance management method and device and electronic equipment Download PDF

Info

Publication number
CN115297124B
CN115297124B CN202210877828.XA CN202210877828A CN115297124B CN 115297124 B CN115297124 B CN 115297124B CN 202210877828 A CN202210877828 A CN 202210877828A CN 115297124 B CN115297124 B CN 115297124B
Authority
CN
China
Prior art keywords
edge node
data
edge
preset
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210877828.XA
Other languages
Chinese (zh)
Other versions
CN115297124A (en
Inventor
吴文峰
林洁琬
黄鹄
毛廷鸿
沈聪
全树强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202210877828.XA priority Critical patent/CN115297124B/en
Publication of CN115297124A publication Critical patent/CN115297124A/en
Priority to PCT/CN2022/141396 priority patent/WO2024021469A1/en
Application granted granted Critical
Publication of CN115297124B publication Critical patent/CN115297124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1031Controlling of the operation of servers by a load balancer, e.g. adding or removing servers that serve requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A system operation and maintenance management method, a device and an electronic device, wherein the method comprises the following steps: the method comprises the steps that a proxy server is newly added in a system, an edge node state monitoring task is deployed based on the newly added proxy server, when abnormal edge nodes are monitored, mirror image data corresponding to the abnormal edge nodes are determined from a preset mirror image database, recovery of the abnormal edge nodes is executed based on the mirror image data, and/or when a business flow fluctuation trend within preset time is predicted based on an adaptive scheduling system, the number value of the edge nodes in the system is adjusted to a second number value from a first number value according to a preset list, and the edge nodes of the second number value are controlled to process business flow. By the method, the edge node resources are self-adaptively scheduled according to the short-time flow trend, so that the resources are more effectively utilized; the high-availability application, state monitoring and mirror image timing backup recovery tasks are configured on the edge cloud, the center cloud and the edge cloud are operated and maintained cooperatively, the automation degree of operation and maintenance is improved, and the continuity and reliability of the service are ensured.

Description

System operation and maintenance management method and device and electronic equipment
Technical Field
The present disclosure relates to the field of cloud computing technologies, and in particular, to a method and an apparatus for managing operation and maintenance of a system, and an electronic device.
Background
With development of cloud computing technology, cloud computing is increasingly used in life, such as: when a user performs network video live broadcast, the support of a center cloud and an edge cloud is generally needed, the edge cloud is a brand-new network architecture and an open platform, the edge cloud is used as an extension of the center cloud, partial services and/or capabilities of the center cloud are expanded to the edge infrastructure, the capabilities comprise storage, calculation, network, big data, safety and the like, the open platform integrates core capabilities of network, calculation, storage and application at the network edge side, the edge cloud changes the working mode of the traditional center cloud, and more flexible services and faster response speed can be provided for the user.
Specifically, the roles of the center cloud and the edge cloud are different, the hardware platforms to which the center cloud and the edge cloud belong are different, the deployed services are different, and the center cloud and the edge cloud are required to be deployed independently in the actual deployment and upgrading processes of the center cloud and the edge cloud.
At present, because the central cloud and the edge cloud need to perform operation and maintenance management independently, when the traffic flow has larger fluctuation, the self-adaptive nano tube cannot be used or the edge node resource is released, so that when the traffic flow is large, the edge node resource cannot be used effectively, when the traffic flow is small, the edge node resource is not released in time, so that the edge node resource cannot be scheduled in a self-adaptive mode, the edge cloud and the central cloud need to perform operation and maintenance management independently, high-availability applications are not integrated together, when new nodes are accessed or released, some high-availability applications need to be manually accessed into a system, and the mirror image of installation and deployment is not effectively managed and maintained, so that operation and maintenance are difficult, and reliability is insufficient.
Disclosure of Invention
The application provides a system operation and maintenance management method, a device and electronic equipment, which are used for improving the system qualitative and reliability and also used for realizing the self-adaptive scheduling of edge nodes in the system.
In a first aspect, the present application provides a system operation and maintenance management method, where the method includes:
a proxy server is newly added in a system, and an edge node state monitoring task is deployed based on the newly added proxy server, wherein the edge node state monitoring task is used for monitoring the states of all edge nodes in the system;
when an abnormal edge node is monitored, mirror image data corresponding to the abnormal edge node is determined from a preset mirror image database, and recovery of the abnormal edge node is executed based on the mirror image data, wherein backup mirror image data used for recovery by each edge node in the system are stored in the preset mirror image database; and/or
When the self-adaptive scheduling system predicts the fluctuation trend of the service flow in the preset time, the number value of the edge nodes in the system is adjusted from the first number value to the second number value according to the preset list, and the edge nodes of the second number value are controlled to process the service flow.
In one possible design, a new proxy server is added to the system, including:
deploying a high availability application from the newly added proxy server, and determining a main server and a standby server from the high availability application;
generating a virtual IP address based on the main server and the standby server, and binding the virtual IP address with the main server;
and when the abnormality of the main server is detected, unbinding the virtual IP address from the main server, and binding the standby server from the virtual IP address until the main server is recovered.
In one possible design, when an abnormal edge node is monitored, the method further comprises:
determining a first edge node which does not respond to the heartbeat information, and taking the first edge node as an abnormal edge node, wherein the heartbeat information is used for determining the abnormal edge node; and/or
When the running state of the second edge node is determined to be a waiting state, responding to the waiting time corresponding to the waiting state to exceed the preset time, and taking the second edge node as an abnormal edge node; and/or
And determining the running state as a third edge node in the recovery state, and taking the third edge node as an abnormal edge node.
In one possible design, adjusting the number of edge nodes in the system from a first number to a second number based on the traffic flow includes:
processing the edge node data according to a preset mode to obtain a parameter value corresponding to the edge node data;
acquiring the flow trend of the service flow in preset time based on the parameter value and a preset flow prediction module, wherein the flow trend represents the fluctuation range of the service flow in the preset time;
the number value of edge nodes in the system is adjusted from a first number value to a second number value based on the traffic potential.
In one possible design, the processing the edge node data according to a preset mode to obtain a parameter value corresponding to the edge node data includes:
extracting all types of edge data in the edge node data, deleting invalid data in all the edge data based on a first preset method, and generating all groups of denoising data corresponding to the edge node data, wherein the invalid data are repeated data in the edge node data and data with larger deviation from other data in the edge node data;
processing each group of denoising data based on the second preset method to obtain training data corresponding to each group of denoising data;
and inputting the training data of each group into a preset model for training to obtain the parameter values corresponding to the training data of each group.
In one possible design, the adjusting the number value of the edge nodes in the system from the first number value to the second number value according to the preset list, and controlling the edge nodes of the second number value to process the traffic includes:
obtaining a first quantity value of a current edge node and obtaining the total quantity of service traffic;
determining a second quantity value of the edge node corresponding to the service flow from the preset list based on the total quantity;
and adjusting the first quantity value to a second quantity value, and controlling an edge node of the second quantity value to process the service flow.
In a second aspect, the present application provides a system operation and maintenance management device, the device including:
the monitoring module is used for newly adding a proxy server in the system and deploying an edge node state monitoring task based on the newly added proxy server;
the recovery module is used for determining mirror image data corresponding to the abnormal edge node from a preset mirror image database when the abnormal edge node is monitored, and executing recovery of the abnormal edge node based on the mirror image data;
and the adjusting module is used for adjusting the number value of the edge nodes in the system from the first number value to the second number value according to a preset list when the self-adaptive scheduling system predicts the fluctuation trend of the service flow in the preset time, and controlling the edge nodes of the second number value to process the service flow.
In one possible design, the monitoring module is specifically configured to deploy a high availability application from the newly added proxy server, determine a primary server and a standby server from the high availability application, generate a virtual IP address based on the primary server and the standby server, bind the virtual IP address to the primary server, unbind the virtual IP address to the primary server when an anomaly of the primary server is detected, and bind the standby server to the virtual IP address until the primary server recovers.
In one possible design, the recovery module is specifically configured to, when an abnormal edge node is monitored, further include: and when determining that the first edge node does not respond to the heartbeat information and takes the first edge node as an abnormal edge node and/or determining that the running state of the second edge node is a waiting state, responding to the waiting time corresponding to the waiting state to exceed the preset time, taking the second edge node as the abnormal edge node and/or determining that the running state is a third edge node in a recovery state and taking the third edge node as the abnormal edge node.
In one possible design, the adjusting module is specifically configured to process the edge node data according to a preset manner, obtain a parameter value corresponding to the edge node data, obtain a traffic trend of the traffic flow in a preset time based on the parameter value and a preset traffic prediction module, and adjust a number value of edge nodes in the system from a first number value to a second number value based on the traffic trend.
In one possible design, the adjusting module is further configured to extract each type of edge data in the edge node data, delete invalid data in each edge data based on a first preset method, generate each set of denoising data corresponding to the edge node data, process each set of denoising data based on the second preset method, obtain training data corresponding to each set of denoising data, and input each set of training data into a preset model for training, so as to obtain parameter values corresponding to each set of training data.
In one possible design, the adjusting module is further configured to obtain a first quantity value of a current edge node, obtain a total amount of traffic, determine a second quantity value of an edge node corresponding to the traffic from the preset list based on the total amount, adjust the first quantity value to the second quantity value, and control the edge node of the second quantity value to process the traffic.
In a third aspect, the present application provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the system operation and maintenance management method when executing the computer program stored in the memory.
In a fourth aspect, a computer readable storage medium stores a computer program, which when executed by a processor, implements a system operation and maintenance management method step as described above.
The technical effects of each of the first to fourth aspects and the technical effects that may be achieved by each aspect are referred to above for the technical effects that may be achieved by the first aspect or the various possible aspects of the first aspect, and are not repeated here.
Drawings
FIG. 1 is a flowchart of a method for managing system operation and maintenance provided in the present application;
fig. 2 is a schematic structural diagram of a system operation and maintenance management device provided in the present application;
fig. 3 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings. The specific method of operation in the method embodiment may also be applied to the device embodiment or the system embodiment. It should be noted that "a plurality of" is understood as "at least two" in the description of the present application. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. A is connected with B, and can be represented as follows: both cases of direct connection of A and B and connection of A and B through C. In addition, in the description of the present application, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not be construed as indicating or implying a relative importance or order.
At present, in a system of a center cloud and an edge cloud, because the functions of the center cloud and the edge cloud are different, hardware platforms are different, deployed services are different, the center cloud and the edge cloud are required to be independently operated and maintained, available applications of the center cloud and the edge cloud are not integrated, when a new node is accessed or released, some available applications need to be manually accessed into the system, and the image of installation and deployment is not effectively managed and maintained, so that the operation and maintenance of the system are difficult, the reliability is insufficient, and edge cloud nodes in the system cannot adapt to a nanotube or release edge node resources, so that the edge node resources cannot be adaptively scheduled.
In order to solve the above-described problems, the embodiments of the present application provide a system operation and maintenance management method, which is used to implement adaptive scheduling of edge cloud resources in a system and improve high availability and stability of the system. The method and the device according to the embodiments of the present application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the embodiments of the device and the method can be referred to each other, and the repetition is not repeated.
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the present application provides a method for managing operation and maintenance of a system, which can improve high availability and stability of the system, and enable edge cloud resources in the system to implement adaptive scheduling, so as to improve utilization rate of the edge cloud resources, and the implementation flow of the method is as follows:
step S1: and a newly-added proxy server exists in the system, and an edge node state monitoring task is deployed based on the newly-added proxy server.
In order to improve the high availability and stability of the system and facilitate the operation and maintenance management of the system, two proxy servers and an adaptive scheduling system are newly added in the system comprising a center cloud and an edge cloud, the proxy servers are used for realizing the operation and maintenance management of the edge cloud based on a mode of configuring the high availability application, the high availability application can be an edge node state monitoring task, a mirror image timing backup task, a load balancing task, a traffic forwarding task and the like, and the high availability application can be adjusted based on the actual operation and maintenance condition of the system, so that one-to-one description is omitted.
The self-adaptive scheduling system predicts the service flow in the next system based on the edge node data sent by the proxy server, so as to realize self-adaptive scheduling of the edge node resources, and based on the cooperative operation and maintenance of the proxy server, the high-availability application and the self-adaptive scheduling module, the self-adaptive scheduling system can realize unified management of the central cloud and the edge cloud.
It should be further noted that, when a proxy server is newly added in the system, the newly added proxy server at least includes: the number of the main server and the standby server in the embodiment of the present application is 1, and the number of the main server and the standby server can be adjusted according to actual situations, which is not described herein too much.
Specifically, a virtual IP address exists between the main server and the standby server, where the virtual IP address is bound to the main server, and when an abnormality of the main server is detected, in order to not affect the operation and maintenance of the system and ensure that each node in the system works normally, the standby server will replace the main server to work, and the virtual IP address will unbind from the main server and bind to the standby server until the main server recovers.
The newly added proxy server is used for deploying services such as service agents, flow forwarding, load balancing, node state monitoring, multipath and the like in the system, and is provided with an edge node state monitoring task which is used for monitoring the states of all edge nodes in the system.
Based on the above description, when the main server is abnormal, the standby server works instead of the main server, so that the service in the system is not interrupted, and the stability and high availability of the system are ensured.
Step S2: when the abnormal edge node is monitored, mirror image data corresponding to the abnormal edge node is determined from a preset mirror image database, and recovery of the abnormal edge node is executed based on the mirror image data.
The above-described edge node state monitoring task may send heartbeat information to each edge node, and when an edge node in the system is in a normal working state, the edge node may respond to the received heartbeat information, so that the edge node state monitoring task can determine whether the edge node is abnormal based on whether the edge node responds to the heartbeat information. In addition, the edge node state monitoring task can also determine whether the edge node is abnormal based on the operation state of the edge node.
When an abnormal edge node appears in the system, the specific situation of the abnormal edge node appears is as follows:
and when the first edge node which does not respond to the heartbeat information in the system is determined, the first edge node is used as an abnormal edge node. And/or
And when the second edge node with the running state of the system being the waiting state is determined, and when the waiting time corresponding to the waiting state of the second edge node exceeds the preset time, the second edge node is used as an abnormal edge node. And/or
When the operation state in the system is determined to be the third edge node which is being recovered, the third edge node is taken as an abnormal edge node because the operation state is that the edge node which is being recovered is the edge node which is failed to recover.
After the abnormal edge node is determined, the mirror image data corresponding to the abnormal edge node in a preset mirror image database is required to be determined, the mirror images of all modules in the system are stored in the preset mirror image database, the preset mirror image database can be a Docker mirror image database, the limitation is not made here, and after the mirror image data corresponding to the abnormal edge node is determined, the abnormal edge node is recovered through the determined mirror image data.
In this embodiment, the edge node state monitoring task may be a crontab timing daemon, where a crontab command is commonly found in Unix and Unix-like operating systems, and is used to set a periodically executed instruction, where the edge node state monitoring task may detect an edge node in a system according to a preset period, and an operation state of the edge node may be obtained based on an automatic operation and maintenance tool, for example: and when the state of the edge node is abnormal, the system displays that the state of the edge node is ERROR, and the service is rapidly switched to a preset server through a multi-path service, wherein the preset server can be a BACKUP server and sends a message to an edge node mirror image recovery task.
After receiving the information of the edge node state monitoring task, the edge node mirror image restoration task analyzes the received information, finds mirror image data of a corresponding node from a preset mirror image library, and uses a preset command to carry out mirror image restoration, wherein the preset command can be a dock command.
In order to ensure that the abnormal edge nodes in the system can be recovered in real time, the mirror image timing backup task in the system backs up the mirror image data of all the edge nodes in the system, and the specific backup method is to check and select the mirror image to backup through an stable timing to all the cluster node containers.
Based on the method, when the abnormal edge nodes appear in the system, the mirror image backup data of each edge node is used for quickly recovering the abnormal edge nodes, so that the update of the abnormal edge nodes in the system is realized, the normal state of the edge nodes in the system is ensured, the system deployment is more flexible, and the convenience of the operation and maintenance of the system is improved.
Step S3: when the self-adaptive scheduling system predicts the fluctuation trend of the service flow in the preset time, the number value of the edge nodes in the system is adjusted from the first number value to the second number value according to the preset list, and the edge nodes of the second number value are controlled to process the service flow.
The system operation and maintenance management method in the embodiment of the application further comprises self-adaptive scheduling of the edge node resources, wherein the self-adaptive scheduling of the edge node resources is used for realizing high utilization rate of the edge node resources, and the specific process of the self-adaptive scheduling of the edge node resources is as follows:
when predicting a traffic fluctuation trend within a preset time based on the adaptive scheduling system, acquiring edge node data of an edge node in real time, wherein the edge node data at least comprises: the edge node data comprises a plurality of types of edge data, and each edge data comprises invalid data, wherein the invalid data is repeated data in the edge data and data with larger deviation from other data in service data, so that the invalid data needs to be deleted from each edge data.
Specifically, invalid data in each service data is deleted based on a first preset method, wherein the first preset method specifically comprises the following steps: the denoising of each service data from which invalid data is deleted is performed by a k-nearest neighbor method, and the processing of data based on the k-nearest neighbor method is a technique well known to those skilled in the art, and therefore, the detailed process of denoising data based on the k-nearest neighbor method is not specifically described herein.
The k-nearest neighbor method is based on the fact that data, which are more than a threshold value from 5 nearest neighbors, in each edge data can be used as abnormal data to be deleted, denoising data corresponding to each edge data are generated, and each group of denoising data corresponding to the edge node data are obtained.
After obtaining each group of denoising data of the service flow, normalizing each group of denoising data, and unifying parameters of each group of denoising data to a substantially same numerical interval.
After obtaining the training data, the training data is required to be input into a preset model for training, and the training method in the preset model at least comprises the following steps: and (3) multi-parameter fitting, correlation coefficients, linear regression algorithm and the like to obtain parameter values corresponding to each group of training data.
After the system detects a series of parameter values corresponding to the service flow, the system can determine the flow trend of the service flow within the preset time based on each parameter value and the preset flow prediction module, wherein the flow trend represents the fluctuation range of the service flow within the preset time.
Such as: the edge node data are shown in table 1:
edge node data Edge data
a {1.2、2.3、2.4、5.7、2.3、1.9、3.2、6、1.2}
b {2.1、1.9、3.3、2.2、2.8、1.6、1.2、3、3.2}
c {1.7、2.6、1.4、3.6、1.3、1.7、1.2、3、2.2}
...... ......
TABLE 1
The partial edge data of the edge node data is recorded in the above table 1, only 3 sets of edge data are exemplified in the above table 1, 9 parameters are included in each set of edge data, and each set of service data is data after the duplication removal, and the 9 parameters are trained and normalized, and since the method of normalizing the data has been described above, the description thereof will not be repeated here.
After training and normalizing each set of edge data based on each set of edge data in table 1 above, the following table 2 is obtained:
edge node data Parameter value
a 2.6
b 3.1
c 1.8
...... ......
TABLE 2
In table 2, the parameter values corresponding to 3 sets of edge data are exemplified, the parameter value corresponding to the edge data a is 2.6, the parameter value corresponding to the edge data b is 3.1, the parameter value corresponding to the edge data c is 1.8, the traffic trend corresponding to the traffic can be determined based on the parameter value, and the parameter values corresponding to other edge data and other edge data are referred to table 2, and are not described herein too.
After the traffic trend of the traffic flow is determined, the number value of the edge nodes in the system is adjusted from the first number value to the second number value based on the traffic trend.
The specific way of adjusting the number value of the edge nodes is as follows: obtaining a first quantity value of a current edge node, in order to determine traffic trend of service data, obtaining total quantity of service traffic, determining a second quantity value of the edge node corresponding to the total quantity of service traffic from a preset list based on the total quantity, wherein the preset list records a corresponding relation between the total quantity of service traffic and the second quantity value of the edge node, and the preset list is as follows:
total amount of traffic flow Second number of edge nodes
90 10
100 20
110 30
...... ......
TABLE 3 Table 3
In table 3, the total amount of each traffic corresponds to the second number of edge nodes, and table 3 is only an example for describing the correspondence between the total amount of traffic and the second number, and the second number of edge nodes corresponding to the total amount of other traffic refers to the example in table 3, which is not described herein.
Through the above table 3, the first quantity value of the edge node in the current system is determined, and then the first quantity value is adjusted to the second quantity value based on the traffic potential of the traffic flow, so that the quantity of the edge node is determined before the traffic flow is processed, and further, the effective scheduling of the edge node resources is realized.
When the flow trend is larger, the system can receive a new edge node, so that the aim of reducing the pressure of the edge node is fulfilled; when the traffic trend is smaller, the system can release the edge node and forward the traffic of the released edge node, so that the resources of the edge node can be effectively utilized.
Based on the description, when the traffic fluctuation trend within the preset time is predicted based on the adaptive scheduling system, the number of the edge nodes can be adjusted based on the load condition of the edge nodes, so that the problem of overload when each edge node processes traffic is avoided, the parameter value corresponding to the traffic is obtained after the traffic is processed, and the traffic trend corresponding to the traffic is obtained, so that the number of the edge nodes in the traffic trend system is adjusted based on the number of the edge nodes, the adaptive adjustment of the number of the edge nodes is ensured, and the utilization rate of the edge nodes is improved.
Based on the same inventive concept, the embodiment of the present application further provides a system operation and maintenance management device, where the system operation and maintenance management device is configured to implement a function of a system operation and maintenance management method, and referring to fig. 2, the device includes:
the monitoring module 201 is configured to add a proxy server in the system, and deploy an edge node state monitoring task based on the added proxy server;
a recovery module 202, configured to determine mirror image data corresponding to an abnormal edge node from a preset mirror image database when the abnormal edge node is monitored, and perform recovery of the abnormal edge node based on the mirror image data;
and the adjusting module 203 is configured to adjust the number value of the edge nodes in the system from a first number value to a second number value according to a preset list when the traffic fluctuation trend within a preset time is predicted based on the adaptive scheduling system, and control the edge nodes of the second number value to process the traffic.
In one possible design, the monitoring module 201 is specifically configured to deploy a high availability application from the newly added proxy server, determine a primary server and a standby server from the high availability application, generate a virtual IP address based on the primary server and the standby server, bind the virtual IP address to the primary server, unbind the virtual IP address to the primary server when an anomaly of the primary server is detected, and bind the standby server to the virtual IP address until the primary server recovers.
In one possible design, the recovery module 202 is specifically configured to, when an abnormal edge node is monitored, further include: and when determining that the first edge node does not respond to the heartbeat information and takes the first edge node as an abnormal edge node and/or determining that the running state of the second edge node is a waiting state, responding to the waiting time corresponding to the waiting state to exceed the preset time, taking the second edge node as the abnormal edge node and/or determining that the running state is a third edge node in a recovery state and taking the third edge node as the abnormal edge node.
In one possible design, the adjusting module 203 is specifically configured to process the edge node data according to a preset manner, obtain a parameter value corresponding to the edge node data, obtain a traffic trend of the traffic flow in a preset time based on the parameter value and a preset traffic prediction module, and adjust a number value of edge nodes in the system from a first number value to a second number value based on the traffic trend.
In one possible design, the adjustment module 203 is further configured to extract each type of edge data in the edge node data, delete invalid data in each edge data based on a first preset method, generate each set of denoising data corresponding to the edge node data, process each set of denoising data based on the second preset method, obtain training data corresponding to each set of denoising data, and input each set of training data into a preset model for training, so as to obtain parameter values corresponding to each set of training data.
In one possible design, the adjusting module 203 is further configured to obtain a first number value of a current edge node, obtain a total amount of traffic, determine a second number value of an edge node corresponding to the traffic from the preset list based on the total amount, adjust the first number value to the second number value, and control the edge node of the second number value to process the traffic.
Based on the same inventive concept, the embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing system operation and maintenance management device, and referring to fig. 3, the electronic device includes:
at least one processor 301, and a memory 302 connected to the at least one processor 301, in this embodiment of the present application, a specific connection medium between the processor 301 and the memory 302 is not limited, and in fig. 3, the connection between the processor 301 and the memory 302 through the bus 300 is taken as an example. Bus 300 is shown in bold lines in fig. 3, and the manner in which the other components are connected is illustrated schematically and not by way of limitation. The bus 300 may be divided into an address bus, a data bus, a control bus, etc., and is represented by only one thick line in fig. 3 for convenience of illustration, but does not represent only one bus or one type of bus. Alternatively, the processor 301 may be referred to as a controller, and the names are not limited.
In the embodiment of the present application, the memory 302 stores instructions executable by the at least one processor 301, and the at least one processor 301 may perform a system operation and maintenance management method as described above by executing the instructions stored in the memory 302. Processor 301 may implement the functions of the various modules in the apparatus shown in fig. 2.
The processor 301 is a control center of the apparatus, and may connect various parts of the entire control device using various interfaces and lines, and by executing or executing instructions stored in the memory 302 and invoking data stored in the memory 302, various functions of the apparatus and processing data, thereby performing overall monitoring of the apparatus.
In one possible design, processor 301 may include one or more processing units, and processor 301 may integrate an application processor and a modem processor, where the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 301. In some embodiments, processor 301 and memory 302 may be implemented on the same chip, and in some embodiments they may be implemented separately on separate chips.
The processor 301 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, which may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a system operation and maintenance management method disclosed in connection with the embodiments of the present application may be directly embodied and executed by a hardware processor, or may be executed by a combination of hardware and software modules in the processor.
The memory 302 serves as a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 302 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 302 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 302 in the present embodiment may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data.
By programming the processor 301, the code corresponding to a system operation and maintenance management method described in the foregoing embodiment may be cured into the chip, so that the chip can execute a system operation and maintenance management step of the embodiment shown in fig. 1 when running. How to design and program the processor 301 is a technology well known to those skilled in the art, and will not be described in detail herein.
Based on the same inventive concept, the embodiments of the present application also provide a storage medium storing computer instructions that, when executed on a computer, cause the computer to perform a system operation and maintenance management method as described above.
In some possible embodiments, the present application provides that aspects of a system operation and maintenance management method may also be implemented in the form of a program product comprising program code for causing the control apparatus to carry out the steps of a system operation and maintenance management method according to various exemplary embodiments of the present application as described herein above when the program product is run on a device.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (8)

1. A system operation and maintenance management method, comprising:
a proxy server is newly added in a system, and an edge node state monitoring task is deployed based on the newly added proxy server, wherein the edge node state monitoring task is used for monitoring the states of all edge nodes in the system;
when an abnormal edge node is monitored, mirror image data corresponding to the abnormal edge node is determined from a preset mirror image database, and recovery of the abnormal edge node is executed based on the mirror image data, wherein backup mirror image data used for recovery by each edge node in the system are stored in the preset mirror image database;
when predicting a traffic flow fluctuation trend within a preset time based on an adaptive scheduling system, extracting all types of edge data in the edge node data, deleting invalid data in all the edge data based on a first preset method, and generating all groups of denoising data corresponding to the edge node data, wherein the invalid data are repeated data in the edge node data and data with larger deviation from other data in the edge node data;
processing each group of denoising data based on a second preset method to obtain training data corresponding to each group of denoising data;
inputting the training data of each group into a preset model for training to obtain respective corresponding parameter values of the training data of each group;
acquiring the flow trend of the service flow in preset time based on the parameter value and a preset flow prediction module, wherein the flow trend represents the fluctuation range of the service flow in the preset time;
and adjusting the quantity value of the edge nodes in the system from a first quantity value to a second quantity value based on the flow trend and a preset list, and controlling the edge nodes of the second quantity value to process the service flow, wherein the preset list is the corresponding relation between the service flow and the quantity of the edge nodes.
2. The method of claim 1, wherein adding a proxy server in the system comprises:
deploying a high availability application from the newly added proxy server, and determining a main server and a standby server from the high availability application;
generating a virtual IP address based on the main server and the standby server, and binding the virtual IP address with the main server;
and when the abnormality of the main server is detected, unbinding the virtual IP address from the main server, and binding the standby server from the virtual IP address until the main server is recovered.
3. The method of claim 1, wherein when an abnormal edge node is monitored, further comprising:
determining a first edge node which does not respond to heartbeat information, and taking the first edge node as an abnormal edge node, wherein the heartbeat information is used for determining the abnormal edge node; and/or
When the running state of the second edge node is determined to be a waiting state, responding to the waiting time corresponding to the waiting state to exceed the preset time, and taking the second edge node as an abnormal edge node; and/or
And determining the running state as a third edge node in the recovery state, and taking the third edge node as an abnormal edge node.
4. The method of claim 1, wherein adjusting the number of edge nodes in the system from a first number value to a second number value based on the traffic trend and a preset list and controlling the edge nodes of the second number value to process the traffic comprises:
obtaining a first quantity value of a current edge node and obtaining the total quantity of service traffic;
determining a second quantity value of the edge node corresponding to the service flow from the preset list based on the total quantity;
and adjusting the first quantity value to a second quantity value, and controlling an edge node of the second quantity value to process the service flow.
5. A system operation and maintenance management device, comprising:
the monitoring module is used for newly adding a proxy server in the system and deploying an edge node state monitoring task based on the newly added proxy server;
the recovery module is used for determining mirror image data corresponding to the abnormal edge node from a preset mirror image database when the abnormal edge node is monitored, and executing recovery of the abnormal edge node based on the mirror image data;
the system comprises an adjusting module, a traffic flow prediction module and a traffic flow prediction module, wherein the adjusting module is used for extracting all types of edge data in the edge node data when the traffic flow fluctuation trend in preset time is predicted based on an adaptive scheduling system, deleting invalid data in all the edge data based on a first preset method, generating all groups of denoising data corresponding to the edge node data, processing all the groups of denoising data based on a second preset method, obtaining training data corresponding to all the groups of denoising data, inputting all the groups of training data into a preset model for training, obtaining parameter values corresponding to all the groups of training data, obtaining the traffic flow trend of the traffic flow in the preset time based on the parameter values and the preset traffic flow prediction module, adjusting the number value of edge nodes in the system from the first number value to the second number value based on the traffic flow trend and a preset list, and controlling the edge nodes of the second number value to process the traffic flow.
6. The apparatus of claim 5, wherein the monitoring module is specifically configured to deploy a high availability application from the newly added proxy server, determine a primary server and a standby server from the high availability application, generate a virtual IP address based on the primary server and the standby server, bind the virtual IP address to the primary server, unbind the virtual IP address to the primary server when an anomaly of the primary server is detected, and bind the standby server to the virtual IP address until the primary server is restored.
7. An electronic device, comprising:
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-4 when executing a computer program stored on said memory.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-4.
CN202210877828.XA 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment Active CN115297124B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210877828.XA CN115297124B (en) 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment
PCT/CN2022/141396 WO2024021469A1 (en) 2022-07-25 2022-12-23 System operation and maintenance management method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210877828.XA CN115297124B (en) 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN115297124A CN115297124A (en) 2022-11-04
CN115297124B true CN115297124B (en) 2023-08-04

Family

ID=83824243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210877828.XA Active CN115297124B (en) 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN115297124B (en)
WO (1) WO2024021469A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297124B (en) * 2022-07-25 2023-08-04 天翼云科技有限公司 System operation and maintenance management method and device and electronic equipment
CN116450356B (en) * 2023-04-21 2024-02-02 珠海创投港珠澳大桥珠海口岸运营管理有限公司 Cross-border logistics management method based on cloud management and control

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111355610A (en) * 2020-02-25 2020-06-30 网宿科技股份有限公司 Exception handling method and device based on edge network
CN111756800A (en) * 2020-05-21 2020-10-09 网宿科技股份有限公司 Method and system for processing burst flow
CN112822283A (en) * 2021-01-21 2021-05-18 重庆紫光华山智安科技有限公司 Edge node control method and device, control node and storage medium
CN113315719A (en) * 2020-02-27 2021-08-27 阿里巴巴集团控股有限公司 Traffic scheduling method, device, system and storage medium
CN114499979A (en) * 2021-12-28 2022-05-13 云南电网有限责任公司信息中心 SDN abnormal flow cooperative detection method based on federal learning
CN114679463A (en) * 2022-05-09 2022-06-28 苏州思萃工业互联网技术研究所有限公司 Method and device for realizing PCDN (Primary Contourlet distribution) resource management

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017113273A1 (en) * 2015-12-31 2017-07-06 华为技术有限公司 Software defined data center and scheduling and traffic-monitoring method for service cluster therein
CN110838932A (en) * 2018-08-17 2020-02-25 阿里巴巴集团控股有限公司 Network current limiting method and device and electronic equipment
CN111314149B (en) * 2020-02-26 2023-07-18 赛特斯信息科技股份有限公司 System for realizing unified monitoring operation and maintenance management based on multiple edge cloud platforms
US20220104127A1 (en) * 2020-09-25 2022-03-31 Samsung Electronics Co., Ltd. Method and apparatus for power management in a wireless communication system
US11941879B2 (en) * 2020-10-22 2024-03-26 Mineral Earth Sciences Llc Edge-based processing of agricultural data
CN112511456B (en) * 2020-12-21 2024-03-22 北京百度网讯科技有限公司 Flow control method, apparatus, device, storage medium, and computer program product
US20220116289A1 (en) * 2021-12-22 2022-04-14 Palaniappan Ramanathan Adaptive cloud autoscaling
CN115297124B (en) * 2022-07-25 2023-08-04 天翼云科技有限公司 System operation and maintenance management method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111355610A (en) * 2020-02-25 2020-06-30 网宿科技股份有限公司 Exception handling method and device based on edge network
CN113315719A (en) * 2020-02-27 2021-08-27 阿里巴巴集团控股有限公司 Traffic scheduling method, device, system and storage medium
CN111756800A (en) * 2020-05-21 2020-10-09 网宿科技股份有限公司 Method and system for processing burst flow
CN112822283A (en) * 2021-01-21 2021-05-18 重庆紫光华山智安科技有限公司 Edge node control method and device, control node and storage medium
CN114499979A (en) * 2021-12-28 2022-05-13 云南电网有限责任公司信息中心 SDN abnormal flow cooperative detection method based on federal learning
CN114679463A (en) * 2022-05-09 2022-06-28 苏州思萃工业互联网技术研究所有限公司 Method and device for realizing PCDN (Primary Contourlet distribution) resource management

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
云计算***中基于边缘节点和容量的延迟分析;刘嵩; 李文蕙;《计算机应用于软件》;第31卷(第04期);全文 *

Also Published As

Publication number Publication date
CN115297124A (en) 2022-11-04
WO2024021469A1 (en) 2024-02-01

Similar Documents

Publication Publication Date Title
CN115297124B (en) System operation and maintenance management method and device and electronic equipment
CN106951559B (en) Data recovery method in distributed file system and electronic equipment
CN109656742B (en) Node exception handling method and device and storage medium
CN102385541B (en) The automatic recovery of controlled data center services
CN109286529B (en) Method and system for recovering RabbitMQ network partition
CN105229613A (en) Coordinate the fault recovery in distributed system
CN111818159A (en) Data processing node management method, device, equipment and storage medium
CN106789141B (en) Gateway equipment fault processing method and device
CN106874142B (en) Real-time data fault-tolerant processing method and system
CN112477919B (en) Dynamic redundancy backup method and system suitable for train control system platform
CN111209084B (en) FAAS distributed computing method and device
CN110399152A (en) A kind of device systems double copies upgrade method and device
CN113132176B (en) Method for controlling edge node, node and edge computing system
CN116340005B (en) Container cluster scheduling method, device, equipment and storage medium
CN1322422C (en) Automatic startup of cluster system after occurrence of recoverable error
CN108268305A (en) For the system and method for virtual machine scalable appearance automatically
CN108170507B (en) Virtual application management method/system, computer readable storage medium and server
CN109725916B (en) Topology updating system and method for stream processing
CN116400987B (en) Continuous integration method, device, electronic equipment and storage medium
CN112269693B (en) Node self-coordination method, device and computer readable storage medium
CN115378800B (en) Server-less architecture distributed fault tolerance system, method, device, equipment and medium
CN109032674B (en) Multi-process management method, system and network equipment
CN112612604B (en) Task scheduling method and device based on Actor model
CN115291891A (en) Cluster management method and device and electronic equipment
CN115421891A (en) Task distribution method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant