CN115314363A - Service recovery method, service deployment method, server, and storage medium - Google Patents

Service recovery method, service deployment method, server, and storage medium Download PDF

Info

Publication number
CN115314363A
CN115314363A CN202210163053.XA CN202210163053A CN115314363A CN 115314363 A CN115314363 A CN 115314363A CN 202210163053 A CN202210163053 A CN 202210163053A CN 115314363 A CN115314363 A CN 115314363A
Authority
CN
China
Prior art keywords
cluster
service
container service
recovery
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210163053.XA
Other languages
Chinese (zh)
Other versions
CN115314363B (en
Inventor
颜金姑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN202210163053.XA priority Critical patent/CN115314363B/en
Publication of CN115314363A publication Critical patent/CN115314363A/en
Application granted granted Critical
Publication of CN115314363B publication Critical patent/CN115314363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The embodiment of the invention relates to the technical field of cloud platforms, and discloses a service recovery method, which comprises the following steps: the edge cluster senses the running state of the container service under the edge cluster; if the running state of the container service is an autonomous state, triggering a rescheduling flow of the container service in the edge cluster by the edge cluster; after the rescheduling in the edge cluster fails, the edge cluster sends information that the feedback container service is in an autonomous state to the center cluster, so that the center cluster triggers a rescheduling process of the container service in the center cluster. The embodiment of the invention also discloses a service deployment method, a server and a storage medium. The service recovery method, the service deployment method, the server and the storage medium provided by the embodiment of the invention can automatically recover the edge container service in the autonomous state, thereby reducing the cost of manual intervention recovery, shortening the time of service exception and improving the stability and reliability of the service.

Description

Service recovery method, service deployment method, server and storage medium
Technical Field
The present invention relates to the technical field of cloud platforms, and in particular, to a service recovery method, a service deployment method, a server, and a storage medium.
Background
In the cloud platform service, the condition of network jitter or abnormality and the like can cause the edge container service to enter an autonomous state, namely, the edge container service performs service management according to the condition of the edge container service. And the central service has difficulty in mastering the running state of the edge service, i.e. the availability of the edge container service cannot be guaranteed. In this case, the operation and maintenance personnel generally detect the operation state of the edge service and perform recovery processing on the edge service.
However, it is difficult for the conventional service recovery method to recover the edge service from normal operation in time, and it is difficult for the user to meet the requirements on the stability and reliability of the service.
Disclosure of Invention
The embodiment of the invention aims to provide a service recovery method, a service deployment method, a server and a storage medium, which are used for automatically recovering edge container services in an autonomous state, reducing the cost of manual intervention recovery, shortening the time of service exception and improving the stability and reliability of the services.
In order to achieve the above object, an embodiment of the present invention provides a service recovery method, including: the edge cluster senses the running state of the container service under the edge cluster; if the running state of the container service is an autonomous state, triggering a rescheduling flow of the container service in the edge cluster by the edge cluster; after the rescheduling in the edge cluster fails, the edge cluster feeds back the information that the container service is in the autonomous state to the center cluster, so that the center cluster triggers the rescheduling process of the container service in the center cluster.
In order to achieve the above object, an embodiment of the present invention further provides a service recovery method, applied to a central cluster, where the method includes: after receiving the information that the container service sent by the edge cluster is in an autonomous state, triggering a rescheduling flow of the container service in the central cluster; wherein, the edge cluster is the edge cluster in the administration of the central cluster.
In order to achieve the above object, an embodiment of the present invention further provides a service deployment method, applied to a central service, where the method includes: providing a configuration interface comprising a plurality of configuration items; the configuration items comprise basic information configuration items, recovery strategy configuration items and rescheduling strategy configuration items; receiving configured basic information of the container service through a basic information configuration item, receiving a recovery strategy of the container service in a central cluster and a recovery strategy in an edge cluster through a recovery strategy configuration item, and receiving a rescheduling strategy of the container service in the central cluster through a rescheduling strategy configuration item; generating arrangement information of the container service according to the basic information of the container service, a recovery strategy of the container service in the central cluster, a recovery strategy of the container service in the edge cluster and a rescheduling strategy of the container service in the central cluster; synchronizing the arrangement information of the container service to the central cluster so that the central cluster can complete the deployment of the container service in the edge cluster according to the arrangement information of the container service; after the container service is deployed by the central service, the edge cluster is used for executing the service recovery method applied to the edge cluster, and the central cluster is used for executing the service recovery method applied to the central cluster.
An embodiment of the present invention further provides a server, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the service restoration method described above or to perform the service deployment method described above.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above-mentioned service recovery method or implements the above-mentioned service deployment method.
Compared with the related art, the edge cluster senses the running state of the container service under the edge cluster, so that the edge container service can be recovered in time when the running state of the edge container service changes or is abnormal. If the edge cluster senses that the operation state of the container service is an autonomous state, which indicates that the container service has abnormal operation, the edge cluster triggers a rescheduling process of the container service in the edge cluster, so that the container service can recover normal operation in the edge cluster. Further, if a rescheduling failure occurs in the edge cluster, it indicates that the edge cluster cannot recover the container service, and therefore, after the rescheduling failure occurs in the edge cluster, the edge cluster feeds back information that the container service is in an autonomous state to the central cluster, so that the central cluster can trigger a rescheduling process of the container service in the central cluster, that is, the container service is recovered, and the recovery of the container service is further ensured. The automatic recovery of the container service in the autonomous state can be realized in the process, and compared with the manual recovery of the container service in the abnormal state, the automatic recovery of the container service in the autonomous state can effectively reduce the processing cost, shorten the abnormal service time and improve the service stability to a certain extent.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
Fig. 1 is a schematic flowchart of a service recovery method according to an embodiment of the present invention;
fig. 2 is a diagram illustrating another flow of a service recovery method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a service recovery method applied to a central cluster according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a service deployment method according to an embodiment of the present invention;
fig. 5 is a schematic view of an application scenario of a service deployment method according to an embodiment of the present invention;
FIG. 6 is another schematic flow chart of a service deployment method provided by an embodiment of the present invention;
fig. 7 is a specific example diagram of a service recovery method and a service deployment method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present invention in its various embodiments. However, the technical solution claimed in the present invention can be implemented without these technical details and various changes and modifications based on the following embodiments.
An embodiment of the present invention relates to a service restoration method. In this embodiment, the edge cluster senses the operating state of the container service under the edge cluster; if the running state of the container service is an autonomous state, triggering a rescheduling process of the container service in the edge cluster, and after the rescheduling in the edge cluster fails, feeding back information that the container service is in the autonomous state to the central cluster so as to trigger the rescheduling process of the container service in the central cluster by the central cluster.
The following describes implementation details of the service restoration method in this embodiment in detail, and the following is only for facilitating understanding of the implementation details of the present solution and is not necessary for implementing the present solution. The specific process is shown in fig. 1, and may include the following steps:
step 101, an edge cluster senses the running state of container services under the edge cluster.
And 102, if the running state of the container service is an autonomous state, triggering a rescheduling flow of the container service in the edge cluster by the edge cluster.
The autonomous state referred to in this step is specifically a state in which the edge node does not communicate with the edge cluster and the center cluster in time, and may be detected by preset detection software, for example, when the edge node has no corresponding feedback within a preset time, it may be determined that the edge node enters the autonomous state. In this step, if it is sensed that the operation state of the container service is an autonomous state, it indicates that the container service has an abnormal operation, and the edge cluster triggers a rescheduling process of the container service in the edge cluster. The process of rescheduling the container service in the edge cluster may specifically include: for the container service in the abnormal state, if the edge nodes in the idle state exist in the rest edge nodes in the edge cluster to which the container service belongs, the container service is dispatched to the edge nodes in the idle state, and then the rescheduling process of the container service is successfully completed; if the rest edge nodes do not have edge nodes in the idle state, the rescheduling failure in the edge cluster is indicated.
In addition, the step of scheduling the container service to the edge node in the idle state may be preset to schedule the container service to the edge node in the idle state and having the largest idle state by default. In practical implementation, the scheduler may perform a process of rescheduling the container service in the edge cluster.
Step 103, after the rescheduling in the edge cluster fails, feeding back the information that the container service is in the autonomous state to the center cluster, so that the center cluster triggers the rescheduling process of the container service in the center cluster.
If the rescheduling failure occurs in the edge cluster, it indicates that the edge cluster cannot recover the container service, so that the edge cluster feeds back the information that the container service is in the autonomous state to the central cluster, so that the central cluster can trigger the rescheduling process of the container service in the central cluster according to the received information that the container service is in the autonomous state, that is, the container service is recovered, and the container service is further ensured to be recovered to a normal operation state.
In one example, each container service in an edge cluster is pre-configured with a recovery policy within the edge cluster. In this example, if the operation state of the container service is an autonomous state, before the triggering of the edge cluster by the edge cluster to perform the rescheduling process of the service in the edge cluster, the method may further include: the edge cluster acquires a recovery strategy of the container service configured in the edge cluster, and identifies the recovery strategy of the container service as common recovery. For the container service which is configured to be recovered together by the recovery strategy in the edge cluster, after the edge cluster fails to reschedule the container service, the center cluster can automatically reschedule the container service in the center cluster, so that the container service can be ensured to be recovered to normal operation.
In addition to the common restore recovery policy referred to in the example above, the edge cluster restore policy for each container service may be configured as an edge cluster restore, a center cluster restore, or no restore. After the recovery policy of the container service configured in the edge cluster is obtained, the identified case that the recovery policy of the container service is other recovery policies is also included. If the edge cluster identifies that the recovery strategy of the container service is recovery in the edge cluster, the edge cluster triggers a flow of rescheduling the container service in the edge cluster, and after the rescheduling in the edge cluster fails, the recovery flow is ended; if the edge cluster identifies that the recovery strategy of the container service is the recovery of the central cluster, the edge cluster feeds back the information that the container service is in an autonomous state to the central cluster, so that the central cluster triggers a rescheduling process of the container service in the central cluster; if the edge cluster identifies that the recovery strategy of the container service is not required to be recovered, the edge cluster finishes the recovery process. Different recovery strategies in the edge cluster are configured for the container service in advance, so that different container services can be recovered flexibly by adopting different recovery strategies, namely, the different container services are recovered in a targeted manner, and the recovery requirements of the different container services are further met.
In another example, each container service in the edge cluster may also be pre-configured with a recovery policy within the central cluster. In this example, after the edge cluster feeds back the information that the container service is in the autonomous state to the center cluster, and before the center cluster triggers a flow of rescheduling the container service in the center cluster, the method may further include: the central cluster receives the information that the container service fed back by the edge cluster is in an autonomous state; the central cluster acquires a recovery strategy of the container service configured in the central cluster, and identifies the recovery strategy of the container service as common recovery or central cluster recovery. If the central cluster identifies that the recovery strategy in the central cluster configured by the container service is common recovery or central cluster recovery, the central cluster automatically performs rescheduling processing on the container service in the central cluster.
In addition to the common recovery or central cluster recovery referred to in the above example, the recovery policy within the central cluster for each container service may also be configured as either intra-edge cluster recovery or no recovery required. Then after obtaining the recovery policy of the container service configured in the central cluster, the method further includes identifying that the recovery policy of the container service is recovery or recovery-unnecessary in the edge cluster. If the central cluster identifies that the recovery strategy of the container service is recovery in the edge cluster or does not need recovery, the central cluster finishes the recovery process. Different recovery strategies in the central cluster are configured for the container service in advance, so that different container services can be recovered flexibly by adopting different recovery strategies, namely, different container services are recovered in a targeted manner, and the recovery requirements of different container services are further met.
In the above example, the feedback of the information that the container service is in the autonomous state from the edge cluster to the center cluster may specifically be: the edge cluster sends the information that the rescheduling of the container service in the edge cluster fails and the information that the container service is in an autonomous state to the central cluster; if the central cluster identifies that the recovery policy of the container service is common recovery, before triggering a process of rescheduling the container service in the central cluster, the method further includes: the central cluster receives information that the container service fails to reschedule within the edge cluster. The edge cluster sends the information of the rescheduling failure in the edge cluster to the central cluster together, so that the central cluster can know whether the edge cluster carries out the rescheduling process on the container service or not, and the waste of resources caused by rescheduling of the central cluster on the container service which is already rescheduled is avoided. For the container services with the recovery policies configured to be recovered together in the edge cluster, the central cluster receives the information that the rescheduling of the container services in the edge cluster fails, and can ensure that the rescheduling of the container services in the edge cluster is not completed in the edge cluster, so that the central cluster can trigger the flow of the rescheduling of the container services in the central cluster.
The process of rescheduling the container service in the central cluster, which is related in the above process, may include: and the central cluster acquires a rescheduling strategy of the container service configured in the central cluster, and executes a rescheduling process of the container service in the central cluster according to the rescheduling strategy. The rescheduling strategy referred to herein may be understood as a screening strategy of a target scheduling area of a container service. The rescheduling strategy can be configured for the container service in advance during service deployment, and different rescheduling strategies can be provided for different container services according to the service characteristics of the container services during configuration so as to meet the requirements of the container services with different characteristics for subsequent service. The method specifically comprises the rescheduling strategies of scheduling with the same operator in the same city, scheduling with the same operator in the same province, scheduling with the same operator in the same district and the like. And executing a flow of rescheduling the container service in the central cluster according to the rescheduling strategy, namely screening out an edge cluster meeting the requirement from the edge clusters administered by the central cluster according to the rescheduling strategy of the edge container service for the central cluster, and scheduling the container service to the edge cluster meeting the requirement.
In addition, after the central cluster triggers the rescheduling process of the container service in the central cluster, if the edge cluster senses that the running state of the service is recovered from the autonomous state to the normal state, the edge cluster can also feed back information of successful recovery of the container service to the central cluster, and then the central cluster finishes the recovery process. Therefore, the central cluster can acquire the current state of the container service, and the resource waste caused by the fact that the central cluster continues to recover the container service is avoided.
Furthermore, in order to avoid that the edge cluster cannot feed back the state of the container service due to network failure or the like, which affects the normal operation of the container service, the central cluster may also trigger a process of rescheduling the container service under the edge cluster in the central cluster when the information that the container service fed back by the edge cluster is in the autonomous state is not received within a preset time and the information that the container service is in the normal state is not received. If the center cluster does not receive the message fed back by the edge cluster within the preset time, it indicates that the edge cluster cannot feed back the state, and problems such as network failure may occur. In this case, the central cluster triggers a rescheduling process for the container service in the edge cluster after the preset time, and schedules the container service to the edge cluster meeting the service requirement, so that the overall availability of the edge container service can be ensured.
It should be noted that after the edge container service in the autonomous state completes the rescheduled flow in the edge cluster and recovers the normal operation state, the expected number of instances of the container service may also be recovered, so as to achieve the availability of the edge container service deployed by the edge cluster.
For better understanding of the service recovery method provided in this embodiment, reference may be made to a flow diagram of the service recovery method shown in fig. 2. In the flow shown in fig. 2, the edge cluster and the center cluster interact with information about the state of the container service and the like, and jointly implement a flow of service recovery, so that automatic recovery of the container service in an autonomous state can be realized.
Compared with the related art, the edge cluster senses the running state of the container service under the edge cluster, so that the edge container service can be recovered in time when the running state of the edge container service changes or is abnormal. If the edge cluster senses that the operation state of the container service is an autonomous state, which indicates that the container service has abnormal operation, the edge cluster triggers a rescheduling process of the container service in the edge cluster, so that the container service can recover normal operation in the edge cluster. Further, if the rescheduling failure occurs in the edge cluster, it indicates that the edge cluster cannot recover the container service, and therefore, after the rescheduling failure occurs in the edge cluster, the edge cluster feeds back information that the container service is in an autonomous state to the center cluster, so that the center cluster can trigger a process of rescheduling the container service in the center cluster, that is, the container service is recovered, and the recovery of the container service is further ensured. The process can realize automatic recovery of the container service in the autonomous state, save the processing cost compared with manual recovery of the container service in the abnormal state, shorten the time of service abnormity and effectively improve the stability of the service.
Another embodiment of the invention relates to a service recovery method applied to a central cluster. In the embodiment, after receiving the information that the container service is in the autonomous state fed back by the edge cluster, the center cluster triggers a rescheduling process of the container service in the center cluster; wherein, the edge cluster is the edge cluster in the administration of the central cluster.
The following describes implementation details of the service restoration method in this embodiment in detail, and the following is only for facilitating understanding of the implementation details of the present solution and is not necessary for implementing the present solution. The specific process is shown in fig. 3, and may include the following steps:
step 201, receiving information that a container service fed back by an edge cluster is in an autonomous state; wherein, the edge cluster is an edge cluster in the central cluster jurisdiction.
In this step, the central cluster receives the information that the container service fed back by the edge cluster is in the autonomous state, learns that the container service in the jurisdiction range operates abnormally, and can perform corresponding recovery processing on the container service in the central cluster.
Step 202, triggering the rescheduling process of the container service in the central cluster.
In this step, after knowing that the container service in the jurisdiction area of the central cluster is abnormal in operation, the central cluster triggers a rescheduling process of the container service in the central cluster, and performs recovery processing on the container service in the central cluster.
In one example, each container service in an edge cluster is pre-configured with a recovery policy within the central cluster. In this example, after receiving the information that the container service fed back by the edge cluster is in the autonomous state and before triggering the flow of rescheduling the container service in the central cluster, the central cluster may further: and acquiring a recovery strategy of the container service configured in the central cluster, and identifying the recovery strategy of the container service as common recovery or central cluster recovery.
In addition to the common recovery or central cluster recovery referred to in the example above, the recovery policy within the central cluster for each container service may be configured as either intra-edge cluster recovery or no recovery. After the recovery policy of the container service configured in the central cluster is obtained, the case that the recovery policy of the container service is identified as recovery in the edge cluster or recovery is not needed is also included. If the recovery strategy of the container service is identified to be recovery in the edge cluster or not, the recovery process is ended. Different recovery strategies in the central cluster are configured for the container service in advance, so that different container services can be recovered flexibly by adopting different recovery strategies, namely, different container services are recovered in a targeted manner, and the recovery requirements of different container services are further met.
In order to prevent the central cluster from wasting processing resources of the central cluster by repeatedly performing recovery processing on the container service whose recovery policy is common recovery, if the recovery policy of the container service is identified as common recovery, before triggering a flow of rescheduling the container service in the central cluster, the central cluster may further: information is received that the container service fails to reschedule within the edge cluster. Receiving the information that the rescheduling of the container service in the edge cluster fails, and ensuring that the commonly recovered container service does not complete the rescheduling in the edge cluster, so that the central cluster can trigger the flow of the rescheduling of the container service in the central cluster.
The process of rescheduling the container service in the central cluster related to the foregoing process may specifically include: and acquiring a rescheduling strategy of the container service configured in the central cluster, and executing a rescheduling process of the container service in the central cluster according to the rescheduling strategy. The rescheduling strategy referred to herein may also be understood as a screening strategy for a target scheduling area of a container service. The rescheduling strategy can be configured for the container service in advance during service deployment, and different rescheduling strategies can be provided for different container services according to the service characteristics of each container service during configuration so as to meet the requirements of container services with different characteristics on subsequent service. The method specifically comprises the rescheduling strategies of scheduling with the same operator in the same city, scheduling with the same operator in the same province, scheduling with the same operator in the same district and the like.
And executing a flow of rescheduling the container service in the central cluster according to the rescheduling strategy, namely screening out edge clusters meeting the requirements from the edge clusters administered by the central cluster according to the rescheduling strategy of the edge container service, and scheduling the container service to the edge clusters meeting the requirements.
In one example, after triggering the flow of the container service rescheduling within the central cluster, the central cluster may further: after receiving the information that the container service is successfully recovered and fed back by the edge cluster, ending the recovery process; the information that the container service is successfully recovered is sent by the edge cluster after the information that the container service is in the autonomous state is sent to the central cluster and the operation state of the container service is perceived to be recovered from the autonomous state to the normal state. In this example, after sensing that the operation state of the container service is restored to normal, the edge cluster feeds back a relevant message to the central cluster, so that the central cluster can timely know that the container service is restored to normal operation. And the central cluster can immediately finish the recovery flow of the edge container service, thereby avoiding the waste of resources caused by carrying out related recovery processing on the edge container service again.
In another example, the central cluster may also: and if the information that the container service fed back by the edge cluster is in the autonomous state and the information that the container service is in the normal state are not received within the preset time, triggering a rescheduling process of the container service under the edge cluster in the central cluster. If the center cluster does not receive the message fed back by the edge cluster within the preset time, it indicates that the edge cluster cannot feed back the state, and the problems such as network failure may occur. In this case, the central cluster triggers a rescheduling process for the container service in the edge cluster after a preset time, and schedules the container service to the edge cluster meeting the service requirement, so as to ensure the overall availability of the edge container service.
It should be understood that this embodiment is an embodiment on the center cluster side corresponding to the aforementioned method embodiment, and that this embodiment can be implemented in cooperation with the aforementioned method embodiment. The related technical details mentioned in the foregoing method embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition.
In this embodiment, the center cluster receives the information that the container service fed back by the edge cluster is in the autonomous state, and the edge cluster is an edge cluster in the jurisdiction of the center cluster, so that the abnormal operation of the container service in the jurisdiction can be known. Further, the central cluster triggers a process of rescheduling the container service in the central cluster, that is, the container service is recovered in the central cluster. In the process, the central cluster can automatically recover the container service in the autonomous state in the edge cluster, and compared with the manual recovery of the container service in the abnormal state, the method can effectively reduce the processing cost, shorten the time of service abnormity and improve the service stability to a certain extent.
Another embodiment of the invention relates to a service deployment method, which is applied to central services. In this embodiment, the central service provides a configuration interface comprising a plurality of configuration items; the configuration items comprise basic information configuration items, recovery strategy configuration items and rescheduling strategy configuration items; receiving configured basic information of the container service through a basic information configuration item, receiving a recovery strategy of the container service in a central cluster and a recovery strategy in an edge cluster through a recovery strategy configuration item, and receiving a rescheduling strategy of the container service in the central cluster through a rescheduling strategy configuration item; generating arrangement information of the container service according to the basic information of the container service, a recovery strategy of the container service in the central cluster, a recovery strategy of the container service in the edge cluster and a rescheduling strategy of the container service in the central cluster; synchronizing the arrangement information of the container service to the central cluster so that the central cluster can complete the deployment of the container service in the edge cluster according to the arrangement information of the container service; after the container service is deployed by the central service, the edge cluster is used for executing the service recovery method applied to the edge cluster, and the central cluster is used for executing the service recovery method applied to the central cluster.
The following describes implementation details of the service restoration method in this embodiment in detail, and the following is only for facilitating understanding of the implementation details of the present solution and is not necessary for implementing the present solution. The specific process is shown in fig. 4, and may include the following steps:
step 301, providing a configuration interface comprising a plurality of configuration items; the configuration items comprise basic information configuration items, recovery strategy configuration items and rescheduling strategy configuration items.
In this step, the central service provides a configuration interface including a plurality of configuration items, so that a user can configure basic information of each container service, a recovery policy in the central cluster, a recovery policy in the edge cluster, and a rescheduling policy in the central cluster through the configuration interface.
Step 302, receiving the configured basic information of the container service through the basic information configuration item, receiving the recovery strategy of the container service in the central cluster and the recovery strategy in the edge cluster through the recovery strategy configuration item, and receiving the rescheduling strategy of the container service in the central cluster through the rescheduling strategy configuration item.
The basic information of the container service related in this step may specifically include a container specification, a mirror image, a deployment cluster, and the like; the rescheduling policy may specifically include a default scheduling policy: the method preferably selects the most redundancy of clusters, a scheduling strategy of the same city and the same operator, a scheduling strategy of the same province and the same operator, a scheduling strategy of the same district and the same operator and the like.
The container service restoration policy within the central cluster and the restoration policy within the edge cluster may each be configured as one of a common restoration, a central cluster restoration, an edge cluster restoration, and a no need restoration.
Step 303, generating the arrangement information of the container service according to the basic information of the container service, the recovery strategy of the container service in the central cluster, the recovery strategy of the container service in the edge cluster and the rescheduling strategy of the container service in the central cluster.
And step 304, synchronizing the arrangement information of the container service to the central cluster so that the central cluster can complete the deployment of the container service in the edge cluster according to the arrangement information of the container service.
The service deployment method provided by this embodiment may adopt the following deployment manner in actual implementation, and an exemplary scenario may be as shown in fig. 5. The container services recovery policies within the central cluster and the recovery policies within the edge cluster may be provided by the console. The console can provide basic configuration of the service, deployment of the service and operation of the service besides the recovery strategy. The central cluster manages each edge cluster, and all service management is issued through the central cluster. A deployment flow schematic may be as shown in fig. 6.
The communication between the edge cluster and the edge node (EdgeNode) adopts a Kubeedge architecture, the edge cluster is deployed with Cloudcore, and the edge node is deployed with Edgecore; when the console performs operations such as adding, deleting, modifying and the like on the service, the central service generates arrangement information, distributes and synchronizes to the central cluster API Server; the central cluster API Server receives the Etcd which requests to store the arrangement information in the central cluster, the central cluster Controller creates or deletes the Pod according to the service expected copy, and the central cluster Scheduler schedules the unscheduled service to the edge cluster API Server after monitoring the event of the unscheduled service. The edge cluster API Server receives the Etcd which requests to store the arrangement information in the edge cluster, the edge cluster Controller creates and deletes the Pod according to the service expected copy, the edge cluster Scheduler schedules the service to the edge node after monitoring the event of unscheduled service, the scheduling result is stored in the Etcd, and the message is sent to the edge node Edgecore after the Cloudcore monitors the event; and the edge node Edgecore completes the service related request after receiving the message. Among them, etcd is the primary data storage of kubernets and is also the actual standard system for container arrangement.
After the central service deploys the container service according to the deployment method, the edge cluster is used to execute the service recovery method in the foregoing embodiment, and the central cluster is used to execute the service recovery method in the foregoing embodiment. The relevant technical details of the service recovery method applied to the edge cluster or the central cluster mentioned in the foregoing embodiment are still valid in this embodiment, and are not described again in this embodiment in order to reduce repetition.
In this embodiment, the central service provides a configuration interface including a basic information configuration item, a recovery policy configuration item, and a rescheduling policy configuration item, so that a user can configure relevant information for the container service through the configuration interface. The central service further receives the basic information of the configured container service, the recovery strategy in the central cluster, the recovery strategy in the edge cluster and the rescheduling strategy in the central cluster through the configuration items, and can configure different relevant strategies for the central service according to different edge service characteristics, so that the subsequent recovery processing on the central service can meet the service requirements. Generating the arrangement information of the container service according to the basic information of the container service, the recovery strategy of the container service in the central cluster, the recovery strategy of the container service in the edge cluster and the rescheduling strategy of the container service in the central cluster, and synchronizing the arrangement information of the container service to the central cluster so that the central cluster can complete the deployment of the container service in the edge cluster according to the arrangement information of the container service. After the service deployment process is carried out, when the container service runs abnormally, the central cluster and the edge cluster can carry out different automatic recovery processing on the container service according to the characteristics of the container service. The method can perform more accurate recovery processing on different container services, and can meet the service requirements of each service. Compared with manual recovery of container service in an abnormal state, the method can effectively reduce processing cost, shorten abnormal service time and improve service stability to a certain extent.
In order to better describe the service recovery method and the service deployment method provided by the above embodiments of the present invention, a specific example is described below, and please refer to fig. 7. Specifically, fig. 7 exemplifies recovery of the edge snapshot task executor during autonomy, where an edge snapshot may be understood as an edge dial test task, and a resident edge task executor needs to be deployed at an edge node to ensure that a dial test task of a client can be quickly executed and fed back by an edge.
Firstly, because the area required by task execution is very wide, the structure of managing the edge equipment by the kubbeenge nano-tube is required to be adopted, so that the problem of edge autonomy is brought; secondly, the edge snapshot needs to be quickly restored to service under the service autonomy, so that the tasks can be pulled, executed and fed back; thirdly, since the dial testing task has a regional requirement, when the service is recovered, the service also needs to be recovered in the same region. The following is a description of how to guarantee the normal service of the snapshot task executor at the edge node through the recovery policy to ensure the success rate of the snapshot task:
1. the edge task executor capurlabel pre-configures basic information, a recovery strategy and a rescheduling strategy of the service.
2. The client source station initiates a snapshot request, the central service salmon completes authentication after receiving the request, and the request is handed to the central pontus to deploy the edge task executor capurling agent.
3. And polling the regional requirement of the unprocessed task in the task pool by the central pontus, judging whether the regional requirement has a capurling agent for deploying the edge task executor, and if the regional requirement has the undeployed edge task executor, initiating a deployment request to the central cluster.
4. The central cluster completes deployment of the edge task executor, capurling, according to the flow of fig. 3.
5. The edge task executor capurling periodically pulls the task to execute the execution and feedback of the guarantee task.
6. When the edge task executor is autonomous, the edge cluster and the center cluster reschedule the edge task executor to be deployed to other clusters again according to the flow of fig. 5. During rescheduling, the edge task executor is preferentially arranged on the edge node of the corresponding area, so that the task can be pulled, executed and fed back by the edge task executor, and the execution success rate of the task is guaranteed.
The steps of the above methods are divided for clarity of description, and may be combined into one step or split into multiple steps during implementation, and all steps are within the scope of the present patent as long as they contain the same logical relationship; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
Another embodiment of the invention is directed to a server, as shown in FIG. 8, comprising at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401 to enable the at least one processor 401 to execute the service restoration method.
Where the memory 402 and the processor 401 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. Data processed by the processor 401 may be transmitted over a wireless medium through an antenna, which may receive the data and transmit the data to the processor 401.
The processor 401 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by processor 401 in performing operations.
Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
Those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above-described embodiments are provided for the realization and use of the present invention by a person skilled in the art, who may make modifications or changes to the above-described embodiments without departing from the inventive idea of the present invention, and therefore the scope of protection of the present invention is not limited by the above-described embodiments, but should be accorded the widest scope of the inventive features set forth in the claims.

Claims (14)

1. A method for service recovery, the method comprising:
the edge cluster senses the running state of the container service under the edge cluster;
if the running state of the container service is an autonomous state, triggering a rescheduling process of the container service in the edge cluster by the edge cluster;
after the rescheduling in the edge cluster fails, the edge cluster feeds back the information that the container service is in the autonomous state to a central cluster, so that the central cluster triggers a rescheduling process of the container service in the central cluster.
2. The service recovery method according to claim 1, wherein if the operation state of the container service is an autonomous state, before the edge cluster triggers a flow of rescheduling of the container service in the edge cluster, the method further includes:
the edge cluster acquires a recovery strategy of the container service configured in the edge cluster, and identifies that the recovery strategy of the container service is common recovery.
3. The service recovery method according to claim 2, further comprising, after the edge cluster obtains the recovery policy of the container service configured in the edge cluster:
if the edge cluster identifies that the recovery strategy of the container service is recovery in the edge cluster, triggering a flow of rescheduling the container service in the edge cluster by the edge cluster, and ending the recovery flow after the rescheduling in the edge cluster fails;
if the edge cluster identifies that the recovery strategy of the container service is recovery of a center cluster, the edge cluster feeds back information that the container service is in an autonomous state to the center cluster, so that the center cluster triggers a rescheduling process of the container service in the center cluster;
and if the edge cluster identifies that the recovery strategy of the container service is not required to be recovered, the edge cluster finishes the recovery process.
4. The service recovery method according to claim 3, wherein after the edge cluster feeds back the information that the container service is in the autonomous state to the central cluster, and before the central cluster triggers the flow of rescheduling the container service in the central cluster, the method further comprises:
the central cluster receives the information that the container service fed back by the edge cluster is in an autonomous state;
the central cluster acquires a recovery strategy of the container service configured in the central cluster, and identifies that the recovery strategy of the container service is common recovery or central cluster recovery.
5. The service recovery method according to claim 4, wherein the edge cluster feeds back information that the container service is in an autonomous state to the central cluster, specifically:
the edge cluster sends the information that the rescheduling of the container service in the edge cluster fails and the information that the container service is in an autonomous state to the central cluster;
if the central cluster identifies that the recovery policy of the container service is common recovery, before the triggering the flow of rescheduling the container service in the central cluster, the method further includes:
and the central cluster receives the information that the container service fails to be rescheduled in the edge cluster.
6. The service recovery method according to claim 4, further comprising, after the central cluster acquires the recovery policy of the container service configured in the central cluster:
if the central cluster identifies that the recovery strategy of the container service is recovery in the edge cluster or does not need recovery, the central cluster finishes the recovery process.
7. The service recovery method according to any one of claims 1 to 6, wherein the process of rescheduling the container service in the central cluster comprises:
and the central cluster acquires a rescheduling strategy of the container service configured in the central cluster, and executes a rescheduling process of the container service in the central cluster according to the rescheduling strategy.
8. The service restoration method according to any of claims 1 to 6, wherein the method further comprises:
and if the central cluster does not receive the information that the container service fed back by the edge cluster is in the autonomous state and does not receive the information that the container service is in the normal state within the preset time, triggering a rescheduling process of the container service under the edge cluster in the central cluster by the central cluster.
9. The service recovery method according to any one of claims 1 to 6, further comprising, after the central cluster triggers a flow of the container service rescheduled within the central cluster:
if the edge cluster senses that the running state of the service is recovered to a normal state from the autonomous state, the edge cluster feeds back information that the container service is successfully recovered to the central cluster;
the central cluster ends the recovery process.
10. A service recovery method applied to a central cluster, the method comprising:
after receiving the information that the container service fed back by the edge cluster is in an autonomous state, triggering a rescheduling process of the container service in the central cluster;
wherein the edge cluster is an edge cluster in the jurisdiction of the central cluster.
11. A service deployment method is applied to a central service, and the method comprises the following steps:
providing a configuration interface comprising a plurality of configuration items; the configuration items comprise basic information configuration items, recovery strategy configuration items and rescheduling strategy configuration items;
receiving basic information of the configured container service through the basic information configuration item, receiving a recovery strategy of the container service in a central cluster and a recovery strategy in an edge cluster through the recovery strategy configuration item, and receiving a rescheduling strategy of the container service in the central cluster through the rescheduling strategy configuration item;
generating the arrangement information of the container service according to the basic information of the container service, the recovery strategy of the container service in a central cluster, the recovery strategy of the container service in an edge cluster and the rescheduling strategy of the container service in the central cluster;
synchronizing the scheduling information of the container service to the central cluster so that the central cluster can complete the deployment of the container service in the edge cluster according to the scheduling information of the container service;
wherein, after the central service deploys the container service, the edge cluster is configured to perform the service recovery method according to any one of claims 1 to 9, and the central cluster is configured to perform the service recovery method according to any one of claims 1 to 10.
12. The service deployment method of claim 11,
the container service restoration policy in the central cluster and the container service restoration policy in the edge cluster are configured to be one of common restoration, central cluster restoration, edge cluster restoration and non-restoration.
13. A server, comprising:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a service restoration method as claimed in any one of claims 1 to 10, or a service deployment method as claimed in claim 11 or 12.
14. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the service recovery method according to any one of claims 1 to 10, or the service deployment method according to claim 11 or 12.
CN202210163053.XA 2022-02-22 2022-02-22 Service recovery method, service deployment method, server and storage medium Active CN115314363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210163053.XA CN115314363B (en) 2022-02-22 2022-02-22 Service recovery method, service deployment method, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210163053.XA CN115314363B (en) 2022-02-22 2022-02-22 Service recovery method, service deployment method, server and storage medium

Publications (2)

Publication Number Publication Date
CN115314363A true CN115314363A (en) 2022-11-08
CN115314363B CN115314363B (en) 2024-04-12

Family

ID=83855647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210163053.XA Active CN115314363B (en) 2022-02-22 2022-02-22 Service recovery method, service deployment method, server and storage medium

Country Status (1)

Country Link
CN (1) CN115314363B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491343A (en) * 2017-09-08 2017-12-19 中国电子科技集团公司第二十八研究所 A kind of across cluster resource scheduling system based on cloud computing
CN107943555A (en) * 2017-10-17 2018-04-20 华南理工大学 Big data storage and processing platform and processing method under a kind of cloud computing environment
WO2020023100A1 (en) * 2018-07-23 2020-01-30 Pure Storage, Inc. Non-disruptive conversion of a clustered service from single-chassis to multi-chassis
CN111262906A (en) * 2020-01-08 2020-06-09 中山大学 Method for unloading mobile user terminal task under distributed edge computing service system
CN112448858A (en) * 2021-02-01 2021-03-05 腾讯科技(深圳)有限公司 Network communication control method and device, electronic equipment and readable storage medium
CN113301102A (en) * 2021-02-03 2021-08-24 阿里巴巴集团控股有限公司 Resource scheduling method, device, edge cloud network, program product and storage medium
CN113296903A (en) * 2021-02-01 2021-08-24 阿里巴巴集团控股有限公司 Edge cloud system, edge control method, control node and storage medium
US20210392042A1 (en) * 2020-04-02 2021-12-16 Vmware, Inc. Dynamic configuration of a cluster network in a virtualized computing system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491343A (en) * 2017-09-08 2017-12-19 中国电子科技集团公司第二十八研究所 A kind of across cluster resource scheduling system based on cloud computing
CN107943555A (en) * 2017-10-17 2018-04-20 华南理工大学 Big data storage and processing platform and processing method under a kind of cloud computing environment
WO2020023100A1 (en) * 2018-07-23 2020-01-30 Pure Storage, Inc. Non-disruptive conversion of a clustered service from single-chassis to multi-chassis
CN111262906A (en) * 2020-01-08 2020-06-09 中山大学 Method for unloading mobile user terminal task under distributed edge computing service system
US20210392042A1 (en) * 2020-04-02 2021-12-16 Vmware, Inc. Dynamic configuration of a cluster network in a virtualized computing system
CN112448858A (en) * 2021-02-01 2021-03-05 腾讯科技(深圳)有限公司 Network communication control method and device, electronic equipment and readable storage medium
CN113296903A (en) * 2021-02-01 2021-08-24 阿里巴巴集团控股有限公司 Edge cloud system, edge control method, control node and storage medium
CN113301102A (en) * 2021-02-03 2021-08-24 阿里巴巴集团控股有限公司 Resource scheduling method, device, edge cloud network, program product and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
乐光学;戴亚盛;杨晓慧;***;游真旭;朱友康;: "边缘计算可信协同服务策略建模", 计算机研究与发展, no. 05, 15 May 2020 (2020-05-15) *
褚轶群;: "一种基于预测和激励机制的网格任务调度框架", 计算机应用与软件, no. 10, 15 October 2008 (2008-10-15) *

Also Published As

Publication number Publication date
CN115314363B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN111290834B (en) Method, device and equipment for realizing high service availability based on cloud management platform
US6859889B2 (en) Backup system and method for distributed systems
EP1153346B1 (en) Server system and method for operating the same
EP3253028B1 (en) Method for managing instance node and management device
CA1169155A (en) Computer or processor control systems
CN1312922A (en) Fault tolerant computer system
CN105468450A (en) Task scheduling method and system
JPH10240470A (en) Information processor, network print system, its controlling method and storing medium storing program
CN112910937B (en) Object scheduling method and device in container cluster, server and container cluster
US20090013209A1 (en) Apparatus for connection management and the method therefor
CN108268305A (en) For the system and method for virtual machine scalable appearance automatically
WO2022242148A1 (en) Ota differential upgrade method and system for master-slave architecture
CN107544867B (en) Method, device and system for recovering intelligent network service
CN113391902B (en) Task scheduling method and device and storage medium
CN115314363B (en) Service recovery method, service deployment method, server and storage medium
CN110569115B (en) Multi-point deployment process management method and process competing method
CN111010313A (en) Batch processing state monitoring method, server and storage medium
CN112231601B (en) Link management method, device, equipment and computer storage medium
CN109634787B (en) Distributed file system monitor switching method, device, equipment and storage medium
CN113225576B (en) Service migration system and method based on live broadcast platform edge computing scene
US9652342B2 (en) Redundancy processing method and system, and information processing apparatus thereof
CN113645099B (en) High availability monitoring method, device, equipment and storage medium
CN112214323B (en) Resource recovery method and device and computer readable storage medium
CN115001956B (en) Method, device, equipment and storage medium for running server cluster
US20230367632A1 (en) Job management system and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant