CN115904621A - Super-fusion system host maintenance method and device - Google Patents

Super-fusion system host maintenance method and device Download PDF

Info

Publication number
CN115904621A
CN115904621A CN202211439186.1A CN202211439186A CN115904621A CN 115904621 A CN115904621 A CN 115904621A CN 202211439186 A CN202211439186 A CN 202211439186A CN 115904621 A CN115904621 A CN 115904621A
Authority
CN
China
Prior art keywords
maintenance
host
virtual machine
target host
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211439186.1A
Other languages
Chinese (zh)
Other versions
CN115904621B (en
Inventor
周依然
徐文豪
张凯
王弘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiling Haina Technology Co ltd
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Publication of CN115904621A publication Critical patent/CN115904621A/en
Application granted granted Critical
Publication of CN115904621B publication Critical patent/CN115904621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a super-fusion system host maintenance method, which comprises the following steps: pre-checking a check item comprising a cluster operation and maintenance component, a calculation component and a storage component aiming at a target host; when the pre-inspection result passes and receives a maintenance entering instruction of a user, performing the pre-inspection again; after the pre-check is passed again, setting the target host machine as a non-dispatchable mode, setting the target host machine as a storage maintenance mode, and migrating the virtual machine on the target host machine to maintain the host machine under the condition of meeting a preset condition; and after the host maintenance is finished and the maintenance mode is checked through exiting, migrating the original virtual machine on the target host back to the target host. The method and the device can reduce operation interaction flow, reduce data recovery amount generated in the maintenance process of the target host and automatically migrate the virtual machine on the target host.

Description

Super-fusion system host maintenance method and device
Technical Field
The invention belongs to the technical field of network bandwidth management, and particularly relates to a super-convergence system host maintenance method, device, equipment and storage medium.
Background
The super-fusion infrastructure is a unified system of software definition, and is a technical architecture which integrates resources such as calculation, network and storage as infrastructure, can be selected, combined and defined according to specific service system requirements, and can conveniently and quickly build a data center and deploy a service system. In the super-convergence architecture, each node is a computing node, a network node and a storage node, and in the case that maintenance scenarios within expectations are required, such as: when the firmware is upgraded, the Kernel is upgraded or the hardware is replaced, the host needs to be powered off offline. Because the system architecture is a super-fusion system, the phenomenon of cluster computing resource reduction and storage resource reduction can be accompanied when the nodes are offline. From the perspective of computing resources, a user needs to ensure that the remaining computing resources of the cluster meet the requirement that a virtual machine running on a target node continues to run, otherwise, the service continuity of the user is affected; from the perspective of storage resources, a user needs to ensure that the remaining storage resources of the cluster meet the requirement that the storage resources included in the target node can perform data recovery, otherwise, a situation that part of data copies of the cluster are not in accordance with expectations occurs, and system stability is affected.
In the existing scene, a user needs to perform the following steps to maintain a host:
1. judging whether the target host meets the offline requirement or not through the console, and checking the operation condition of the key component;
2. migrating the running virtual machine on the target host to other hosts in the cluster in a live migration mode, so as to ensure that user services are not influenced;
3. taking the target host offline, and performing expected internal maintenance actions such as firmware upgrading, fault hardware replacement and the like;
4. after the maintenance is finished, migrating the original virtual machine of the target host to ensure that the computing resources of different nodes in the cluster are uniformly distributed;
5. and checking the running condition of the host to ensure that the host recovers the health state.
By adopting the scheme, the method has the following disadvantages:
1. a large number of checking actions are required to ensure that the target node is offline and cannot influence the cluster;
2. the running condition of the virtual machine needs to be recorded and manually migrated to ensure that the computing resources are uniform after the maintenance action is finished;
3. a large amount of data recovery occurs during maintenance, and it takes a long time to wait for the data to recover to the desired copy.
Disclosure of Invention
In order to solve the above problems, an object of the present invention is to provide a method and an apparatus for maintaining a super-fusion system host, which can reduce operation interaction flows, reduce data recovery amount generated in a target host maintenance process, and automatically migrate a virtual machine on a target host.
In order to achieve the purpose, the technical scheme of the invention is as follows: a super-fusion system host maintenance method comprises the following steps: pre-checking inspection items including a cluster operation and maintenance component, a calculation component and a storage component aiming at a target host; when the pre-inspection result passes and receives a maintenance entering instruction of a user, performing the pre-inspection again; after the pre-check is passed again, setting the target host machine as a non-dispatchable mode, setting the target host machine as a storage maintenance mode, and migrating the virtual machine on the target host machine to maintain the host machine under the condition of meeting a preset condition; and after the host maintenance is finished and the maintenance mode is checked through exiting, migrating the original virtual machine on the target host back to the target host.
Through the device, a user can directly perform operations such as host state inspection, virtual machine migration, host state record updating and the like through the console, preparation work required by host offline maintenance is automatically completed, and interaction behaviors are reduced. By introducing the storage maintenance mode, the data recovery amount generated by the host offline maintenance under the super-fusion scene is controlled, and the time consumption of the host offline maintenance is reduced. Marking the target host as non-dispatchable in preparation for entering the maintenance mode prevents a boundary scenario in the process of entry.
In an embodiment of the present invention, the performing, with respect to the target host, a pre-check on a check item including the cluster operation and maintenance component, the computation component, and the storage component further includes: checking whether a host in a cluster of the super-fusion system is in a maintenance mode ready to enter, a maintenance mode and a maintenance mode ready to exit through inquiring database records, wherein a task center designates a universal unique identification code to ensure that only one preposed check task can be operated at the same time and only one host in the cluster can be operated in the maintenance mode; performing health status check at least comprising a computing component, a storage component and an operation and maintenance component on the cluster and the target host, wherein when the platform is Elf/SMTZBS, the detection item further comprises virtual machine detection; and waiting to receive an incoming maintenance instruction of the user when the health state check passes.
In an embodiment of the present invention, the migrating the virtual machine on the target host further includes: performing pre-scheduling check of virtual machine migration, wherein the pre-scheduling check comprises: the running state of the virtual machine, whether the virtual machine contains direct equipment or not and whether the state is changed after the virtual machine is migrated or not; performing live migration on a running virtual machine; under a preset condition, performing shutdown and cold migration on a running virtual machine; and performing cold migration on the virtual machine in the shutdown state.
In an embodiment of the present invention, migrating the virtual machine on the target host further includes: in the storage maintenance mode, after the node where the target host is located is offline, the cluster does not automatically trigger data recovery after detection, so that the data recovery amount generated by a user in the subsequent maintenance period is reduced.
In an embodiment of the present invention, migrating the virtual machine on the target host further includes: in the process of migrating the virtual machines of the target host, only one virtual machine is migrated each time, and under the condition that the current virtual machine is failed to be migrated, the process is completely failed to enter, and the subsequent virtual machines are not migrated any more.
In an embodiment of the present invention, after the host maintenance is completed, the method further includes: starting a target host, wherein a service on the target host is started and self-started; the method comprises the steps that a target host is started and finished, a maintenance mode quitting checking instruction issued by a user is received, and the host is checked for an operation and maintenance component, a calculation component and a storage component; and under the condition that the operation and maintenance component, the calculation component and the storage component meet preset conditions, setting the node where the target host is located into a non-storage maintenance mode, and setting the host into a schedulable state.
In an embodiment of the present invention, the migrating the original virtual machine on the target host back to the target host further includes: performing pre-scheduling check of virtual machine migration back, wherein the pre-scheduling check comprises: the running state of the virtual machine, whether the virtual machine contains direct equipment or not and whether the state is changed after the virtual machine is migrated or not; performing live migration on a running virtual machine; and performing cold migration on the virtual machine in the shutdown state.
Based on the same conception, the invention also provides a super-fusion system host maintenance device, which comprises: the pre-inspection module is used for pre-inspecting inspection items including the cluster operation and maintenance component, the calculation component and the storage component aiming at the target host; the waiting execution module carries out the pre-inspection again under the condition that the pre-inspection result passes and receives a maintenance entering instruction of a user; the execution module is used for setting the target host as a non-dispatchable host after the pre-check is passed again, setting the target host as a storage maintenance mode, and migrating the virtual machine on the target host to maintain the host under the condition of meeting preset conditions; and the rebuilding module is used for migrating the original virtual machine on the target host to the target host after the host is maintained and checked by exiting the maintenance mode.
Based on the same concept, the present invention also provides a computer apparatus comprising: a memory for storing a processing program; and the processor executes the super-fusion system host maintenance method.
Based on the same concept, the invention also provides a readable storage medium, wherein a processing program is stored on the readable storage medium, and when the processing program is executed by a processor, the super-fusion system host maintenance method is realized.
After the technical scheme is adopted, compared with the prior art, the invention has the advantages that:
1. the maintenance mode is introduced to reduce the mental burden of a user in maintaining the target host, reduce the operation interaction flow, reduce the data recovery amount generated in the maintenance process of the target host and automatically migrate the virtual machine on the target host.
2. The states of 'entering maintenance' and 'maintenance' are introduced according to the state of the host in the cluster, so that the cluster can be ensured to execute the scheduling action of the computing resource correctly when the host enters the maintenance mode.
3. In the invention, after the node is offline, the cluster detection can automatically trigger data recovery, and when the host offline is an expected operation, the maintenance time consumption can be greatly reduced by avoiding the data recovery. The addition of the storage maintenance mode support can prevent cold data on the host in the storage maintenance mode from generating data recovery. If the system finds that one copy in a certain data block is in a storage maintenance mode, the copy is only marked to be recovered, but a recovery command is not really triggered, so that the aim of reducing the data recovery amount is fulfilled.
Drawings
The following detailed description of embodiments of the invention is provided in conjunction with the appended drawings, in which:
FIG. 1 is a diagram illustrating a host state machine of the host maintenance method of the super fusion system according to the present invention;
FIG. 2 is a schematic diagram illustrating a pre-inspection process of a host maintenance method of the super-fusion system according to the present invention;
FIG. 3 is a schematic diagram of a host computer maintenance method of the super-fusion system according to the present invention, ready for maintenance;
FIG. 4 is a schematic diagram of the host maintenance method of the super-fusion system according to the present invention for preparing to quit maintenance.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. Advantages and features of the present invention will become apparent from the following description and from the claims. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise ratio for the purpose of facilitating and distinctly aiding in the description of the embodiments of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and back \8230;) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the motion situation, etc. in a specific posture (as shown in the attached drawings), and if the specific posture is changed, the directional indicators are changed accordingly.
Example one
In the current scenario, a maintenance mode is introduced to relieve the mental burden of the user in maintaining the target host: the method has the advantages of reducing operation interaction flow, reducing data recovery amount generated in the maintenance process of the target host, and automatically migrating the virtual machine on the target host. The states of 'maintenance mode' and 'entering maintenance mode' are introduced according to the state of the host in the cluster, so that the cluster can be ensured to execute the scheduling action of the computing resource correctly when the host enters the maintenance mode. Please refer to fig. 1, which is a host state machine of the host maintenance method of the super-fusion system.
The host maintenance process is divided into the following stages: pre-inspection, preparation for entering maintenance, preparation for exiting maintenance inspection, and exiting maintenance.
Referring to fig. 2, the purpose of the pre-check is to check whether the target host can currently satisfy the enter maintenance mode condition. The states of the cluster operation and maintenance component, the computing component and the storage component are respectively checked. The method comprises the steps that all check items are independently and concurrently executed and are not mutually sensed, the check items are divided into necessary check items and secondary check items, if the necessary check items do not pass, a user cannot enable a target host to enter a maintenance mode, if the secondary check items do not pass, the user is prompted to not pass the reason, and the user can select to enable the target host to enter the maintenance mode.
Referring to fig. 3, after the user initiated pre-check passes, the user may choose to enter the target host into the maintenance mode, and during the entering process, the following actions are automatically performed:
1. setting the host state as 'ready to enter maintenance';
2. performing secondary pre-inspection to prevent a user from not initiating an entering action in time after the pre-inspection is finished, so that the cluster state is inconsistent with the pre-inspection state, and subsequent action execution fails;
3. setting host non-schedulable, preventing a user from creating and running a virtual machine on a target host during entry;
4. migrating the target host virtual machine, and migrating the target host to other nodes of the cluster;
5. setting a target node as a storage maintenance mode to reduce the data recovery amount generated by a user in the subsequent maintenance period;
6. the host state is set to "maintenance".
After the above actions are completed, referring to fig. 4, the user can observe that the target host is in the "maintenance mode" state in the console, and at this time, the user can perform the subsequent offline maintenance action.
After the off-line maintenance is completed, the user starts the target host, the service on the target host can be started automatically, after the host is started, the user can initiate the quit maintenance check through the console, and if the check is passed, the user can quit the maintenance of the target host. The complete process is as follows
1. Initiating a preparation quit maintenance check, checking the running state of a target host, and waiting for receiving a quit request initiated by a user after the check is passed;
2. setting a target node as a non-storage maintenance mode;
3. setting the target host to be in a schedulable state to allow subsequent auto-migration virtual machine actions to be schedulable to the target host;
4. migrating the virtual machine, and migrating the original virtual machine of the target host to the target node;
5. and (4) exiting the maintenance.
Example two
The embodiment provides an implementation mode based on the host maintenance method and the device pre-inspection of the super fusion system.
The pre-inspection of the embodiment ensures that all inspection items are independently decoupled, whether the inspection items are called by the maintenance mode device is not required to be sensed, each inspection item is realized as an independent Job Task type function, when the pre-inspection is triggered, a Task center Leader generates a Task set of the pre-inspection Job according to the related information of the current platform and the target node, and the Task set is uniformly submitted and scheduled. And after all the check items are executed, uniformly returning check results. The current Task center does not support the saving of the results of specific Task execution, the Task center is planned to be improved to support the characteristics, and the pre-inspection results are saved in specific Job Task information.
During the process of entering the maintenance mode, the target host computer performs a secondary check. Because each check in the pre-check needs to be recorded in the database after the execution of each check is completed, if the pre-check is changed into a single Task mode, compatibility adjustment is performed on the specific check implementation, and unnecessary conditions are introduced, so that the API scheduling mode is uniformly adopted for secondary check in the maintenance mode process.
Specifically, whether the cluster has a host in a state related to a maintenance mode is:
and checking whether the host computer of the cluster is in a maintenance mode related state or not by inquiring the database record. All maintenance mode tasks are submitted and executed through the task center, the task center assigns a unique UUID to ensure that only the same preposed inspection task can be operated at the same time, and the cluster can only have the same host to operate in a maintenance mode.
Cluster and target host health status checking:
the check that the host enters maintenance mode should be based on: judging whether the host computer can influence user service (calculation and storage) when off-line maintenance after entering a maintenance mode, classifying checking items according to the basis, and dividing the checking items into calculation component checking, storage component checking and operation and maintenance component checking, wherein when the platform is Elf/SMTZBS, the detection items comprise virtual machine detection, and when the platform is Vmware, the detection items remove virtual component detection:
1. the operation and maintenance component detection comprises the following steps:
and if the cluster exists, checking to fail.
2. The storage component detection comprises:
whether single copy data exist or not and the single copy data are on the target node, if so, the check is failed; (ZBS-meta _ text can only return all the copy cases at present, and self-filtering is needed; ZBS is needed to obtain the specified chunk text support according to chunk id);
zookeeper, if cluster 3 node, except target node, has abnormal node, then check not pass; if cluster 5 node, except the target node, has more than 1 abnormal node, then the check fails;
zbs-meta, the cluster has at least one survival meta node except the target node, if the condition is not satisfied, the detection is not passed;
whether a node in the cluster is in a storage maintenance mode exists, if the node in the cluster is in the storage maintenance mode, checking that the node does not pass;
whether there is data recovery for the cluster (if so, storage maintenance mode cannot be entered);
capacity detection, namely, the current residual capacity of the cluster, namely the residual capacity of the node, is larger than the used capacity of the node, and is only displayed and is not used as a basis for whether a maintenance mode can be entered or not;
3. the computing component detecting comprises:
mongo, if cluster 3 node, except the target node, has an abnormal node, then the check does not pass; if cluster 5 node, except the target node, has more than 1 abnormal node, then the check fails;
whether the target node storage network can be connected or not is judged, and if not, the target node storage network is checked to be not passed;
job-center-worker, whether the target node service is running or not, if not, checking that the service does not pass;
libvirtd; whether the target node service is running or not, if not, checking that the service does not pass;
whether a virtual machine on a target host can be live migrated;
whether the virtual machine on the target host can be cold migrated.
In the super-fusion system, each independent node is a storage node from the perspective of a storage component, the storage system adopts a copy mechanism to ensure data availability, after the nodes are offline, data recovery can be automatically triggered after cluster detection, and when the host offline is an operation within expectation, the time consumed by maintenance can be greatly reduced by avoiding the data recovery. The addition of the storage maintenance mode support can prevent cold data on the host in the storage maintenance mode from generating data recovery. If the system finds that one copy in a certain data block is in the storage maintenance mode, the copy is only marked to be recovered, but a recovery command is not really triggered, so that the aim of reducing the data recovery amount is fulfilled.
From the perspective of a computing component, a user service virtual machine may run on any one host in a cluster, and when a certain host is maintained, the virtual machine needs to be migrated first, so that it is ensured that user services are not affected. When a user enters a certain host into a maintenance mode, the host is firstly set to be non-dispatchable on a computing component level, and it is ensured that an unexpected virtual machine is not created/migrated to a target host in the entering process. Before the virtual machine is migrated, pre-scheduling check of virtual machine migration is carried out, if the pre-scheduling execution is completed, subsequent migration actions are carried out, and if the pre-scheduling execution fails, the virtual machine cannot be migrated, and the virtual machine cannot enter a maintenance mode. In the process that the target host enters the maintenance mode, in order to ensure that the migration influence range of the virtual machines is as small as possible, serial independent migration is selected for migration, only one virtual machine is migrated each time, the subsequent virtual machines are migrated after the current virtual machine is migrated successfully, if the current virtual machine is migrated successfully, the complete entering process fails, and the subsequent virtual machines are not migrated any more.
The target host enters a maintenance mode to be executed as an asynchronous task, the task does not comprise subtasks, and the complete control flow is logically controlled by the device, so that the device is ensured not to depend on the outside. In the task implementation, all steps in the entering process are guaranteed to be idempotent, so that reentrant under an abnormal scene is guaranteed. Whether the asynchronous task is executed successfully or fails, the state of the target host is finally set to be an expected state, and the consistency of the states is ensured.
And when the maintenance of the target host is finished and the target host is online again, the target host needs to be quitted from the maintenance. Before the real exit, the method firstly carries out the preparation exit maintenance check to check whether part of key services on the target host meet the exit condition, and if not, the exit of the maintenance mode is prohibited. And if the condition is met, performing subsequent exit action. In the exit process, the storage component exits the storage maintenance mode from the target node to enable the target node to be on line again; the computing component marks the target host as a schedulable state, and migrates the automatic migration virtual machine triggered in the process that the target host enters the maintenance mode back to the target host, so as to ensure the balance of cluster computing resources. The method is the same as that the target host computer is prepared to enter the maintenance mode, the maintenance check is prepared to exit and is executed as an asynchronous task, the task does not comprise subtasks, and the target host computer state is set to be the expected state finally no matter whether the exit is successful or failed, so that the state consistency is ensured.
EXAMPLE III
The embodiment provides a specific implementation mode of virtual machine migration based on the super fusion system host maintenance method and device.
During the host entering maintenance mode, virtual machine migration out and back actions are involved. The state and configuration of the virtual machine can affect whether the virtual machine is migrated back successfully or not. The configuration items related to the virtual machine being unable to migrate comprise:
1. whether the virtual machine is in a running state;
2. whether the virtual machine contains a pass-through device;
3. and whether the state of the virtual machine is changed after the virtual machine is migrated.
The above various combination scenarios are specifically divided as follows:
the first scenario is:
Figure BDA0003947835240000101
the second scenario is:
Figure BDA0003947835240000102
Figure BDA0003947835240000111
the third scenario is:
Figure BDA0003947835240000112
a fourth scenario:
Figure BDA0003947835240000113
Figure BDA0003947835240000121
preferably, during the process of entering the target host into the maintenance mode, multiple steps and component interactions are designed, and if more computing resources are present on the target host, the overall time consumption may be on the order of minutes, thus requiring the ability to support the user to cancel the entry. Before the maintenance mode entry/exit asynchronous task starts, it is recorded whether the current step supports a cancel action and whether the current task is marked as cancelled. Each step is checked for the current task flag bit before execution, providing an update unmarked bit capability through the API. If the marking bit is True, canceling the execution of the subsequent steps, and marking the whole task as cancelled; if the flag bit is False, it indicates that the user has not initiated a cancel action, and the subsequent steps are continued until the next step is executed.
Example four
Based on the same conception, the invention also provides a super-fusion system host maintenance device, which comprises: the pre-inspection module is used for pre-inspecting inspection items including the cluster operation and maintenance component, the calculation component and the storage component aiming at the target host; the waiting execution module carries out the pre-inspection again under the condition that the pre-inspection result passes and receives a maintenance entering instruction of a user; the execution module is used for setting the target host as non-dispatchable after the pre-check is passed again, setting the target host as a storage maintenance mode, and migrating the virtual machine on the target host to maintain the host under the condition that a preset condition is met; and the rebuilding module is used for migrating the original virtual machine on the target host to the target host after the host is maintained and checked by exiting the maintenance mode.
EXAMPLE five
Based on the same concept, the present invention also provides a computer device, which may have large differences due to different configurations or performances, and may include one or more processors (CPUs) (e.g., one or more processors) and memories, one or more storage media (e.g., one or more mass storage devices) storing applications or data. The memory and storage medium may be, among other things, transient or persistent storage. The program stored on the storage medium may include one or more modules (not shown), each of which may include a sequence of instructions operating on the computer device. Further, the processor may be configured to communicate with the storage medium to execute a series of instruction operations in the storage medium on the computer device.
The computer device may also include one or more power supplies, one or more wired or wireless network interfaces, one or more input-output interfaces, and/or one or more operating systems, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.
Those skilled in the art will appreciate that the computer device architecture of the present embodiment is not intended to be limiting of computer devices and may include more or fewer components than those shown or some components in combination or in a different arrangement of components.
The computer readable instructions, when executed by the processor, cause the processor to perform the steps of the embodiments described above when executing the computer readable instructions.
In an embodiment, a readable storage medium is provided, and when executed by one or more processors, the computer readable instructions enable the one or more processors to execute the above-mentioned super-fusion system host maintenance method, and specific steps are not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A super-convergence system host maintenance method is characterized by comprising the following steps:
pre-checking a check item comprising a cluster operation and maintenance component, a calculation component and a storage component aiming at a target host;
when the pre-inspection result passes and receives a maintenance entering instruction of a user, performing the pre-inspection again;
after the pre-check is passed again, setting the target host machine as a non-dispatchable mode, setting the target host machine as a storage maintenance mode, and migrating the virtual machine on the target host machine to maintain the host machine under the condition of meeting a preset condition;
and after the host maintenance is finished and the maintenance mode is checked through exiting, migrating the original virtual machine on the target host back to the target host.
2. The super-fusion system host maintenance method of claim 1, wherein the pre-checking the check items including the cluster operation and maintenance component, the computation component, and the storage component against the target host further comprises:
checking whether a host in a cluster of the super-fusion system is in a maintenance mode ready to enter, a maintenance mode and a maintenance mode ready to exit through inquiring database records, wherein a task center designates a universal unique identification code to ensure that only one preposed check task can be operated at the same time and only one host in the cluster can be operated in the maintenance mode;
performing health state check at least comprising a computing component, a storage component and an operation and maintenance component on the cluster and the target host, wherein when the platform is Elf/SMTZBS, the detection item further comprises virtual machine detection;
and waiting to receive an entering maintenance instruction of the user when the health state check passes.
3. The super fusion system host maintenance method of claim 1, wherein said migrating the virtual machine on the target host further comprises:
performing pre-scheduling check of virtual machine migration, wherein the pre-scheduling check comprises: the running state of the virtual machine, whether the virtual machine contains direct equipment or not, and whether the state is changed after the virtual machine is migrated;
performing live migration on a running virtual machine; the method comprises the following steps of performing shutdown and cold migration on a running virtual machine under a preset condition;
and performing cold migration on the virtual machine in the shutdown state.
4. The super fusion system host maintenance method of claim 3, migrating a virtual machine on the target host further comprising:
in the storage maintenance mode, after a node where a target host is located is offline, data recovery is not automatically triggered after cluster detection, so that the data recovery amount generated by a user in the subsequent maintenance period is reduced.
5. The super-converged system host maintenance method of claim 4, wherein migrating the virtual machine on the target host further comprises:
in the process of migrating the virtual machines of the target host, only one virtual machine is migrated each time, and under the condition that the current virtual machine is failed to be migrated, the process is completely failed to enter, and the subsequent virtual machines are not migrated any more.
6. The super-fusion system host maintenance method of claim 1, further comprising, after completion of host maintenance:
starting a target host, wherein a service on the target host is started and self-started;
the method comprises the steps that after a target host is started, a maintenance mode quitting check instruction issued by a user is received, and the host is checked for an operation and maintenance component, a calculation component and a storage component;
and under the condition that the operation and maintenance component, the calculation component and the storage component meet preset conditions, setting the node where the target host is located into a non-storage maintenance mode, and setting the host into a schedulable state.
7. The super-converged system host maintenance method of claim 1, wherein the migrating the original virtual machine on the target host back to the target host further comprises:
performing pre-scheduling check of virtual machine migration back, wherein the pre-scheduling check comprises: the running state of the virtual machine, whether the virtual machine contains direct equipment or not, and whether the state is changed after the virtual machine is migrated;
performing live migration on a running virtual machine;
and performing cold migration on the virtual machine in the shutdown state.
8. A super-convergence system host maintenance device is characterized by comprising:
the pre-inspection module is used for pre-inspecting inspection items including the cluster operation and maintenance component, the calculation component and the storage component aiming at the target host;
the waiting execution module carries out the pre-inspection again under the condition that the pre-inspection result passes and receives a maintenance entering instruction of a user;
the execution module is used for setting the target host as a non-dispatchable host after the pre-check is passed again, setting the target host as a storage maintenance mode, and migrating the virtual machine on the target host to maintain the host under the condition of meeting preset conditions;
and the rebuilding module is used for migrating the original virtual machine on the target host to the target host after the host is maintained and checked by exiting the maintenance mode.
9. A computer device, comprising:
a memory for storing a processing program;
a processor, which when executing the processing program, implements the hyper-converged system host maintenance method according to any one of claims 1 to 7.
10. A readable storage medium, having a processing program stored thereon, the processing program, when executed by a processor, implementing the hyper-converged system host maintenance method according to any one of claims 1 to 7.
CN202211439186.1A 2022-10-12 2022-11-17 Method and device for maintaining host of super fusion system Active CN115904621B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022112451482 2022-10-12
CN202211245148 2022-10-12

Publications (2)

Publication Number Publication Date
CN115904621A true CN115904621A (en) 2023-04-04
CN115904621B CN115904621B (en) 2023-09-19

Family

ID=86483544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211439186.1A Active CN115904621B (en) 2022-10-12 2022-11-17 Method and device for maintaining host of super fusion system

Country Status (1)

Country Link
CN (1) CN115904621B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399201A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of method, apparatus and cloud management platform of openstack calculate node host maintenance
CN111176790A (en) * 2019-12-30 2020-05-19 北京浪潮数据技术有限公司 Active maintenance method and device of cloud platform physical host and readable storage medium
CN111669284A (en) * 2020-04-28 2020-09-15 长沙证通云计算有限公司 OpenStack automatic deployment method, electronic device, storage medium and system
US11157263B1 (en) * 2020-06-15 2021-10-26 Dell Products L.P. Pipeline rolling update
US20220027247A1 (en) * 2020-07-21 2022-01-27 Hewlett Packard Enterprise Development Lp Maintenance operations based on analysis of collected data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399201A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of method, apparatus and cloud management platform of openstack calculate node host maintenance
CN111176790A (en) * 2019-12-30 2020-05-19 北京浪潮数据技术有限公司 Active maintenance method and device of cloud platform physical host and readable storage medium
CN111669284A (en) * 2020-04-28 2020-09-15 长沙证通云计算有限公司 OpenStack automatic deployment method, electronic device, storage medium and system
US11157263B1 (en) * 2020-06-15 2021-10-26 Dell Products L.P. Pipeline rolling update
US20220027247A1 (en) * 2020-07-21 2022-01-27 Hewlett Packard Enterprise Development Lp Maintenance operations based on analysis of collected data

Also Published As

Publication number Publication date
CN115904621B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
US9229707B2 (en) Zero downtime mechanism for software upgrade of a distributed computer system
US5805790A (en) Fault recovery method and apparatus
US7536582B1 (en) Fault-tolerant match-and-set locking mechanism for multiprocessor systems
US7774785B2 (en) Cluster code management
US8046520B2 (en) Compound computer system and method for sharing PCI devices thereof
US20070206611A1 (en) Effective high availability cluster management and effective state propagation for failure recovery in high availability clusters
CN111277460A (en) ZooKeeper containerization control method and device, storage medium and electronic equipment
JPH05181823A (en) Method and apparatus for controlling block in block partitioning type process environment
CN112199240B (en) Method for switching nodes during node failure and related equipment
US20210406127A1 (en) Method to orchestrate a container-based application on a terminal device
CN111857951A (en) Containerized deployment platform and deployment method
US6502176B1 (en) Computer system and methods for loading and modifying a control program without stopping the computer system using reserve areas
CN115964176B (en) Cloud computing cluster scheduling method, electronic equipment and storage medium
WO2023125482A1 (en) Cluster management method and device, and computing system
CN115904621A (en) Super-fusion system host maintenance method and device
US11579780B1 (en) Volume remote copy based on application priority
CN114816662A (en) Container arrangement method and system applied to Kubernetes
CN115277398A (en) Cluster network configuration method and device
CA2345200A1 (en) Cross-mvs-system serialized device control
CN109634721B (en) Method and related device for starting communication between virtual machine and host
CN113342499A (en) Distributed task calling method, device, equipment, storage medium and program product
CN113342511A (en) Distributed task management system and method
US20230393882A1 (en) Management of virtual machine shutdowns in a computing environment based on resource locks
CN115686802B (en) Cloud computing cluster scheduling system
CN108196990B (en) Self-checking method and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100098

Patentee after: Beijing Zhiling Haina Technology Co.,Ltd.

Country or region after: China

Address before: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100098

Patentee before: Beijing zhilinghaina Technology Co.,Ltd.

Country or region before: China