CN110445662A

CN110445662A - OpenStack control node is adaptively switched to the method and device of calculate node

Info

Publication number: CN110445662A
Application number: CN201910810282.4A
Authority: CN
Inventors: 刘梦可; 刘超
Original assignee: Shanghai Instrument Electric (group) Co Ltd Central Research Institute
Current assignee: Shanghai Instrument Electric (group) Co Ltd Central Research Institute
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2019-11-12
Anticipated expiration: 2039-08-29
Also published as: CN110445662B

Abstract

The present invention relates to the method and devices that a kind of OpenStack control node is adaptively switched to calculate node, the OpenStack includes several groups control node group and calculate node group, this method comprises: S1: by several groups control node group be divided into changeable control node group and can not switching control node group, to be switched control node is elected from changeable control node group by election algorithm；S2: being monitored by period clocked flip, if discovery has, node failure occurs in calculate node group or total load is excessively high, is triggered adaptive upgrading process, is otherwise terminated process；Wherein, the adaptive upgrading process specifically: to be switched control node is switched to by calculate node by automatic management tool in conjunction with container technique and calculate node group described in step S2 is added.Compared with prior art, the present invention has many advantages, such as high-efficient.

Description

OpenStack control node is adaptively switched to the method and device of calculate node

Technical field

The present invention relates to Openstack cloud platform technical fields, adaptive more particularly, to a kind of OpenStack calculate node The method and device of control node should be switched to.

Background technique

OpenStack is the management platform of the cloud computing of an open source, may be implemented to provide a large amount of distributed calculating The management in source, storage resource and Internet resources, while unified management-plane being provided, OpenStack be both a community and One project and an open source software, it supports almost all kinds of cloud environment, provides the solution of a deployment cloud Or tool set.Its objective is: helping tissue to operate to the cloud of virtual computing or storage service, provides for public cloud, private clound Expansible, flexible cloud computing, OpenStack pass through the production practices of many years, and what is developed is highly developed.

It is general to dispose the model that framework is more control nodes and more calculate nodes, control in the cloud platform of middle and small scale Node can be multiplexed with network node, distributed storage service can be deployed in control node, in calculate node or other On independent node.As the service life of server in cloud platform is continuously increased, the failure rate of server is also continuously increased, In The emergency of server failure is frequently encountered in actual production environment, when especially reusable storage clothes occur for calculate node When business functional fault, will lead to the virtual machine on the node can not externally provide service in a short time；In addition, working as cloud platform meter The load too high of operator node group also will affect the performance of virtual machine, further influence user experience.In order to improve virtualization The operation continuity of group system can restore the business when the virtual machine of some operation business breaks down in time Operation makes the time of service disconnection minimum, and virtual cluster would generally support High Availabitity technology, which can periodically monitor void The service condition of quasi- machine can restore the virtual machine of failure when virtual-machine fail occurs in time, guarantee that virtual machine operation connects Continuous property.Each server all configures an agency in the virtual cluster of High Availabitity technology, and agency continuously detects it The state of his server, agency's timing send heartbeat signal to other servers, it is assumed that a server can not continuously be rung three times Heartbeat signal is answered, agency thinks the server fail and reports corresponding failure, and agency can be by the institute in failed server Having virtual machine, restarting is on other servers to restore virtual machine business in virtual cluster, to guarantee the continuous of business Property, either increase new server or the original failed server of replacement, and directly by cloud platform by way of RPM packet Each calculating service, external network proxy services are deployed in the operating system of newly-increased server, but this method due to server above and below Frame process is cumbersome, deployment is complicated, therefore timeliness is low, restores slow, is difficult to meet the requirement in production environment to high-timeliness.

The prior art also proposed effective solution scheme regarding to the issue above, and Chinese patent CN108089911A is proposed The control method and device of a kind of calculate node in OpenStack environment, this method comprises: periodically being supervised in control node The status data for surveying calculate node, determines whether corresponding calculate node can be used, will be in corresponding calculate node if unavailable The virtual machine of upper operation, which is dispersed into, is confirmed as available calculate node according to the status data monitored in the environment, the party Method, which passes through, determines whether calculate node can be used, and evacuates the virtual machine run in not available calculate node in time, can Farthest shorten RTO and RPO, restore virtual machine business in the shortest time, keeps high availability, but receive virtual machine Calculate node load increases the service operation that also results on the node and slows, while by the void of unavailable calculate node Quasi- machine is dispersed to other available calculate nodes, has aggravated the operation burden of other calculate nodes, this method can not effectively solution be definitely The problem of operator node load too high, influences the service quality of tenant's virtual machine.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of OpenStack to calculate Node self-adapting is switched to the method and device of control node.

The purpose of the present invention can be achieved through the following technical solutions:

A kind of method that OpenStack control node is adaptively switched to calculate node, the OpenStack are control section The topological structure that point group and calculate node group are constituted, this method comprises:

S1: by control node group be divided into changeable control node group and can not switching control node group, pass through election algorithm To be switched control node is elected from changeable control node group；

S2: being monitored by period clocked flip, if discovery has calculate node group node failure or excessively high total load occur, touching From upgrading process is adapted to, otherwise terminate process；

Wherein, the adaptive upgrading process specifically: will be to be switched by automatic management tool in conjunction with container technique Control node is switched to calculate node and calculate node group described in step S2 is added.

Further, the cloud platform management that the control node group provides High Availabitity controls service, the changeable control Supplement of the node group as control node group, it is possible to provide management service and virtual network service improve the management performance of cloud platform, Increase the service bandwidth of centralized virtual network, since each service of entire control node group realizes High Availabitity, individually The switching of control node not will cause the service disruption of cloud platform.

Further, the calculate node provides computing resource virtualization services, managing computing resources service and virtual two The services such as layer network agency, provide the computing resources such as CPU, memory for virtual machine.

Further, OpenStack cloud platform uses containerization deployment way, and all services are all encapsulated into corresponding In Docker mirror image, start the service by way of starting container, avoids the dependence collision problem between different services, simultaneously Facilitate the upgrading rollback of each service, effectively solves the problems, such as that cloud platform deployment is difficult, upgrading is difficult；

The container mirror image of each service is stored in the privately owned warehouse local Docker, and the layering in conjunction with container mirror image is special Property, all customization mirror images of cloud platform are realized by the way of mirror image layering, are layered by four layers of realization mirror image, from top to bottom Successively are as follows: each service in operating system foundation image, cloud platform foundation image, each functional module foundation image and module Mirror image is installed by the repetition that mirror image layering can avoid relying on packet, reduces total storage size of mirror image, and deployment efficiency is improved.

Further, by label customized label carry out control node group grouping, it is described can not switching control node Group is at least 3, guarantees the minimum number requirement of cloud platform High Availabitity.

Further, it when disposing the control node in the changeable control node group, pre-installs needed for calculate node Container mirror image, when cloud platform service upgrade, keep the container mirror image synchronization of the control node to update, can be with by the container mirror image Start cloud platform and calculate service, while avoiding the network transmission of big file that performance is caused to decline；

Further, the election algorithm selection reference index is worth a smallest node, the election algorithm selection ginseng The smallest node of index value is examined, the reference index is that cpu load or network flow or Cost value, the Cost value pass through Weighted sum algorithm acquires, and calculation formula is as follows:

Wherein W_iFor weighted value, X_iIt includes any in the input parameter of CPU usage amount, memory usage amount and network flow for being The combination of parameter, N are the number for inputting parameter.

Further, to be switched control node is quickly switched to by the combination container technique by automatic management tool Calculate node specifically:

All containers in be switched control node, the automatically dispose kit are cleared up using automatically dispose tool Include Ansible, reservation and the consistent operating system layer of calculate node and Docker service layer, and the meter that quick start switches The service of operator node, the service include nova-libvirtd, nova-compute and neutron-ope nvswitch- agent。

Further, there is the judgment method of node failure in calculate node group specifically:

The monitoring system sends heartbeat packet to each calculate node in calculate node group, if there is calculate node that can not receive Heartbeat packet, then node failure occurs for the calculate node group.

Further, the judgment method of calculate node group load too high includes that total load calculates and total load prediction.

Further, the total load calculation method specifically:

The load of current calculate node, the load are collected by the monitoring agent in each calculate node in calculate node group Including CPU, memory and network flow, when the total load of calculate node group is more than pre-set threshold value, then the calculate node Group total load is excessively high；

Further, the total load prediction technique specifically:

History monitoring data based on calculate node is carried out by the neural network linear regression model (LRM) of multiple input single output Prediction, the neural network linear regression model (LRM) are as follows:

Z=WX+B

Wherein Z is calculate node load estimation value, X={ x₁,x₂,…,x_NIt is input sample, which includes time, void Quasi- machine quantity and tenant's quantity, W={ w₁,w₂,…,w_NIt is weight matrix, B={ b₁It is excursion matrix, use mean square deviation function As cost function, W and B are calculated by forward calculation and backward conduction, calculating section is acquired according to the load estimation value Z of acquisition Point group total load, if total load is more than pre-set threshold value, the calculate node group total load is excessively high.

A kind of OpenStack control node is adaptively switched to the device of calculate node, including memory and processor, institute It states memory and is stored with computer program, the processor calls the computer program to execute such as the step of the method.

Compared with prior art, the present invention have with following the utility model has the advantages that

(1) present invention by the state of periodically monitoring calculate node group includes whether that node failure and total load occurs Excessively high, automatic trigger control node is switched to the process of calculate node, realizes the self-healing or dilatation of calculate node group, maintains cloud flat The high availability of platform, wherein the calculation of the total load of calculate node group includes calculating and predicting in real time to calculate, and prediction calculates It is predicted based on historical data by neural network linear regression model (LRM), realizes that calculate node group total load reaches given threshold It is preceding with regard to carry out node switching, avoid having an impact cloud platform；

(2) present invention uses containerization deployment way to each service of cloud platform, only when carrying out node switching flow Quickly cleaning for environment need to can be realized to container service original in control node；

(3) service of cloud platform is stored in the privately owned warehouse Docker by the present invention by the way of mirror image layering, wherein Mirror image layering be followed successively by from top to bottom operating system foundation image, cloud platform foundation image, each functional module foundation image and The mirror image of each service in module reduces total storage size of mirror image, only needs by the to be switched control node that environment is cleared up pre- First all mirror images of the downloading about calculate node service in the privately owned warehouse Docker, and non-start up corresponding container, can quickly open Dynamic corresponding container service, is disposed high-efficient；

(4) present invention by control node group be divided into changeable control node group and can not switching control node group, wherein not Changeable control node group ensure that the high availability of cloud platform, so that the switching of individual node will not influence cloud platform service Continuity.

Detailed description of the invention

Fig. 1 is the flow chart of adaptive switching node；

Fig. 2 is the frame diagram of adaptive switching node；

Fig. 3 is the flow chart of one switching node of embodiment；

Fig. 4 is the Docker container deployment diagram of three kinds of category nodes.

Specific embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to Following embodiments.

Embodiment one

A kind of method that OpenStack control node is adaptively switched to calculate node, OpenStack are control node group The topological structure constituted with calculate node group, such as Fig. 1, this method comprises:

Election algorithm selects reference index to be worth a smallest node, and the election algorithm selection reference index value is the smallest One node, the reference index are that cpu load or network flow or Cost value, the Cost value are asked by weighted sum algorithm , calculation formula is as follows:

It is specific that be switched control node is quickly switched to by calculate node by automatic management tool in conjunction with container technique Are as follows:

There is the judgment method of node failure in calculate node group specifically:

Monitoring system sends heartbeat packet to each calculate node in calculate node group, if there is calculate node that can not receive heartbeat Packet, then node failure occurs for the calculate node group.

The judgment method of calculate node group load too high specifically:

Specifically, step S2 is specifically as shown in Figure 3 in the present embodiment, comprising:

101) the every five minutes clocked flip monitoring systems of timer to the load information of each calculate node of cloud platform into Row is collected, and sends heartbeat packet to calculate node；

102) judge whether calculate node group node failure occurs or total load is excessively high, it is no if so then execute step 103) Then process terminates；

103) step 104) is directly executed if monitoring system is configured to silent mode, otherwise passes through mail or short massage notice Administrator otherwise terminates process if administrator agrees to then follow the steps 104)；

104) to be switched control node is elected from changeable control node group by election algorithm；

105) container in the to be switched control node, reservation operations system layer are cleared up by Ansible automatically；

106) to be switched control node container service relevant to calculate node is started by Ansible automatically, is switched For calculate node, and the calculate node is added to calculate node group, terminates process.

Embodiment two

Calculate node group total load is calculated by prediction algorithm and is obtained in this implementation, other to be the same as example 1, prediction Algorithm specifically:

Z=WX+B

Wherein Z is calculate node load estimation value, X={ x₁,x₂,…,x_NIt is input sample, which includes time, void Quasi- machine quantity and tenant's quantity, W={ w₁,w₂,…,w_NIt is weight matrix, B={ b₁It is excursion matrix, use mean square deviation function As cost function, W and B are calculated by forward calculation and backward conduction, according to the load of each calculate node in acquisition group Predicted value Z acquires calculate node group total load, if total load is more than pre-set threshold value, the calculate node group total load mistake It is high.

Embodiment three

A kind of OpenStack control node corresponding with embodiment one is adaptively switched to the device of calculate node, OpenStack includes control node group and calculate node group, which includes:

Fault monitor module, for by sending heartbeat packet to each calculate node in calculate node group, judgement should Whether calculate node group occurs node failure；

Load detecting module, for collecting each calculate node load information member in calculate node group, according to the negative of collection Information carrying breath calculates the calculate node group total load, or according to the historic load information prediction calculate node group total load, and according to Setting load threshold judges whether the calculate node group overloads；

Node processing module, for control node group to be divided into changeable control node group and can not switching control node Group elects to be switched control node from changeable control node group by election algorithm, passes through certainly in conjunction with container technique To be switched control node is switched to calculate node by dynamicization management tool, and the calculate node is added to occur node failure or The excessively high calculate node group of total load；

Timing trigger module, for setting monitoring cycle, and according to the period clocked flip fault monitor module and load Detection module operation.

Peripheral unit of the device of the present embodiment as cloud platform monitors the load information and malfunction of calculate node, Load information includes CPU, memory and key network flow, while management node switching flow.

Topology of the architecture of cloud platform using M+N node, the model comprising M control node and N number of calculate node, Wherein control node reusable network node, M control node are divided into several control node groups of composition, and several control node groups are logical Cross label customized label be divided into changeable control node group and can not switching control node group, can not switching control node group At least 3, guarantee the minimum number requirement of cloud platform High Availabitity.

The cloud platform management that control node group provides a High Availabitity controls service: providing includes computing module, cloud hard disk The application programming interfaces API service of management module and mirror image management module and inside work including controller assemblies and scheduling component Make component, these services be it is stateless, pass through haproxy+keepalived realize load balancing High Availabitity；It provides simultaneously altogether Enjoy database and Message Queuing Services, the two services be it is stateful, database service by MySQL Gelera realize it is more Main high-availability cluster, RabbitMQ cluster realize message queue High Availabitity by mirror image pattern；

The cloud platform management that control node group can provide High Availabitity by L3 Agent simultaneously controls service, includes tenant The service such as gateway, extranet access, Floating IP address, virtual firewall of network, the changeable control node group is as control node The supplement of group, it is possible to provide management service and virtual network service improve the management performance of cloud platform, increase centralized virtual network Service bandwidth, and pass through keepalived realize virtual flow-line active and standby High Availabitity.

Calculate node group provides computing resource virtualization services, managing computing resources service and virtual double layer network agency etc. Service, provides the computing resources such as CPU, memory for virtual machine.

OpenStack cloud platform uses containerization deployment way, and all services are all encapsulated into corresponding Docker mirror image, Start the service by way of starting container, all nodes keep the consistency of operating system version and Docker service release The dependence collision problem between different services is avoided, while facilitating the upgrading rollback of each service, effectively solves cloud platform deployment Problem difficult, upgrading is difficult.

Such as Fig. 2, the container mirror image of each service is stored in the privately owned warehouse local Docker, in conjunction with point of container mirror image All customization mirror images of layer characteristic, cloud platform are realized by the way of mirror image layering, are layered by four layers of realization mirror image, by upper It arrives down successively are as follows: each clothes in operating system foundation image, cloud platform foundation image, each functional module foundation image and module The mirror image of business is installed by the repetition that mirror image layering can avoid relying on packet, reduces total storage size of mirror image, and deployment effect is improved Rate.

All services of cloud platform are started in a manner of Docker container, and all nodes keep operating system version and Docker The consistency of service release ensures flatness, the stability of node switching.

Such as Fig. 4, the Docker container of control node includes the API service and internal component of cloud platform modules, is shared Database, message queue and load balancing service all realize high availability schemes, due to each clothes of entire control node group Business realizes High Availabitity, meets the continuity of the cloud platform service in node switching, and the switching of single control node will not make At the service disruption of cloud platform.

The Docker container of calculate node includes that nova-compute calculates service, neutron-openvswitch-ag Virtual double layer network service of ent etc., wherein all mirrors of the node pre-download calculate node service in changeable control node group Picture, and non-start up corresponding container, therefore the corresponding container service capable of fast starting when the node upgrades to calculate node.

When disposing the control node in changeable control node group, container mirror image needed for pre-installing calculate node, Yun Ping It keeps the container mirror image synchronization of the control node to update when platform service upgrade, cloud platform can star by the container mirror image and calculate Service, while avoiding the network transmission of big file that performance is caused to decline.

Embodiment one, embodiment two and embodiment three are by being based on current state, including the generation node event of calculate node group Barrier or load too high are predicted by the neural network linear regression model (LRM) of multiple input single output based on historical data, are touched Hair control node is switched to the process of calculate node, and by using containerization deployment platform service, node only needs simple when switching Old container service is deleted, then passes through the Docker mirror image quick start service of pre-download, quick process ensure that guarantee cloud The timeliness and high availability of platform switching.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be within the scope of protection determined by the claims.

Claims

1. a kind of method that OpenStack control node is adaptively switched to calculate node, the OpenStack includes several groups Control node group and calculate node group, which is characterized in that this method comprises:

S1: by several groups control node group be divided into changeable control node group and can not switching control node group, pass through election calculate Method elects to be switched control node from changeable control node group；

S2: being monitored by period clocked flip, if discovery has, node failure occurs in calculate node group or total load is excessively high, is triggered adaptive Process should be upgraded, otherwise terminate process；

Wherein, the adaptive upgrading process specifically: pass through automatic management tool for be switched control in conjunction with container technique Node is switched to calculate node and calculate node group described in step S2 is added.

2. the method that a kind of OpenStack control node according to claim 1 is adaptively switched to calculate node, special Sign is, carries out the grouping of control node group by label customized label, it is described can not switching control node group be at least 3 It is a.

3. the method that a kind of OpenStack control node according to claim 1 is adaptively switched to calculate node, special Sign is that the election algorithm selection reference index is worth a smallest node, and the reference index is cpu load or network flow Amount or Cost value；

Wherein, the Cost value is acquired by weighted sum algorithm, and calculation formula is as follows:

Wherein W_iFor weighted value, X_iBe include one in the input parameter of CPU usage amount, memory usage amount and network flow or Multiple, N is the number for inputting parameter.

4. the method that a kind of OpenStack control node according to claim 1 is adaptively switched to calculate node, described To be switched control node is quickly switched to by calculate node by automatic management tool in conjunction with container technique specifically:

All containers in be switched control node are cleared up using automatically dispose tool, are retained and the consistent operation of calculate node System layer and Docker service layer, and each service of calculate node that quick start switches.

5. the method that a kind of OpenStack control node according to claim 1 is adaptively switched to calculate node, special Sign is that the judgment method of node failure occurs in calculate node group specifically:

6. the method that a kind of OpenStack control node according to claim 1 is adaptively switched to calculate node, special Sign is that the judgment method of calculate node group load too high includes that total load calculates and total load prediction.

7. the method that a kind of OpenStack control node according to claim 6 is adaptively switched to calculate node, special Sign is, the total load calculation method of calculate node group specifically:

The load of current calculate node is collected by the monitoring agent in each calculate node in calculate node group, which includes CPU, memory and network flow, when the total load of calculate node group is more than pre-set threshold value, then the calculate node group is total Load too high.

8. the method that a kind of OpenStack control node according to claim 6 is adaptively switched to calculate node, special Sign is, the total load prediction technique of calculate node group specifically: the history monitoring data based on calculate node, by more The neural network linear regression model (LRM) singly exported is inputted to be predicted；

Wherein, the neural network linear regression model (LRM) are as follows:

Z=WX+B

Wherein Z is calculate node load estimation value, X={ x₁,x₂,…,x_NIt is input sample, which includes time, virtual machine Quantity and tenant's quantity, W={ w₁,w₂,…,w_NIt is weight matrix, B={ b₁It is excursion matrix；According to each in the group of acquisition The load estimation value Z of calculate node acquires calculate node group total load, if total load is more than pre-set threshold value, the calculating Node group total load is excessively high.

9. a kind of OpenStack control node is adaptively switched to the device of calculate node, including memory and processor, described Memory is stored with computer program, which is characterized in that the processor calls the computer program to execute such as claim The step of 1-8 any the method.