CN110647379B - Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud - Google Patents

Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud Download PDF

Info

Publication number
CN110647379B
CN110647379B CN201810682329.9A CN201810682329A CN110647379B CN 110647379 B CN110647379 B CN 110647379B CN 201810682329 A CN201810682329 A CN 201810682329A CN 110647379 B CN110647379 B CN 110647379B
Authority
CN
China
Prior art keywords
cluster
deployment
node
value
utilization rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810682329.9A
Other languages
Chinese (zh)
Other versions
CN110647379A (en
Inventor
吕智慧
吴杰
强浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201810682329.9A priority Critical patent/CN110647379B/en
Publication of CN110647379A publication Critical patent/CN110647379A/en
Application granted granted Critical
Publication of CN110647379B publication Critical patent/CN110647379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application belongs to the technical field of cloud computing, and particularly relates to a method for carrying out Hadoop cluster automation telescopic deployment and Plugin deployment based on an OpenStack cloud. The method comprises the following steps: in the Hadoop cluster deployment stage, a cluster automatic expansion strategy based on resource utilization rate and a replacement mechanism based on task success rate. The application enables the OpenStack to provide better support for the clusters, the cluster scale can be adjusted according to the service processing amounts of different time periods, and the service processing speed is ensured.

Description

Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud
Technical Field
The application belongs to the technical field of cloud computing, and relates to a Hadoop cluster automation telescopic deployment and Plugin deployment method based on an OpenStack cloud.
Background
The prior art discloses that Sahara can be integrated with third party management tools (e.g., apache Ambari and Cloudera management consoles) via a plug-in mechanism. The core part of Sahara is responsible for interaction with a user, and simultaneously provides resources of OpenStack (such as a virtual machine, a server, a security group and the like) through a Heat component; the Plugin is responsible for installing and configuring the Hadoop cluster in the pre-allocated virtual machine, and in addition, the Plugin can be a tool for managing and monitoring cluster deployment. Sahara provides a unified mechanism for Plugin to work in pre-assigned virtual machines: on the one hand, plugin must inherit the sahara. Plugins. Provisioning: provisioningPluginBase class and need to implement all necessary methods/interfaces; on the other hand, the virtual machine object provided by Sahara has a remote attribute that can be used to implement interaction with the virtual machine, and the virtual machine is operated by remote call commands through instance.
Based on the current state of the art, the inventor of the present application intends to propose a method for performing Hadoop cluster automatic expansion deployment and Plugin deployment based on the OpenStack cloud, which supplements and optimizes an automatic expansion mechanism for deployment related to the Hadoop cluster, and adjusts the cluster scale to delete redundant nodes, replace problem nodes and deploy new nodes.
Disclosure of Invention
The application aims at providing a Hadoop cluster automatic expansion deployment and plug deployment method based on OpenStack cloud based on the current state of the art, which supplements and optimizes an automatic expansion mechanism of deployment related to the Hadoop cluster, and adjusts the cluster scale to delete redundant nodes, replace problem nodes and deploy new nodes.
The purpose of the application is achieved by the following technical scheme Shi Xian:
the application provides a Hadoop cluster automatic telescopic deployment method based on an OpenStack cloud, which comprises the steps of integrating a Sahara module in the OpenStack cloud with a third party management tool through a Plugin mechanism, referencing requirements, distributing a proper amount of virtual machines for a required Hadoop cluster by combining the automatic telescopic deployment method, and installing and configuring the Hadoop cluster in a pre-distributed virtual machine.
Specifically, the method for carrying out automatic expansion deployment of the Hadoop clusters based on the OpenStack cloud is characterized in that the method is used for completing the automatic expansion deployment of the Hadoop clusters in a cloud environment according to prediction and real-time conditions; the method specifically comprises the following steps:
(1) Automatic telescoping strategy based on utilization rate
The application is introduced intoRespectively representing the expected values of the user on three utilization rates of the cluster CPU, the memory RAM and the hard Disk, wherein the utilization rates of the three are respectively l in the actual situation C 、l R 、l D Lambda is introduced in the different resources due to the different degree of importance of the user C 、λ R 、λ D The three items are respectively used as weights of the three items, so that the following data can be obtained,
φ=λ C ·η CR ·η RD ·η D (definition 2)
Wherein, definition 1 represents the difference between the actual utilization rate and the expected utilization rate under each index, and the index can specifically reflect the difference between each index; definition 2 represents the comprehensive value of the difference between the actual utilization rate of the cluster CPU, the memory RAM and the hard Disk and the expected value, the range of the value is also within [0,1 ], and the closer to 0, the closer to the expected value, the more the utilization rate of the cluster accords with the expected value;
based on the above, the application provides an automatic expansion strategy based on the utilization rate;
(2) Automatic telescopic quick deployment strategy based on task success rate
The application introduces a variable into the strategy Is between [0,1 ]]Representing the proportion of tasks on a single node that can run successfully; in the most ideal case, the individual nodes +.>Indicating that all tasks can be successfully executed, and the results can be smoothly output; because the failure of the task execution of the node is unavoidable, the node cannot be replaced immediately as soon as an error occurs, which is unreasonable and causes the increase of the system overhead; therefore will be relative to->Has a predicted value->If the value approaches 0, the task success rate for that node is too small, whichIn the case, the continued use of the node reduces the running efficiency of the cluster, so that a node replacement strategy is started;
based on the above, the application provides an automatic telescopic quick deployment strategy based on task success rate.
In the application, the Hadoop cluster automation telescopic deployment based on the OpenStack cloud comprises the following two processes:
1. automatic telescoping deployment mechanism
(1) Automatic telescoping strategy based on utilization rate
The final realization goal of the automatic expansion strategy algorithm based on the cloud platform resource utilization rate is as follows: by combining cloud platform resources and current application scene requirements, computing resources of the cloud platform are utilized more reasonably, and a cluster deployment automatic telescoping mechanism related to the Hadoop cluster is optimized, so that the running efficiency of the cluster reaches a better result;
the use conditions of three indexes of a CPU, a memory and a hard disk play an important role in the implementation process of an automatic expansion function based on the utilization rate, and the application mainly reflects the utilization rate from the three indexes;
the application is introduced intoRespectively representing expected values of users for three utilization rates of cluster CPU, memory RAM and hard Disk, wherein in actual conditions, the three utilization rates are respectively l C 、l R 、l D Lambda is introduced in the different resources due to the different degree of importance of the user C 、λ R 、λ D The three terms are respectively used as weights of the three terms, so that the following data are obtained,
φ=λ C ·η CR ·η RD ·η D (definition 2)
Definition 1 represents the actual practice under each indexThe difference between the actual utilization rate and the expected utilization rate is specifically reflected by the index, and the platform adjusts the cluster resource allocation condition according to the difference; in the actual case of a device, in which the device,the calculation result of eta is a value interval, and the minimum value in the eta value interval range contained in the [0, 1) interval is taken as the degree of coincidence between the calculation result and the user expectation; definition 2 represents the comprehensive value of the difference between the actual utilization rate of the cluster CPU, the memory RAM and the hard Disk and the expected value, the range of the value is also within [0,1 ], and the closer to 0, the closer to the expected value, the more the utilization rate of the cluster accords with the expected value;
in the algorithm, firstly, expected values of three utilization rates of a user on a cluster CPU, a memory RAM and a hard Disk are obtained, the expected values are compared with actual utilization rates of the three utilization rates in a platform, if the actual utilization rate is smaller than the minimum value of the corresponding expected value, datanode and Namenode services are closed, a virtual machine is closed, and if the actual utilization rate is larger than the minimum value of the corresponding expected value and smaller than the maximum value of the corresponding expected value, the virtual machine is started, a Hadoop cluster is deployed, and the virtual machine is started;
(2) Automatic telescopic quick deployment strategy based on task success rate
Based on the Hadoop in the design process, various problems can occur to any node to cause the execution failure of the allocated task, and the application adopts a mode of replacing the node with higher failure rate, so that the failure caused by some physical factors of the node can be avoided;
meanwhile, before the node which provides the computing service for the node is replaced to formally service the cluster, point-to-point data block copying is needed to be carried out, so that the data can be correctly stored;
to implement this strategy, the presentIn the application, a variable is firstly introduced Is between [0,1 ]]Is added in the percentage of the total weight of the product,
representing the proportion of tasks on a single node that can run successfully; in the most ideal case, each nodeIndicating that all tasks can be successfully executed, the result can be smoothly output, and because the failure of executing the task of the node is unavoidable, the node cannot be replaced immediately as soon as an error occurs, which is not only unreasonable, but also leads to increase of system overhead, and
in the application, toHas a predicted value->If the value approaches 0, the success rate of the task representing the node is too small, and the continued use of the node can reduce the running efficiency of the cluster, so that the node replacement strategy is started;
the process steps of the algorithm of the present application are described as follows:
step1 selecting an appropriate oneValue as the lowest criterion for measuring task success rate
step2, calculating the success rate of the node task in a certain time periodValue of
step3 willValue and->Comparing the values, if the values are larger than the values, continuing to start step2 for the next node; if the number is smaller than the preset value, the next step is carried out
step4 apply for a new node
step5, deploying Hadoop application on the newly applied node, and copying the data on the original node
step6, starting service on new node, suspending service of original node
step7, starting service on new node, terminating original node, entering step2 of next node
Through the replacement of the nodes, the task execution effect of the whole cluster is optimized finally, and in the replacement process, the failure caused by the physical aspect of the virtual machine can be avoided to the greatest extent.
2. Cluster automatic deployment Plugin implementation
(1) Cluster Plugin implementation interface
The cluster is used as an independent plugin, exists in an independent directory form under the sahara/sahara/plugins directory, and the main directory structure is shown in figure 1;
wherein:
1. as shown in fig. 2, the v2_7_1 directory is a content specially made for the 2.7.1 version of the sandbox plug-in, the hadoop2 is a general content, and the outermost version factor. Py identifies the directory name in a regular matching manner so as to obtain a corresponding version number;
plugin. Py is the core responsible for implementing all necessary interfaces, and the interfaces specifically needed to be implemented are shown in table 1:
3. as shown in fig. 3, in the functional implementation, the method is mainly divided into two parts of configuration and starting of a cluster. config_helper.py is a configured core module in which the path of the relevant configuration file is configured, and the configuration of the environment variables is performed;
4. as shown in fig. 5, in version handle. Py, the configuration and starting of the sandbox are specifically completed according to the current plugin version;
5. and run_scripts/starting_scripts/py realizes the startup of the cluster specifically, wherein the virtual machine which is started is remotely connected to through ssh by using an instance_remote () method, and a corresponding linux command is executed, so that the startup of a process, a node, the cluster and the like is controlled specifically.
(2) Implementation of cluster mirror image packaging manufacturing tool
OpenStack virtual machine mirror image
In the embodiment of the application, taking a centOS operating system as an example, the manufacturing process and the principle of an OpenStack virtual machine image are briefly introduced;
2. cluster mirroring
In the application, a cluster mirror image is manufactured by using a Diskimage-builder.
The application carries out Hadoop cluster automatic expansion deployment experiments,
and selecting any group of deployment as a representative of an experiment, performing multiple tests, comparing index data before and after optimization on the basis, and analyzing deployment service. Table 2 shows the cluster configuration.
Table 2 Cluster configuration
Cluster vCPU RAM disk Node number
Cluster 1 4core 10GB 5GB 8
Cluster 2 2core 100GB 100GB 16
Cluster 3 1core 5GB 80GB 8
Cluster 4 1core 5GB 80GB 16
Cluster 5 1core 5GB 80GB 24
Cluster 6 1core 5GB 80GB 48
Test results show that compared with the speed of deployment before optimization, the speed of deployment after optimization is improved considerably, when the number of cluster nodes required to be deployed is small, the optimization effect is not obvious, but when the number of cluster nodes is increased, the deployment time before optimization is obviously increased, deployment services after optimization are provided, the deployment time is increased with the increase of the cluster scale, but the increase is more moderate, the deployment time of 6 clusters is closer, the deployment time is stable within the interval range from 10 minutes to 20 minutes, and the results show that the deployment time of the clusters after optimization is obviously improved and the required time is more stable; compared with the optimized deployment service, the optimized deployment service still shows the optimizing effect even in the cluster deployment process with smaller scale, and the effect after optimization is obviously improved in the aspect of success rate, so that the deployment service optimization is successful. The Hadoop cluster automatic telescopic deployment strategy provided by the application can optimize the automatic deployment of the Hadoop cluster, so that the deployment service is more stable and efficient.
Drawings
FIG. 1 shows a directory structure of a cluster as a separate plug in, under the sahara/sahara/plug directory, in the form of a separate directory.
Fig. 2 shows that the v2_7_1 directory is a content specially made for the 2.7.1 version of the sandbox plug-in, the hadoop2 is a general content, and the outermost version factor. Py identifies the directory name in a regular matching manner so as to obtain a corresponding version number.
Fig. 3 shows that in terms of functional implementation, the configuration and startup of the cluster are mainly divided into two parts, wherein the config_helper.py is a core module of the configuration, a path of a relevant configuration file is configured, the configuration of environment variables is performed, and the config_helper.py is also responsible for specifically generating a corresponding configuration file according to user configuration for sandboxes to be started.
FIG. 4 is a graph of the work done by config_helper.py when a small portion of the configuration spark environment variables shown in FIG. 3 are truncated.
Fig. 5 shows that in version handle. Py, the configuration and start of the sandbox is specifically completed according to the current plugin version,
the overall flow of the sandbox start is described in the start_cluster method.
FIG. 6 shows that a virtual machine image in qcow2 format is created, 10G in size.
FIG. 7 is a schematic diagram of a cluster.
FIG. 8 shows a web page of a cluster creation service on which relevant requirements for cluster creation are submitted, including selection of Hadoop version, configuration of nodes as listed in Table 2, selection of mirror image, etc., after completion of relevant designation, the clusters may enter the deployment phase.
Fig. 9 shows that the speed of deployment after optimization is significantly improved compared with that before optimization, the deployment time of the cluster deployment after optimization is significantly improved, and the required time is more stable.
Fig. 10 shows that, before optimization, the optimized deployment service still shows its optimizing effect even in the cluster deployment process with smaller scale, and the optimized effect is obviously improved in terms of success rate, which shows the success of the deployment service optimization of the present application.
Detailed Description
The technical scheme of the application is specifically described below with reference to the accompanying drawings and examples.
The application aims to provide a Hadoop cluster automatic telescopic deployment method based on an OpenStack cloud. As shown in fig. 1, the method is based on a Sahara module in an OpenStack cloud, integrates with a third party management tool through a plug in mechanism, refers to requirements, and is combined with an automatic telescopic deployment method to allocate a proper amount of virtual machines to a needed Hadoop cluster, and installs and configures the Hadoop cluster in the pre-allocated virtual machines.
In the application, hadoop cluster automation telescoping deployment is performed based on OpenStack cloud, and the method comprises the following two processes:
1. automatic telescoping deployment mechanism
(1) Automatic telescoping strategy based on utilization rate
The final realization goal of the automatic telescopic strategy algorithm based on the cloud platform resource utilization rate is as follows: by combining cloud platform resources and current application scene requirements, computing resources of the cloud platform are utilized more reasonably, a cluster deployment automatic telescoping mechanism related to the Hadoop cluster is optimized, and the running efficiency of the cluster reaches a better result.
When the number of resources occupied by the virtual machine is allocated, 3 aspects of a CPU, a memory and a hard disk are usually concerned, so that the use condition of the three indexes is necessarily important in the implementation process of the automatic expansion function based on the utilization rate; in the present embodiment, the usage is mainly reflected from these three indices in terms of usage.
The application is introduced intoRespectively representing the expected values of the user on three utilization rates of the cluster CPU, the memory RAM and the hard Disk, wherein the utilization rates of the three are respectively l in the actual situation C 、l R 、l D Lambda is introduced in the different resources due to the different degree of importance of the user C 、λ R 、λ D The three terms are respectively used as weights of the three terms. The following data can thus be obtained,
φ=λ C ·η CR ·η RD ·η D (definition 2)
Definition 1 represents the difference between the actual utilization rate and the expected utilization rate under each index, the index can specifically reflect the difference between each index, and the platform can adjust the cluster resource allocation condition according to the difference; in the actual case of a device, in which the device,the calculation result of eta is a value interval, and the minimum value in the eta value interval range contained in the [0, 1) interval is taken as the degree of coincidence between the calculation result and the user expectation; definition 2 represents the integrated value of the difference between the actual utilization and the expected of the cluster CPU, memory RAM and hard Disk, this value rangingAlso within [0, 1), the closer to 0, the closer to the expected value, the more the cluster utilization is expected;
the algorithm firstly obtains expected values of three utilization rates of a user on a cluster CPU, a memory RAM and a hard Disk, compares the expected values with actual utilization rates of the three utilization rates in a platform, and closes the Datanode and Namenoode services and closes the virtual machine if the actual utilization rate is smaller than the minimum value of the corresponding expected values; if the actual utilization value is larger than the minimum value of the corresponding expected value and smaller than the maximum value of the corresponding expected value, starting the virtual machine, deploying the Hadoop cluster, and starting.
(2) Automatic telescopic quick deployment strategy based on task success rate
In the design process of Hadoop, various problems can occur to any node to cause the execution failure of the allocated task, the failed task can be rerun, if the probability of node failure is increased, although Hadoop can have own consideration standard for the allocation task, the running time of the whole cluster can be increased, the higher the node task failure rate is, the larger the increase amplitude is, even if the node is always performing task calculation, the utilization rate of each resource is reasonable, the running of the whole task is actually influenced to a certain extent, the reasons for causing task failure are many, and in order to cause the smallest possible influence to other nodes of the whole cluster, the application adopts a mode of replacing the node with higher failure rate to avoid the failure caused by some physical factors of the node itself;
meanwhile, as the principle that the economic benefit of mobile computing is higher than that of mobile data, hadoop can distribute computing tasks to nodes with data blocks required by the computing as much as possible, before the nodes finish storing the required data in a safe mode, if the data are deleted directly, the clusters can be caused to enter the safe mode again and need to wait for the whole cluster, so before the nodes providing computing services for the nodes are formally served for the cluster, the nodes need to copy the data blocks point to point, the data can be ensured to be correctly stored, the state of the whole cluster can be better adjusted, the weight increment to loads of other nodes caused by failure of single nodes is avoided, and even the condition of scale effect occurs;
to implement the above strategy, the present application first introduces a variable Is between [0,1 ]]Represents the percentage of
The proportion of tasks on a single node that can run successfully; in the most ideal case, each nodeIndicating that all tasks can be successfully executed, the result can be smoothly output, and the node cannot be replaced immediately as soon as an error occurs due to unavoidable failure of executing the task of the node, which is unreasonable and causes increase of system overhead; in the application, the pair
Has a predicted valueIf the value approaches 0, the task success rate of the node is too small, and the continued use of the node reduces the running efficiency of the cluster, so that the application enables a node replacement strategy;
the process steps of the algorithm are described as follows:
step1 selecting an appropriate oneValues asMinimum criterion for measuring success rate of task
step2, calculating the success rate of the node task in a certain time periodValue of
step3 willValue and->Comparing the values, if the values are larger than the values, continuing to start step2 for the next node; if the number is smaller than the preset value, the next step is carried out
step4 apply for a new node
step5, deploying Hadoop application on the newly applied node, and copying the data on the original node
step6, starting service on new node, suspending service of original node
step7, starting service on new node, terminating original node, entering step2 of next node
Through the replacement of the nodes, the task execution effect which is optimized by the whole cluster is finally realized, and in the replacement process, the failure caused by the physical aspect of the virtual machine can be avoided to the greatest extent.
2. Cluster automatic deployment Plugin implementation
(3) Cluster Plugin implementation interface
The cluster exists as a separate directory under the sahara/sahara/plug directory, the main directory structure of which is shown in figure 1,
wherein:
as shown in fig. 2, the v2_7_1 directory is a content specially made for the 2.7.1 version of the sandbox plug-in, the hadoop2 is a general content, and the outermost version factor. Py identifies the directory name in a regular matching manner so as to obtain a corresponding version number.
plugin. Py is the core responsible for implementing all necessary interfaces, and the interfaces specifically needed to be implemented are shown in table 1:
TABLE 1
/>
FIG. 3 shows that in a functional implementation, the configuration and startup are mainly divided into two parts of a cluster, and config_helper.py is a core module of the configuration, in which paths of relevant configuration files are configured, and environment variables are configured; in addition, config_helper.py is also responsible for specifically generating a corresponding configuration file according to the user configuration for the sandbox to be started, and the config_helper.py works when a small part of configuration spark environment variables are intercepted in fig. 4;
as shown in fig. 5, in version handle. Py, the configuration and starting of the sandbox are specifically completed according to the current plugin version, wherein the overall flow of starting the sandbox is written in the start_cluster method;
and run_scripts/starting_scripts/py realizes the startup of the cluster specifically, wherein the virtual machine which is started is remotely connected to through ssh by using an instance_remote () method, and a corresponding linux command is executed, so that the startup of a process, a node, the cluster and the like is controlled specifically.
(4) Implementation of cluster mirror image packaging manufacturing tool
OpenStack virtual machine mirroring:
taking a centOS operating system as an example, the manufacturing process and principle of the OpenStack virtual machine image in the embodiment are as follows:
1) Downloading a centOS installation ISO mirror;
2) Installation is performed by a virt-manager tool or virt-install command. FIG. 6 is an example of installation using a command line;
FIG. 6 shows that a qcow2 format virtual machine image is created, the size is 10G, if virt-manager is used, gradual installation can be performed through graphical prompts, and during the installation process, some additional configurations need to be performed, such as changing the Ethernet state, setting the host name, designating the installation source and storage device, performing disk partitioning, setting the root password, and the like;
3) After the step 2), logging in the newly installed virtual machine through a root user, performing relevant configuration, such as installation of ACPI service, installation of a group-init packet, configuration of partition size support, disabling zeroconf routing, configuration console log output and the like;
4) After the configuration is completed, closing the virtual machine;
5) Clearing MAC address information;
6) Compressing the mirror image.
After the steps are finished, a common OpenStack virtual machine image is manufactured and can be uploaded to an OpenStack platform for use.
Cluster mirroring:
for clusters of different types and versions, images of corresponding types and versions are required to be used as support, when the cluster images are manufactured, in addition to the basic OpenStack image manufacturing step, all relevant software packages (such as Hadoop and Spark) need to be downloaded, installed and configured in the images, and in the application, the Diskimage-builder is used for manufacturing the cluster images.
The application performs Hadoop cluster automatic telescopic deployment experiment
And selecting any group of deployment as a representative of an experiment, performing multiple tests, comparing index data before and after optimization on the basis, and analyzing deployment service.
Table 2 Cluster configuration
Cluster vCPU RAM disk Node number
Cluster 1 4core 10GB 5GB 8
Cluster 2 2core 100GB 100GB 16
Cluster 3 1core 5GB 80GB 8
Cluster 4 1core 5GB 80GB 16
Cluster 5 1core 5GB 80GB 24
Cluster 6 1core 5GB 80GB 48
In the experiment, six different Hadoop clusters are deployed, as shown in fig. 6, six clusters with different scales and different configurations are studied, and the physical computing nodes in the experiment are 6 in total, because Openstack can control the positions of virtual machines to a certain extent according to the use condition of resources, virtual machines can be uniformly distributed on each node in the first cluster and the second cluster, in order to reduce uncertainty caused by other factors, in the embodiment, cluster deployment is directly performed on 6 identical machines, and table 2 shows the specific node resource configuration condition of the two clusters;
FIG. 8 shows a web page of a cluster creation service, on which related requirements for cluster creation are submitted, including Hadoop version selection, node configuration as shown in Table 1, mirror image selection, etc., after the related assignment is completed, the cluster may enter a deployment stage, and in this experiment, the experimental results are compared before and after optimization with respect to deployment speed and success rate;
in the two sets of comparison experiments, six Hadoop clusters are deployed respectively, deployment of each cluster is carried out for a plurality of times, abnormal or failure situations are eliminated, an average value is taken as time for deploying the clusters, and a result is shown in fig. 9, the speed of deployment after optimization is improved considerably compared with that before optimization, when the number of cluster nodes required to be deployed is small, the optimization effect is not obvious, but when the number of the cluster nodes is increased, the deployment time before optimization is obvious, deployment service after optimization is provided, the deployment time is increased along with the increase of the cluster scale, but the increase is mild, the deployment time of 6 clusters is close, the deployment time is stable within a range from 10 minutes to 20 minutes, and the result shows that the deployment time of the clusters after optimization is obviously improved and the required time is more stable; as shown in fig. 10, the optimized deployment service still shows its optimizing effect even in the cluster deployment process with smaller scale compared with the deployment service before optimization; the result shows that as the cluster scale increases, the success rate of cluster deployment is reduced to a certain extent due to various uncertainties, and before optimization, the reduction is obvious, and the stability of deployment service is poor; after optimization, although the success rate is also reduced, the reduction is smaller, and the reduction also tends to be gentle, and still remains at a very high level; experimental results prove that the effect after optimization is obviously improved in terms of success rate, and the deployment service optimization is successful. The Hadoop cluster automatic telescopic deployment strategy can optimize the automatic deployment of the Hadoop cluster, so that the deployment service is more stable and efficient.

Claims (1)

1. The method for carrying out Hadoop cluster automation telescopic deployment based on the OpenStack cloud is characterized by comprising the steps of integrating a Sahara module in the OpenStack cloud with a third party management tool through a Plugin mechanism according to set conditions: the utilization rate requirement and the task success rate requirement are combined with an automatic telescopic deployment method, a proper amount of virtual machines are distributed for the needed Hadoop clusters, and the Hadoop clusters are installed and configured in the pre-distributed virtual machines;
the method completes automatic telescopic deployment of the Hadoop cluster in the cloud environment according to the prediction and real-time conditions, and comprises the following steps:
(1) Automatic telescoping strategy based on utilization rate
Introduction ofRepresenting the user's expectations for the three utilization rates of the cluster CPU, the memory RAM and the hard Disk respectivelyThe values are combined with the three utilization values of +.>According to the difference of importance of users to different resources, lambda is introduced into the resources C 、λ R 、λ D The three items are respectively used as weights of the three items; the following data were thus obtained,
φ=λ c ·η CR ·η RD ·η D (definition 2)
Wherein, definition 1 represents the difference between the actual utilization rate and the expected utilization rate under each index, and the index specifically reflects the difference between each index; definition 2 represents the comprehensive value of the difference between the actual utilization rate of the cluster CPU, the memory RAM and the hard Disk and the expected value, wherein the range of the value is within [0,1 ], and the closer to 0, the closer to the expected value, the more the utilization rate of the cluster accords with the expected value;
(2) Automatic telescopic quick deployment strategy based on task success rate
Introducing a variable into a policy
Is between [0,1 ]]Representing the proportion of tasks on a single node that can run successfully; each node->Indicating that all tasks can be successfully executed, and smoothly outputting results;
failure of task execution based on nodes is unavoidable forSetting a predicted valueIf the value approaches 0, the task success rate of the node is too small, the continued use of the node reduces the cluster operation efficiency, and the node replacement strategy is enabled.
CN201810682329.9A 2018-06-27 2018-06-27 Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud Active CN110647379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810682329.9A CN110647379B (en) 2018-06-27 2018-06-27 Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810682329.9A CN110647379B (en) 2018-06-27 2018-06-27 Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud

Publications (2)

Publication Number Publication Date
CN110647379A CN110647379A (en) 2020-01-03
CN110647379B true CN110647379B (en) 2023-10-17

Family

ID=68988861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810682329.9A Active CN110647379B (en) 2018-06-27 2018-06-27 Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud

Country Status (1)

Country Link
CN (1) CN110647379B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104320460A (en) * 2014-10-24 2015-01-28 西安未来国际信息股份有限公司 Big data processing method
CN104734892A (en) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack
CN106982137A (en) * 2017-03-08 2017-07-25 中国人民解放军国防科学技术大学 Hadoop cluster Automation arranging methods based on kylin cloud computing platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105900518B (en) * 2013-08-27 2019-08-20 华为技术有限公司 System and method for mobile network feature virtualization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104320460A (en) * 2014-10-24 2015-01-28 西安未来国际信息股份有限公司 Big data processing method
CN104734892A (en) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack
CN106982137A (en) * 2017-03-08 2017-07-25 中国人民解放军国防科学技术大学 Hadoop cluster Automation arranging methods based on kylin cloud computing platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Elastic provisioning of virtual Hadoop clusters in OpenStack-based Clouds;Antonio Corradi et al.;《IEEE ICC 2015 - Workshop on Cloud Computing Systems, Networks, and Applications (CCSNA)》;20151231;第1914-1920页 *
基于IaaS云平台的Hadoop资源调度策略研究;王炳旭;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160715(第07期);论文第21-32、40-44页 *
基于云平台虚拟集群的设计与实现;张新朝;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160215(第02期);论文第20-22页 *

Also Published As

Publication number Publication date
CN110647379A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
US20230244454A1 (en) Software defined network controller
EP3347816B1 (en) Extension of resource constraints for service-defined containers
CN101593134B (en) Method and device for allocating CPU resources of virtual machine
US10171300B2 (en) Automatic redistribution of virtual machines as a growing neural gas
CN107566165B (en) Method and system for discovering and deploying available resources of power cloud data center
JP6783850B2 (en) Methods and systems for limiting data traffic
KR102016238B1 (en) System and method for supervising doker container, computer readable medium for performing the method
US20210109890A1 (en) System and method for planning and configuring a file system migration
Zhang et al. Improving Hadoop service provisioning in a geographically distributed cloud
US11886905B2 (en) Host upgrade method and device
CN108369529A (en) It is example allocation host by anticorrelation rule
CN104781783A (en) Integrated computing platform deployed in an existing computing environment
CN110086726A (en) A method of automatically switching Kubernetes host node
CN105743677A (en) Resource configuration method and apparatus
CN110297713A (en) Configuration management system and method of cloud host
CN111880738A (en) Method for automatically creating and mounting LVM (logical volume manager) volume in K8s environment
US20220035681A1 (en) Ict resource managment device, ict resource management method and ict resource management program
CN112527450B (en) Super-fusion self-adaptive method, terminal and system based on different resources
CN110647379B (en) Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud
CN110209498A (en) Across available area resource regulating method based on private clound
CN110162312B (en) BeeGFS configuration method and device based on IML
CN112468458B (en) Scheduling method based on neutron layering mechanism
Wei et al. Integrating local job scheduler–LSF TM with Gfarm TM
CN114157569A (en) Cluster system and construction method and construction device thereof
CN114443069B (en) Method and system for constructing operating system mirror image by dynamically expanding cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant