CN110955522B - Resource management method and system for coordination performance isolation and data recovery optimization - Google Patents

Resource management method and system for coordination performance isolation and data recovery optimization Download PDF

Info

Publication number
CN110955522B
CN110955522B CN201911100053.XA CN201911100053A CN110955522B CN 110955522 B CN110955522 B CN 110955522B CN 201911100053 A CN201911100053 A CN 201911100053A CN 110955522 B CN110955522 B CN 110955522B
Authority
CN
China
Prior art keywords
tenant
request
priority
data recovery
tenants
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911100053.XA
Other languages
Chinese (zh)
Other versions
CN110955522A (en
Inventor
王芳
冯丹
刘家豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201911100053.XA priority Critical patent/CN110955522B/en
Publication of CN110955522A publication Critical patent/CN110955522A/en
Application granted granted Critical
Publication of CN110955522B publication Critical patent/CN110955522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Abstract

The invention discloses a resource management method and a resource management system for coordination performance isolation and data recovery optimization, which belong to the field of cloud storage and comprise the following steps: at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of the system resources, and then the tenant IO request is directly sent to the storage node; and receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion. The invention can shorten the data recovery time on the premise of guaranteeing the performance requirements of tenants.

Description

Resource management method and system for coordination performance isolation and data recovery optimization
Technical Field
The invention belongs to the field of cloud storage, and particularly relates to a resource management method and system for coordination performance isolation and data recovery optimization.
Background
Cloud storage systems, such as Ceph, *** file system, azure storage, amazon block storage, etc., often run loads of multiple tenants simultaneously in order to reduce costs and simplify management. Specifically, the cloud storage system creates a large number of virtual block devices, such as Ceph creating RBD, amazon block storage creating EBS volume, and then allocates these virtual block devices to different tenants for use, thereby providing different storage services to the different tenants. In a cloud storage system, different tenants use different virtual block devices, but storage resources of the bottom layer are shared, so resource competition and performance interference exist among the tenants. In order to guarantee the performance requirements of tenants, an effective performance isolation means needs to be provided. In addition, storage resources in the cloud storage system are often over-allocated, for example, the cloud storage system needs to meet the peak load of the tenant, but the peak state is only a short time in the tenant load operation process, so that idle resources exist in the cloud storage system most of the time. In order to improve resource utilization, existing performance isolation means often allocate idle resources to tenants with the lowest performance requirements.
When providing storage services to tenants, a cloud storage system inevitably has node failures. The failure may be from an artifact, software Bug, or hardware failure, etc. In order to guarantee reliability of tenant data, a cloud storage system often adopts multiple local mechanisms or erasure code mechanisms to store data. When a node failure occurs, the cloud storage system can automatically recover lost data. However, when data recovery is performed, resource contention may be generated by tenant requests and data recovery requests, which may cause more challenges for storage management.
In the existing storage management method, because the priority assigned to the tenant IO request is much higher than the priority assigned to the data recovery request, when the tenant IO request and the data recovery request generate resource competition, the tenant IO request is preferentially processed, and thus the data recovery time is long. On one hand, the data recovery time is too long, which may cause other data copies, even all data copies to be lost in the data recovery process, so that the data is completely unrepairable; on the other hand, because the system is in a degraded state during the data recovery process, the tenant request may be blocked at this time, and the performance of the tenant may be seriously affected by an excessively long data recovery request. Generally speaking, the existing storage management methods often cannot give consideration to both performance isolation and data recovery optimization.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides a resource management method and a resource management system for coordination performance isolation and data recovery optimization, and aims to shorten the data recovery time on the premise of ensuring the performance requirements of tenants.
To achieve the above object, according to a first aspect of the present invention, there is provided a resource management method for coordinating performance isolation and data recovery optimization, comprising:
at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of the system resources, and then the tenant IO request is directly sent to the storage node;
receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
When a data recovery request occurs, the storage resources allocated to the tenant only meet the lowest performance requirement of the tenant, the priority of the IO request of the tenant is reduced under the condition that the storage resources actually allocated to the tenant meet the performance requirement of the tenant is guaranteed, the data recovery request can be allocated with residual resources when the storage node scheduling request is guaranteed, and the proportion of the storage resources allocated to the data recovery request is increased when the IO request is scheduled according to the priority proportion, so that more storage resources are allocated to the data recovery request under the condition that the performance requirement of the tenant is guaranteed, the data recovery time is shortened, and the aim of optimizing data recovery is fulfilled.
Further, when a data recovery request occurs, the resource allocation is made to meet the minimum performance requirement of the tenant only, and the method includes:
creating a token bucket for the virtual block device of each tenant;
if the tenant performance requirement indicates that the tenant requirement size is T 1 If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T 2 If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 2
Further, when a data recovery request occurs, the priority of the tenant IO request is reduced under the condition that the storage resource actually allocated to the tenant is greater than the storage resource required by the tenant, and the method includes:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to be (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants, turning to the step (S4); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and residual storage resources still exist in the cloud storage system, turning to the step (S5); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and no residual storage resources exist in the cloud storage system, the step (S6) is carried out;
(S4) after adding the lowest priority minW in the range of (minW, maxW), turning to the step (S2);
(S5) after the highest priority maxW is reduced in the range of (minW, maxW), the step (S2) is carried out;
and (S6) finishing the priority regulation of the tenant IO request.
According to the window-based adjusting method, the priority of the tenant IO request is adjusted, dynamic resource allocation is realized, and a data recovery process is optimized as much as possible under the condition that an SLO (Service Level Objective) default is not caused.
Further, in the step (S4), the lowest priority minW is added in the range of (minW, maxW), and the specific manner is as follows: updating the lowest priority minW to (minW + maxW)/2;
in step (S5), the highest priority maxW is lowered within the range of (minW, maxW), specifically: the highest priority maxW is updated to (minW + maxW)/2.
According to the invention, the regulation window of the tenant IO request priority is reduced by half every time, so that the priority regulation can reach a stable state more quickly.
Further, the method for judging whether the storage resources actually allocated to the tenant can meet the performance requirement of the tenant comprises the following steps:
according to CR = (TP) A –TP N )/TP N Calculating the SLO compliance rate CR of the tenants in the current cloud storage system;
if CR <0, judging that the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants; if CR is greater than Th, the storage resources actually allocated to the tenants are judged to meet the performance requirements of the tenants, and residual storage resources still exist in the cloud storage system; if 0-CR-Nt (protection plus) Th are adopted, the fact that the storage resources actually allocated to the tenants can meet the performance requirements of the tenants is judged, and no residual storage resources exist in the cloud storage system;
wherein, TP A Represents the sum of storage resources, TP, actually allocated to the tenant N Represents the minimum sum of storage resources required by the tenant, th represents a preset threshold, and Th>0。
Further, when no data recovery request occurs, the resource allocation realizes the maximum utilization of system resources, and the method comprises the following steps:
a token bucket is created for the virtual block device of each tenant, and currently available storage resources of the cloud storage system are obtained;
if the tenant performance requirement indicates that the tenant requirement size is T 1 If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T 2 If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 2
After all tenants distribute storage resources, if the remaining storage resources still exist in the cloud storage system, proportionally distributing the remaining storage resources among the tenants with the minimum throughput rate guarantee requirement, so that the rate of generating tokens by the token bucket of the corresponding virtual block device is increased according to the same proportion;
wherein, the distribution proportion for distributing the residual storage resources is the ratio of the corresponding minimum throughput rates.
When no data recovery request occurs, the invention completes the allocation of storage resources in two rounds: in the first round of allocation, the lowest performance requirements of all tenants are guaranteed; in the second round of allocation, allocating the residual storage resources to the tenants with the minimum throughput rate guarantee in proportion; therefore, the invention can improve the service quality of the tenant and improve the performance of the tenant as much as possible when the data recovery optimization is needed.
Further, at the client, sending the tenant IO request to the storage node, the method includes:
when a tenant IO request is sent, tokens in a token bucket of corresponding virtual block equipment are consumed, and the number of the consumed tokens is equal to the size of the request;
and if the number of tokens in the token bucket is not enough to serve the tenant IO request, enabling the process initiating the tenant IO request to be dormant until enough tokens are generated in the token bucket.
Further, at a storage node end of the cloud storage system, receiving various requests, and scheduling the requests of different types according to a priority ratio, wherein the method comprises the following steps:
constructing a request queue for each type of request at a storage node, wherein the priority of the request queue is consistent with that of the requests in the request queue;
and carrying out request scheduling from different queues according to the priority proportion.
According to a second aspect of the present invention, there is provided a cloud storage system, including a client and a storage node, where the client includes: the system comprises a monitoring module, a resource allocation module and a priority regulation module; the storage node comprises a request scheduling module;
the monitoring module is used for monitoring the use condition of storage resources in the cloud storage system and whether a data recovery request occurs;
the resource allocation module is used for allocating storage resources for each tenant according to the performance requirements of the tenant, enabling the resource allocation to only meet the lowest performance requirements of the tenant when a data recovery request occurs, and enabling the resource allocation to realize the maximum utilization of system resources when the data recovery request does not occur;
the priority adjusting module is used for reducing the priority of the tenant IO request under the condition that the storage resources which are actually allocated to the tenant meet the performance requirement of the tenant when the data recovery request occurs;
the resource allocation module is also used for sending the tenant IO request to the storage node;
the request scheduling module is used for receiving various requests and scheduling different types of requests according to the priority proportion so as to distribute storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
According to a third aspect of the invention, there is provided a system comprising a computer readable storage medium and a processor, the computer readable storage medium for storing an executable program;
the processor is used for reading an executable program stored in a computer readable storage medium and executing the resource management method for coordination performance isolation and data recovery optimization provided by the first aspect of the invention.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the resource management method and system for coordination performance isolation and data recovery optimization, provided by the invention, when a data recovery request occurs, the storage resources allocated to the tenant only meet the minimum performance requirement of the tenant, the priority of the IO request of the tenant is reduced under the condition that the storage resources allocated to the tenant actually meet the performance requirement of the tenant is guaranteed, the residual resources can be allocated to the data recovery request when a storage node scheduling request is guaranteed, and the proportion of the storage resources allocated to the data recovery request is increased when the IO request is scheduled according to the priority proportion, so that more storage resources are allocated to the data recovery request under the condition that the performance requirement of the tenant is guaranteed, the data recovery time is shortened, and the purpose of data recovery optimization is achieved.
(2) According to the resource management method and system for coordination performance isolation and data recovery optimization, provided by the invention, the priority of the tenant IO request is adjusted through a window-based adjusting method, dynamic resource allocation is realized, and the data recovery process is optimized as much as possible under the condition that the Service Level Object (SLO) is not violated.
(3) According to the resource management method and system for coordination performance isolation and data recovery optimization, in the preferred scheme, the adjusting window of the tenant IO request priority is reduced by half every time, so that the priority adjustment can reach a stable state more quickly.
(4) The resource management method and the resource management system for coordination performance isolation and data recovery optimization provided by the invention can complete the allocation of storage resources in two rounds when no data recovery request occurs: in the first round of allocation, the lowest performance requirement of all tenants is guaranteed; in the second round of allocation, allocating the residual storage resources to the tenants with the minimum throughput rate guarantee in proportion; therefore, the invention can improve the service quality of the tenant and improve the performance of the tenant as much as possible when the data recovery optimization is needed.
Drawings
Fig. 1 is a schematic diagram of a resource management method for coordination performance isolation and data recovery optimization and a cloud storage system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In order to shorten the data recovery time on the premise of guaranteeing the performance requirements of tenants, the resource management method for coordination performance isolation and data recovery optimization provided by the invention, as shown in fig. 1, comprises the following steps:
at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of system resources, and then the tenant IO request is directly sent to the storage node, wherein the priority of the tenant IO request is the priority which is defaulted by the cloud storage system to be allocated by the cloud storage system; the performance requirement of a tenant is stored in metadata of a virtual block device used by the tenant, and the performance requirement of the tenant may indicate that the tenant has a fixed throughput requirement or indicate that the tenant has a minimum throughput requirement;
receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
In a traditional resource management method, the priority of a tenant IO request and the priority of a data recovery request are often fixed, and the priority allocated to the tenant IO request is much higher than that of the data recovery request, for example, in a Ceph system, the priority of the tenant IO request is 63, and the priority of the data recovery request is 3, so that when resource competition occurs between the data recovery request and the tenant IO request, storage resources are preferentially allocated to the tenant IO request, and thus the data recovery time is long; according to the resource management method for coordination performance isolation and data recovery optimization, when a data recovery request occurs, the storage resources allocated to tenants only meet the lowest performance requirements of the tenants, and the priority of tenant IO requests is reduced under the condition that the storage resources allocated to the tenants actually meet the performance requirements of the tenants, for example, in an application example of the method, the priority of the data recovery request is still 3, and the priority of the tenant IO requests is finally adjusted to 8, so that when a storage node scheduling request is ensured, the remaining resources can be allocated to the data recovery request, and when the IO requests are scheduled according to the priority proportion, the proportion of the storage resources allocated to the data recovery request is increased, so that under the condition that the performance requirements of the tenants are ensured, more storage resources are allocated to the data recovery request, the data recovery time is shortened, and the purpose of data recovery optimization is achieved.
In an optional embodiment, in the resource management method for coordinating performance isolation and data recovery optimization, when a data recovery request occurs, resource allocation is made to meet only the minimum performance requirement of a tenant, and the method includes:
creating a token bucket for the virtual block device of each tenant;
if the tenant performance requirement indicates that the tenant requirement size is T 1 If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T 2 If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 2 (ii) a For example, the performance requirement of the tenant 1 indicates that the tenant requires a minimum throughput guarantee of not less than 10MB/s, the performance requirement of the tenant 2 indicates that the tenant requires a minimum throughput guarantee of not less than 20MB/s, and the currently available storage resource is 60MB/s, then after the tenant 1 and the tenant 2 are respectively allocated with storage bandwidths of 10MB/s and 20MB/s, the storage resource allocation is finished, and after the allocation is finished, the system still has a free bandwidth of 30 MB/s;
when a data recovery request occurs, the priority of an IO request of a tenant is reduced under the condition that the storage resource actually allocated to the tenant is guaranteed to be larger than the storage resource required by the tenant, the adopted method is a window-based adjusting method, and the method specifically comprises the following steps:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to be (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants, turning to the step (S4); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and residual storage resources still exist in the cloud storage system, turning to the step (S5); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and no residual storage resources exist in the cloud storage system, the step (S6) is carried out;
(S4) after adding the lowest priority minW in the range of (minW, maxW), turning to the step (S2);
preferably, the lowest priority minW is added in the range of (minW, maxW) by: updating the lowest priority minW to (minW + maxW)/2;
(S5) after reducing the highest priority maxW within the range of (minW, maxW), the step is shifted to (S2);
preferably, the highest priority maxW is lowered in the range of (minW, maxW), in a specific manner: updating the highest priority maxW to (minW + maxW)/2;
(S6) finishing the priority regulation of the tenant IO request;
the resource management method for coordinating performance isolation and data recovery optimization can ensure that the performance requirements of tenants are just met after adjustment, and no residual storage resources exist in the system, so that the priority of tenant IO requests is reduced as much as possible under the condition of ensuring the performance requirements of the tenants, and further, the proportion of the storage resources allocated to data recovery requests is as large as possible when the requests are scheduled; that is, the priority of the tenant IO request is adjusted by the window-based adjustment method, so as to implement dynamic resource allocation, and optimize the data recovery process as much as possible without causing a default of SLO (Service Level Objective); in a preferred embodiment, by reducing the regulation window of the tenant IO request priority by half each time, the priority adjustment can reach a stable state faster;
in the window-based adjustment method, the method for judging whether the storage resources actually allocated to the tenant can meet the performance requirement of the tenant comprises the following steps:
according to CR = (TP) A –TP N )/TP N Calculating the current SLO compliance rate CR of the cloud storage system; wherein, TP A Represents the sum of storage resources, TP, actually allocated to the tenant N Representing the minimum sum of storage resources required by the tenant;
if CR <0, judging that the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants; if CR is greater than Th, the storage resources actually allocated to the tenants are judged to meet the performance requirements of the tenants, and residual storage resources still exist in the cloud storage system; if 0-CR-Nt (protection plus) Th are adopted, the fact that the storage resources actually allocated to the tenants can meet the performance requirements of the tenants is judged, and no residual storage resources exist in the cloud storage system;
wherein Th represents a preset threshold value, and Th is greater than 0; the specific value of the threshold Th can be determined according to the actual tenant performance requirement and the system fault condition, so as to ensure that the data recovery time can be shortened to the maximum extent under the condition of reducing the default risk of SLO (service level Objective); in the present embodiment, the threshold Th =0.25 is specifically set.
In an optional embodiment, the resource management method for coordinating performance isolation and data recovery optimization enables resource allocation to achieve maximum utilization of system resources when no data recovery request occurs, and includes:
a token bucket is created for the virtual block device of each tenant, and currently available storage resources of the cloud storage system are obtained;
if the tenant performance requirement indicates that the tenant requirement size is T 1 If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T 2 If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T 2
After all tenants distribute storage resources, if the remaining storage resources still exist in the cloud storage system, proportionally distributing the remaining storage resources among the tenants with the minimum throughput rate guarantee requirement, so that the rate of generating tokens by the token bucket of the corresponding virtual block device is increased according to the same proportion;
wherein, the distribution proportion for distributing the residual storage resources is the ratio of the corresponding minimum throughput rate; for example, the performance requirement of the tenant 1 indicates that the tenant requires a minimum throughput guarantee of not less than 10MB/s, the performance requirement of the tenant 2 indicates that the tenant requires a minimum throughput guarantee of not less than 20MB/s, and the currently available storage resource is 60MB/s, then after the tenant 1 and the tenant 2 are respectively allocated with storage bandwidths of 10MB/s and 20MB/s, the system still has an idle bandwidth of 30MB/s, and the remaining idle bandwidth is calculated according to a ratio of 10MB/s: the proportion of 20MB/s =1 is that 2 is allocated to tenant 1 and tenant 2, and after allocation is finished, the storage bandwidths allocated to tenant 1 and tenant 2 are 20MB/s and 40MB/s respectively;
according to the resource management method for coordination performance isolation and data recovery optimization, when no data recovery request occurs, the allocation of storage resources is completed in two rounds: in the first round of allocation, the lowest performance requirement of all tenants is guaranteed; in the second round of allocation, allocating the residual storage resources to the tenants with the minimum throughput rate guarantee in proportion; therefore, when data recovery optimization is needed, the quality of service for tenants can be improved as much as possible, and the resource utilization rate can be maximized.
In an optional embodiment, at a client, a tenant IO request is sent to a storage node, and the method includes:
when a tenant IO request is sent, consuming tokens in a token bucket of corresponding virtual block equipment, wherein the number of the consumed tokens is equal to the size of the request;
and if the number of tokens in the token bucket is not enough to serve the tenant IO request, enabling the process initiating the tenant IO request to be dormant until enough tokens are generated in the token bucket.
In an optional embodiment, at a storage node end of a cloud storage system, various types of requests are received, and different types of requests are scheduled according to a priority ratio, where the method includes:
constructing a request queue for each type of request at the storage node end, wherein the priority of the request queue is consistent with that of the requests in the request queue;
carrying out request scheduling from different queues according to priority proportion; the specific request scheduling mechanism is determined by a specific cloud storage system, for example, in a Ceph system, request scheduling is realized by creating a token bucket for each request queue, and accordingly, in the Ceph system, request scheduling is performed according to priority ratios of queues, specifically, the ratio of token generation rates in the token buckets of different queues is consistent with the ratio of priorities of the request queues; for the request scheduling mechanisms in other cloud storage systems, which will not be listed one by one here, it should be understood that, after the requests are scheduled from different queues according to the priority ratio, the proportion of the storage resources allocated to each request queue is consistent with the priority ratio of the request queue, that is, the storage resources are allocated to different types of requests according to the priority ratio.
Corresponding to the resource management method for coordination performance isolation and data recovery optimization, the present invention further provides a cloud storage system, as shown in fig. 1, including a client and a storage node, where the client includes: the system comprises a monitoring module, a resource allocation module and a priority regulation module; the storage node comprises a request scheduling module;
the monitoring module is used for monitoring the use condition of storage resources in the cloud storage system and whether a data recovery request occurs;
the resource allocation module is used for allocating storage resources for each tenant according to the performance requirements of the tenant, enabling the resource allocation to only meet the lowest performance requirements of the tenant when a data recovery request occurs, and enabling the resource allocation to realize the maximum utilization of system resources when the data recovery request does not occur;
the priority adjusting module is used for reducing the priority of the tenant IO request under the condition that the storage resources which are actually allocated to the tenant meet the performance requirement of the tenant when the data recovery request occurs;
the resource allocation module is also used for sending the tenant IO request to the storage node;
the request scheduling module is used for receiving various requests and scheduling different types of requests according to the priority proportion so as to distribute storage resources to the different types of requests according to the priority proportion;
the types of the requests comprise tenant IO requests and data recovery requests;
in the embodiment of the present invention, the detailed implementation of each module may refer to the description in the above method embodiment, and will not be repeated here.
The invention also provides a system comprising a computer-readable storage medium and a processor, the computer-readable storage medium for storing an executable program;
the processor is used for reading an executable program stored in the computer readable storage medium and executing the resource management method for coordinating performance isolation and data recovery optimization.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for resource management to coordinate performance isolation and data recovery optimization, comprising:
at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, enabling resource allocation to realize the maximum utilization of system resources, and then directly sending the tenant IO request to a storage node;
receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request; when a data recovery request occurs, the resource allocation only meets the minimum performance requirement of a tenant, and the method comprises the following steps:
creating a token bucket for the virtual block device of each tenant;
if the tenant performance requirement indicates that the tenant requirement size isT 1 The fixed throughput rate guarantee of (1) sets the rate of generating tokens by the token bucket of the virtual block device to beT 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower thanT 2 The rate of generating tokens by the token bucket of the virtual block device is set asT 2
When a data recovery request occurs, the priority of the tenant IO request is reduced under the condition that the storage resource which is actually allocated to the tenant is ensured to be larger than the storage resource required by the tenant, and the method comprises the following steps:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to be (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants, turning to the step (S4); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and residual storage resources still exist in the cloud storage system, turning to the step (S5); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and no residual storage resources exist in the cloud storage system, turning to the step (S6);
(S4) after increasing the minimum priority minW in the range of (minW, maxW), turning to the step (S2);
(S5) reducing the highest priority maxW within the range of (minW, maxW), and then, turning to the step (S2);
and (S6) finishing the priority regulation of the tenant IO request.
2. The method for resource management with coordination performance isolation and data recovery optimization according to claim 1, wherein said step (S4) of increasing said lowest priority minW within the range of (minW, maxW) is performed by: updating the lowest priority minW to (minW + maxW)/2;
in the step (S5), the highest priority maxW is lowered within a range of (minW, maxW), specifically: and updating the highest priority maxW to (minW + maxW)/2.
3. The method for resource management with coordination performance isolation and data recovery optimization according to claim 1, wherein the method for determining whether the storage resources actually allocated to the tenant can meet the performance requirement of the tenant comprises:
according toCR=(TP A TP N )/TP N Calculating the SLO compliance rate of the current tenants in the cloud storage systemCR
If it isCR< 0If so, judging that the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants; if it isCR>ThIf yes, judging that the storage resources actually allocated to the tenants can meet the performance requirements of the tenants, and judging that residual storage resources still exist in the cloud storage system; if 0<CR<ThIf so, judging that the storage resources actually allocated to the tenants can meet the performance requirements of the tenants, and judging that no residual storage resources exist in the cloud storage system;
wherein, the first and the second end of the pipe are connected with each other,TP A representing the sum of the storage resources actually allocated to the tenant,TP N represents the minimum sum of storage resources required by the tenant,Threpresents a preset threshold value, andTh>0
4. the method of resource management for coordination performance isolation and data recovery optimization according to claim 1, wherein said method for maximizing utilization of system resources for resource allocation in the absence of a data recovery request comprises:
creating a token bucket for the virtual block device of each tenant, and obtaining currently available storage resources of the cloud storage system;
if the tenant performance requirement indicates that the tenant requirement size isT 1 The fixed throughput rate guarantee of (1) sets the rate of generating tokens by the token bucket of the virtual block device to beT 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower thanT 2 The rate of generating tokens by the token bucket of the virtual block device is set asT 2
After all tenants distribute storage resources, if the remaining storage resources still exist in the cloud storage system, proportionally distributing the remaining storage resources among the tenants with the minimum throughput rate guarantee requirement, so that the rate of generating tokens by the token bucket of the corresponding virtual block device is increased according to the same proportion;
wherein, the distribution proportion for distributing the residual storage resources is the ratio of the corresponding minimum throughput rates.
5. A method of resource management to coordinate performance isolation and data recovery optimization according to claim 1 or 2, wherein at the client, tenant IO requests are sent to storage nodes, the method comprising:
when a tenant IO request is sent, tokens in a token bucket of corresponding virtual block equipment are consumed, and the number of the consumed tokens is equal to the size of the request;
and if the number of tokens in the token bucket is not enough to serve the tenant IO request, enabling the process initiating the tenant IO request to be dormant until enough tokens are generated in the token bucket.
6. The method for resource management of coordination performance isolation and data recovery optimization according to claim 1 or 2, wherein, at a storage node end of the cloud storage system, various types of requests are received, and different types of requests are scheduled according to a priority ratio, and the method comprises:
constructing a request queue for each type of request at the storage node, wherein the priority of the request queue is consistent with that of the requests in the request queue;
and carrying out request scheduling from different queues according to the priority proportion.
7. A cloud storage system comprises a client and a storage node, wherein the client comprises: the system comprises a monitoring module, a resource allocation module and a priority regulation module; the storage node comprises a request scheduling module;
the monitoring module is used for monitoring the use condition of storage resources in the cloud storage system and whether a data recovery request occurs;
the resource allocation module is used for allocating storage resources to each tenant according to the performance requirements of the tenant, enabling the resource allocation to only meet the lowest performance requirements of the tenant when a data recovery request occurs, and enabling the resource allocation to realize the maximum utilization of system resources when the data recovery request does not occur;
the priority adjusting module is used for reducing the priority of the tenant IO request under the condition that the storage resources which are actually allocated to the tenant meet the performance requirement of the tenant when the data recovery request occurs; the resource allocation module is further configured to send a tenant IO request to the storage node;
the request scheduling module is used for receiving various requests and scheduling different types of requests according to the priority proportion so as to distribute storage resources to the different types of requests according to the priority proportion;
the types of the requests comprise tenant IO requests and data recovery requests; the resource allocation module, when a data recovery request occurs, enables resource allocation to only meet the lowest performance requirement of a tenant, and includes: creating a token bucket for the virtual block device of each tenant; if the tenant performance requirement indicates that the tenant requirement size isT 1 The fixed throughput rate guarantee of (1) sets the rate of generating tokens by the token bucket of the virtual block device to beT 1 (ii) a If the tenant performance requirement indicates that the tenant requirement is not lower thanT 2 The rate of generating tokens by the token bucket of the virtual block device is set asT 2
The priority adjusting module reduces the priority of the tenant IO request under the condition that the storage resource which is actually allocated to the tenant is larger than the storage resource required by the tenant when the data recovery request occurs, and comprises the following steps:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to be (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants, turning to the step (S4); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and residual storage resources still exist in the cloud storage system, turning to step (S5); if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and no residual storage resources exist in the cloud storage system, turning to the step (S6);
(S4) after increasing the minimum priority level minW in the range of (minW, maxW), switching to a step (S2);
(S5) after the highest priority maxW is reduced in the range of (minW, maxW), the step (S2) is carried out;
and (S6) finishing the priority regulation of the tenant IO request.
8. A resource management system that coordinates performance isolation and data recovery optimization, comprising a computer-readable storage medium and a processor, wherein the computer-readable storage medium is configured to store an executable program;
the processor is configured to read an executable program stored in the computer-readable storage medium and execute the resource management method for coordination performance isolation and data recovery optimization of any one of claims 1-6.
CN201911100053.XA 2019-11-12 2019-11-12 Resource management method and system for coordination performance isolation and data recovery optimization Active CN110955522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911100053.XA CN110955522B (en) 2019-11-12 2019-11-12 Resource management method and system for coordination performance isolation and data recovery optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911100053.XA CN110955522B (en) 2019-11-12 2019-11-12 Resource management method and system for coordination performance isolation and data recovery optimization

Publications (2)

Publication Number Publication Date
CN110955522A CN110955522A (en) 2020-04-03
CN110955522B true CN110955522B (en) 2022-10-14

Family

ID=69977228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911100053.XA Active CN110955522B (en) 2019-11-12 2019-11-12 Resource management method and system for coordination performance isolation and data recovery optimization

Country Status (1)

Country Link
CN (1) CN110955522B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113687798A (en) * 2021-10-26 2021-11-23 苏州浪潮智能科技有限公司 Method, device and equipment for controlling data reconstruction and readable medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7437727B2 (en) * 2002-03-21 2008-10-14 Network Appliance, Inc. Method and apparatus for runtime resource deadlock avoidance in a raid system
US20140250440A1 (en) * 2013-03-01 2014-09-04 Adaptive Computing Enterprises, Inc. System and method for managing storage input/output for a compute environment
CN103136056B (en) * 2013-03-04 2017-10-27 浪潮电子信息产业股份有限公司 A kind of cloud computing platform dispatching method
US9575804B2 (en) * 2015-03-27 2017-02-21 Commvault Systems, Inc. Job management and resource allocation
US10129101B2 (en) * 2015-04-30 2018-11-13 Futurewei Technologies, Inc. Application driven and adaptive unified resource management for data centers with Multi-Resource Schedulable Unit (MRSU)
US10831545B2 (en) * 2016-08-29 2020-11-10 Vmware, Inc. Efficient queueing and scheduling of backups in a multi-tenant cloud computing environment
CN106484536B (en) * 2016-09-30 2020-04-03 杭州朗和科技有限公司 IO scheduling method, device and equipment
CN107249035B (en) * 2017-06-28 2020-05-26 重庆大学 Shared repeated data storage and reading method with dynamically variable levels
CN108337109B (en) * 2017-12-28 2021-12-17 中兴通讯股份有限公司 Resource allocation method and device and resource allocation system
CN109992418B (en) * 2019-03-25 2023-01-06 华南理工大学 SLA-aware resource priority scheduling method and system for multi-tenant big data platform

Also Published As

Publication number Publication date
CN110955522A (en) 2020-04-03

Similar Documents

Publication Publication Date Title
US10772115B2 (en) Resource scheduling method and server
CN108337109B (en) Resource allocation method and device and resource allocation system
US9225668B2 (en) Priority driven channel allocation for packet transferring
US20170192823A1 (en) Network storage device using dynamic weights based on resource utilization
US20190319895A1 (en) Resource Scheduling Method And Apparatus
CN107688492B (en) Resource control method and device and cluster resource management system
CN109660376B (en) Virtual network mapping method, equipment and storage medium
CN105159775A (en) Load balancer based management system and management method for cloud computing data center
CN109564528B (en) System and method for computing resource allocation in distributed computing
US11567556B2 (en) Platform slicing of central processing unit (CPU) resources
CN112783659B (en) Resource allocation method and device, computer equipment and storage medium
WO2019170011A1 (en) Task allocation method and device, and distributed storage system
CN103944997A (en) Load balancing method with combination of random sampling and virtualization technology
WO2020134133A1 (en) Resource allocation method, substation, and computer-readable storage medium
CN106095581B (en) Network storage virtualization scheduling method under private cloud condition
CN111798113A (en) Resource allocation method, device, storage medium and electronic equipment
CN110955522B (en) Resource management method and system for coordination performance isolation and data recovery optimization
CN113010309B (en) Cluster resource scheduling method, device, storage medium, equipment and program product
CN112073532B (en) Resource allocation method and device
JP2006195985A (en) Method for controlling resource utilization rate and computer system
CN108228323B (en) Hadoop task scheduling method and device based on data locality
CN114675972A (en) Method and system for flexibly scheduling cloud network resources based on integral algorithm
CN113630733A (en) Network slice distribution method and device, computer equipment and storage medium
CN113760549A (en) Pod deployment method and device
CN114489463A (en) Method and device for dynamically adjusting QOS (quality of service) of storage volume and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant