CN109960610B - Data backup method based on policy splitting - Google Patents

Data backup method based on policy splitting Download PDF

Info

Publication number
CN109960610B
CN109960610B CN201910147338.2A CN201910147338A CN109960610B CN 109960610 B CN109960610 B CN 109960610B CN 201910147338 A CN201910147338 A CN 201910147338A CN 109960610 B CN109960610 B CN 109960610B
Authority
CN
China
Prior art keywords
data source
backup
data
policy
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910147338.2A
Other languages
Chinese (zh)
Other versions
CN109960610A (en
Inventor
程华平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eisoo Information Technology Co Ltd
Original Assignee
Shanghai Eisoo Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eisoo Information Technology Co Ltd filed Critical Shanghai Eisoo Information Technology Co Ltd
Priority to CN201910147338.2A priority Critical patent/CN109960610B/en
Publication of CN109960610A publication Critical patent/CN109960610A/en
Application granted granted Critical
Publication of CN109960610B publication Critical patent/CN109960610B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data backup method based on strategy distribution, which comprises a data source grouping process and a data source scheduling process, so that backup resources are evenly distributed to different computing nodes or storage nodes to carry out data distribution. Compared with the prior art, the method and the device have the advantages that the backup performance of the virtualized platform is improved, and meanwhile, the influence on the virtualized platform caused by excessive occupation of backup resources is reduced.

Description

Data backup method based on policy splitting
Technical Field
The invention relates to a technology of accelerating virtualized backup, in particular to a data backup method based on policy splitting.
Background
The virtualization platform is mainly responsible for the virtualization of hardware resources and the centralized management of virtual resources, business resources and user resources. The method adopts the technologies of virtual computing, virtual storage, virtual network and the like to complete the virtualization of computing resources, storage resources and network resources.
When the resources of the virtualized platform are backed up, the data to be backed up of the resources are sourced from different computing nodes and storage nodes, and when the tasks are backed up in a multithreading or multiprocessing mode, the computing nodes or the storage nodes are randomly selected for backup, so that the backup tasks are excessively concentrated on one computing node or one storage node for execution, network IO, disk IO and CPU occupation of the node are excessively high to reach bottlenecks, and the backup performance of the whole tasks is further affected.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a data backup method based on policy splitting.
The aim of the invention can be achieved by the following technical scheme:
a data backup method based on policy distribution includes data source grouping process and data source scheduling process, so that backup resources are distributed to different computing nodes or storage nodes evenly to conduct data distribution, and backup performance is improved maximally.
Preferably, the data source grouping process specifically includes:
step 101), calling an API interface of a virtualization platform, and sequentially acquiring the attribute of the acquired backup data source;
step 102), selecting a corresponding strategy mode according to the configuration attribute: a host policy or a storage policy;
step 103), if a host policy is selected: classifying according to the position of the computing node where the data source is located;
step 104), if a storage policy is selected: classifying according to the storage nodes where the data sources are located.
Preferably, the attributes of the backup data source in step 101) include storage locations and computing node locations.
Preferably, the classifying in step 103) is specifically: data sources with the same computing node locations are assigned to the same data source container, and different computing node containers form a data source group.
Preferably, the classifying in step 104) is specifically: data sources with the same storage location are allocated to the same data source container, and different storage node containers form a data source group.
Preferably, the data source scheduling process specifically includes:
step 201), each data source container has an attribute: the backup number BN is used for recording the number of the current data source containers in backup subtasks;
step 202), a scheduler consists of N sub-tasks, and each sub-task is responsible for processing backup work of a data source;
step 203), the subtask Tn of the scheduler applies for the data source to the data source group Gn;
step 204), the data source group Gn searches the data source container Cn with the minimum BN value, and takes out a data source dn from Cn, and adds 1 to the BN value of Cn;
step 205), the data source group returns the searched data source dn to the subtask Tn;
step 206), after the subtask Tn finishes the data source dn backup, notifying the data source group that the data source dn backup is finished;
step 207), searching a data source container Cn to which the found data source dn belongs, and subtracting 1 from the BN value of Cn;
step 208), after the sub-task Tn completes the backup, the execution of steps 203), 204), 205), 206), 207) continues until the sub-task execution is exited when no data source is available.
Step 209), after all the subtasks are executed out, the scheduler completes the backup task of the data source set.
Compared with the prior art, the policy splitting method is suitable for EXSI, fusionCompute and Langchao virtualization platform backup, improves the backup performance of the virtualization platform in a data splitting mode, and simultaneously reduces the influence of excessive centralized backup resource occupation on the virtualization platform.
Drawings
FIG. 1 is a schematic diagram of a data source grouping scheme;
FIG. 2 is a schematic diagram of a data source scheduling scheme.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
The data backup method based on policy distribution comprises a data source grouping process and a data source scheduling process, and can effectively solve the problem that backup resources are excessively concentrated in a single node, so that the backup resources are evenly distributed to different computing nodes or storage nodes for data distribution, and the backup performance is improved to the greatest extent.
As shown in fig. 1, the data source grouping process specifically includes:
step 101), calling an API interface of a virtualization platform, and sequentially acquiring the attribute of the acquired backup data source;
step 102), selecting a corresponding strategy mode according to the configuration attribute: a host policy or a storage policy;
step 103), if a host policy is selected: classifying according to the position of the computing node where the data source is located;
step 104), if a storage policy is selected: classifying according to the storage nodes where the data sources are located.
The attributes of the backup data source in step 101) include storage locations and computing node locations. The classification in the step 103) is specifically as follows: data sources with the same computing node locations are assigned to the same data source container, and different computing node containers form a data source group. The classifying in step 104) specifically includes: data sources with the same storage location are allocated to the same data source container, and different storage node containers form a data source group.
As shown in fig. 2, the data source scheduling process specifically includes:
step 201), each data source container has an attribute: the backup number BN is used for recording the number of the current data source containers in backup subtasks;
step 202), a scheduler consists of N sub-tasks, and each sub-task is responsible for processing backup work of a data source;
step 203), the subtask Tn of the scheduler applies for the data source to the data source group Gn;
step 204), the data source group Gn searches the data source container Cn with the minimum BN value, and takes out a data source dn from Cn, and adds 1 to the BN value of Cn;
step 205), the data source group returns the searched data source dn to the subtask Tn;
step 206), after the subtask Tn finishes the data source dn backup, notifying the data source group that the data source dn backup is finished;
step 207), searching a data source container Cn to which the found data source dn belongs, and subtracting 1 from the BN value of Cn;
step 208), after the sub-task Tn completes the backup, the execution of steps 203), 204), 205), 206), 207) continues until the sub-task execution is exited when no data source is available.
Step 209), after all the subtasks are executed out, the scheduler completes the backup task of the data source set.
The invention is realized by adopting C++, so that the implemented backup node needs to install a C++ runtime library.
Windows environment requires the installation of vc++ runtime libraries. Linux environments require glibc version compatibility.
2. The C++ implementation program is divided into several modules: the system comprises a data source module, a strategy module, a data source group module and a scheduler module.
3. The data source module realizes the following functions: getDataSouceinfo, acquires attributes of the data source.
4. The strategy module realizes the following functions: creating a host policy or a storage policy; classifyDataSouce, uses policies to classify according to data source attributes.
5. The data source group module has the following implementation functions: applying for a data source from the data source group, and completing the BN value plus 1 operation of a data source container associated with the data source; freeDataSouce releases the data source after the data source is backed up, and completes BN1 minus 1 operation of the data source container associated with the data source.
6. The scheduler module has the following implementation functions: createTasks, creates a specified number of subtasks. Task run, subtask run process backup flow.
7. There are data source clusters { vm1, vm2, …, vmn }, and the data source attributes { attr1, attr2, …, attrn } are obtained using GetDataSouceInfo.
8. Calling CreateSttategy to create a strategy mode strategy_x according to the configuration information;
9. the interface ClassifyDataSouce of strategy_x is called, and { vm1, vm2, …, vmn } is classified according to { attr1, attr2, …, attrn }, so as to generate a data source group DG1, which is composed of data source container sets { dc1, dc2, …, dcn } of different types.
10. The scheduler module generates a corresponding number of subtask sets { Task1, task2, …, task N }, based on the configuration information.
11. And the scheduler module controls the subtask backup flow through the TaskRun.
12. Any subtask obtains a data source from the data source group DG1 through the interface ApplyDataSouce.
13. And (3) any subtask, and calling FreeDataSouce to release the occupation of the data source after the data source is backed up.
14. And finishing the task running of all tasks, and finishing the backup task.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (4)

1. The data backup method based on policy splitting is characterized by comprising a data source grouping process and a data source scheduling process, so that backup resources are evenly distributed to different computing nodes or storage nodes to perform data splitting, and the backup performance is improved to the greatest extent;
the data source grouping process specifically comprises the following steps:
step 101), calling an API interface of a virtualization platform, and sequentially acquiring the attribute of the acquired backup data source;
step 102), selecting a corresponding strategy mode according to the configuration attribute: a host policy or a storage policy;
step 103), if a host policy is selected: classifying according to the position of the computing node where the data source is located;
step 104), if a storage policy is selected: classifying according to storage nodes where data sources are located;
the data source scheduling process specifically comprises the following steps:
step 201), each data source container has an attribute: the backup number BN is used for recording the number of the current data source containers in backup subtasks;
step 202), a scheduler consists of N sub-tasks, and each sub-task is responsible for processing backup work of a data source;
step 203), the subtask Tn of the scheduler applies for the data source to the data source group Gn;
step 204), the data source group Gn searches the data source container Cn with the minimum BN value, and takes out a data source dn from Cn, and adds 1 to the BN value of Cn;
step 205), the data source group returns the searched data source dn to the subtask Tn;
step 206), after the subtask Tn finishes the data source dn backup, notifying the data source group that the data source dn backup is finished;
step 207), searching a data source container Cn to which the found data source dn belongs, and subtracting 1 from the BN value of Cn;
step 208), after the sub-task Tn completes the backup, continuing to execute steps 203), 204), 205), 206), 207) until the sub-task execution is exited when no data source is available;
step 209), after all the subtasks are executed out, the scheduler completes the backup task of the data source set.
2. The method of claim 1, wherein the attributes of the backup data source in step 101) include storage locations and computing node locations.
3. The method for policy-based data backup according to claim 1, wherein the classifying in step 103) is specifically: data sources with the same computing node locations are assigned to the same data source container, and different computing node containers form a data source group.
4. The method for backup of data based on policy splitting according to claim 1, wherein the classifying in step 104) is specifically: data sources with the same storage location are allocated to the same data source container, and different storage node containers form a data source group.
CN201910147338.2A 2019-02-27 2019-02-27 Data backup method based on policy splitting Active CN109960610B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910147338.2A CN109960610B (en) 2019-02-27 2019-02-27 Data backup method based on policy splitting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910147338.2A CN109960610B (en) 2019-02-27 2019-02-27 Data backup method based on policy splitting

Publications (2)

Publication Number Publication Date
CN109960610A CN109960610A (en) 2019-07-02
CN109960610B true CN109960610B (en) 2023-06-06

Family

ID=67023902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910147338.2A Active CN109960610B (en) 2019-02-27 2019-02-27 Data backup method based on policy splitting

Country Status (1)

Country Link
CN (1) CN109960610B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702734A (en) * 2009-10-27 2010-05-05 北京算通数字技术研究中心有限公司 Dispatching method and device of streaming media data and nodal equipment
CN102014282A (en) * 2010-10-25 2011-04-13 深圳市融创天下科技发展有限公司 Distributed video transcoding scheduling method and system
CN102622273A (en) * 2012-02-23 2012-08-01 中国人民解放军国防科学技术大学 Self-learning load prediction based cluster on-demand starting method
US8392572B2 (en) * 2010-02-12 2013-03-05 Elitegroup Computer Systems Co., Ltd. Method for scheduling cloud-computing resource and system applying the same
CN103139243A (en) * 2011-11-24 2013-06-05 明博教育科技有限公司 File synchronization method based on star distributed system
CN105426252A (en) * 2015-12-17 2016-03-23 浪潮(北京)电子信息产业有限公司 Thread distribution method and system of distributed type file system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702734A (en) * 2009-10-27 2010-05-05 北京算通数字技术研究中心有限公司 Dispatching method and device of streaming media data and nodal equipment
US8392572B2 (en) * 2010-02-12 2013-03-05 Elitegroup Computer Systems Co., Ltd. Method for scheduling cloud-computing resource and system applying the same
CN102014282A (en) * 2010-10-25 2011-04-13 深圳市融创天下科技发展有限公司 Distributed video transcoding scheduling method and system
CN103139243A (en) * 2011-11-24 2013-06-05 明博教育科技有限公司 File synchronization method based on star distributed system
CN102622273A (en) * 2012-02-23 2012-08-01 中国人民解放军国防科学技术大学 Self-learning load prediction based cluster on-demand starting method
CN105426252A (en) * 2015-12-17 2016-03-23 浪潮(北京)电子信息产业有限公司 Thread distribution method and system of distributed type file system

Also Published As

Publication number Publication date
CN109960610A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
US10929244B2 (en) Optimized backup of clusters with multiple proxy servers
US8739171B2 (en) High-throughput-computing in a hybrid computing environment
US8914805B2 (en) Rescheduling workload in a hybrid computing environment
US9513962B2 (en) Migrating a running, preempted workload in a grid computing system
Wang et al. Workflow as a service in the cloud: architecture and scheduling algorithms
US8850434B1 (en) System and method of constraining auto live migration of virtual machines using group tags
WO2016078008A1 (en) Method and apparatus for scheduling data flow task
US20120222042A1 (en) Management of heterogeneous workloads
US20150143380A1 (en) Scheduling workloads and making provision decisions of computer resources in a computing environment
Liu et al. Scheduling parallel jobs with tentative runs and consolidation in the cloud
US20220253341A1 (en) Memory-aware placement for virtual gpu enabled systems
EP4052125A1 (en) Mitigating slow instances in large-scale streaming pipelines
Groesbrink et al. Architecture for adaptive resource assignment to virtualized mixed-criticality real-time systems
Ali et al. Cluster-based multicore real-time mixed-criticality scheduling
Liu et al. Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud.
Aggarwal et al. Survey on scheduling algorithms for multiple workflows in cloud computing environment
Gouasmi et al. Cost-efficient distributed MapReduce job scheduling across cloud federation
CN109960610B (en) Data backup method based on policy splitting
Komarasamy et al. Adaptive deadline based dependent job scheduling algorithm in cloud computing
Zhou et al. Performance analysis of scheduling algorithms for dynamic workflow applications
Deng et al. Vmerger: Server consolidation in virtualized environment
Zhang et al. Cost-efficient and latency-aware workflow scheduling policy for container-based systems
Chen et al. Speculative slot reservation: Enforcing service isolation for dependent data-parallel computations
Rodrigo Álvarez et al. A2l2: An application aware flexible hpc scheduling model for low-latency allocation
Upadhye et al. Cloud resource allocation as non-preemptive approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant