WO2018177107A1 - 数据迁移方法、迁移服务器及存储介质 - Google Patents

数据迁移方法、迁移服务器及存储介质 Download PDF

Info

Publication number
WO2018177107A1
WO2018177107A1 PCT/CN2018/078398 CN2018078398W WO2018177107A1 WO 2018177107 A1 WO2018177107 A1 WO 2018177107A1 CN 2018078398 W CN2018078398 W CN 2018078398W WO 2018177107 A1 WO2018177107 A1 WO 2018177107A1
Authority
WO
WIPO (PCT)
Prior art keywords
relationship
data
task
migration
relationship chain
Prior art date
Application number
PCT/CN2018/078398
Other languages
English (en)
French (fr)
Inventor
刘军
方锦亮
赵重庆
温伟飞
李良必
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018177107A1 publication Critical patent/WO2018177107A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the embodiments of the present invention relate to the field of network technologies, and in particular, to a data migration method, a migration server, and a storage medium.
  • PB Packetabyte
  • IDC Internet Data Center
  • the service cluster when it performs service processing, it creates a corresponding computing task for the service and allocates corresponding computing resources for the computing task, and executes the computing task to execute the processing process of the service data. Because the various services are usually related to each other, in order to avoid affecting the related other services when migrating the business data of one service, the data of the service cluster is usually migrated as a whole. After all computing tasks (ie, stopping providing services to all services), all business data is migrated to the new service cluster, and then the computing tasks are reconfigured and the corresponding computing resources are allocated in the new service cluster, after which the reconfigured computing tasks are initiated. This completes the data migration by re-serving all services.
  • all computing tasks ie, stopping providing services to all services
  • the embodiment of the invention provides a data migration method, a migration server and a storage medium, which can solve the problems of the related art.
  • the technical solution is as follows:
  • a data migration method comprising:
  • the computing tasks indicated by the relationship chains that are not migrated among the plurality of relationship chains are normally run.
  • a data migration apparatus comprising:
  • a first acquiring unit configured to acquire, according to a computing task log of the original service cluster, a plurality of relationship chains, where the computing task log is used to record an association relationship between the computing task and the service data in the original service cluster, where each relationship chain is used. For indicating a set of computing tasks and business data having an association relationship;
  • a migration unit configured to sequentially migrate the service data and the computing task indicated by the multiple relationship chains to the target service cluster in units of relationship chains;
  • the computing tasks indicated by the relationship chains that are not migrated among the plurality of relationship chains are normally run.
  • a migration server comprising: a processor and a memory, the memory storing at least one instruction loaded by the processor and executed to:
  • the computing tasks indicated by the relationship chains that are not migrated among the plurality of relationship chains are normally run.
  • a computer readable storage medium stores at least one instruction loaded by a processor and executed to implement a method as performed by a migration server The action performed in .
  • the business data and the calculation task with the association relationship are represented by a relationship chain, so that the relationship chain being migrated will not be the other in the process of data migration in the relationship chain unit.
  • the relationship chain has an impact, and the computing tasks indicated by the relationship chain that has not been migrated can still be normally operated, so as not to affect the normal use of the service indicated by the relationship chain that has not been migrated.
  • FIG. 1A is a schematic diagram of an implementation scenario provided by an embodiment of the present invention.
  • FIG. 1B is a structural diagram of a migration server according to an embodiment of the present invention.
  • 2A is a flowchart of a data migration method according to an embodiment of the present invention.
  • 2B is a schematic diagram of a relationship chain according to an embodiment of the present invention.
  • 2C is a schematic diagram of a relationship chain splitting according to an embodiment of the present invention.
  • 2D is a schematic diagram of a relationship chain splitting according to an embodiment of the present invention.
  • 2E is a schematic diagram of a relationship chain splitting according to an embodiment of the present invention.
  • 2F is a schematic diagram of accessing key service data by a split relationship chain according to an embodiment of the present invention.
  • 2G is a schematic diagram of a process related to a double write table mechanism according to an embodiment of the present invention.
  • 2H is a schematic diagram of a migration state in a relationship chain migration process according to an embodiment of the present invention.
  • FIG. 3 is a block diagram of a data migration apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram of a data migration apparatus according to an embodiment of the present invention.
  • FIG. 1A is a schematic diagram of an implementation scenario of data migration according to an embodiment of the present invention.
  • the implementation scenario includes an original service cluster, a target service cluster, and a migration server.
  • the original service cluster is a service cluster that needs to migrate business data
  • the target service cluster is a service cluster to which service data is migrated.
  • the service cluster may include multiple storage clusters and multiple computing clusters, the storage clusters are used to store business data, and the computing clusters are used to run computing tasks and store related data of computing tasks, such as computing resource size of computing tasks and location of computing resources.
  • the storage cluster and the computing cluster may be deployed on different servers or on the same server, which is not limited in this embodiment.
  • the service cluster when it performs business processing, it creates a corresponding computing task for the service and allocates corresponding computing resources for the computing task, and executes the computing task to execute one or more business processing processes, for example, from The service cluster reads a certain service data, processes the service data, and writes another service data that is output to the service cluster.
  • the computing task has a certain running periodicity, and the running period may be several hours, several days, several weeks, or several months, for example, a computing task with a running period of one hour, and runs every hour.
  • the operation period of the different computing tasks may be the same or different, and the type of the computing task is related to the processing speed of the service data, which is not limited in this embodiment.
  • the service cluster also maintains a data path mapping table, which is used for the correspondence between the service data identifier and the storage path of the service data.
  • the computing task can determine the storage path of the read or written service data through the data path mapping table in the service cluster, thereby completing the process of reading or writing the service data according to the obtained storage path.
  • the business data read by one computing task may be written by other computing tasks, and the business data written by one computing task may be read by other computing tasks, so that there is a relationship between the computing task and the business data. Certain input and output relationships.
  • the migration server is used to migrate the data of the service cluster and manage the data migration process.
  • the migration server can be deployed in the original service cluster or deployed in the target service cluster. Of course, it can also be deployed in the original service cluster. On a different server than the target service cluster that can communicate with both.
  • the migration server needs to migrate data in the original service cluster to the target service cluster, and the migrated data relates to service data and calculation tasks in the original service cluster.
  • the migration server may include multiple modules, each of which plays a different role during the data migration process.
  • FIG. 1B is an architectural diagram of a migration server, which includes multiple functional modules. The following describes the functions of each functional module:
  • the analyzing module is configured to perform the process of acquiring multiple relationship chains according to the computing task log indicated by the following steps 201 to 203; the splitting module is configured to perform the process of splitting the relationship chain indicated by the following step 204; The module is configured to perform the process of consistency checking of the migration subtask and the relationship chain in step 206 below.
  • the migration module is configured to perform the process of the service data migration and the calculation task migration in the following steps 205 to 208, wherein after the migration of the service data indicated by the relationship chain is completed, the migration module executes the storage path of the data path mapping table.
  • the switching process corresponds to step 207.
  • the migration process of the computing task refers to the process of switching the configuration task configuration information, and the configuration information of the computing task can be obtained from the configuration database, and the process corresponds to step 208. If the migrated relationship chain is a split relationship chain, the key service data needs to be synchronized, and the process corresponds to step a in step 206.
  • the data path mapping table synchronization refers to adding a target storage path of the business data migrated to the target service cluster to the path mapping table.
  • the migration server foreground can be used to manage the relationship chain migration process, for example, can display various information of the relationship chain, the connection relationship of each node in the relationship chain, the migration state of the relationship chain in the migration process, and the relationship chain.
  • the configuration library is configured to store configuration information of the computing resource of the computing task, such as the size of the computing resource and the location information.
  • the configuration repository may also store the original storage path of the service data in the original service cluster and the target storage path of the target service cluster.
  • the task relationship chain is used to store multiple relationship chains generated by the analysis module.
  • the migration task library is used to store information about the migration subtasks, such as the number of the migration subtask, the indicated service data, the original storage path and the target storage path of the service data, and the amount of data of the service data.
  • This embodiment is applied to the scenario of service data migration.
  • the owner of the original service cluster opens the service data to the new owner, causing the service data to be migrated to the target service cluster of the new owner, or the size of the original service cluster is already
  • the service needs cannot be met.
  • the target service cluster needs to be deployed in a larger area of the equipment room to migrate the service data to the target service cluster.
  • the original service clusters may be placed in the IDC1 equipment room, and the target service clusters may all be placed in the IDC2 equipment room.
  • the geographic locations of the IDC1 equipment room and the IDC2 equipment room are different.
  • the size of the target service cluster is larger than the size of the original service cluster.
  • the number of servers that can be accommodated in the corresponding IDC2 room is greater than the number of servers that can be accommodated in the IDC1 room.
  • the original service cluster or the target service cluster may be placed in different IDC equipment rooms, which is not limited in this embodiment.
  • FIG. 2A is a flowchart of a data migration method according to an embodiment of the present invention.
  • the method process provided by the embodiment of the present invention includes:
  • the calculation task log is generated in the running process.
  • the calculation task log is used to record the relationship between the computing task and the service data in the original service cluster.
  • the computing task log includes a task identifier of the computing task, a storage path of the business data, an input/output relationship between the computing task and the business data, and other information, and the input and output relationship between the computing task and the business data is used to indicate the input service of the computing task.
  • the data and the output service data, the other information may include the service information to which the task belongs, such as the service identifier, the user information to which the service belongs, and the like.
  • the storage path of the service data may be used to indicate the service data, and the same storage path is used to indicate the same service data, and the calculation task accesses the service data through the storage path of the service data.
  • the migration server can obtain the calculation task log from the original service cluster, and extract multiple input and output records from the calculation task log.
  • the input and output records are used to indicate the task identifier of the computing task, the storage path of the business data, and the input/output relationship between the computing task and the business data. As shown in Table 1, an input and output recording table is shown.
  • the migration server may analyze the plurality of input and output records extracted from the calculation task log to obtain a plurality of relationship chains for indicating the relationship between the computing task and the service data, where the multiple relationship chains are acquired.
  • the process is detailed in steps 202 to 204 below.
  • the migration server adds the same relationship chain identifier to the input and output records having the association relationship
  • the process of adding different relationship chain identifiers to the input and output records having no association relationship may be:
  • Each input and output record is traversed, for the first input and output record currently traversed, if the traversed input and output record includes a second input and output record having an association relationship with the first input and output record, the first input is The output record adds the same relationship chain identifier as the second input-output record; if the traversed input-output record does not include the second input-output record having an association relationship with the first input-output record, the first input-output record is Add a relationship chain identifier that is different from the traversed input and output records.
  • the relationship between the first input/output record and the second input/output record means that the computing task indicated by the first input/output record has an input/output relationship with the service data indicated by the second input/output record, or The business data indicated by an input/output record has an input-output relationship with the computing task indicated by the second input/output record.
  • each input and output record of Table 1 is traversed.
  • a relationship chain identifier 1001 is added for the input and output record, assuming that the input and output records of the current traversal are The second input and output record "task 2, IN, storage path 1", then the "storage path 1" in the first input and output record that has been traversed has an input relationship, and then the input and output records that have been traversed are determined.
  • a relationship chain identifier 1001 identical to the first input and output record is added for the second input and output record.
  • the input and output records of the current traversal are the last input and output records in Table 1 "Task 5, IN, Storage Path 5", and there is no input or output between the service data indicated by "Task 5" and all the input and output records that have been traversed. Relationship, and there is no input-output relationship between the "storage path 5" and the calculated task indicated by all the input and output records that have been traversed. Therefore, the second input-output record is not included in the traversed input-output record, therefore, for the current
  • the last input and output record of the traversal adds a different relationship chain identifier than the traversed input and output record.
  • the different relationship chain identifier may be 1002 or the like. After adding the relationship chain identifier for all input and output records, the relationship list shown in Table 2 can be obtained.
  • the migration server may abstract the input and output records having the same relationship chain identifier into a first relationship chain, where the first relationship chain includes a task node for indicating a computing task and data for indicating service data.
  • the task node includes a task identifier of the computing task in the first relationship chain
  • the data node includes a storage path of the service data.
  • the generated first relationship chain is as shown in FIG. 2B, and the service data and the calculation task indicated by the relationship chain identifier 1001 corresponding to the input and output records are shown in FIG. 2B.
  • the connection from the task node 1 to the data node 1 is used to instruct the computing task 1 to write service data to the storage path 1, that is, the service data is the output data of the computing task 1.
  • the connection from the data node 1 to the task node 2 is used to instruct the computing task 2 to read the service data from the storage path 2, that is, the service data is the input data of the computing task 2.
  • the service data or the computing task indicated by any of the first relationship chains does not have an association relationship with the computing tasks or service data indicated by the other first relationship chains. Therefore, the service data and the calculation task in the original service cluster can be migrated in units of the first relationship chain, and the other first relationship chains are not affected when the service data and the calculation task indicated by one relationship chain are migrated. Indicates the normal operation of the computing task.
  • the time of data migration will be constrained by the amount of data migrated and the bandwidth of the network, and usually the network bandwidth is limited.
  • the migration process is further reduced.
  • the first relationship chain corresponding to the larger amount of data may be further split in this embodiment. For details, refer to step 204.
  • first relationship chain includes the second relationship chain
  • the first threshold may be set by the migration server according to a preset migration time and a network bandwidth of the relationship chain. For example, if the network bandwidth is 2 GB/s (gigabytes per second) and the preset migration time is 2 minutes, the first threshold is A threshold value is at most 120 GB. Of course, the first threshold may also be smaller than the 120 GB to avoid impact on network bandwidth due to unstable network environment.
  • the preset migration time may be preset by the migration server, or may be set according to the service requirement of the user, and is not limited in this embodiment.
  • the migration server may obtain the data volume of the service data indicated by the first relationship chain according to the storage path indicated by the data node in the first relationship chain. If the service data indicated by the first relationship chain exceeds the first threshold, determining that the first relationship chain is a second relationship chain, and determining that the second relationship chain needs to be split, the splitting process may include the following Steps 204a to 204c:
  • Step 204a Acquire weights of multiple data nodes in the second relationship chain.
  • the weight of each data node is used to indicate the degree of association of the data node in the second relationship chain. The higher the weight, the higher the degree of association of the data node.
  • the process of obtaining the weights of the plurality of data nodes may be: for each of the plurality of data nodes, a product of the number of task nodes associated with the data node and the data amount of the service data indicated by the data node, Determined as the weight of the data node.
  • the task node associated with the data node 1 includes task nodes 1 to 4, and the number of task nodes is 4, assuming the data amount of the service data indicated by the data node 1 If it is 100 GB, the data node 1 has a weight of 4*100 equal to 400.
  • first relationship chain splitting is to split the relationship chain with a large amount of data into a relational chain with a small amount of data, and for any data node in the relationship chain, if the data node is associated with the data node The more associated task nodes, the more the number of relational chains that can be split based on the data node, so that the data volume of the service data indicated by each of the split relationship chains is balanced and does not cause The amount of data of a certain relational chain is too large. Therefore, when determining the weight of the data node, it is necessary to consider two factors of the number of task nodes associated with the data node and the amount of data of the service data indicated by the data node.
  • Step 204b Obtain a key data node from a plurality of data nodes according to a sequence of weights from high to low and a position of the plurality of data nodes on the second relationship chain, where the key data node is the first one in the sequence
  • the two relationship chains are split into data nodes of at least two third relation chains.
  • the migration server analyzes each data node according to the order of weights from high to low. For example, the migration server is based on the data node to the second relationship.
  • the chain is pre-split, and the number of the third relationship chain that can be obtained by splitting the second relationship chain is determined. If the number of the third relationship chains obtained by the splitting is less than 2, the order of the weights is from high to low. Performing a pre-split process on the next data node; if the number of the third relationship chains obtained by the splitting is not less than 2, the data node is determined as a key data node, and the second relationship chain is split based on the key data node Minute. After the key data nodes are obtained from the plurality of data nodes, the migration server no longer performs a pre-split process on the data nodes subsequent to the critical data node in the above-mentioned arrangement order.
  • the method for determining the number of the third relationship chains that can be split according to the pre-separation of the second relationship chain by the data node may be: disconnecting the data node and the task node associated with the data node.
  • the process of determining connectivity between nodes other than the data node in the second relationship chain may be: traversing a node other than the data node, for example, arbitrarily selecting one node as a starting point, if Each node can traverse to determine that nodes other than the data node are connected, otherwise, it is determined that nodes other than the data node are disconnected.
  • the foregoing process of pre-splitting the second relationship chain is not a process of actually splitting the second relationship chain, but the migration server assumes that the second relationship chain can be split into multiples based on the corresponding data node.
  • the analysis process of the third relationship chain is not a process of actually splitting the second relationship chain, but the migration server assumes that the second relationship chain can be split into multiples based on the corresponding data node.
  • Step 204c Split, according to the key data node, a plurality of task nodes associated with the key data node in the second relationship chain into the plurality of third relationship chains.
  • the process of splitting the second relationship chain into multiple third relationship chains based on the key data node by the migration server may be classified into the following three cases:
  • the node in which the task node has an association relationship is determined to be the third relationship chain.
  • the key data node is included in each of the third relationship chains. It is still assumed that the key data node in the relationship chain shown in FIG. 2B is the data node 1, as shown in FIG. 2C, which is a plurality of thirds obtained by splitting the relationship chain shown in FIG. 2B based on the data node 1 in this case. Schematic diagram of the relationship chain.
  • the critical data node is determined as a third relationship chain, and each of the plurality of task nodes associated with the critical data node is other than the critical data node and The node with the associated relationship of the task node acts as a third relational chain.
  • the key data node is used alone as a third relationship chain.
  • the key data node is first split from the second relationship chain as a third relationship chain.
  • traversing the task node as a starting point, and determining, as the starting point, all the nodes that can be traversed are associated with the task node. node.
  • the key data node in the relationship chain shown in FIG. 2B is the data node 1, as shown in FIG. 2D, in this case, based on the data node 1, the relationship chain shown in FIG. 2B is split to obtain a plurality of third relationship chains.
  • Schematic diagram It should be noted that FIG. 2B is only shown as an example, and does not represent the actual splitting result.
  • a third relationship chain other than the key data node should include multiple nodes instead of Will only include one task node.
  • the key data node, the at least one task node directly associated with the key data node, and the node associated with the at least one task node are split into a third relationship chain, and the split relationship is The task node and the data node outside the third relationship chain are split into at least one third relationship chain.
  • the task node directly associated with the key data node refers to a task node that is a child node or a parent node of the key service data.
  • the key data node is split into a third relationship chain with at least one task node directly associated with it.
  • the process of splitting the task node and the data node other than the split third relationship chain into at least one third relationship chain, and in addition to the key data node in the second case The process in which the task node has the associated relationship is the same as the process of the third relational chain, and will not be described here.
  • the key data node in the relationship chain shown in FIG. 2B is the data node 1, as shown in FIG. 2E, which is obtained by splitting the relationship chain shown in FIG. 2B based on the data node 1 in this case.
  • a schematic diagram of the third relationship chain is shown in FIG. 2B.
  • the third relationship chain when the third relationship chain is migrated, if the key data node is detected in the relationship chain, the target storage path of the critical data node is written into the data path mapping table.
  • the second case and the third case After the split, the key business data can be copied to the target service cluster.
  • the migration server adds different relationship chain identifiers to the multiple third relationship chains.
  • the second relationship chain is formally split into a plurality of third relationship chains, and the third relationship chain may be the third in order to distinguish the third relationship chain obtained by the splitting from the first relationship chain that is not split.
  • the relationship chain adds a split identifier, which can be reflected in the relationship chain identifier. For example, the first two digits in the relationship chain identifier are used as the split identifier.
  • the format of the relationship chain identifier may be xx_yyyy, where xx is used to represent the split identifier, such as 00 represents the un-separated first relationship chain, and 01 represents the third relationship chain obtained by the split. Where yyyy is used to indicate the number of the relationship chain.
  • the computing task in the process of data migration in the relationship chain, the computing task can still run, and new service data is generated in the running process. Due to the limitation of the network bandwidth, when the relationship chain is too large, it is likely to be generated. The speed of the new business data is greater than the speed of the migration of the business data, which will cause the relationship chain to never be migrated. Therefore, splitting the large relationship chain into small relationship chains can ensure the realization of the relationship chain in the case of normal running computing tasks. Indicates the migration of business data.
  • step 203 and step 204 are processes for generating a plurality of relationship chains according to the association relationship between the calculation task and the service data indicated by the input and output records identified by the same relationship chain, and each relationship chain includes a task for indicating the calculation task.
  • the above steps 202 to 204 are steps of acquiring a plurality of relationship chains according to the calculation task log of the original service cluster.
  • Each relationship chain is used to indicate a set of computing tasks and business data with an associated relationship.
  • the migration server can sequentially migrate the service data and the calculation task indicated by the multiple relationship chains to the target service cluster in units of relationship chains.
  • sequential migration means that data migration can be performed only for one relationship chain at a time, or parallel migration can be performed for several relationship chains.
  • the computing tasks indicated by the relationship chains that are not migrated in the plurality of relationship chains are normally run.
  • the migration process of a relationship chain includes the following steps 205 to 208.
  • multiple migration subtasks may be generated according to multiple service data indicated by the relationship chain, and the process of generating multiple migration subtasks may be:
  • Each of the plurality of service data indicated by the relationship chain performs the following process: determining whether the data volume of the service data is less than a second threshold; if the data volume of the service data is less than the second threshold, generating a migration corresponding to the service data If the data volume of the service data is not less than the second threshold, the service data is divided into multiple sub-service data according to the second threshold, and the traffic sub-task is generated corresponding to each sub-service data.
  • the data volume of each sub-service data is less than a second threshold.
  • the second threshold may be preset or changed by the migration server, which is not limited in this embodiment.
  • the service cluster records the storage time of the service data, and the migration server can determine the generation time of the service data according to the stored storage time.
  • the migration server may add configuration information for each migration sub-task, and the configuration information may include an original storage path and a target storage path of the corresponding service data.
  • the service data shown in this embodiment refers to the service data stored in a storage path.
  • the migration subtask corresponding to the service data is used. Migrate the business data stored in a storage path.
  • the service data indicated by the relationship chain is migrated to the target service cluster according to multiple migration subtasks.
  • the migration server can migrate the business data to the target service cluster according to the original storage path and the target storage path.
  • a plurality of subtasks corresponding to a relationship chain may be executed in sequence or in parallel, which is not limited in this embodiment.
  • the granularity of data migration is reduced, and the multiple migration sub-tasks can be run in parallel, which improves the migration efficiency of the service data indicated by the relationship chain.
  • the computing task indicated by the relationship chain may continue to run. Therefore, the service data stored in the storage path in the relationship chain may occur.
  • Update For the service data stored in a storage path, the service data stored before the relationship chain is generated is referred to as historical service data, and the service data updated after the relationship chain is generated is referred to as new service data.
  • the service data generation time can be performed under a certain storage path from the first to the last. The migration of the service data is performed, that is, the historical service data is preferentially migrated to avoid the problem that the user needs to retransmit the service data when the user changes the service data, thereby reducing the migration efficiency.
  • this embodiment further provides a data verification mechanism for the migration subtask, and the data verification process may be: for each of the multiple migration subtasks, the service data corresponding to the migration subtask is all After the migration to the target service cluster, the service data corresponding to the migration sub-task in the target service cluster and the original service cluster is checked for consistency. If the consistency check is successful, the service data corresponding to the migration sub-task is successfully migrated. If the consistency check fails, it is determined that the migration of the service data corresponding to the migration subtask fails, and the migration subtask is re-executed. It should be noted that the configuration information of each migration sub-task may further include the data volume size of the corresponding service data.
  • the migration server detects that the data volume of the service data migrated to the target service cluster reaches the migration.
  • the amount of data indicated by the subtask is determined, it is determined that the service data corresponding to the migration subtask has all migrated to the target service cluster.
  • the scope of the consistency check on the service data corresponding to the migration sub-task includes: verification of the data volume of the service data, verification of the number of files included in the service data, and verification of the data content of the service data.
  • the migration server may perform a consistency check on the service data corresponding to the migration sub-task by using a preset algorithm, and the preset algorithm may be preset.
  • the preset algorithm may be a CRC (Cyclic Redundancy Check). Code) verification algorithm.
  • the timing of re-executing the migration sub-task may be performed immediately after determining that the consistency check of the corresponding service data fails, or may be performed after a preset time period after determining that the consistency check of the corresponding service data fails, and may also be performed.
  • the migration sub-tasks that are failed to be re-executed are re-executed, which is not limited in this embodiment.
  • the fine-grained verification of the service data is implemented, so that when the service data migration is in error, the data can be re-migrated at the granularity of the migration subtask, compared with the prior art.
  • the business data migration error occurs, all business data needs to be re-migrated, which reduces the cost of data errors during the migration process and improves the efficiency of data migration.
  • the related computing tasks are not stopped during the entire process of migrating the relationship chain, but in the service data. After migrating to a certain level, the computing task is migrated during the period in which the computing task stops running to minimize the time when the computing task stops running. In the process of migrating the service data indicated by the relationship chain, the following steps 206a to 206d may also be performed.
  • Step 206a In the process of migrating the service data indicated by the relationship chain, obtain a migration progress of the service data indicated by the relationship chain.
  • the migration server may obtain the migration progress of the service data indicated by the relationship chain according to the total data volume of the service data indicated by the relationship chain and the migrated data volume of the relationship chain service data.
  • the migration progress can be expressed as a ratio between the amount of migrated data and the total amount of data. For example, if the total data volume of the service data indicated by the relationship chain is 100 GB, and the migrated data volume is 60 GB, the relationship chain can be determined. Indicates that the migration progress of business data is 60%.
  • step 206b when the migration progress of the service data exceeds the preset progress, it is determined whether the computing task is in the stopped running state for each computing task indicated by the relationship chain, and if the computing task is in the stopped running state, step 206c is performed. If the computing task is in a running state, step 206d is performed.
  • the preset progress may be preset or modified by the migration server.
  • the preset progress may also be dynamically adjusted by the migration server according to the network bandwidth. For example, when the migration server detects that the network bandwidth is reduced, the preset may be appropriately increased. Set the value of the progress to reduce the time spent on the task of the migration task with the greatest possible reduction.
  • Step 206c If the computing task is in a stopped running state, maintaining the stopped running state of the computing task before the relationship chain completes the migration.
  • Step 206d If the computing task is in the running state, wait for the computing task to stop running, and maintain the stopped running state of the computing task before the relationship chain completes the migration.
  • the process of maintaining the stop running state of the computing task in step 206c and step 206d may be referred to as a freeze computing task process.
  • the message of the frozen computing task may be displayed to the enterprise user through the migration server before the computing task is frozen, and the process of freezing the computing task is performed after the enterprise user confirms the freezing.
  • the migration server may perform consistency check on the service data indicated by the relationship chain in a relationship chain unit, and the process may be performed.
  • the consistency check is performed on the service data indicated by the relationship chain in the target service cluster and the original service cluster; if the consistency check is successful, the subsequent steps 207 and 208 are performed; if the consistency check fails, according to The result of the consistency check determines the service data of the migration failure indicated by the relationship chain, and re-migrates the service data that failed to be migrated.
  • the consistency check of the service data may be performed one by one for each migration subtask, or may be performed for each storage path in the relationship chain.
  • the service data corresponding to the migration sub-task or the storage path is the service data of the migration failure.
  • the migration server can use the corresponding original migration sub-task or re-establish the migration sub-task to re-migrate the service data that failed to be migrated.
  • the specific migration process is the same as the above-mentioned data migration process based on the migration sub-task. Narration.
  • the migration server can distinguish whether the migrated relationship chain is the first relationship chain that has not been split or the third relationship chain that is obtained by splitting according to the relationship chain identifier.
  • the migration server may add a specified identifier to the key data node in each third relationship chain. The specified identifier is used to identify whether the key relationship is included in the migrated relationship chain, thereby determining whether the migrated relationship chain is the third relationship chain.
  • the multiple third relationships can be guaranteed in the process of performing data migration according to the third relationship chain.
  • the key service data indicated by the key shared data node is synchronized.
  • a double write table mechanism is adopted, and two storage paths of the key service data are stored in the data path mapping table, and one is a target storage in the target service cluster.
  • the path and the original storage path in the original service cluster may be: obtaining a target storage path of the key service data in the target service cluster, the key service data is the service data indicated by the key data node; and the data path mapping table Add the target storage path and keep the original storage path of the key business data in the original service cluster.
  • the process of adding the target storage path to the data path mapping table may be performed after the relationship chain is split, or may be performed before the multiple third relationship chain migration, which is not limited in this embodiment.
  • the process of migrating the third relationship chain includes the following steps. a to step c:
  • Step a Synchronize key service data in the target service cluster and the original service cluster according to the target storage path and the original storage path.
  • the migration server When detecting the update of the key service data of the original service cluster or the target service cluster, the migration server synchronizes the key service data in the target service cluster and the original service cluster according to the target storage path and the original storage path.
  • Step b If the service data and the calculation task indicated by the third relationship chain have all migrated to the target service cluster, when the computing task indicated by the third relationship chain is run, the target storage path is recorded according to the data path mapping table. Key business data.
  • Step c If the service data and the calculation task indicated by the third relationship chain are not all migrated to the target service cluster, when the computing task indicated by the third relationship chain is run, the original storage path is accessed according to the data path mapping table. Business data.
  • the storage path of the key service data in the corresponding service cluster may be obtained from the data path mapping table according to the identifier of the service cluster where the data indicated by the third relationship chain is located. For example, if the data indicated by the third relationship chain is in the original service cluster, that is, the data indicated by the third relationship chain has not been successfully migrated to the target service cluster, the computing task indicated by the third relationship chain is executed.
  • the original storage path of the key business data is obtained from the data path mapping table, and the key service data is accessed through the original storage path.
  • 2F is a schematic diagram of accessing key service data in the process of migrating the third relational chain obtained by splitting according to the relationship chain shown in FIG. 2B, wherein the data node 1 corresponds to key service data, and the task node 1
  • the third relationship chain has been migrated to the target service cluster, and the third relationship chain where task nodes 2 to 4 are located has not been migrated to the target service cluster.
  • the computing task indicated by the task node 1 accesses the key service data through the target storage path of the key service data, and the task nodes 2 to 4 access the key service data through the original storage path of the key service data.
  • the process involved in the double-write table mechanism includes the following process ( 1) to (4):
  • This process corresponds to the process of obtaining key data nodes in the second relationship chain.
  • the storage path of the key service data in the corresponding service cluster is obtained from the data path mapping table.
  • the computing task indicated by the third relationship chain can access the key service data in the target service cluster, that is, the third relationship chain is released.
  • the dependency of the original storage path of critical business data is released.
  • the migration of the data indicated by the relationship chain further includes the migration of the source data of the service
  • the source data includes data input by the user at the user terminal
  • the real-time generation of the user terminal is not synchronized to the original service cluster.
  • the data is typically used by computing tasks.
  • the source data may be obtained from a real-time data processing server through a specified interface, and the source data and the service data indicated by the relationship chain are migrated together to the target service cluster so as not to affect the normality of the computing task. run.
  • the migration server may record the target storage path of each service data indicated by the relationship chain, and all the service data indicated by the relationship chain are migrated to the target service cluster. For each service data, the migration server may replace the original storage path of the service data with the target storage path of the service data in the data path mapping table.
  • the migration server determines that the service data corresponding to all the third relationship chains related to the key service data is migrated to the target service cluster, from the data path mapping table. Delete the original storage path of the key business data and retain the target storage path of the key business data.
  • the process of migrating the computing task indicated by the relationship chain to the target service cluster may be: acquiring the first computing resource information and the second computing resource information of the computing task, and replacing the first computing resource information of the computing task with The second computing resource information.
  • the first computing resource information is computing resource information configured for the computing task in the original service cluster
  • the second computing resource information is computing resource information configured for the computing task in the target service cluster.
  • the migration server starts all computing tasks indicated by the relationship chain, thereby completing the migration of the relationship chain.
  • the embodiment may also implement incremental migration of data, and the incremental migration includes the following two layers:
  • the first level the migration of new data during the migration process.
  • the migration server may record the time stamp of the latest input and output record in the calculation task log after acquiring multiple relationship chains according to the calculation task log.
  • the migration server can obtain the new input and output records generated after the time stamp is obtained from the original service cluster calculation task log according to the time stamp of the record.
  • the migration server can update the relationship chain that has not been migrated according to the newly added input and output records, and the process can be: adding any input and output records to any of the items, if the relationship chain is not migrated, the new input is included. Outputting the associated fourth relationship chain, and updating the fourth relationship chain according to the newly added input and output record; if the fourth relationship chain is not included in the relationship chain that has not been migrated, according to the newly added input and output The relationship between the record and other newly added input and output records is generated, and a new relationship chain is generated.
  • the process of generating a new relationship chain is the same as the process of generating multiple relationship chains, and will not be described here.
  • the fourth relationship chain associated with the newly added input and output record means that the service data indicated by the fourth relationship chain is associated with the computing task indicated by the newly added input and output record, or is the fourth The computing task indicated by the relationship chain has an association relationship with the business data indicated by the newly added input and output record.
  • the migration server may perform the step of updating the relationship chain that has not been migrated in the process of the relationship chain being migrated, or may be performed after the completion of a relationship chain migration.
  • the embodiment does not limit this.
  • the migration server can periodically obtain new input and output records in the calculation task log to periodically update the relationship chain that has not been migrated.
  • the new computing task generated in the original service cluster may have an association relationship with the service data indicated by the migrated relationship chain, and therefore, when the relationship chain indicates After the data is migrated to the target service cluster, the new computing task needs to read and write the associated service data from the target service cluster, and the service data is not in the same IDC room as the original service cluster. The read and write will occupy a large network bandwidth. Therefore, in the first level, the migration server can update the relationship chain that has not been migrated according to the calculation task log in time, so that the relationship chain can add a comprehensive indication of the original service.
  • the migration server can also monitor the network bandwidth usage of all computing tasks in the original service cluster, and the network bandwidth usage is higher than the pre- For the bandwidth calculation task, the migration server preferentially migrates the relationship chain where the computing task is located to the target service cluster.
  • the breakpoint is resumed based on the data migration state at the time of the interruption.
  • the process of resuming the breakpoint may be: when the migration based on any one of the relationship chains is detected, when the migration interruption operation of the relationship chain is detected, the migration subtask of the uncompleted migration is recorded, and the migration process of the relational chain is stopped; When the continuation migration operation of the relationship chain is detected, the business data and the calculation task indicated by the relationship chain are migrated to the target service cluster according to the migration subtask of the uncompleted migration.
  • the migration server may record the number of the migration subtasks and perform the migration in the order of numbering. For different migration subtasks, the migration server can record the status of the migration subtask, such as the status of incomplete migration, migration, and migration.
  • the migration server detects a migration interruption operation for a relationship chain, you can record the number of the migration subtask that did not complete the migration.
  • a continuation migration operation for the relationship chain is detected, only the migration subtasks that have not completed the migration are performed to migrate the business data and computing tasks that were not migrated when the relationship chain is interrupted to the target service cluster.
  • the migration server may also adopt different migration states to control the migration process, and adopt the state machine to manage the migration process, thereby avoiding the loss of the migration state and ensuring that the migration state is lost.
  • the migration process of the relationship chain can be interrupted arbitrarily and then resumed.
  • FIG. 2H shows a schematic diagram of the migration state involved in the relationship chain during the migration process. The following takes a migration process of a relational chain as an example to introduce each migration state:
  • Start migration Start migrating the data indicated by the relationship chain.
  • the migration state After determining that the migration process of the relationship chain is started, the migration state can be entered. In the migration state, the migration server obtains the source data from the real-time data processing server through the specified interface.
  • the migration server determines that the original storage path and the target storage path of the key service data indicated by the third relationship chain are included in the data path mapping table. In the middle, it enters the migration state waiting for the user to confirm.
  • Freeze computing task In this migration state, the migration server performs the above steps 206b and 206c, and enters the next migration state when all computing tasks are in a stopped state.
  • Service data consistency check After the service data indicated by the relationship chain is completely migrated to the target service cluster, the migration state is entered.
  • Service data storage path switching After the consistency check of the relationship chain is successful, the migration state is entered, and the process of switching the service data storage path is performed.
  • Compute task migration After the service data storage path is completely switched from the original service cluster to the target service cluster, the migration state is entered and the process of calculating the task migration is performed.
  • Thawing calculation task After the completion of the task migration, the process of running all the computing tasks is executed. When all the computing tasks are running normally, the migration completion state is entered, thereby completing the migration of the business data and the computing tasks indicated by the relationship chain.
  • the migration server can provide a management interface in the foreground, and the terminal having the management authority can access the foreground of the migration server to display the management interface, and the management personnel can obtain the migration process by viewing the management interface.
  • the terminal having the management authority can access the foreground of the migration server to display the management interface, and the management personnel can obtain the migration process by viewing the management interface.
  • any terminal can log in to the migration server by using the administrator's username and password to obtain management rights and access the migration server foreground based on the obtained management rights.
  • the terminal connected to the target service cluster and connected to the target service cluster can obtain management rights and access the migration server foreground based on the obtained management rights.
  • the management interface may include the connection relationship of each node in the relationship chain, the migration status information of each node, the running status information of the computing task, the start migration time, the expected stop migration time, the migration status of the relationship chain, and the migration of the relationship chain. Progress and so on.
  • the management interface can also include one or more management options for managing the migration process.
  • the management interface may include a stop migration option and a continue migration option. When the administrator triggers the stop migration option, the migration server receives the stop migration instruction, and then suspends the current migration process until subsequent managers trigger the continue migration option. The migration server continues to migrate after receiving the Continue Migration command.
  • the method provided in this embodiment uses a relational chain representation of the service data and the calculation task of the association relationship according to the calculation task log in the original service cluster, so that the data migration is performed in the process of data migration in the relationship chain unit.
  • the relationship chain does not affect other relationship chains, and the computing tasks indicated by the relationship chain that is not migrated can still be run normally, so as not to affect the normal use of the services indicated by the relationship chain that has not been migrated.
  • both the computing task and the business data are migrated as nodes in the relationship chain, so that the computing tasks are not affected by the geographical location of the business data.
  • the relationship can be made large.
  • the small relationship chain can access the key service data flexibly regardless of whether it belongs to the original service cluster or the target service cluster, and realizes the decoupling between the related services and achieves the adoption.
  • Multiple small relationship chains gradually migrate complex businesses.
  • the service data indicated by the relationship chain is first migrated.
  • the calculation task can be performed in the gap where the calculation task stops running.
  • Migration greatly reducing the impact of data migration on the normal use of the business, and because the business data reaches the migration progress, the remaining business data volume can usually be completed in a short period of time, which can be less than the running cycle of the computing task, so The process of data migration does not affect the normal use of the business at all, and realizes user-unaware data migration.
  • the granularity of data migration is reduced, and the multiple migration sub-tasks can be run in parallel, which improves the migration efficiency of the service data indicated by the relationship chain.
  • the migration subtask needs to be re-migrated, and the service data of the entire service cluster does not need to be re-migrated, which reduces the cost of data errors during the migration process and improves the efficiency of data migration.
  • the business data and computing tasks in the original service cluster are gradually migrated to the target service cluster in units of multiple relationship chains by means of rounding to zero.
  • the service data and computing tasks in the original service cluster are performed.
  • the number of servers in the original service cluster can be removed and relocated to the target IDC room, so that server device resources can be reused, reducing the cost of data migration.
  • FIG. 3 is a block diagram of a data migration apparatus according to an embodiment of the present invention.
  • the apparatus includes a first acquisition unit 301 and a migration unit 302.
  • the first obtaining unit 301 is connected to the migration unit 302, and is configured to acquire a plurality of relationship chains according to the calculation task log of the original service cluster, where the calculation task log is used to record the relationship between the computing task and the service data in the original service cluster.
  • Each relationship chain is used to indicate a set of computing tasks and business data having an association relationship; and the migration unit 302 is configured to sequentially migrate the business data and the computing tasks indicated by the plurality of relationship chains to the target in a relationship chain unit.
  • a service cluster wherein, when the migration is based on any one of the relationship chains, the computing tasks indicated by the relationship chains that are not migrated in the plurality of relationship chains are normally run.
  • the first obtaining unit 301 is configured to add the same relationship chain identifier to the input and output records with the associated relationship according to the multiple input and output records recorded in the computing task log, so as not to have an association.
  • the input and output records of the relationship add different relationship chain identifiers; according to the association relationship between the computing task and the business data indicated by the input and output records identified by the same relationship chain, multiple relationship chains are generated, and each relationship chain includes a function for indicating calculation A task node of a task, a data node for indicating business data, and an association relationship between the task node and the data node.
  • the first obtaining unit 301 includes:
  • Generating a subunit configured to generate a plurality of first relationship chains according to an association relationship between the computing task and the service data indicated by the input and output records identified by the same relationship chain;
  • splitting unit is configured to split the second relationship chain into a plurality of third relationship chains if the plurality of first relationship chains include the second relationship chain, where the second relationship chain is data of the indicated service data The first relationship chain that exceeds the first threshold.
  • the splitting unit is configured to obtain weights of multiple data nodes in the second relationship chain, and the weight of each data node is used to indicate that the data node is in the second relationship chain.
  • the degree of association the higher the weight is the higher the degree of association;
  • the key data nodes are obtained from the plurality of data nodes according to the order of the weights from high to low and the positions of the plurality of data nodes on the second relationship chain
  • the key data node is the first data node in the sequence capable of splitting the second relationship chain into at least two third relationship chains; and based on the key data node, the second relationship chain and the key data
  • the plurality of task nodes associated with the node are split into a plurality of third relationship chains.
  • the disassembling unit is used to:
  • the critical data node, the task node, and the key data node are disconnected from the task node and have an association relationship with the task node.
  • the node is determined to be the third relationship chain; or,
  • Determining the key data node as a third relationship chain Determining the key data node as a third relationship chain, and each of the plurality of task nodes directly associated with the key data node is associated with the task node except the key data node The node is determined to be the third relationship chain; or,
  • the splitting unit is configured to, for each of the plurality of data nodes, a number of task nodes associated with the data node and data of the service data indicated by the data node The product of the quantity is determined as the weight of the data node.
  • the device further includes:
  • a second acquiring unit configured to acquire a target storage path of the key service data in the target service cluster, where the key service data is the service data indicated by the key data node;
  • the migration unit 302 is configured to:
  • the key service data is synchronized in the target service cluster and the original service cluster according to the target storage path and the original storage path;
  • the target storage path recorded according to the data path mapping table when running the computing task indicated by the third relationship chain Access the key business data;
  • the migration unit 302 includes:
  • Generating a subunit configured to generate, for each of the plurality of relationship chains, a plurality of migration subtasks according to the plurality of service data indicated by the relationship chain, where each migration subtask is used to indicate corresponding service data The original storage path and the target storage path;
  • a first migration subunit configured to migrate the service data indicated by the relationship chain to the target service cluster according to the multiple migration subtasks
  • a second migration subunit configured to migrate the computing task indicated by the relationship chain to the target service cluster
  • the computing task indicated by the relationship chain is in a stopped state when the computing task indicated by the relationship chain is migrated.
  • the first migration subunit is configured to:
  • the service data is divided into multiple sub-service data according to the second threshold, and a migration sub-task is generated corresponding to each sub-service data.
  • the amount of data of each sub-service data is less than the second threshold.
  • the first migration subunit is further configured to:
  • the device further includes:
  • a first checking unit configured to: for each of the multiple migration subtasks, after the traffic data corresponding to the migration subtask is all migrated to the target service cluster, the target service cluster and the original service If the consistency check succeeds, the service data corresponding to the migration subtask is successfully migrated. If the consistency check fails, the migration subtask is determined. The corresponding business data migration fails and the migration subtask is re-executed.
  • the device further includes:
  • a second check unit configured to perform consistency check on the service data indicated by the relationship chain in the target service cluster and the original service cluster; if the consistency check succeeds, perform calculation indicated by the relationship chain If the task is migrated to the target service cluster, if the consistency check fails, the service data that failed to be migrated is determined according to the consistency check result, and the service data that failed the migration is re-migrated.
  • the second migration subunit is configured to acquire first computing resource information and second computing resource information of the computing task, where the first computing resource information is used in the original service cluster.
  • the computing resource information configured by the task, the second computing resource information is computing resource information configured for the computing task in the target service cluster; the first computing resource information of the computing task is replaced with the second computing resource information.
  • the device further includes:
  • a switching unit configured to switch, in the data path mapping table, the original storage path of the service data in the original service cluster to a target storage path in the target service cluster.
  • the migration unit 302 is further configured to: when the migration based on any one of the relationship chains is detected, when the migration interruption operation of the relationship chain is detected, the migration subtask that does not complete the migration is recorded, and the pair is stopped.
  • the device further includes:
  • the relationship chain update unit is configured to obtain an updated calculation task log; and update the relationship chain that has not been migrated according to the updated calculation task log.
  • the device provided in this embodiment uses a relational chain to represent the service data and the computing task with the associated relationship according to the computing task log in the original service cluster, so that the data migration process is performed in the process of data migration in the relationship chain unit.
  • the relationship chain does not affect other relationship chains, and the computing tasks indicated by the relationship chain that is not migrated can still be run normally, so as not to affect the normal use of the services indicated by the relationship chain that has not been migrated.
  • FIG. 4 is a block diagram of a data migration apparatus according to an embodiment of the present invention.
  • device 400 can be provided as a server.
  • apparatus 400 includes a processing component 422 that further includes one or more processors, and memory resources represented by memory 432 for storing instructions executable by processing component 422, such as an application.
  • An application stored in memory 432 may include one or more modules each corresponding to a set of instructions.
  • the processing component 422 is configured to execute instructions to perform the method performed by the migration server in the data migration method embodiment described above.
  • Device 400 may also include a power supply component 426 configured to perform power management of device 400, a wired or wireless network interface 450 configured to connect device 400 to the network, and an input/output (I/O) interface 458.
  • Device 400 may operate based on an operating system stored in the memory 432, for example, Windows Server TM, Mac OS X TM , Unix TM, Linux TM, FreeBSD TM or the like.
  • the data migration apparatus can be used to perform the operations performed by the migration server in the above embodiment.
  • non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor to perform the data migration method of the above embodiments.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores at least one instruction loaded by a processor and executed to implement an operation performed by the migration server in the method of the foregoing embodiment.
  • the computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据迁移方法、迁移服务器及存储介质,属于网络技术领域。该方法包括:根据原服务集群的计算任务日志,获取多个关系链,计算任务日志用于记录原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;以关系链为单位,将多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;在基于任一个关系链进行迁移时,正常运行多个关系链中未进行迁移的关系链所指示的计算任务。通过将具有关联关系的业务数据和计算任务采用一个关系链表示,使得在进行数据迁移的过程中,仍可以正常运行未进行迁移的关系链所指示的计算任务,从而不会影响未进行迁移的关系链所指示业务的正常使用。

Description

数据迁移方法、迁移服务器及存储介质
本申请要求于2017年3月29日提交中国国家知识产权局、申请号为201710197702.7、发明名称为“数据迁移方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及网络技术领域,特别涉及一种数据迁移方法、迁移服务器及存储介质。
背景技术
随着网络技术发展,各种业务的业务数据量在不断地迅猛增长,能够达到PB(Petabyte,拍字节)级甚至以上的数量级,使得互联网和信息行业进入了大数据时代。在大数据时代,通常采用由大量服务器组成的服务集群来进行业务数据存储、业务处理和业务管理。在实际应用中,服务集群通常会部署在同一个IDC(Internet Data Center,数据中心)机房中。然而,随着业务数据的不断增长,服务集群的规模也在不断扩大,而IDC机房的规模是有限的,其可能存放不下该服务集群的所有服务器,从而限制了服务集群的规模,此时,为了满足业务数据增长的需求,可以将服务集群中的数据迁移到规模更大的新服务集群中。
现有技术中,服务集群在进行业务处理时,会为业务创建相应的计算任务并为该计算任务分配相应的计算资源,通过运行该计算任务来执行业务数据的处理过程。由于各种业务之间通常是相互关联的,为了避免在迁移一个业务的业务数据时对相关联的其他业务造成影响,通常会将服务集群的数据进行整体迁移,在整体迁移时,需要先停止所有的计算任务(即停止向所有业务提供服务)后,将所有业务数据迁移到新服务集群,然后,在新服务集群重新配置计算任务并分配相应的计算资源,之后启动重新配置的计算任务,从而为所有业务重新提供服务,从而完成数据迁移。
在实现本发明实施例的过程中,发明人发现相关技术至少存在以下问题:
由于服务集群中业务数据的数据量巨大,迁移过程通常需要花费几天、几个月或者更长的时间,如果这个时间内停止对所有业务提供服务,会导致所有业务都不能正常使用。
发明内容
本发明实施例提供了一种数据迁移方法、迁移服务器及存储介质,可以解决相关技术的问题。所述技术方案如下:
一方面,提供了一种数据迁移方法,所述方法包括:
根据原服务集群的计算任务日志,获取多个关系链,所述计算任务日志用于记录所述原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;
以关系链为单位,将所述多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;
其中,在基于任一个关系链进行迁移时,正常运行所述多个关系链中未进行迁移的关系链所指示的计算任务。
另一方面,提供了一种数据迁移装置,所述装置包括:
第一获取单元,用于根据原服务集群的计算任务日志,获取多个关系链,所述计算任务日志用于记录所述原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;
迁移单元,用于以关系链为单位,将所述多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;
其中,在基于任一个关系链进行迁移时,正常运行所述多个关系链中未进行迁移的关系链所指示的计算任务。
再一方面,提供了一种迁移服务器,所述迁移服务器包括:处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如下操作:
根据原服务集群的计算任务日志,获取多个关系链,所述计算任务日志用于记录所述原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;
以关系链为单位,将所述多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;
其中,在基于任一个关系链进行迁移时,正常运行所述多个关系链中未进行迁移的关系链所指示的计算任务。
再一方面,提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如迁移服务器所执行的方法中所执行的操作。
本发明实施例提供的技术方案带来的有益效果是:
通过根据原服务集群中的计算任务日志,将具有关联关系的业务数据和计算任务采用一个关系链表示,使得在以关系链为单位进行数据迁移的过程中,正在迁移的关系链不会对其他关系链产生影响,仍可以正常运行未进行迁移的关系链所指示的计算任务,从而不会影响未进行迁移的关系链所指示业务的正常使用。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A是本发明实施例提供的一种实施场景示意图;
图1B是本发明实施例提供的一种迁移服务器的架构图;
图2A是本发明实施例提供的一种数据迁移方法的流程图;
图2B是本发明实施例提供的一种关系链示意图;
图2C是本发明实施例提供的一种关系链拆分示意图;
图2D是本发明实施例提供的一种关系链拆分示意图;
图2E是本发明实施例提供的一种关系链拆分示意图;
图2F是本发明实施例提供的一种经过拆分得到的关系链对关键业务数据访问的示意图;
图2G是本发明实施例提供的一种双写表机制涉及流程的示意图;
图2H是本发明实施例提供的一种关系链迁移过程中的迁移状态示意图;
图3是本发明实施例提供的一种数据迁移装置的框图;
图4是本发明实施例提供的一种数据迁移装置的框图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1A是本发明实施例提供的数据迁移的实施场景示意图,参见图1A,该实施场景中包括原服务集群、目标服务集群和迁移服务器。
其中,原服务集群为需要迁移业务数据的服务集群,目标服务集群为业务数据迁移到的服务集群。服务集群可以包括多个存储集群和多个计算集群,存储集群用于存储业务数据,计算集群用于运行计算任务以及存储计算任务的相关数据,比如计算任务的计算资源大小以及计算资源的位置等。其中,存储集群和计算集群可以分别部署在不同的服务器上,也可以部署在相同的服务器上,本实施例对此不作限定。
需要说明的是,服务集群在进行业务处理时,会为业务创建相应的计算任务并为该计算任务分配相应的计算资源,通过运行该计算任务来执行一个或多个业务处理过程,比如,从服务集群中读取某个业务数据,对该业务数据进行处理后将输出的另一个业务数据写入服务集群等。其中,计算任务具有一定的运行周期性,其运行周期可以为几小时、几天、几周或者几个月等,例如运行周期为1小时的计算任务,每隔一小时运行一次。其中,不同计算任务的运行周期可以相同也可以不同,由计算任务的类型和业务数据的处理速度有关,本实施例对此不作限定。
此外,服务集群中还维护有数据路径映射表,该数据路径映射表用于业务数据标识和业务数据的存储路径之间的对应关系。计算任务可以通过服务集群中的数据路径映射表,确定读取或写入的业务数据的存储路径,从而根据获取的存储路径完成业务数据的读取或写入过程。其中,一个计算任务读取的业务数据可以是由其他计算任务写入的,而一个计算任务写入的业务数据又可以被其他计算任务读取,这样,计算任务和业务数据之间便具有了一定输入输出关系。
其中,迁移服务器用于对服务集群的数据进行迁移,并对数据迁移过程进 行管理,该迁移服务器可以部署在原服务集群中,也可以部署在目标服务集群中,当然,也可以部署在原服务集群和目标服务集群以外的能够与二者进行通信的其他服务器上。本实施例中,迁移服务器需要将原服务集群中的数据迁移到目标服务集群,迁移的数据涉及原服务集群中的业务数据以及计算任务。
具体地,该迁移服务器可以包括多个模块,在数据迁移过程中每个模块起到不同的作用。图1B为迁移服务器的架构图,该迁移服务器包括多个功能模块,下面对各个功能模块的作用进行介绍:
其中,分析模块用于执行下述步骤201至203所指示的根据计算任务日志获取多个关系链的过程;拆分模块用于执行下述步骤204所指示的关系链拆分的过程;校验模块用于执行下述步骤206中对迁移子任务和关系链的一致性校验的过程。
其中,迁移模块用于执行下述步骤205至208中涉及业务数据迁移和计算任务迁移的过程,其中,在将关系链所指示的业务数据完成迁移后,迁移模块执行数据路径映射表的存储路径切换过程,该过程对应步骤207。其中,计算任务的迁移过程是指对计算任务配置信息的切换过程,计算任务的配置信息可以从配置库中获取,该过程对应步骤208。其中,如果迁移的关系链为经过拆分得到的关系链则需要对关键业务数据进行同步,该过程对应步骤206下的步骤a。
其中,数据路径映射表同步是指将迁移至目标服务集群中的业务数据的目标存储路径添加到路径映射表中。
其中,迁移服务器前台可以用于对关系链的迁移过程进行管理,比如可以展示关系链的各种信息,关系链中各节点的连接关系、关系链在迁移过程中所处的迁移状态、关系链中的节点信息以及计算任务的运行状态信息等,其中关系链的节点信息包括关系链中所有数据节点所指示的存储路径和任务节点所指示的计算任务标识,其中数据节点和任务节点的相关解释参见步骤203所示的内容。用户可以通过该迁移服务器前台启动或暂停对关系链的迁移过程。
其中,配置库用于存储计算任务的计算资源的配置信息,比如计算资源的大小以及位置信息,该配置库中还可以存储业务数据在原服务集群的原存储路径以及目标服务集群的目标存储路径。任务关系链库用于存储由分析模块生成的多个关系链。迁移任务库用于存储迁移子任务的信息,比如迁移子任务的编号、所指示业务数据、业务数据的原存储路径和目标存储路径、业务数据的数 据量大小等信息。
本实施例应用于业务数据迁移的场景下,例如原服务集群的所有者将业务数据开放给新所有者,导致业务数据需要迁移至新所有者的目标服务集群中,或者原服务集群的规模已不能满足业务需要,但由于原服务集群受到所在机房面积的限制而无法再扩展规模,需要在更大面积的机房内部署目标服务集群,将业务数据迁移至目标服务集群中。
在一种实施方式中,原服务集群可以均放置在IDC1机房中,目标服务集群可以均放置在IDC2机房中,IDC1机房和IDC2机房所在的地理位置不同。其中目标服务集群的规模大于原服务集群的规模,相应的IDC2机房可容纳服务器的数目大于IDC1机房可容纳服务器的数目。当然,在另一实施方式中,原服务集群或目标服务集群均可以放置在不同的IDC机房中,本实施例对此不作限定。
图2A是本发明实施例提供的一种数据迁移方法的流程图,参见图2A,本发明实施例提供的方法流程包括:
201、获取原服务集群的计算任务日志。
原服务集群的计算任务在运行的过程中,均会生成计算任务日志,该计算任务日志用于记录原服务集群中计算任务与业务数据的关联关系。例如,该计算任务日志包括计算任务的任务标识、业务数据的存储路径、计算任务与业务数据的输入输出关系以及其他信息,该计算任务与业务数据的输入输出关系用来指示计算任务的输入业务数据和输出业务数据,该其他信息可以包括计算任务所属的业务信息,比如业务标识、业务所属的用户信息等。其中,业务数据的存储路径可以用来指示业务数据,相同的存储路径用于指示相同的业务数据,计算任务通过业务数据的存储路径来访问该业务数据。
迁移服务器可以从原服务集群获取该计算任务日志,并从该计算任务日志中提取出多条输入输出记录。其中,输入输出记录用于指示计算任务的任务标识、业务数据的存储路径以及计算任务和业务数据的输入输出关系,如表1所示为一种输入输出记录表。
表1
Figure PCTCN2018078398-appb-000001
Figure PCTCN2018078398-appb-000002
本实施例中,迁移服务器可以对从计算任务日志中提取出的多条输入输出记录进行分析,以获取用于指示计算任务和业务数据关联关系的多个关系链,该获取多个关系链的过程详见下述步骤202至204。
202、根据计算任务日志所记录的多条输入输出记录,为具有关联关系的输入输出记录添加相同的关系链标识,为不具有关联关系的输入输出记录添加不同的关系链标识。
本实施例中,迁移服务器为具有关联关系的输入输出记录添加相同的关系链标识,为不具有关联关系的输入输出记录添加不同的关系链标识的过程可以为:对多条输入输出记录中的每条输入输出记录进行遍历,对于当前遍历的第一输入输出记录,如果已遍历的输入输出记录中包括与第一输入输出记录之间具有关联关系的第二输入输出记录,则为第一输入输出记录添加与第二输入输出记录相同的关系链标识;如果已遍历的输入输出记录中不包括与第一输入输出记录之间具有关联关系的第二输入输出记录,则为第一输入输出记录添加与已遍历的输入输出记录不同的关系链标识。
其中,第一输入输出记录与第二输入输出记录之间具有关联关系是指,第一输入输出记录所指示的计算任务与第二输入输出记录所指示的业务数据具有输入输出关系,或者,第一输入输出记录所指示的业务数据与第二输入输出记录所指示的计算任务具有输入输出关系。
例如,以表1为例,对表1的每一条输入输出记录进行遍历,当遍历第一条输入输出记录时,为该输入输出记录添加一个关系链标识1001,假设当前遍历的输入输出记录为第二条输入输出记录“任务2,IN,存储路径1”,则由于任务2和已遍历的第一条输入输出记录中“存储路径1”具有输入关系,则确定已遍历的输入输出记录中包括与当前遍历的输入输出记录具有关联关系的 输入输出记录,则为该第二条输入输出记录添加与第一条输入输出记录相同的关系链标识1001。假设当前遍历的输入输出记录为表1中最后一条输入输出记录“任务5,IN,存储路径5”,由于“任务5”和已遍历的所有输入输出记录所指示业务数据之间并没有输入输出关系,而且“存储路径5”和已遍历的所有输入输出记录所指示计算任务之间并没有输入输出关系,因此,已遍历的输入输出记录中不包括第二输入输出记录,因此,为该当前遍历的最后一条输入输出记录添加与已遍历输入输出记录不同的关系链标识,例如该不同的关系链标识可以1002等。在为所有输入输出记录添加关系链标识后,可以得到如表2所示的关系链表。
表2
Figure PCTCN2018078398-appb-000003
203、按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个第一关系链。
在本实施例中,迁移服务器可以将具有相同关系链标识的输入输出记录,抽象成一个第一关系链,第一关系链中包括用于指示计算任务的任务节点、用于指示业务数据的数据节点以及任务节点与数据节点之间的输入输出关系。其中,在第一关系链中任务节点包括计算任务的任务标识,数据节点包括业务数据的存储路径。
以表2为例,根据具有相同关系链标识1001的输入输出记录,生成的第一关系链如图2B,图2B中示出了关系链标识1001对应输入输出记录所指示的业务数据和计算任务之间的关联关系,该第一关系链包括任务1至4所对应的任务节点1-4、存储路径1至4所对应的数据节点1-4以及业务节点与数据 节点之间的输入输出关系。其中,由任务节点1指向数据节点1的连线用于指示计算任务1向存储路径1写入业务数据,即该业务数据为计算任务1的输出数据。由数据节点1指向任务节点2的连线用于指示计算任务2从存储路径2读取业务数据,即该业务数据为计算任务2的输入数据。
本实施例中,任一第一关系链所指示的业务数据或计算任务,与其他第一关系链所指示的计算任务或业务数据之间不具有关联关系。因此,可以以第一关系链为单位,对原服务集群中的业务数据和计算任务进行迁移,在对一个关系链所指示的业务数据和计算任务进行迁移时,不会影响其他第一关系链所指示计算任务的正常运行。
考虑到数据迁移的时间会受到迁移的数据量和网络带宽的双重约束,而通常网络带宽是有限的,为了保证能够在较短时间内完成一个关系链所指示数据的迁移,从而进一步降低迁移过程对业务正常使用的影响,本实施例可以进一步地将对应数据量较大的第一关系链进行拆分,详细过程参见步骤204。
204、如果多个第一关系链中包括第二关系链,则将第二关系链拆分为多个第三关系链,第二关系链为所指示业务数据的数据量超过第一阈值的第一关系链。
其中,第一阈值可以由迁移服务器根据关系链的预设迁移时间和网络带宽设定,例如,假设网络带宽为2GB/s(吉字节每秒),预设迁移时间为2分钟,则第一阈值最大为120GB,当然第一阈值也可以小于该120GB,以避免由于网络环境不稳定对网络带宽造成影响。其中,预设迁移时间可以由迁移服务器进行预先设定,或者根据用户的业务需求进行设定等,本实施例对此不作限定。
对于该多个第一关系链中的每个第一关系链,迁移服务器可以根据该第一关系链中数据节点所指示的存储路径,获取该第一关系链所指示业务数据的数据量。如果该第一关系链所指示业务数据超过该第一阈值,则确定该第一关系链为第二关系链,并确定需要对该第二关系链进行拆分,该拆分的过程可以包括以下步骤204a至204c:
步骤204a、获取第二关系链中多个数据节点的权值。
其中,每个数据节点的权值用于指示数据节点在第二关系链中的关联程度,权值越高,数据节点的被关联程度越高。
该多个数据节点的权值的获取过程可以为:对于多个数据节点中的每个数据节点,将与数据节点相关联的任务节点的数目和数据节点所指示业务数据的 数据量的乘积,确定为数据节点的权值。
以图2B所示的第一关系链为例,其中,与数据节点1相关联的任务节点包括任务节点1至4,任务节点的数目为4,假设该数据节点1所指示业务数据的数据量为100GB,则该数据节点1的权值为4*100等于400。
需要说明的是,由于第一关系链拆分的目的是将数据量较大的关系链拆分为数据量较小的关系链,而对于关系链中的任一数据节点,如果与该数据节点相关联的任务节点越多,则表明基于该数据节点能够拆分得到的关系链的数目越多,这样便使得拆分得到的每个关系链所指示业务数据的数据量比较均衡,不会导致某个关系链的数据量过大,因此,在确定数据节点的权值时需要考虑与数据节点相关联任务节点的数目和数据节点所指示业务数据的数据量这两个因素。
步骤204b、按照权值从高到低的顺序和多个数据节点在第二关系链上的位置,从多个数据节点中获取关键数据节点,该关键数据节点为顺序中第一个能够将第二关系链拆分为至少两个第三关系链的数据节点。
在本实施例中,为了提高关系链拆分的效率和成功率,迁移服务器按照权值从高到低的顺序,对每个数据节点进行分析,比如,迁移服务器基于该数据节点对第二关系链进行预拆分,确定能够将第二关系链拆分得到的第三关系链的数目,如果拆分得到的第三关系链的数目小于2,则按照权值从高到低的排列顺序,对下一个数据节点执行预拆分的过程;如果拆分得到的第三关系链的数目不小于2,则将该数据节点确定为关键数据节点,基于该关键数据节点对第二关系链进行拆分。当从多个数据节点中获取到关键数据节点后,迁移服务器不再对上述排列顺序中该关键数据节点之后的数据节点执行预拆分的过程。
其中,在根据数据节点对第二关系链进行预拆分时,确定能够拆分得到的第三关系链的数目的方法可以为:断开该数据节点和与该数据节点相关联的任务节点之间的关联关系,之后,确定第二关系链中除该数据节点之外的节点(包括任务节点和数据节点)之间的连通性,如果除该数据节点之外的节点之间是连通的,则确定能够拆分得到的第三关系链的数目为1(即小于2),否则,确定能够拆分得到的第三关系链的数目不小于2。
其中,确定第二关系链中除该数据节点之外的节点之间的连通性的过程可以为:对除该数据节点之外的节点进行遍历,例如可以任选一个节点为起点进行遍历,如果每个节点都能遍历到,则确定除该数据节点之外的节点是连通的, 否则,确定除该数据节点之外的节点是不连通的。
需要说明的是,上述对第二关系链进行预拆分的过程不是对第二关系链进行实际拆分的过程,而是迁移服务器假设基于相应数据节点能够将第二关系链拆分成多少个第三关系链的分析过程。
步骤204c、基于该关键数据节点,将第二关系链中与该关键数据节点相关联的多个任务节点拆分至多个第三关系链中。
本实施例中,迁移服务器基于该关键数据节点,将第二关系链拆分为多个第三关系链的过程可以分为以下三种情况:
第一种情况、对于与该关键数据节点直接关联的多个任务节点中的每个任务节点,将该关键数据节点、该任务节点以及该关键数据节点与该任务节点断开连接关系时与该任务节点具有关联关系的节点确定为第三关系链。
在该种情况下,每一个第三关系链中均包括该关键数据节点。仍假设图2B所示的关系链中关键数据节点为数据节点1,如图2C所示为该种情况下基于数据节点1,对图2B所示的关系链进行拆分得到的多个第三关系链的示意图。
第二种情况、将该关键数据节点确定为一个第三关系链,对于与该关键数据节点相关联的多个任务节点中的每个任务节点,将除该关键数据节点之外的且与该任务节点具有关联关系的节点作为一个第三关系链。
该种情况下,关键数据节点单独作为一个第三关系链。例如,首先将该关键数据节点从该第二关系链中拆分出来,作为一个第三关系链。在剩余的节点中,对于与关键数据节点相关联的多个任务节点中每个任务节点,以该任务节点为起点进行遍历,将能够遍历到的所有节点确定为与该任务节点具有关联关系的节点。假设图2B所示的关系链中关键数据节点为数据节点1,如图2D所示为该种情况下基于数据节点1,对图2B所示的关系链进行拆分得到多个第三关系链的示意图。需要说明的是,图2B仅作为示例示出,并不代表实际的拆分结果,比如,在实际拆分过程中除了关键数据节点之外的第三关系链中应当包括多个节点,而不会只包括一个任务节点。
第三种情况、将该关键数据节点、与该关键数据节点直接关联的至少一个任务节点以及与该至少一个任务节点具有关联关系的节点拆分为一个第三关系链,将除已拆分的第三关系链之外的任务节点和数据节点拆分为至少一个第三关系链。
其中,与该关键数据节点直接关联的任务节点是指作为该关键业务数据的 子节点或父节点的任务节点。该种情况下,关键数据节点与其直接关联的至少一个任务节点拆分为一个第三关系链。其中,将除已拆分的第三关系链之外的任务节点和数据节点拆分为至少一个第三关系链的过程,与第二种情况下将除该关键数据节点之外的且与该任务节点具有关联关系的节点作为一个第三关系链的过程同理,在此不做赘述。例如,仍假设图2B所示的关系链中关键数据节点为数据节点1,如图2E所示为该种情况下基于数据节点1,对图2B所示的关系链进行拆分得到的多个第三关系链的示意图。
第一种情况,可以在迁移第三关系链时,如果检测到该关系链中包括关键数据节点,则将该关键数据节点的目标存储路径写入数据路径映射表。第二种情况和第三种情况:可以在拆分之后,先将关键业务数据复制到目标服务集群。
需要说明的是,在将第二关系链拆分为多个第三关系链的过程中,迁移服务器会为该多个第三关系链添加不同的关系链标识。上述三种情况将第二关系链在形式上拆分为多个第三关系链,为了将由拆分得到的第三关系链与没有进行拆分的第一关系链进行区分,可以为该第三关系链添加拆分标识,该拆分标识可以体现在关系链标识中,比如将关系链标识中的前两位作为拆分标识。例如,关系链标识的格式可以为xx_yyyy,其中xx用于表示拆分标识,比如00表示未拆分的第一关系链,01表示由拆分得到的第三关系链。其中,yyyy用于表示关系链的编号。
本实施例中,以关系链进行数据迁移的过程中,计算任务仍可以运行,在运行的过程中会产生新业务数据,由于受到网络带宽的限制,当关系链过大时,很可能导致产生的新业务数据的速度大于业务数据的迁移速度,这样会导致该关系链会永远无法迁移完成,因此将大关系链拆分为小关系链可以保证正常运行计算任务的情况下实现对关系链所指示业务数据的迁移。
上述步骤203和步骤204是按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个关系链的过程,每个关系链包括用于指示计算任务的任务节点、用于指示业务数据的数据节点以及任务节点和数据节点之间的关联关系。
上述步骤202至204为根据原服务集群的计算任务日志,获取多个关系链的步骤。其中,每个关系链用于指示具有关联关系的一组计算任务和业务数据。
在本实施例中,迁移服务器能够以关系链为单位,将多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群。其中,依次迁移是指可以一次 性仅针对一个关系链进行数据迁移,也可以针对几个关系链进行并行迁移。在基于任一个关系链进行迁移时,正常运行多个关系链中未进行迁移的关系链所指示的计算任务。其中,一个关系链的迁移过程包括下述步骤205至208。
205、对于多个关系链中的每个关系链,根据该关系链所指示的多个业务数据,生成多个迁移子任务。
本实施例中,在针对一个关系链进行数据迁移的过程中,可以根据该关系链所指示的多个业务数据,生成多个迁移子任务,该生成多个迁移子任务的过程可以为:对于关系链所指示的多个业务数据中的每个业务数据,执行以下过程:判断业务数据的数据量是否小于第二阈值;如果业务数据的数据量小于第二阈值,则对应业务数据生成一个迁移子任务;如果业务数据的数据量不小于第二阈值,则根据第二阈值,按照数据产生的时间顺序将业务数据划分为多个子业务数据,对应每个子业务数据生成一个迁移子任务。其中,每个子业务数据的数据量小于第二阈值。其中,第二阈值可以由迁移服务器进行预先设置或更改,本实施例对此不作限定。业务数据在存储至服务集群时,服务集群会对应记录该业务数据的存储时间,迁移服务器可以根据记录的存储时间确定该业务数据的产生时间。其中,迁移服务器可以为每个迁移子任务添加配置信息,该配置信息可以包括相应业务数据的原存储路径和目标存储路径。
需要说明的是,本实施例所示的一个业务数据是指一个存储路径下所存储的业务数据,当业务数据的数据量小于第二阈值时,对应该业务数据生成的迁移子任务便用于对一个存储路径下所存储的业务数据进行迁移。
206、根据多个迁移子任务,将该关系链所指示的业务数据迁移到目标服务集群。
迁移服务器可以根据该原存储路径和该目标存储路径将该业务数据迁移至目标服务集群。一个关系链所对应的多个子任务可以顺序执行也可以并行执行,本实施例对此不作限定。
通过将关系链所指示的业务数据采用不同的迁移子任务进行迁移,降低了数据迁移的粒度,而且该多个迁移子任务可以并行运行,提高了该关系链所指示业务数据的迁移效率。
需要说明的是,在对关系链所指示的业务数据进行迁移的过程中,该关系链所指示的计算任务还可以继续运行,因此,该关系链中存储路径下所存储的业务数据可能会发生更新。对于一个存储路径下所存储的业务数据,本实施例 将生成关系链之前所存储的业务数据称为历史业务数据,将生成关系链之后更新的业务数据称之为新业务数据。考虑到用户对历史业务数据修改的可能性小于对新业务数据修改的可能性,因此,在执行每个迁移子任务时,可以按照业务数据生成时间从先到后的顺序对某个存储路径下的业务数据进行迁移,也即是优先对历史业务数据进行迁移,以避免由于用户对业务数据进行更改时需要对业务数据进行重传,从而降低迁移效率的问题。
此外,本实施例还提供了对迁移子任务的数据校验机制,该数据校验过程可以为:对于多个迁移子任务中的每个迁移子任务,在该迁移子任务对应的业务数据全部迁移到目标服务集群之后,对目标服务集群和原服务集群中与该迁移子任务对应的业务数据进行一致性校验;如果一致性校验成功,则确定该迁移子任务对应的业务数据迁移成功;如果一致性校验失败,则确定该迁移子任务对应的业务数据迁移失败,重新执行该迁移子任务。需要说明的是,每个迁移子任务的配置信息还可以包括相应业务数据的数据量大小,当执行该迁移子任务时,迁移服务器如果检测到迁移至目标服务集群的业务数据的数据量达到迁移子任务所指示的数据量大小时,确定该迁移子任务对应的业务数据已经全部迁移至目标服务集群。
其中,对迁移子任务对应的业务数据进行一致性校验的范围包括:对业务数据的数据量的校验、对业务数据包含的文件数的校验以及对业务数据的数据内容的校验。迁移服务器可以采用预设算法对该迁移子任务对应的业务数据进行一致性校验,该预设算法可以进行预先设置,比如,该预设算法可以为CRC(Cyclic Redundancy Check,循环冗余校验码)校验算法。当该业务数据在原服务集群和目标服务集群中的数据量、包含的文件数目以及数据内容均一致时,确定对该业务数据的一致性校验成功。
其中,重新执行该迁移子任务的时机可以为在确定对相应业务数据一致性校验失败后立即执行,也可以在确定相应业务数据一致性校验失败后预设时间段后执行,还可以在该关系链对应的其他迁移子任务完成之后重新执行迁移失败的迁移子任务,本实施例对此不作限定。
通过针对迁移子任务进行数据校验,实现了业务数据的细粒度校验,使得当业务数据迁移出错时,可以在迁移子任务的粒度上进行数据的重新迁移,相比于现有技术中当业务数据迁移出错时需要对所有业务数据重新进行迁移的情况,降低了迁移过程中数据出错的代价,提高了数据迁移的效率。
为了在最大程度上降低数据迁移对正常使用业务的影响,本实施例中,对于正在迁移的关系链,并不是在迁移该关系链的整个过程中停止运行相关的计算任务,而是在业务数据迁移到一定进度之后,在计算任务停止运行的周期内对计算任务进行迁移,以最大程度上缩小计算任务停止运行的时间。在迁移该关系链所指示的业务数据的过程中,还可以执行下述步骤206a至步骤206d。
步骤206a、在迁移该关系链所指示的业务数据的过程中,获取该关系链所指示的业务数据的迁移进度。
迁移服务器可以根据该关系链所指示的业务数据的总数据量和该关系链业务数据的已迁移数据量,获取该关系链所指示业务数据的迁移进度。该迁移进度可以以已迁移数据量与总数据量之间的比例来表示,例如,该关系链所指示业务数据的总数据量为100GB,已迁移数据量为60GB,则可以确定该关系链所指示业务数据的迁移进度为60%。
步骤206b、当业务数据的迁移进度超过预设进度时,对于该关系链所指示的每个计算任务,判断该计算任务是否处于停止运行状态,如果该计算任务处于停止运行状态,执行步骤206c,如果该计算任务处于运行状态,执行步骤206d。
其中,预设进度可以由迁移服务器进行预先设置或修改,当然该预设进度还可以由迁移服务器根据网络带宽进行动态调整,例如,当迁移服务器检测到网络带宽降低时,可以适当增大该预设进度的数值,以最大可能的降低迁移计算任务所话费的时间。
步骤206c、如果该计算任务处于停止运行状态,则在该关系链完成迁移之前维持该计算任务的停止运行状态。
步骤206d、如果计算任务处于运行状态,则等待计算任务停止运行后、该关系链完成迁移之前维持该计算任务的停止运行状态。
需要说明的是,步骤206c和步骤206d中维持计算任务的停止运行状态过程可以称为冻结计算任务过程。为了避免由于冻结计算任务对企业用户的业务造成影响,在冻结计算任务之前可以通过迁移服务器向企业用户显示冻结计算任务的消息,由企业用户确认冻结之后,再执行冻结计算任务的过程。
在数据迁移过程中,由于计算任务仍然在运行,因此,在一个迁移子任务所对应的业务数据迁移完成之后,该业务数据还有可能发生更改,例如发生修改,或者被删除等。因此,为了保证业务数据的完整性,在关系链所指示的业 务数据迁移完成之后,迁移服务器还可以以关系链为单位,对该关系链所指示的业务数据进行一致性校验,该过程可以为:对目标服务集群和原服务集群中该关系链所指示的业务数据进行一致性校验;如果一致性校验成功,则执行后续步骤207和步骤208;如果一致性校验失败,则根据一致性校验结果,确定该关系链所指示的迁移失败的业务数据,对迁移失败的业务数据重新进行迁移。其中,对关系链所指示的业务数据进行一致性校验时,可以是针对每个迁移子任务一一进行业务数据的一致性校验,也可以是针对关系链中每个存储路径一一进行业务数据的一致性校验,而对于一致性校验失败的迁移子任务或者存储路径,则确定该迁移子任务或者存储路径所对应的业务数据为迁移失败的业务数据。迁移服务器可以采用对应的原迁移子任务或重新建立迁移子任务,以对该迁移失败的业务数据重新进行迁移,具体迁移过程与上述根据迁移子任务进行数据迁移的过程同理,在此不做赘述。
需要说明的是,上述步骤205至206以一个关系链为例介绍了迁移关系链所指示业务数据的过程。在以关系链进行数据迁移的过程中,迁移服务器可以根据关系链标识来区分所迁移的关系链是未经过拆分的第一关系链,还是经过拆分得到的第三关系链。或者,对于步骤204c中关系链拆分的一种情况,由于拆分得到的每个第三关系链均包括关键数据节点,迁移服务器可以对每个第三关系链中的关键数据节点添加指定标识,通过该指定标识来识别迁移的关系链中是否包括关键数据节点,从而确定迁移的关系链是否为第三关系链。
需要说明的是,由于经过拆分得到的多个第三关系链仍然共享关键数据节点所指示的业务数据,为了在根据第三关系链进行数据迁移的过程中,能够保证该多个第三关系链共享的关键数据节点所指示的关键业务数据同步,本实施例采用了双写表机制,在数据路径映射表中存储该关键业务数据的两个存储路径,一个是在目标服务集群的目标存储路径,另一个是在原服务集群的原存储路径,该过程可以为:获取关键业务数据在目标服务集群中的目标存储路径,关键业务数据为关键数据节点所指示的业务数据;在数据路径映射表中添加该目标存储路径,且保留关键业务数据在原服务集群中的原存储路径。其中,向数据路径映射表中添加目标存储路径的过程可以在关系链拆分之后执行,也可以在该多个第三关系链迁移之前执行,本实施例对此不作限定。
在以关系链为单位进行数据迁移的过程中,如果迁移的关系链为经过拆分得到的第三关系链,则基于双写表机制,对该第三关系链迁移的过程中还包括 以下步骤a至步骤c:
步骤a、根据目标存储路径和原存储路径,在目标服务集群和原服务集群中同步关键业务数据。
迁移服务器在检测到原服务集群或者目标服务集群的关键业务数据发生更新时,根据该目标存储路径和原存储路径,在目标服务集群和原服务集群中对该关键业务数据进行同步。
步骤b、如果该第三关系链所指示的业务数据和计算任务已全部迁移至目标服务集群,则在运行第三关系链所指示的计算任务时,根据数据路径映射表记录的目标存储路径访问关键业务数据。
步骤c、如果该第三关系链指示的业务数据和计算任务未全部迁移至目标服务集群,则在运行第三关系链所指示的计算任务时,根据数据路径映射表记录的原存储路径访问关键业务数据。
本实施例中,可以根据第三关系链所指示的数据所在服务集群的标识,从数据路径映射表中获取关键业务数据在相应服务集群的存储路径。例如,如果第三关系链所指示的数据在原服务集群中,也即是,该第三关系链所指示的数据还未成功迁移至目标服务集群,则运行该第三关系链所指示的计算任务时,从数据路径映射表中获取关键业务数据的原存储路径,通过该原存储路径访问该关键业务数据。如果第三关系链所指示的数据在目标服务集群中,也即是,该第三关系链所指示的数据已成功迁移至目标服务集群,则运行该第三关系链所指示的计算任务时,从数据路径映射表中获取关键业务数据的目标存储路径,通过该目标存储路径访问该关键业务数据。如图2F为基于图2B所示关系链进行拆分之后,在对拆分得到的第三关系链进行迁移过程中,关键业务数据的访问示意图,其中,数据节点1对应关键业务数据,任务节点1所在第三关系链已迁移至目标服务集群,任务节点2至4所在第三关系链还未迁移至目标服务集群中。任务节点1所指示的计算任务通过关键业务数据的目标存储路径访问该关键业务数据,任务节点2至4通过关键业务数据的原存储路径访问该关键业务数据。
结合上述采用双写表机制对拆分得到的第三关系链的迁移过程,下面对双写表机制所涉及的流程进行介绍,参见图2G,双写表机制涉及的流程包括下述过程(1)至(4):
(1)获取关键数据节点。
该过程对应在第二关系链中获取关键数据节点的过程。
(2)关键业务数据同步。
在原服务集群和目标服务集群中对关键业务数据进行同步。该过程基于在数据路径映射表中存储的关键业务数据的原存储路径和目标存储路径。
(3)关键业务数据存储路径的智能路由。
根据第三关系链所处的服务集群的位置,从数据路径映射表中获取关键业务数据在相应服务集群中的存储路径。对应上述步骤b和步骤c。
(4)逐渐解除第三关系链对关键业务数据的原存储路径的依赖关系。
当第三关系链所指示的业务数据迁移至目标服务集群后,该第三关系链所指示的计算任务便可以在目标服务集群访问关键业务数据,也即是,解除了该第三关系链与关键业务数据的原存储路径的依赖关系。
在本实施例中,在对关系链所指示的数据的迁移还包括对业务的源头数据的迁移,该源头数据包括用户在用户终端输入的数据,以及用户终端实时产生还未同步到原服务集群的数据。在实际应用中,该源头数据一般由计算任务使用。具体地,针对一个关系链,可以通过指定接口从实时数据处理服务器获取该源头数据,并将该源头数据和关系链所指示的业务数据一起迁移到目标服务集群中,以便不影响计算任务的正常运行。
207、在数据路径映射表中,将该关系链所指示的业务数据在原服务集群中的原存储路径切换为在目标服务集群中的目标存储路径。
在本实施例中,在对关系链进行迁移的过程中,迁移服务器可以记录该关系链所指示的每个业务数据的目标存储路径,待该关系链所指示的业务数据全部迁移至目标服务集群时,对于每个业务数据,迁移服务器可以在数据路径映射表中,将该业务数据的原存储路径替换为该业务数据的目标存储路径。
需要说明的是,如果该业务数据为关键业务数据,迁移服务器在确定与该关键业务数据相关的所有第三关系链所对应的业务数据均迁移至目标服务集群时,从该数据路径映射表中删除该关键业务数据的原存储路径,保留该关键业务数据的目标存储路径。
208、将关系链所指示的计算任务迁移至目标服务集群。
本实施例中,将关系链所指示的计算任务迁移至目标服务集群的过程可以为:获取计算任务的第一计算资源信息和第二计算资源信息,将计算任务的第一计算资源信息替换为第二计算资源信息。其中,第一计算资源信息为在原服 务集群中为计算任务配置的计算资源信息,第二计算资源信息为在目标服务集群中为计算任务配置的计算资源信息。
需要说明的是,在将关系链所指示的计算任务迁移至目标服务集群之后,迁移服务器启动运行该关系链所指示的所有计算任务,从而完成对该关系链的迁移。
此外,本实施例在进行数据迁移的过程中,还可以实现数据的增量迁移,该增量迁移包括以下两个层面:
第一个层面、对迁移过程中新增的数据进行迁移。
在以关系链为单位进行数据迁移的过程中,原服务集群中的大量计算任务仍然在运行,使得原服务集群在生成多个关系链之后,会产生大量的新业务数据,或者是原服务集群中新增了计算任务,这些新增的数据都可以通过计算任务日志中新增输入输出记录体现。迁移服务器可以在根据计算任务日志,获取多个关系链之后,记录该计算任务日志中产生时间最晚的输入输出记录的时间标签。迁移服务器可以根据该记录的时间标签,从原服务集群计算任务日志中获取该时间标签之后产生的新增输入输出记录。
迁移服务器可以根据该新增的输入输出记录,对未进行迁移的关系链进行更新,该过程可以为:对于任一条新增输入输出记录,如果未进行迁移的关系链中包括与该新增输入输出记录相关联的第四关系链,则根据该新增输入输出记录对该第四关系链进行更新;如果未进行迁移的关系链中不包括该第四关系链,则根据该新增输入输出记录与其他新增输入输出记录之间的关联关系,生成新的关系链,该生成新关系链的过程与上述生成多个关系链的过程同理,在此不做赘述。其中,与该新增输入输出记录相关联的第四关系链是指,该第四关系链所指示的业务数据与该新增输入输出记录所指示的计算任务具有关联关系,或者为该第四关系链所指示的计算任务与该新增输入输出记录所指示的业务数据具有关联关系。
需要说明的是,迁移服务器根据该新增的输入输出记录,对未进行迁移的关系链进行更新的步骤可以在关系链正在迁移的过程中执行,也可以某个关系链迁移完成之后执行,本实施例对此不作限定。迁移服务器可以周期性的获取计算任务日志中的新增输入输出记录,以周期性地对未进行迁移的关系链进行更新。
本实施例中,在迁移关系链所指示数据的过程中,原服务集群中产生的新 计算任务可能与该迁移的关系链所指示业务数据之间具有关联关系,因此,当该关系链所指示的数据迁移至目标服务集群后,该新计算任务则需要从目标服务集群中对相关联的业务数据进行读写,而由于目标服务集群与原服务集群不在同一个IDC机房内,该种业务数据的读写将会占用较大的网络带宽,因此,在该第一个层面中迁移服务器可以及时根据计算任务日志,对未进行迁移的关系链进行更新,使得关系链能够增加全面的指示原服务集群中最新的业务数据和计算任务,以最大程度上避免原服务集群的计算任务读写目标服务集群的业务数据的情况,从而提高业务处理效率以及网络资源的利用率。此外,为了进一步避免原服务集群的计算任务读写目标服务集群的业务数据的情况,迁移服务器还可以对原服务集群中所有计算任务的网络带宽占用量进行监控,对于网络带宽占用量高于预设带宽的计算任务,迁移服务器优先将该计算任务所在的关系链迁移至目标服务集群。
另一个层面、当迁移中断时,基于中断时的数据迁移状态进行断点续传。
该断点续传的过程可以为:在基于任一个关系链进行迁移时,当检测到对关系链的迁移中断操作时,记录未完成迁移的迁移子任务,停止对关系链的迁移过程;当检测到对关系链的继续迁移操作时,根据未完成迁移的迁移子任务,将关系链所指示的业务数据和计算任务迁移至目标服务集群。
需要说明的是,在以关系链进行数据迁移的过程中,很可能会发生突发情况而导致该关系链的迁移过程中断,比如,发生网络故障或者有更高优先级的业务数据需要立即迁移等。在根据多个迁移子任务对关系链进行迁移的过程中,迁移服务器可以记录对该多个迁移子任务进行编号,并按照编号的顺序依次进行迁移。针对不同的迁移子任务,迁移服务器可以记录该迁移子任务的状态,比如该状态可以为未完成迁移、正在迁移和迁移完成。当迁移服务器检测到某个关系链的迁移中断操作时,可以记录未完成迁移的迁移子任务的编号。当检测到对该关系链的继续迁移操作时,仅执行未完成迁移的迁移子任务,以将该关系链中断时未迁移的业务数据和计算任务迁移至目标服务集群。
在本实施例中,在以关系链进行数据迁移的过程中,迁移服务器还可以采用不同的迁移状态来对迁移过程进行控制,通过采用状态机的方式管理迁移过程,避免了迁移状态丢失,保证关系链的迁移过程可任意中断后再进行断点续传。图2H示出了关系链在迁移过程中涉及的迁移状态示意图。下面以一个关系链的迁移过程为例,对各个迁移状态进行介绍:
启动迁移:开始迁移该关系链所指示的数据。
获取源头数据:在确定关系链的迁移过程启动之后,便可进入该种迁移状态,在该种迁移状态下,迁移服务器通过指定接口从实时数据处理服务器上获取该源头数据。
等待用户确认:在业务数据迁移进度达到预设进度时,冻结计算任务之前,向用户展示计算任务确认冻结界面,由用户确认后,转移至冻结计算任务的状态。需要说明的是,如果迁移的关系链为拆分得到的第三关系链,则迁移服务器在确定第三关系链所指示的关键业务数据的原存储路径和目标存储路径均包含在数据路径映射表中时,才进入该等待用户确认的迁移状态。
冻结计算任务:该种迁移状态下,迁移服务器执行上述步骤206b和206c,在所有计算任务都处于停止运行状态时,进入下一个迁移状态。
等待业务数据一致:在冻结计算任务之后,关系链所指示业务数据未完全迁移至目标服务集群之前均处于该种迁移状态。
业务数据一致性校验:在关系链所指示业务数据全部迁移至目标服务集群后,进入该种迁移状态。
业务数据存储路径切换:在对关系链的一致性校验成功后,进入该种迁移状态,执行对业务数据存储路径切换的过程。
计算任务迁移:当业务数据存储路径全部由原服务集群切换至目标服务集群后,进入该种迁移状态,执行对计算任务迁移的过程。
解冻计算任务:计算任务迁移完成之后,执行运行所有计算任务的过程,当所有计算任务都正常运行时,进入迁移完成状态,从而完成该关系链所指示业务数据和计算任务的迁移。
需要说明的是,为了便于管理迁移过程,迁移服务器可以在前台提供管理界面,具有管理权限的终端可以访问迁移服务器前台,从而展示该管理界面,由管理人员通过查看该管理界面,获知迁移过程的各项信息。
例如,任一终端可以采用管理人员的用户名和密码登录迁移服务器,从而获得管理权限,基于获得的管理权限访问迁移服务器前台。或者,部署在目标服务集群所在机房、与目标服务集群相连接的终端可以获得管理权限,基于获得的管理权限访问迁移服务器前台。
例如,该管理界面中可以包括关系链中各个节点的连接关系、各个节点的迁移状态信息、计算任务的运行状态信息、开始迁移时间、预计停止迁移时间、 关系链的迁移状态以及关系链的迁移进度等。另外,该管理界面还可以包括一个或多个管理选项,用于对迁移过程进行管理。例如,该管理界面可以包括停止迁移选项和继续迁移选项,当管理人员触发该停止迁移选项时,迁移服务器接收到停止迁移指令,则暂停当前的迁移过程,直至后续管理人员触发继续迁移选项时,迁移服务器接收到继续迁移指令,则继续进行迁移。
本实施例提供的方法,通过根据原服务集群中的计算任务日志,将具有关联关系的业务数据和计算任务采用一个关系链表示,使得在以关系链为单位进行数据迁移的过程中,正在迁移的关系链不会对其他关系链产生影响,仍可以正常运行未进行迁移的关系链所指示的计算任务,从而不会影响未进行迁移的关系链所指示业务的正常使用。另外,在关系链中将计算任务和业务数据均作为节点来进行迁移,使得计算任务不会受到业务数据所在地理位置的影响。
另外,通过对数据量较大的大关系链进行关键数据节点获取,且将关键数据节点所对应的关键业务数据设置成在原服务集群和目标服务集群均能够访问的业务数据,使得能够将大关系链拆分成多个小关系链后,小关系链不论属于原服务集群还是目标服务集群,均可以灵活访问该关键业务数据,实现了对相互关联的业务之间的解耦,并且实现了通过多个小关系链将复杂业务进行逐步迁移。
另外,在对关系链所指示的数据进行迁移的过程中,先迁移关系链所指示的业务数据,当业务数据迁移进度达到预设进度时,可以在计算任务停止运行的间隙,对计算任务进行迁移,大大降低了数据迁移对业务正常使用的影响,而且由于业务数据达到迁移进度时,剩余的业务数据量通常可以在很短的时间内完成迁移,该时间可以小于计算任务的运行周期,这样数据迁移的过程完全不会影响业务的正常使用,实现了用户无感知的数据迁移。
另外,通过将关系链所指示的业务数据采用不同的迁移子任务进行迁移,降低了数据迁移的粒度,而且该多个迁移子任务可以并行运行,提高了该关系链所指示业务数据的迁移效率。而且,当迁移发生错误时,只需要对迁移子任务进行重新迁移即可,无需对整个服务集群的业务数据重新迁移,降低了迁移过程中数据出错的代价,提高了数据迁移的效率。
另外,通过化整为零的方式,将原服务集群中的业务数据和计算任务以多个关系链为单位逐渐迁移至目标服务集群,在迁移过程中,原服务集群中的业务数据和计算任务在不断减少,使得原服务集群中空余出来的服务器便可以拆 卸搬迁到目标IDC机房中,使得服务器设备资源可以重复利用,降低了数据迁移的成本。
图3是本发明实施例提供的一种数据迁移装置的框图。参照图3,该装置包括第一获取单元301和迁移单元302。
其中,第一获取单元301与迁移单元302连接,用于根据原服务集群的计算任务日志,获取多个关系链,该计算任务日志用于记录该原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;迁移单元302,用于以关系链为单位,将该多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;其中,在基于任一个关系链进行迁移时,正常运行该多个关系链中未进行迁移的关系链所指示的计算任务。
在一种可能的实现方式中,该第一获取单元301用于根据该计算任务日志所记录的多条输入输出记录,为具有关联关系的输入输出记录添加相同的关系链标识,为不具有关联关系的输入输出记录添加不同的关系链标识;按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个关系链,每个关系链包括用于指示计算任务的任务节点、用于指示业务数据的数据节点以及任务节点和数据节点之间的关联关系。
在一种可能的实现方式中,该第一获取单元301包括:
生成子单元,用于按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个第一关系链;
拆分子单元,用于如果该多个第一关系链中包括第二关系链,则将该第二关系链拆分为多个第三关系链,该第二关系链为所指示业务数据的数据量超过第一阈值的第一关系链。
在一种可能的实现方式中,该拆分子单元用于获取该第二关系链中多个数据节点的权值,每个数据节点的权值用于指示该数据节点在该第二关系链中的关联程度,权值越高被关联程度越高;按照权值从高到低的顺序和该多个数据节点在该第二关系链上的位置,从该多个数据节点中获取关键数据节点,该关键数据节点为该顺序中第一个能够将该第二关系链拆分为至少两个第三关系链的数据节点;基于该关键数据节点,将该第二关系链中与该关键数据节点相关联的多个任务节点拆分至多个第三关系链中。
在一种可能的实现方式中,该拆分子单元用于:
对于与该关键数据节点直接关联的多个任务节点中的每个任务节点,将该关键数据节点、该任务节点以及该关键数据节点与该任务节点断开连接关系时与该任务节点具有关联关系的节点确定为第三关系链;或,
将该关键数据节点确定为一个第三关系链,对于与该关键数据节点直接关联的多个任务节点中的每个业务节点,将除该关键数据节点之外的、与该任务节点具有关联关系的节点确定为第三关系链;或,
将该关键数据节点、与该关键数据节点直接关联的至少一个任务节点以及与该至少一个任务节点具有关联关系的节点拆分为一个第三关系链,将除了已拆分的第三关系链之外的任务节点和数据节点拆分为至少一个第三关系链。
在一种可能的实现方式中,该拆分子单元用于对于该多个数据节点中的每个数据节点,将与该数据节点相关联的任务节点的数目和该数据节点所指示业务数据的数据量的乘积,确定为该数据节点的权值。
在一种可能的实现方式中,该装置还包括:
第二获取单元,用于获取关键业务数据在该目标服务集群中的目标存储路径,该关键业务数据为该关键数据节点所指示的业务数据;
添加单元,用于在数据路径映射表中添加该目标存储路径,且保留该关键业务数据在该原服务集群中的原存储路径。
在一种可能的实现方式中,该迁移单元302用于:
在迁移该多个第三关系链的过程中,根据该目标存储路径和该原存储路径,在该目标服务集群和该原服务集群中同步该关键业务数据;
对于该多个第三关系链中的任一个第三关系链,执行以下过程:
如果该第三关系链所指示的业务数据和计算任务已全部迁移至该目标服务集群,则在运行该第三关系链所指示的计算任务时,根据该数据路径映射表记录的该目标存储路径访问该关键业务数据;
如果该第三关系链该指示的业务数据和计算任务未全部迁移至该目标服务集群,则在运行该第三关系链所指示的计算任务时,根据该数据路径映射表记录的该原存储路径访问该关键业务数据。
在一种可能的实现方式中,该迁移单元302包括:
生成子单元,用于对于该多个关系链中的每个关系链,根据该关系链所指示的多个业务数据,生成多个迁移子任务,每个迁移子任务用于指示相应业务 数据的原存储路径和目标存储路径;
第一迁移子单元,用于根据该多个迁移子任务,将该关系链所指示的业务数据迁移到该目标服务集群;
第二迁移子单元,用于将该关系链所指示的计算任务迁移至该目标服务集群;
其中,在迁移该关系链所指示的计算任务时,该关系链所指示的计算任务处于停止运行状态。
在一种可能的实现方式中,该第一迁移子单元用于:
对于该关系链所指示的多个业务数据中的每个业务数据,执行以下过程:
判断该业务数据的数据量是否小于第二阈值;
如果该业务数据的数据量小于该第二阈值,则对应该业务数据生成一个迁移子任务;
如果该业务数据的数据量不小于该第二阈值,则根据该第二阈值,按照数据产生的时间顺序将该业务数据划分为多个子业务数据,对应每个子业务数据生成一个迁移子任务,该每个子业务数据的数据量小于该第二阈值。
在一种可能的实现方式中,该第一迁移子单元还用于:
在迁移该关系链所指示的业务数据的过程中,获取该关系链所指示的业务数据的迁移进度;
当该业务数据的迁移进度超过预设进度时,对于该关系链所指示的每个计算任务,执行以下过程:
判断该计算任务是否处于停止运行状态;
如果该计算任务处于停止运行状态,则在该关系链完成迁移之前维持该计算任务的停止运行状态;
如果该计算任务处于运行状态,则等待该计算任务停止运行后、该关系链完成迁移之前维持该计算任务的停止运行状态。
在一种可能的实现方式中,该装置还包括:
第一校验单元,用于对于该多个迁移子任务中的每个迁移子任务,在该迁移子任务对应的业务数据全部迁移到该目标服务集群之后,对该目标服务集群和该原服务集群中与该迁移子任务对应的业务数据进行一致性校验;如果一致性校验成功,则确定该迁移子任务对应的业务数据迁移成功;如果一致性校验失败,则确定该迁移子任务对应的业务数据迁移失败,重新执行该迁移子任务。
在一种可能的实现方式中,该装置还包括:
第二校验单元,用于对该目标服务集群和该原服务集群中该关系链所指示的业务数据进行一致性校验;如果一致性校验成功,则执行将该关系链所指示的计算任务迁移到该目标服务集群的步骤;如果一致性校验失败,则根据一致性校验结果,确定迁移失败的业务数据,对该迁移失败的业务数据重新进行迁移。
在一种可能的实现方式中,该第二迁移子单元用于获取该计算任务的第一计算资源信息和第二计算资源信息,该第一计算资源信息为在该原服务集群中为该计算任务配置的计算资源信息,该第二计算资源信息为在该目标服务集群中为该计算任务配置的计算资源信息;将该计算任务的第一计算资源信息替换为该第二计算资源信息。
在一种可能的实现方式中,该装置还包括:
切换单元,用于在数据路径映射表中,将该业务数据在该原服务集群中的原存储路径切换为在该目标服务集群中的目标存储路径。
在一种可能的实现方式中,该迁移单元302还用于在基于任一个关系链进行迁移时,当检测到对该关系链的迁移中断操作时,记录未完成迁移的迁移子任务,停止对该关系链的迁移过程;当检测到对该关系链的继续迁移操作时,根据该未完成迁移的迁移子任务,将该关系链所指示的业务数据和计算任务迁移至该目标服务集群。
在一种可能的实现方式中,该装置还包括:
关系链更新单元,用于获取更新的计算任务日志;根据该更新的计算任务日志,对未进行迁移的关系链进行更新。
本实施例提供的装置,通过根据原服务集群中的计算任务日志,将具有关联关系的业务数据和计算任务采用一个关系链表示,使得在以关系链为单位进行数据迁移的过程中,正在迁移的关系链不会对其他关系链产生影响,仍可以正常运行未进行迁移的关系链所指示的计算任务,从而不会影响未进行迁移的关系链所指示业务的正常使用。
需要说明的是:上述实施例提供的数据迁移装置在迁移数据时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将迁移服务器的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据迁移 装置与数据迁移方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图4是本发明实施例提供的一种数据迁移装置的框图。例如,装置400可以被提供为一服务器。参照图4,装置400包括处理组件422,其进一步包括一个或多个处理器,以及由存储器432所代表的存储器资源,用于存储可由处理部件422的执行的指令,例如应用程序。存储器432中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件422被配置为执行指令,以执行上述数据迁移方法实施例中迁移服务器所执行的方法。
装置400还可以包括一个电源组件426被配置为执行装置400的电源管理,一个有线或无线网络接口450被配置为将装置400连接到网络,和一个输入输出(I/O)接口458。装置400可以操作基于存储在存储器432的操作***,例如Windows Server TM,Mac OS X TM,Unix TM,Linux TM,FreeBSD TM或类似。
该数据迁移装置可以用于执行上述实施例中迁移服务器所执行的操作。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由处理器执行以完成上述实施例中的数据迁移方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本发明实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条指令,该指令由处理器加载并执行以实现上述实施例的方法中迁移服务器所执行的操作。例如,计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的可选实施例,并不用以限制本发明实施例,凡在本发明实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明实施例的保护范围之内。

Claims (21)

  1. 一种数据迁移方法,其特征在于,应用于迁移服务器,所述方法包括:
    根据原服务集群的计算任务日志,获取多个关系链,所述计算任务日志用于记录所述原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;
    以关系链为单位,将所述多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;
    其中,在基于任一个关系链进行迁移时,正常运行所述多个关系链中未进行迁移的关系链所指示的计算任务。
  2. 根据权利要求1所述的方法,其特征在于,所述根据原服务集群的计算任务日志,获取多个关系链包括:
    根据所述计算任务日志所记录的多条输入输出记录,为具有关联关系的输入输出记录添加相同的关系链标识,为不具有关联关系的输入输出记录添加不同的关系链标识;
    按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个关系链,每个关系链包括用于指示计算任务的任务节点、用于指示业务数据的数据节点以及任务节点和数据节点之间的关联关系。
  3. 根据权利要求2所述的方法,其特征在于,所述按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个关系链包括:
    按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个第一关系链;
    如果所述多个第一关系链中包括第二关系链,则将所述第二关系链拆分为多个第三关系链,所述第二关系链为所指示业务数据的数据量超过第一阈值的第一关系链。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述第二关系链拆分为多个第三关系链包括:
    获取所述第二关系链中多个数据节点的权值,每个数据节点的权值用于指示所述数据节点在所述第二关系链中的关联程度,权值越高被关联程度越高;
    按照权值从高到低的顺序和所述多个数据节点在所述第二关系链上的位置,从所述多个数据节点中获取关键数据节点,所述关键数据节点为所述顺序中第一个能够将所述第二关系链拆分为至少两个第三关系链的数据节点;
    基于所述关键数据节点,将所述第二关系链中与所述关键数据节点相关联的多个任务节点拆分至多个第三关系链中。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述关键数据节点,将所述第二关系链中与所述关键数据节点相关联的多个任务节点拆分至多个第三关系链中包括:
    对于与所述关键数据节点直接关联的多个任务节点中的每个任务节点,将所述关键数据节点、所述任务节点以及所述关键数据节点与所述任务节点断开连接关系时与所述任务节点具有关联关系的节点确定为第三关系链;或,
    将所述关键数据节点确定为一个第三关系链,对于与所述关键数据节点直接关联的多个任务节点中的每个任务节点,将除所述关键数据节点之外的、与所述任务节点具有关联关系的节点确定为第三关系链;或,
    将所述关键数据节点、与所述关键数据节点直接关联的至少一个任务节点以及与所述至少一个任务节点具有关联关系的节点拆分为一个第三关系链,将除了已拆分的第三关系链之外的任务节点和数据节点拆分为至少一个第三关系链。
  6. 根据权利要求4所述的方法,其特征在于,所述将所述第二关系链拆分为多个第三关系链之后,所述方法还包括:
    获取关键业务数据在所述目标服务集群中的目标存储路径,所述关键业务数据为所述关键数据节点所指示的业务数据;
    在数据路径映射表中添加所述目标存储路径,且保留所述关键业务数据在所述原服务集群中的原存储路径。
  7. 根据权利要求1所述的方法,其特征在于,所述以关系链为单位,将所述多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群包括:
    对于所述多个关系链中的每个关系链,根据所述关系链所指示的多个业务数据,生成多个迁移子任务,每个迁移子任务用于指示相应业务数据的原存储路径和目标存储路径;
    根据所述多个迁移子任务,将所述关系链所指示的业务数据迁移到所述目标服务集群;
    将所述关系链所指示的计算任务迁移至所述目标服务集群;
    其中,在迁移所述关系链所指示的计算任务时,所述关系链所指示的计算任务处于停止运行状态。
  8. 根据权利要求7所述的方法,其特征在于,所述将所述关系链所指示的业务数据迁移到所述目标服务集群包括:
    在迁移所述关系链所指示的业务数据的过程中,获取所述关系链所指示的业务数据的迁移进度;
    当所述业务数据的迁移进度超过预设进度时,对于所述关系链所指示的每个计算任务,执行以下过程:
    判断所述计算任务是否处于停止运行状态;
    如果所述计算任务处于停止运行状态,则在所述关系链完成迁移之前维持所述计算任务的停止运行状态;
    如果所述计算任务处于运行状态,则等待所述计算任务停止运行后、所述关系链完成迁移之前维持所述计算任务的停止运行状态。
  9. 根据权利要求7所述的方法,其特征在于,所述将所述关系链所指示的计算任务迁移至所述目标服务集群包括:
    获取所述计算任务的第一计算资源信息和第二计算资源信息,所述第一计算资源信息为在所述原服务集群中为所述计算任务配置的计算资源信息,所述第二计算资源信息为在所述目标服务集群中为所述计算任务配置的计算资源信息;
    将所述计算任务的第一计算资源信息替换为所述第二计算资源信息。
  10. 根据权利要求7所述的方法,其特征在于,所述将所述关系链所指示的业务数据迁移到所述目标服务集群之后,所述方法还包括:
    在数据路径映射表中,将所述业务数据在所述原服务集群中的原存储路径切换为在所述目标服务集群中的目标存储路径。
  11. 一种迁移服务器,其特征在于,所述迁移服务器包括:处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如下操作:
    根据原服务集群的计算任务日志,获取多个关系链,所述计算任务日志用于记录所述原服务集群中计算任务与业务数据的关联关系,每个关系链用于指示具有关联关系的一组计算任务和业务数据;
    以关系链为单位,将所述多个关系链所指示的业务数据和计算任务依次迁移至目标服务集群;
    其中,在基于任一个关系链进行迁移时,正常运行所述多个关系链中未进行迁移的关系链所指示的计算任务。
  12. 根据权利要求11所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    根据所述计算任务日志所记录的多条输入输出记录,为具有关联关系的输入输出记录添加相同的关系链标识,为不具有关联关系的输入输出记录添加不同的关系链标识;
    按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个关系链,每个关系链包括用于指示计算任务的任务节点、用于指示业务数据的数据节点以及任务节点和数据节点之间的关联关系。
  13. 根据权利要求12所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    按照相同关系链标识的输入输出记录所指示的计算任务和业务数据之间的关联关系,生成多个第一关系链;
    如果所述多个第一关系链中包括第二关系链,则将所述第二关系链拆分为多个第三关系链,所述第二关系链为所指示业务数据的数据量超过第一阈值的第一关系链。
  14. 根据权利要求13所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    获取所述第二关系链中多个数据节点的权值,每个数据节点的权值用于指示所述数据节点在所述第二关系链中的关联程度,权值越高被关联程度越高;
    按照权值从高到低的顺序和所述多个数据节点在所述第二关系链上的位置,从所述多个数据节点中获取关键数据节点,所述关键数据节点为所述顺序中第一个能够将所述第二关系链拆分为至少两个第三关系链的数据节点;
    基于所述关键数据节点,将所述第二关系链中与所述关键数据节点相关联的多个任务节点拆分至多个第三关系链中。
  15. 根据权利要求14所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    对于与所述关键数据节点直接关联的多个任务节点中的每个任务节点,将所述关键数据节点、所述任务节点以及所述关键数据节点与所述任务节点断开连接关系时与所述任务节点具有关联关系的节点确定为第三关系链;或,
    将所述关键数据节点确定为一个第三关系链,对于与所述关键数据节点直接关联的多个任务节点中的每个任务节点,将除所述关键数据节点之外的、与所述任务节点具有关联关系的节点确定为第三关系链;或,
    将所述关键数据节点、与所述关键数据节点直接关联的至少一个任务节点以及与所述至少一个任务节点具有关联关系的节点拆分为一个第三关系链,将除了已拆分的第三关系链之外的任务节点和数据节点拆分为至少一个第三关系链。
  16. 根据权利要求14所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    获取关键业务数据在所述目标服务集群中的目标存储路径,所述关键业务数据为所述关键数据节点所指示的业务数据;
    在数据路径映射表中添加所述目标存储路径,且保留所述关键业务数据在所述原服务集群中的原存储路径。
  17. 根据权利要求11所述的迁移服务器,其特征在于,所述指令由所述处 理器加载并执行以实现如下操作:
    对于所述多个关系链中的每个关系链,根据所述关系链所指示的多个业务数据,生成多个迁移子任务,每个迁移子任务用于指示相应业务数据的原存储路径和目标存储路径;
    根据所述多个迁移子任务,将所述关系链所指示的业务数据迁移到所述目标服务集群;
    将所述关系链所指示的计算任务迁移至所述目标服务集群;
    其中,在迁移所述关系链所指示的计算任务时,所述关系链所指示的计算任务处于停止运行状态。
  18. 根据权利要求11所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    在迁移所述关系链所指示的业务数据的过程中,获取所述关系链所指示的业务数据的迁移进度;
    当所述业务数据的迁移进度超过预设进度时,对于所述关系链所指示的每个计算任务,执行以下过程:
    判断所述计算任务是否处于停止运行状态;
    如果所述计算任务处于停止运行状态,则在所述关系链完成迁移之前维持所述计算任务的停止运行状态;
    如果所述计算任务处于运行状态,则等待所述计算任务停止运行后、所述关系链完成迁移之前维持所述计算任务的停止运行状态。
  19. 根据权利要求17所述的迁移服务器,其特征在于,所述指令由所述处理器加载并执行以实现如下操作:
    获取所述计算任务的第一计算资源信息和第二计算资源信息,所述第一计算资源信息为在所述原服务集群中为所述计算任务配置的计算资源信息,所述第二计算资源信息为在所述目标服务集群中为所述计算任务配置的计算资源信息;
    将所述计算任务的第一计算资源信息替换为所述第二计算资源信息。
  20. 根据权利要求17所述的迁移服务器,其特征在于,所述指令由所述处 理器加载并执行以实现如下操作:
    在数据路径映射表中,将所述业务数据在所述原服务集群中的原存储路径切换为在所述目标服务集群中的目标存储路径。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至10任一项所述的方法中所执行的操作。
PCT/CN2018/078398 2017-03-29 2018-03-08 数据迁移方法、迁移服务器及存储介质 WO2018177107A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710197702.7A CN108664496B (zh) 2017-03-29 2017-03-29 数据迁移方法及装置
CN201710197702.7 2017-03-29

Publications (1)

Publication Number Publication Date
WO2018177107A1 true WO2018177107A1 (zh) 2018-10-04

Family

ID=63674187

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078398 WO2018177107A1 (zh) 2017-03-29 2018-03-08 数据迁移方法、迁移服务器及存储介质

Country Status (2)

Country Link
CN (1) CN108664496B (zh)
WO (1) WO2018177107A1 (zh)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399356A (zh) * 2019-06-14 2019-11-01 阿里巴巴集团控股有限公司 一种在线数据迁移方法、装置、计算设备及存储介质
CN111258985A (zh) * 2020-01-17 2020-06-09 中国工商银行股份有限公司 数据集群迁移方法及装置
CN111274230A (zh) * 2020-03-26 2020-06-12 北京奇艺世纪科技有限公司 数据迁移的管理方法、装置、设备及存储介质
CN111708755A (zh) * 2020-05-20 2020-09-25 北京奇艺世纪科技有限公司 数据迁移方法、装置、***、电子设备以及可读存储介质
CN111708763A (zh) * 2020-06-18 2020-09-25 北京金山云网络技术有限公司 分片集群的数据迁移方法、装置和分片集群***
CN112506606A (zh) * 2020-11-23 2021-03-16 北京达佳互联信息技术有限公司 集群中容器的迁移方法、装置、设备和介质
CN112653539A (zh) * 2020-12-29 2021-04-13 杭州趣链科技有限公司 一种待存储数据的存储方法、装置以及设备
WO2021098268A1 (zh) * 2019-11-22 2021-05-27 浪潮电子信息产业股份有限公司 一种mon服务迁移方法、装置、设备及可读存储介质
CN113051245A (zh) * 2019-12-26 2021-06-29 云丁网络技术(北京)有限公司 用于迁移数据的方法、装置及***
CN113535087A (zh) * 2021-07-13 2021-10-22 咪咕互动娱乐有限公司 数据迁移过程中的数据处理方法、服务器及存储***
CN114024956A (zh) * 2020-07-17 2022-02-08 北京达佳互联信息技术有限公司 数据迁移方法、装置、服务器及存储介质
CN114785796A (zh) * 2022-04-22 2022-07-22 中国农业银行股份有限公司 一种数据均衡方法和装置
CN116954870A (zh) * 2023-09-19 2023-10-27 苏州元脑智能科技有限公司 跨集群应用的迁移方法、恢复方法、装置及集群***
CN117742604A (zh) * 2023-12-20 2024-03-22 北京火山引擎科技有限公司 数据存储控制方法及设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490322A (zh) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 运算节点的拆分方法和装置、电子设备和存储介质
CN110503199A (zh) * 2019-08-14 2019-11-26 北京中科寒武纪科技有限公司 运算节点的拆分方法和装置、电子设备和存储介质
CN110597609A (zh) * 2019-09-17 2019-12-20 深圳市及响科技有限公司 一种集群迁移与自动恢复方法及***
CN113438267B (zh) * 2020-03-23 2023-02-28 华为技术有限公司 一种分析流数据的方法与设备
CN111459411B (zh) * 2020-03-30 2023-07-21 北京奇艺世纪科技有限公司 数据迁移方法、装置、设备及存储介质
CN116049096B (zh) * 2022-05-05 2024-04-16 荣耀终端有限公司 一种数据迁移方法、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855299A (zh) * 2012-08-16 2013-01-02 上海引跑信息科技有限公司 不中断服务条件下,分布式数据库迭代迁移的方法
CN102982085A (zh) * 2012-10-31 2013-03-20 北京奇虎科技有限公司 数据迁移***和方法
CN103324466A (zh) * 2013-05-24 2013-09-25 浪潮电子信息产业股份有限公司 一种数据相关性序列化io的并行处理方法
CN103970879A (zh) * 2014-05-16 2014-08-06 中国人民解放军国防科学技术大学 一种调整数据块存储位置的方法及***

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6047323A (en) * 1995-10-19 2000-04-04 Hewlett-Packard Company Creation and migration of distributed streams in clusters of networked computers
WO2004046957A2 (en) * 2002-11-15 2004-06-03 Creo Inc. Methods and systems for sharing data
JP4856932B2 (ja) * 2005-11-18 2012-01-18 株式会社日立製作所 記憶システム及びデータ移動方法
CN102999537B (zh) * 2011-09-19 2017-01-18 阿里巴巴集团控股有限公司 一种数据迁移***和方法
CN103164261B (zh) * 2011-12-15 2016-04-27 ***通信集团公司 多中心数据任务处理方法、装置及***
CN103647849B (zh) * 2013-12-24 2017-02-08 华为技术有限公司 一种业务迁移方法、装置和一种容灾***
CN104935618B (zh) * 2014-03-19 2018-01-19 福建福昕软件开发股份有限公司 一种集群部署方法
CN103955491B (zh) * 2014-04-15 2017-04-19 南威软件股份有限公司 一种定时数据增量同步的方法
CN104184813B (zh) * 2014-08-20 2018-03-09 杭州华为数字技术有限公司 虚拟机的负载均衡方法和相关设备及集群***
CN105404474A (zh) * 2015-12-07 2016-03-16 上海爱数信息技术股份有限公司 一种异构分布式存储***的数据迁移方法
CN106055670A (zh) * 2016-06-06 2016-10-26 中国工商银行股份有限公司 一种***间数据迁移方法及装置
CN106202212A (zh) * 2016-06-28 2016-12-07 微梦创科网络科技(中国)有限公司 一种基于数据服务器集群实现数据拆分的方法及***

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855299A (zh) * 2012-08-16 2013-01-02 上海引跑信息科技有限公司 不中断服务条件下,分布式数据库迭代迁移的方法
CN102982085A (zh) * 2012-10-31 2013-03-20 北京奇虎科技有限公司 数据迁移***和方法
CN103324466A (zh) * 2013-05-24 2013-09-25 浪潮电子信息产业股份有限公司 一种数据相关性序列化io的并行处理方法
CN103970879A (zh) * 2014-05-16 2014-08-06 中国人民解放军国防科学技术大学 一种调整数据块存储位置的方法及***

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399356A (zh) * 2019-06-14 2019-11-01 阿里巴巴集团控股有限公司 一种在线数据迁移方法、装置、计算设备及存储介质
WO2021098268A1 (zh) * 2019-11-22 2021-05-27 浪潮电子信息产业股份有限公司 一种mon服务迁移方法、装置、设备及可读存储介质
CN113051245A (zh) * 2019-12-26 2021-06-29 云丁网络技术(北京)有限公司 用于迁移数据的方法、装置及***
CN111258985A (zh) * 2020-01-17 2020-06-09 中国工商银行股份有限公司 数据集群迁移方法及装置
CN111274230A (zh) * 2020-03-26 2020-06-12 北京奇艺世纪科技有限公司 数据迁移的管理方法、装置、设备及存储介质
CN111274230B (zh) * 2020-03-26 2024-03-08 北京奇艺世纪科技有限公司 数据迁移的管理方法、装置、设备及存储介质
CN111708755A (zh) * 2020-05-20 2020-09-25 北京奇艺世纪科技有限公司 数据迁移方法、装置、***、电子设备以及可读存储介质
CN111708763A (zh) * 2020-06-18 2020-09-25 北京金山云网络技术有限公司 分片集群的数据迁移方法、装置和分片集群***
CN111708763B (zh) * 2020-06-18 2023-12-01 北京金山云网络技术有限公司 分片集群的数据迁移方法、装置和分片集群***
CN114024956A (zh) * 2020-07-17 2022-02-08 北京达佳互联信息技术有限公司 数据迁移方法、装置、服务器及存储介质
CN114024956B (zh) * 2020-07-17 2024-03-12 北京达佳互联信息技术有限公司 数据迁移方法、装置、服务器及存储介质
CN112506606A (zh) * 2020-11-23 2021-03-16 北京达佳互联信息技术有限公司 集群中容器的迁移方法、装置、设备和介质
CN112653539B (zh) * 2020-12-29 2023-06-20 杭州趣链科技有限公司 一种待存储数据的存储方法、装置以及设备
CN112653539A (zh) * 2020-12-29 2021-04-13 杭州趣链科技有限公司 一种待存储数据的存储方法、装置以及设备
CN113535087A (zh) * 2021-07-13 2021-10-22 咪咕互动娱乐有限公司 数据迁移过程中的数据处理方法、服务器及存储***
CN113535087B (zh) * 2021-07-13 2023-10-17 咪咕互动娱乐有限公司 数据迁移过程中的数据处理方法、服务器及存储***
CN114785796A (zh) * 2022-04-22 2022-07-22 中国农业银行股份有限公司 一种数据均衡方法和装置
CN116954870A (zh) * 2023-09-19 2023-10-27 苏州元脑智能科技有限公司 跨集群应用的迁移方法、恢复方法、装置及集群***
CN116954870B (zh) * 2023-09-19 2024-02-02 苏州元脑智能科技有限公司 跨集群应用的迁移方法、恢复方法、装置及集群***
CN117742604A (zh) * 2023-12-20 2024-03-22 北京火山引擎科技有限公司 数据存储控制方法及设备

Also Published As

Publication number Publication date
CN108664496B (zh) 2022-03-25
CN108664496A (zh) 2018-10-16

Similar Documents

Publication Publication Date Title
WO2018177107A1 (zh) 数据迁移方法、迁移服务器及存储介质
US20200364092A1 (en) Managing partitions in a scalable environment
WO2019154394A1 (zh) 分布式数据库集群***、数据同步方法及存储介质
US11481139B1 (en) Methods and systems to interface between a multi-site distributed storage system and an external mediator to efficiently process events related to continuity
US11663085B2 (en) Application backup and management
US8135930B1 (en) Replication systems and methods for a virtual computing environment
US11966307B2 (en) Re-aligning data replication configuration of primary and secondary data serving entities of a cross-site storage solution after a failover event
US8341365B2 (en) Data backup system and method for virtual infrastructure
US9170848B1 (en) Parallel processing of data
WO2020072338A1 (en) Data backup and disaster recovery between environments
CN109408115B (zh) 一种基于容器环境中迁移对象的方法及计算***
US20200026786A1 (en) Management and synchronization of batch workloads with active/active sites using proxy replication engines
US11032156B1 (en) Crash-consistent multi-volume backup generation
WO2012113336A1 (zh) 一种在虚拟化环境中管理资源的***及其实现方法
KR20190041033A (ko) 서비스의 2차 위치에서의 작업의 재생 기법
US9563478B1 (en) Scalable concurrent execution of distributed workflows sharing common operations
US11409711B2 (en) Barriers for dependent operations among sharded data stores
CN110347483B (zh) 物理机到虚拟机迁移方法、装置及存储介质
US7516181B1 (en) Technique for project partitioning in a cluster of servers
US20220269414A1 (en) Snapshotting a containerized application
TW201738781A (zh) 資料表連接方法及裝置
US11829609B2 (en) Data loss recovery in a secondary storage controller from a primary storage controller
US11176088B2 (en) Dynamic server pool data segmentation using dynamic ordinal partition key without locks
WO2017181430A1 (zh) 分布式***的数据库复制方法及装置
US9485308B2 (en) Zero copy volume reconstruction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18777264

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18777264

Country of ref document: EP

Kind code of ref document: A1