CN114442907A - Data migration method and device, server and network system - Google Patents

Data migration method and device, server and network system Download PDF

Info

Publication number
CN114442907A
CN114442907A CN202011219098.1A CN202011219098A CN114442907A CN 114442907 A CN114442907 A CN 114442907A CN 202011219098 A CN202011219098 A CN 202011219098A CN 114442907 A CN114442907 A CN 114442907A
Authority
CN
China
Prior art keywords
data
migration
queue
data queue
consumer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011219098.1A
Other languages
Chinese (zh)
Inventor
魏巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011219098.1A priority Critical patent/CN114442907A/en
Publication of CN114442907A publication Critical patent/CN114442907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Abstract

The embodiment of the application provides a data migration method and device, a server and a network system. The data migration method comprises the following steps: responding to a migration instruction for migrating data of an original data queue in a first server, and predicting the total times of a consumer reading first data of the original data queue, wherein the first data is data written in the original data queue when the migration instruction is received; when the total times is less than or equal to the set times, selecting a first migration strategy to determine a migration starting position, wherein the migration starting position of the first migration strategy is positioned at the downstream of the earliest data position of the original data queue; and copying the data in the original data queue to the second server according to a set sequence to form a migration data queue, wherein the set sequence is the sequence from the migration starting position to the position of the data which is newly written into the original data queue by the producer in the migration process. The migration strategy is selected according to the consumption condition of the consumer, so that the migration speed is improved, and the io resource consumption of the disk is reduced.

Description

Data migration method and device, server and network system
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a data migration method and apparatus, a server, and a network system.
Background
The data channel product is mainly used for an upstream system to publish messages and a downstream system to subscribe to use the messages, the messages are stored in the data channel in a first-in first-out queue mode, and downstream tasks also read and process data sequentially.
Currently, data high reliability of most message channel products is realized through data redundancy, that is, service data is stored in multiple copies in a cluster. When partial node failure or node expansion capacity occurs, necessary data copy migration is needed to ensure the high reliability of the original data. In the prior art, the data copy migration has a large data volume and a slow migration speed, consumes a large amount of resources such as a disk io and network bandwidth, and will certainly impact the original normal service.
Disclosure of Invention
The embodiment of the application provides a data migration method and device, a server and a network system, on the premise that effective transmission of upstream and downstream data is guaranteed and the problem of data loss is avoided, different migration strategies can be selected according to different consumption conditions of a consumer, the speed of completing copy migration is improved, the io resource consumption of a disk for storing data is reduced, and further the influence of copy migration on normal services is reduced.
In a first aspect, an embodiment of the present application provides a data migration method, where the data migration method includes: predicting the total times of a consumer reading first data of an original data queue in response to a migration instruction for migrating the data of the original data queue in a first server, wherein the first data is data written in the original data queue when the migration instruction is received; when the total times is less than or equal to the set times, selecting a first migration strategy to determine a migration starting position, wherein the migration starting position of the first migration strategy is located downstream of the earliest data position of the original data queue; and copying the data in the original data queue to a second server according to a set sequence to form a migration data queue, wherein the set sequence is the sequence from the migration starting position of the selected migration strategy to the position of the data which is newly written into the original data queue by the producer in the migration process.
In the above solution, when a migration instruction for migrating data in an original data queue in a first server is received, a total number of times that a consumer reads first data of the original data queue again after receiving the migration instruction may be predicted, and if the total number of times is less than or equal to a set number of times, that is, the number of times that the consumer reads first data of the original data queue is small, a first migration policy is selected to determine a migration start position, and data in the original data queue is copied to a second server to form a migration data queue according to an order from the migration start position of the selected migration policy to a position where a producer writes data of the original data queue most recently during migration, since the migration start position of the first migration policy is located downstream of an earliest data position of the original data queue, the number of copies of the first data is reduced, and since the data is stored in a page cache first, after a certain time, the data in the page cache is stored in the disk again, that is, the uncopyed data is generally the data in the disk, so that the proportion of reading the data from the page cache is increased, the probability of true disk reading is reduced, the io resource consumption of the disk for storing the data is reduced, the influence of copy migration on normal services is reduced, all the data in the original data queue does not need to be migrated, and the speed of completing the copy migration can be increased.
In a possible implementation manner, the selecting a first migration policy to determine a migration start position specifically includes: acquiring a reading position and a latest data position of an earliest active consumer of the original data queue at the current moment, wherein the current moment is the moment when the total times are determined to be less than the set times, and the latest data position is a position of data written into the original data queue by the producer at the current moment; determining a starting migration position of the first migration policy to be the latest data position when a distance between a read position of the oldest active consumer and the latest data position is less than or equal to a set distance.
That is, in this implementation, it is determined that the distance between the read position of the oldest active consumer and the latest data position in the original data queue at the current time, where the total number of times is less than the set number of times, is less than or equal to the set distance, the oldest active consumer is the real-time consumer of the original data queue, and since the oldest active consumer is farthest from the latest data position, when the oldest active consumer is the real-time consumer, the other active consumers are all real-time consumers, so that the start migration position of the first migration policy can be determined to be the latest data position of the original data queue, it is not necessary to copy the first data between the oldest data position of the original data queue and the latest data position, where the total number of times is less than the set number of times, and the speed of completing the migration of the copy can be increased, and thus data is mainly read from the page buffer, and the probability of true disk read is reduced, the consumption of the io resources of the disk for storing data is reduced, so that the io resources of the disk can be mainly used for normal services, and the influence of copy migration on the normal services is reduced.
In a possible implementation manner, when the start migration position is set to the latest data position, the data migration method further includes: acquiring the read position of the earliest active consumer of the original data queue in the data migration process; when the migration data queue is copied to the data which is written into the original data queue by the producer latest, determining that the reading position of the earliest active consumer of the original data queue enters the range of second data of the original data queue in the data migration process, and completing data migration, wherein the second data is the data copied by the migration data queue.
That is to say, in this implementation, since it is determined that the starting migration position of the first migration policy is the latest data position of the original data queue, all consumers in the original data queue are real-time consumers of the original data queue, and when the migration data queue is copied to the data that is newly written into the original data queue by the producer, as long as the read position of the earliest active consumer in the migration process enters the range of the second data that has been copied by the migration data queue of the original data queue, it can be ensured that the read positions of all active consumers of the original data queue all enter the range of the second data, and data migration can be completed.
In a possible implementation manner, the data migration method further includes: when the distance between the reading position of the earliest active consumer and the latest data position is larger than a set distance, determining that the starting migration position of the first migration strategy is the reading position of the earliest active consumer.
That is, in this implementation, when the distance between the read position of the oldest active consumer and the latest data position in the original data queue is greater than the set distance, that is, at least the oldest active consumer among all active consumers is a non-real-time consumer, and the dwell time of the oldest active consumer at the first data position of the original data queue is longer, at this time, the starting migration position of the first migration policy may be determined to be the read position of the oldest active consumer, so that the oldest active consumer can always work normally in the whole data migration process, and the first data between the oldest data position of the original data queue and the read position of the oldest active consumer need not to be copied, the speed of completing the copy migration can be increased, the probability of occurrence of real disk reading can be reduced to a certain extent, and the io resource consumption of a disk storing data is reduced, and the influence of copy migration on normal service is reduced.
In one possible implementation, when the start migration location is set as the read location of the earliest active consumer, the data migration method further includes: and determining whether the migration data queue is copied to the data which is written into the original data queue by the producer latest, and finishing data migration.
That is, in this implementation, since the starting migration position of the first migration policy is set as the read position of the oldest active consumer, when the migration data queue is copied to the data that is newly written into the original data queue by the producer in the data migration process, it can be ensured that all active consumers are located within the range of the copied second data of the original data queue, thereby completing the data migration.
In a possible implementation manner, the obtaining a read position of an earliest active consumer of the original data queue specifically includes: acquiring a reading position of each active consumer in all data queues for reading, wherein all the data queues comprise the original data queue; unloading the mapping structure of each active consumer and the read position of each active consumer in all the read data queues of each active consumer into the mapping structure of each data queue and the read position of all the active consumers in each data queue, thereby obtaining the read positions of all the active consumers of the original data queue; comparing the read positions of all active consumers in the raw data queue to obtain a read position of an earliest active consumer of the raw data queue.
That is, in this implementation, all active consumers existing in the original data queue and the read locations of these active consumers cannot be directly obtained, the data queue for each active consumer read and the read location in the data queue may be obtained, this may be done by first acquiring the read position of each active consumer in all data queues for reads, then the mapping structure/corresponding relation between each active consumer and the reading position of each active consumer in all the data queues read is transferred into the mapping structure/corresponding relation between each data queue and the reading position of all the active consumers in each data queue, therefore, active consumers and reading positions thereof existing in the original data queue are obtained, and the reading positions of all the active consumers in the original data queue are compared, so that the reading position of the earliest active consumer in the original data queue can be obtained.
In one possible implementation manner, before the obtaining the read position of each active consumer in all data queues read, the obtaining the read position of the oldest active consumer in the original data queue further includes: acquiring a read position of all consumers in the read data queue, wherein all consumers comprise inactive consumers and active consumers; querying the read states of all the consumers to classify each of the all the consumers as the active consumer or the inactive consumer, wherein the read state of the active consumer is working, and the read state of the inactive consumer is suspended; removing the inactive consumer and information of the read position of the inactive consumer in the read data queue.
That is, in this implementation, it is inconvenient to directly acquire the read positions of all active consumers in all data queues read, and the read positions of all consumers in the data queues read may be acquired first, and all consumers include inactive consumers and active consumers. Then, the read status of all consumers is queried to classify the consumers as active consumers whose read status is active and inactive consumers whose read status is suspended. Then, the information of the reading positions of the inactive consumers and the inactive consumers in the read data queues is removed, and the reading position of each active consumer in all the read data queues can be obtained.
In one possible implementation manner, after the data migration is completed, the data migration method further includes: determining that the consumer has a situation of reading the first data of the original data queue which is not migrated, and deleting the original data queue according to a set strategy; or, determining that the consumer does not read the first data of the original data queue which is not migrated, and deleting the original data queue.
That is, in this implementation, when the first migration policy is selected to determine the migration start position, since the migration start position of the first migration policy is located downstream of the earliest data position of the original data queue, data between the earliest data position in the original data queue and the migration start position is not copied into the migration data queue, it is necessary to determine whether there is a situation that the first data that is not migrated in the original data queue is read by the consumer after the data migration is completed, and if there is no situation that the first data is read, the original data queue may be deleted; if there is a read situation, the original data queue may be deleted according to a set policy, for example, the set policy may be to delete the original data queue after the consumer reads the first data that has not been migrated, so as to ensure that normal work is not affected.
In one possible implementation manner, the data migration method includes: and when the total times is greater than the set times, selecting a second migration strategy to determine a migration starting position, wherein the migration starting position of the second migration strategy is the earliest data position of the original data queue.
That is to say, in this implementation, when the total number of times that the consumer reads the first data in the original data queue is greater than the set number of times, after the data migration is completed by selecting the first migration policy, although the original data queue may be deleted according to the set policy, so as to ensure that the consumer can read the first data that is not migrated in the original data queue again after the data migration is completed, the consumption of switching the consumer from the migration data queue to the original data queue is relatively large, when the number of times that the consumer reads the first data in the original data queue is relatively large, in order to reduce the consumption, the second migration policy may be selected, so as to start migration from the earliest data position in the original data queue, and ensure that the consumer can complete all read services in the migration data queue.
In a possible implementation manner, the data migration method further includes: and determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
That is, in this implementation, when the second migration policy is selected to start migration from the earliest data position of the original data queue, and when the migration data queue is copied to the data that is newly written into the original data queue by the producer, it is ensured that the migration data queue copies all the data in the original data queue, and then data migration is completed.
In a possible implementation manner, the data migration method further includes: and deleting the original data queue after the data migration is completed.
That is, in this implementation, since the start migration position of the second migration policy is the earliest data position of the original data queue, and after the migration data queue copies to the data that is newly written into the original data queue by the producer and completes the data migration, the migration data queue may copy to all the data of the original data queue, and therefore, the original data queue may be deleted immediately after completing the data migration, so as to release the memory.
In a second aspect, an embodiment of the present application provides a data migration method, where the data migration method includes: responding to a migration instruction for migrating data of an original data queue in a first server, and acquiring a reading position and a latest data position of an earliest active consumer of the original data queue at the current time, wherein the current time is the time when the migration instruction is received, and the latest data position is a position where a producer writes data of the original data queue at the current time; when the distance between the reading position of the earliest active consumer and the latest data position is smaller than or equal to a set distance, copying the data in the original data queue to a second server according to the sequence from the latest data position to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue; and/or when the distance between the reading position of the earliest active consumer and the latest data position is larger than a set distance, copying the data in the original data queue to a second server in the sequence from the reading position of the earliest active consumer to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue.
In the scheme, when the distance between the reading position of the earliest active consumer and the latest data position of the original data queue at the current moment of receiving the migration instruction is smaller than or equal to the set distance, all active consumers in the original data queue are real-time consumers, and therefore data migration is started from the latest data position of the original data queue; when the distance between the read position of the earliest active consumer in the original data queue at the current moment of receiving the migration instruction and the latest data position is greater than the set distance, at least the earliest active consumer in the original data queue is not a real-time consumer, so that data migration is started from the read position of the earliest active consumer in the original data queue, and the data in the original data queue in the first server is copied to the second server according to the sequence from the migration start position to the position where the producer writes the data in the original data queue most recently in the migration process to form a migration data queue, compared with the scheme of migrating data from the earliest data position, because the data written in the original data queue by the producer is stored in the page buffer for a certain time and then stored in the disk from the page buffer, the data migration method of the embodiment of the application mainly copies data from the page buffer for migration, therefore, the probability of copying the first data from the disk is reduced, namely the probability of real disk reading is reduced, the influence of copy migration on the normal service of the disk is reduced, all data in the original data queue does not need to be migrated, and the speed of completing the copy migration can be improved.
In one possible implementation, when migrating data from the latest data location, the data migration method includes: acquiring the read position of the earliest active consumer of the original data queue in the data migration process; and when the migration data queue is copied to the data which is written into the original data queue by the producer latest, determining that the read position of the earliest active consumer of the original data queue enters a second data range of the original data queue in the data migration process, and completing data migration, wherein the second data is the data copied by the migration data queue.
That is, in this implementation, since the migration start position is the latest data position of the original data queue, all consumers in the original data queue are real-time consumers, and when the migration data queue is copied to the data written into the original data queue by the producer, as long as the read position of the earliest active consumer farthest from the latest data position in the migration process enters the range of the migration data queue, it can be ensured that the read positions of all active consumers in the original data queue all enter the range of the migration data queue, and data migration can be completed.
In one possible implementation, when migrating data from the read location of the earliest active consumer, the data migration method further comprises: and determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
That is, in this implementation, since the start migration position is set as the read position of the oldest active consumer, when the migration data queue is copied to the data newly written into the original data queue by the producer in the data migration process, it can be ensured that all active consumers are located within the range of the migration data queue, thereby completing the data migration.
In a possible implementation manner, the obtaining a read position of an earliest active consumer in the original data queue specifically includes: acquiring a reading position of each active consumer in all data queues for reading, wherein all the data queues comprise the original data queue; unloading the mapping structure of each active consumer and the read position of each active consumer in all the read data queues of each active consumer into the mapping structure of each data queue and the read position of all the active consumers in each data queue, thereby obtaining the read positions of all the active consumers of the original data queue; comparing the read positions of all active consumers in the raw data queue to obtain a read position of an earliest active consumer of the raw data queue.
That is to say, in this implementation manner, all active consumers existing in the original data queue and the reading positions of these active consumers cannot be directly obtained, but the data queue read by each active consumer and the reading position in the data queue may be obtained, so that the mapping structure/correspondence between each active consumer and the reading position of each active consumer in all the data queues read may be restored to the mapping structure/correspondence between each data queue and the reading position of all the active consumers in each data queue, thereby obtaining the active consumers existing in the original data queue and their reading positions, and then comparing the reading positions of all the active consumers in the original data queue, that is, obtaining the reading position of the earliest active consumer in the original data queue.
In one possible implementation manner, before the obtaining the read position of each active consumer in all data queues read, the obtaining the read position of the oldest active consumer in the original data queue further includes: acquiring a read position of all consumers in the read data queue, wherein all consumers comprise inactive consumers and active consumers; querying the read status of all consumers to divide each of the all consumers into the active consumer and the inactive consumer, wherein the read status of the active consumer is working and the read status of the inactive consumer is suspended; removing the inactive consumer and information of the read position of the inactive consumer in the read data queue.
That is, in this implementation, it is inconvenient to directly acquire the read positions of all active consumers in all data queues read, and the read positions of all consumers in the data queues read may be acquired first, and all consumers include inactive consumers and active consumers. Then, the read status of all consumers is queried to classify the consumers as active consumers whose read status is active and inactive consumers whose read status is suspended. Then, the information of the reading positions of the inactive consumers and the reading positions of the inactive consumers in the read data queues is removed, and the reading position of each active consumer in all the read data queues can be obtained.
In one possible implementation manner, after the data migration is completed, the data migration method further includes: determining that the consumer has a situation of reading first data which is not migrated in the original data queue, and deleting the original data queue according to a set strategy, wherein the first data is data written in the original data queue when the migration instruction is received; or, determining that the consumer does not read the first data of the original data queue which is not migrated, and deleting the original data queue.
That is to say, in this implementation, after the data migration is completed, it is necessary to determine whether the first data that is not migrated in the original data queue is read again by the consumer, and if the first data that is not migrated in the original data queue is not read again, the original data queue may be immediately deleted; if the situation of reading again exists, the original data queue can be deleted according to a set strategy, for example, the original data queue is deleted after the consumer reads the first data which is not migrated again, so that the normal work is not influenced.
In a third aspect, an embodiment of the present application provides a data migration apparatus, where the data migration apparatus includes: the prediction unit is used for responding to a migration instruction for migrating data of an original data queue in a first server and predicting the total times of reading the first data of the original data queue by the consumer, wherein the first data is the data written in the original data queue when the migration instruction is received; a selecting unit, configured to select a first migration policy to determine a migration start position when the total number of times is less than or equal to a set number of times, where the migration start position of the first migration policy is located downstream of an earliest data position of the original data queue; and the migration unit is used for copying the data in the original data queue to a second server according to a set sequence to form a migration data queue, wherein the set sequence is the sequence from the migration starting position of the selected migration strategy to the position of the data which is newly written into the original data queue by the producer in the migration process.
In one possible implementation, the selecting unit includes: an obtaining module, configured to obtain a read position and a latest data position of an earliest active consumer of the original data queue at a current time, where the current time is a time when the total number of times is determined to be less than the set number of times, and the latest data position is a position where the producer writes data in the original data queue at the current time; a determining module, configured to determine a starting migration position of the first migration policy to be the latest data position when a distance between the read position of the earliest active consumer and the latest data position is less than or equal to a set distance.
In a possible implementation manner, when the start migration position is set to the latest data position, the obtaining module is further configured to obtain a read position of an earliest active consumer of the original data queue in a data migration process; the data migration apparatus further includes: and the first determining unit is used for determining that the reading position of the earliest active consumer of the original data queue enters a range of second data of the original data queue in the data migration process when the migration data queue is copied to the data which is newly written into the original data queue by the producer, and finishing data migration, wherein the second data is the copied data of the migration data queue.
In a possible implementation manner, the determining module is further configured to determine, when a distance between the read position of the earliest active consumer and the latest data position is greater than a set distance, that the start migration position of the first migration policy is the read position of the earliest active consumer.
In one possible implementation, when the start migration position is a read position of the earliest active consumer, the data migration apparatus further includes: and the second determining unit is used for determining that the data which is copied to the original data queue by the migration data queue and is written into the original data queue by the producer latest, and finishing data migration.
In one possible implementation manner, the obtaining module includes: a first obtaining submodule, configured to obtain a read position of each active consumer in all read data queues, where the all data queues include the original data queue; a dump sub-module, configured to dump the mapping structure of each active consumer and the read position of each active consumer in all read data queues into a mapping structure of each data queue and the read position of all active consumers in each data queue, so as to obtain the read positions of all active consumers of the original data queue; and the comparison and acquisition submodule is used for comparing the reading positions of all active consumers in the original data queue so as to acquire the reading position of the earliest active consumer in the original data queue.
In a possible implementation manner, the obtaining module further includes: a second obtaining submodule, configured to obtain read positions of all consumers in the read data queue, where all consumers include inactive consumers and active consumers; a query submodule, configured to query the read status of all the consumers to classify each of the consumers as the active consumer or the inactive consumer, where the read status of the active consumer is in operation, and the read status of the inactive consumer is suspended; and the removing submodule is used for removing the information of the inactive consumers and the reading positions of the inactive consumers in the read data queue.
In one possible implementation manner, after the data migration is completed, the data migration apparatus further includes: a first deleting unit, configured to determine that the consumer has a situation of reading the first data of the original data queue that is not migrated, and delete the original data queue according to a set policy; or, deleting the primary data queue in case it is determined that the consumer did not read the first data of the primary data queue that was not migrated.
In a possible implementation manner, the selecting unit is further configured to select a second migration policy to determine a migration start position when the total number of times is greater than a set number of times, where the migration start position of the second migration policy is an earliest data position of the original data queue.
In one possible implementation manner, the data migration apparatus further includes: and the third determining unit is used for determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
In one possible implementation manner, the data migration apparatus further includes: and the second deleting unit is used for deleting the original data queue after the data migration is finished.
In a fourth aspect, an embodiment of the present application provides a data migration apparatus, where the data migration apparatus includes: an obtaining module, configured to respond to a migration instruction for migrating data of an original data queue in a first server, and obtain a read position and a latest data position of an earliest active consumer in the original data queue at a current time, where the current time is a time when the migration instruction is received, and the latest data position is a position where a producer writes data in the original data queue at the current time; a migration module, configured to copy, when a distance between a read position of the oldest active consumer and the latest data position is less than or equal to a set distance, data in the original data queue to a second server in an order from the latest data position to a position of data that is written into the original data queue by a producer during a migration process, so as to form a migration data queue; and/or when the distance between the reading position of the earliest active consumer and the latest data position is larger than a set distance, copying the data in the original data queue to a second server to form a migration data queue according to the sequence from the reading position of the earliest active consumer to the position of the data which is newly written into the original data queue by a producer in the migration process.
In a possible implementation manner, when data is migrated from the latest data position, the obtaining module is further configured to obtain a read position of an earliest active consumer of the original data queue during data migration; the data migration apparatus further includes: the first determining module is configured to determine, when the migration data queue is copied to data that is newly written into the original data queue by a producer, that a read position of an earliest active consumer of the original data queue enters a second data range of the original data queue in a data migration process, and complete data migration, where the second data is data that has been copied by the migration data queue.
In one possible implementation, when migrating data from the read location of the earliest active consumer, the data migration apparatus further includes: and the second determining module is used for determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
In one possible implementation manner, the obtaining module includes: a first obtaining submodule, configured to obtain a read position of each active consumer in all read data queues, where the all data queues include the original data queue; a dump sub-module, configured to dump the mapping structure of each active consumer and the read position of each active consumer in all read data queues into a mapping structure of each data queue and the read position of all active consumers in each data queue, so as to obtain the read positions of all active consumers of the original data queue; and the comparison and acquisition submodule is used for comparing the reading positions of all active consumers in the original data queue so as to acquire the reading position of the earliest active consumer in the original data queue.
In a possible implementation manner, the obtaining module further includes: a second obtaining submodule, configured to obtain read positions of all consumers in the read data queue, where all consumers include inactive consumers and active consumers; a query submodule, configured to query the read status of all the consumers to classify each of the consumers as the active consumer or the inactive consumer, where the read status of the active consumer is in operation, and the read status of the inactive consumer is suspended; and the removal submodule is used for removing the information of the inactive consumers and the reading positions of the inactive consumers in the read data queue.
In one possible implementation manner, after the data migration is completed, the data migration apparatus further includes: a deleting module, configured to determine that the consumer has a situation of reading first data of the original data queue that is not migrated, and delete the original data queue according to a set policy, where the first data is data written in the original data queue when the migration instruction is received; or, in case it is determined that the consumer does not read the first data of the original data queue that has not been migrated, deleting the original data queue.
In a fifth aspect, an embodiment of the present application provides a server, including: a transceiver for receiving and transmitting data; a memory storing a computer program; a processor, configured to execute the computer program stored in the memory, so as to enable the server to implement the data migration method, where the server is the first server or the second server.
In a sixth aspect, an embodiment of the present application provides a network system, including: the data migration method comprises a first server and a second server, wherein the first server or the second server can execute the data migration method.
In a seventh aspect, an embodiment of the present application provides a computer storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the data migration method described above.
In an eighth aspect, embodiments of the present application provide a computer program product containing instructions that, when executed on a computer, cause the computer to perform the above-mentioned data migration method.
According to the scheme of the embodiment of the application, on the premise of ensuring that upstream and downstream data can be effectively transmitted and the problem of data loss is avoided, different migration strategies can be selected according to different consumption conditions of a consumer, specifically, when the total times of the consumer reading the first data of the original data queue is less, namely the total times is less than or equal to the set times, the first migration strategy can be selected to determine the migration starting position, wherein the first data is the data written in the original data queue when a migration instruction is received, the migration starting position of the first migration strategy is located at the downstream of the earliest data position of the original data queue, so that the copy number of the first data is reduced, because the data is stored into the page cache firstly, the data in the page cache is stored into the disk after a certain time, namely the uncopyed data is generally the data located in the disk, the proportion of reading the data from the page cache is increased, the probability of real disk reading is reduced, so that the io resource consumption of a disk for storing data is reduced, the influence of copy migration on normal service is reduced, all data in an original data queue does not need to be copied, and the speed of completing copy migration is improved.
Drawings
FIG. 1 is an architecture diagram of a Kafka cluster;
FIG. 2 is a schematic diagram of a scenario in which the Kafka cluster shown in FIG. 1 performs data migration;
FIG. 3 is a process diagram of one data migration method suitable for use with the Kafka cluster shown in FIG. 1;
FIG. 4 is a graph of the change in disk utilization when migrating data according to the data migration method shown in FIG. 3;
fig. 5 is a flowchart of a data migration method provided in an embodiment of the present application;
FIG. 6 is a detailed flowchart of step 531 in FIG. 5;
FIG. 7 is a process diagram of a first approach to a first migration policy;
FIG. 8 is a process diagram of a second approach to the first migration policy;
FIG. 9 is a graph comparing disk utilization with the optimized data migration method according to the embodiment of the present application and the original data migration method;
FIG. 10 is a flow chart of another data migration method provided by an embodiment of the present application;
FIG. 11 is a schematic structural diagram of a data migration apparatus according to an embodiment of the present application;
FIG. 12 is a schematic diagram of the structure of the selection unit in FIG. 11;
FIG. 13 is a schematic diagram of the acquisition module of FIG. 12;
FIG. 14 is a schematic structural diagram of another data migration apparatus provided in an embodiment of the present application;
fig. 15 is a schematic structural diagram of a server provided in an embodiment of the present application;
fig. 16 is a schematic structural diagram of a network system according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
In the description of the embodiments of the present application, the words "exemplary," "for example," or "for instance" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary," "e.g.," or "e.g.," is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary," "e.g.," or "exemplary" is intended to present relevant concepts in a concrete fashion.
In the description of the embodiments of the present application, the term "and/or" is only one kind of association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, B exists alone, and A and B exist at the same time. In addition, the term "plurality" means two or more unless otherwise specified. For example, the plurality of systems refers to two or more systems, and the plurality of screen terminals refers to two or more screen terminals.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit indication of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The data channel product is mainly used for an upstream system to publish messages and a downstream system to subscribe to use the messages, the messages are stored in the data channel in a first-in first-out queue mode, and downstream tasks also acquire and process data in sequence. The message queue generally plays the roles of decoupling, peak clipping and asynchronous processing in the architecture design, a producer writes a message into the queue, and a consumer reads the message from the queue to perform business logic. Kafka is a high-throughput distributed publish-subscribe messaging system with high-performance, persistence, multi-copy backup, scalability capabilities that can handle all the action flow data of a consumer in a website. Such actions, such as web browsing, searching, and other user actions, are a key factor in many social functions on modern networks. These data are typically addressed by handling logs and log aggregations due to throughput requirements.
FIG. 1 is an architecture diagram of a Kafka cluster. As shown in fig. 1, the Kafka cluster includes one or more servers, which are called brokers (brokers), and each message issued to the Kafka cluster has a Topic (Topic), and the types of messages/data under the same Topic are the same. Each topic contains one or more partitions (partitions), which are physical concepts, and within each Partition, messages need to be guaranteed to be ordered. The producer (producer) is a client that writes messages/data into a specified topic of the agent, and the Consumer (Consumer) is a client that reads messages/data of the specified topic from the agent for business processing. To achieve horizontal expansion, the expansion may be performed horizontally by increasing the number of partitions. Every time a message/data is newly written, Kafka appends (appended) copy data to the corresponding file to ensure the high reliability of the original data.
In FIG. 1, subject 0 has two partitions, partition 0 and partition 1. There are three copies per partition. Solid arrows point to the subject of producer write messages/data, which are leader copies, providing data production and consumption services; the subject pointed to by the dashed arrow is a follower (follower) copy that copies (copy) the message/data from the leader copy, which may be selected to serve as the leader copy after the leader copy has failed. The subject of the chain double-dashed arrow is the copy read by the consumer. When the agent instances are expanded and contracted, in order to balance the copies on each agent as much as possible, data migration is required, that is, a scenario in which the scheme of the embodiment of the present application is mainly used.
Fig. 2 is a schematic diagram of a scenario when the Kafka cluster shown in fig. 1 performs data migration. Specifically, the scenario in fig. 2 is a schematic diagram when performing the expansion. As shown in fig. 2, when capacity expansion is performed, the agent 3 may be added, and data of the "subject 0 partition 0" in the agent 0 and the "subject 0 partition 1" in the agent 1 may be copied to the agent 3, as shown by the single-dot chain line arrows in fig. 2. In addition, when performing the capacity reduction, assuming that the Kafka cluster includes agents 1 to 5 each including 4 partitions at this time, four partitions in the agent 5 may be migrated into the agents 1 to 4, respectively.
Fig. 3 is a process diagram of a data migration method applicable to the Kafka cluster shown in fig. 1. As shown in fig. 3, in the data migration method, copying is started from the earliest data position of the original data queue, during the data migration process, the original data queue is still providing service, consumer 1 and consumer 2 are in a working state of reading data, and the producer is still writing new data into the original data queue, and the data represented by the black rectangles in the original data queue of the lower diagram "migration completed" in fig. 3 is the data that the producer has newly written after the data copying is started. And after long data copying, finishing data migration after all data in the original data queue is copied into a new data queue. The migration data queue begins servicing, after which the newly written data will be written to the migration data queue and the data in the original data queue will be deleted.
FIG. 4 is a graph of the change in disk utilization when migrating data according to the data migration method shown in FIG. 3. As shown in fig. 3 and 4, the data migration method has the following disadvantages:
1. before the data migration is finished, a large amount of data (such as data represented by black rectangles in the original data queue of the lower graph "migration finished" in fig. 3) is written into the original data queue, and if the migration speed is not much greater than the data writing speed, the data migration process takes a lot of time.
2. In the data copying process, a large amount of historical data read-write work occupies a large amount of io resources of a disk for storing data, normal services are affected, in addition, the Linux caches recent read-write data by means of page caches, and the early-time copy historical data is difficult to hit the page caches, so that real disk reading occurs.
The main function of the message channel product is to provide upstream service data publishing and downstream service data subscription, so that the upstream and downstream data can be effectively transmitted in the data migration process without losing data. The embodiment of the application provides a data migration method, a data migration device, a server and a network System, which can be applied to message channel products such as kafka, and can also be applied to other similar storage systems such as a Distributed File System (hdfs), and can select a migration strategy according to different consumption conditions of a consumer on the premise of ensuring that upstream and downstream data can be effectively transmitted and data cannot be lost, thereby being beneficial to improving data migration efficiency and reducing io resource consumption of a disk for storing data, and further reducing the influence of copy migration on normal services.
Fig. 5 is a flowchart of a data migration method according to an embodiment of the present application. The data migration method is used for migrating data in a data queue, wherein a producer is used for writing data into the data queue in sequence, and at least one consumer is used for reading the data from the data queue in sequence. As shown in fig. 5, the data migration method includes:
step 51, in response to a migration instruction for migrating data in the original data queue in the first server, predicting a total number of times that a consumer reads first data in the original data queue, where the first data is data written in the original data queue when the migration instruction is received, such as data represented by a white rectangle in fig. 3. In addition, it should be noted that the "total number" herein is the number of times that the consumer reads the first data again after receiving the migration instruction, and the total number may be predicted based on the historical consumption record of the consumer before receiving the migration instruction, the business type to which the consumer belongs, and the like. Specifically, the historical consumption records may include the frequency of re-consumption of each consumer and its trend of change. In addition, if the business type of the consumer allows missed reading of data, that is, the consumer may be allowed not to read the data when the consumer needs to re-read the first data, the number of times of reading of the consumer may be calculated as 0. Accordingly, if the business type of the consumer does not allow missed reading of data, the consumer needs to count each time the consumer needs to read the first data that is not migrated. Moreover, the total times can be the total times of reading the first data of the original data queue by different consumers, and when the same consumer reads the first data for multiple times, the first data is calculated according to multiple times; if only one consumer reads the original data queue, the total times are the times of the consumer reading the first data of the original data queue.
In step 52, it is determined whether the total number of times is less than or equal to the set number of times.
The set number of times can be set according to the work requirement, and can be 3 times or 5 times, for example.
And step 53, if the judgment result is yes, selecting a first migration strategy to determine a migration starting position, wherein the migration starting position of the first migration strategy is located downstream of the earliest data position of the original data queue.
With continued reference to fig. 3, the "oldest data position of the original data queue" herein is the position of the oldest message (i.e., the oldest data) in the original data queue, in fig. 3, the oldest data position is the data position at the leftmost side of the original data queue, and "downstream" refers to a position behind the "oldest message", such as the "newest data position" described below, i.e., the position at which the producer writes data into the original data queue, in fig. 3, the "newest data position" is the data position at the rightmost side of the original data queue, further, as the "oldest active consumer position", i.e., the position of the active consumer at the forefront of the original data queue, and in fig. 3, the "oldest active consumer position" is the position of consumer 1.
And step 54, copying the data in the original data queue to the second server according to a set sequence to form a migration data queue, wherein the set sequence is the sequence from the migration starting position of the selected migration strategy to the position of the data which is newly written into the original data queue by the producer in the migration process.
In the above solution, when a migration instruction for migrating an original data queue in a first server is received, the total number of times that a consumer reads the first data of the original data queue again after receiving the migration instruction may be predicted, and if the total number of times is less than or equal to a set number of times, that is, the number of times that the consumer reads the first data of the original data queue is small, a first migration policy may be selected to determine a migration start position. In addition, data written in an original data queue by a producer is firstly stored in a page cache, after a period of time, the data stored in the page cache can be stored in a magnetic disk, when the data queue is migrated, whether the data needing to be migrated exists in the page cache is checked firstly, if so, the data is migrated from the page cache, so that real magnetic disk reading cannot occur, and if the data needing to be migrated does not exist in the page cache, the data is migrated from the magnetic disk, so that real magnetic disk reading can occur. When the migration starting position is located at the downstream of the earliest data position of the original data queue, the migration quantity of the first data can be reduced, so that the proportion of data migration from the page cache is increased, the probability of reading a real disk can be reduced, the consumption of io resources of the disk for storing the data is reduced, the io resources of the disk can be mainly used for normal services, and the influence of copy migration on the normal services is reduced.
Specifically, the first migration policy may include two schemes, where a migration start position of the first scheme is a latest data position of the original data queue; the starting migration position of the second scheme is the read position of the oldest active consumer in the raw data queue. In addition, the second migration policy to be described below includes a scheme, i.e., a third scheme, in which the start migration bit is set to the earliest data position of the original data queue. Wherein the migration start positions of the first scheme and the second scheme are located downstream of the earliest data position of the original data queue (the migration start position of the third scheme).
First, a first scheme of the data migration method according to the embodiment of the present application is described in detail below. In a mainstream business scenario, the consumer's consuming task is typically located (i.e., read location) close to the latest message/data location written by the producer into the original data queue. At this time, as shown in fig. 5, step 53 selects the first migration policy to determine the migration start position, which may specifically include:
and 531, acquiring a read position and a latest data position of an earliest active consumer of the original data queue at the current time, wherein the current time is the time when the total number of times is determined to be less than the set number of times, and the latest data position is a position of data written into the original data queue by the producer at the current time.
Step 532, comparing whether the distance between the reading position of the earliest active consumer and the latest data position is less than or equal to the set distance;
in step 533, if the comparison result is yes, it is determined that the migration start position of the first migration policy is the latest data position, that is, the first scheme of the data migration method in the embodiment of the present application. That is, it is determined that a distance between a read position of an earliest active consumer in the original data queue at the current time, the total number of which is less than the set number, and a latest data position is less than or equal to the set distance, the earliest active consumer is a real-time consumer, and the earliest active consumer will closely follow the latest data, and since the earliest active consumer is a consumer farthest from the latest data position, when the earliest active consumer is a real-time consumer, it is determined that all other active consumers are real-time consumers, and data migration can be started from the latest data position of the original data queue.
Then, step 54 may be executed to copy the data in the original data queue to the second server in the set order to form a migration data queue.
Because the migration starting position is the latest data position, all or most of the migrated data is migrated from the page cache at this time, the situation of real disk reading can be avoided or greatly reduced, and the io resource consumption of the disk storing the data is reduced, i.e., the io resource of the disk can be mainly used for normal services, so that the influence of copy migration on the normal services is reduced, and the migration data amount is small, so that the speed of completing the copy migration can be increased.
Fig. 6 is a detailed flowchart of step 531 in fig. 5. As shown in fig. 6, the obtaining of the read position of the earliest active consumer of the original data queue in step 531 may specifically include:
step 5314, obtain a read position of each active consumer in all data queues read, wherein all data queues include an original data queue.
Step 5315, the mapping structure of each active consumer and the read position of each active consumer in all the read data queues is transferred to the mapping structure of each data queue and the read position of all the active consumers in each data queue, so as to obtain the read positions of all the active consumers of the original data queue.
Step 5316, compare the read positions of all active consumers in the raw data queue to obtain the read position of the oldest active consumer of the raw data queue.
Kafka only has an Application Program Interface (api) for a specified consumer group (consumer group) to query the read status of all subject data queues, i.e., only can obtain the read data queues of each consumer in the consumer group (as shown in FIG. 1). The scheme of the embodiment of the present application needs to know the read status of all downstream active consumers in each subject data queue in order to determine from which oil pointer/start position (offset) to migrate and when to complete the migration work. In order to solve the above problems, the embodiments of the present application adopt the following schemes;
first, information of the subject data queue read by the full/all active consumers is obtained, including the data queue and the read location. At this time, a mapping (map) structure is formed, in which the Key Value is an active consumer and the Value is a subject data queue, as shown in table 1 below. Then, the information is transferred into a mapping structure in which the subject data queue is a Key Value and all active consumers and their reading positions are Value values, as shown in table 2 below, so that the api adds a service for viewing the reading positions of all active consumers in the subject data queue. Wherein the api culls consumption by downstream inactive consumers. This is because many consumer activities are only temporarily consumed and then exited after completion, and therefore this need to be eliminated to ensure that the reading location of the active consumer (i.e., the active consumer) is always close to the new data.
Key Value
Active consumers 1 Data queue 1 position a and data queue 3 position b
Active consumers 2 Data queue 1 position c, data queue 2 position d
Active consumers 3 Data queue 2 position e, data queue 3 position f
TABLE 1
Key Value
Data queue 1 Active consumer 1 location a, active consumer 2 location c
Data queue 2 Active consumer 2 location d, active consumer 3 location e
Data queue 3 Active consumer 1 location b, active consumer 3 location f
TABLE 2
That is to say, in this implementation, all active consumers existing in the original data queue and the reading positions of these active consumers cannot be directly obtained, but the data queue read by each active consumer and the reading position in the data queue may be obtained, so that the mapping structure/correspondence between the reading positions of all active consumers in all data queues read by each active consumer and each active consumer may be transferred to the mapping structure/correspondence between the reading positions of all active consumers in each data queue and each data queue, that is, the active consumer existing in each data queue and the reading position thereof are obtained, and then the reading positions of all active consumers in the original data queue are compared, so that the reading position of the oldest active consumer in the original data queue may be obtained.
With continued reference to FIG. 6, when obtaining the read position of the oldest active consumer of the raw data queue, the following steps may also be performed before performing step 5314:
step 5311, obtain the read position of all consumers in the read data queue, wherein all consumers include inactive consumers and active consumers.
Step 5312, query the read status of all consumers to classify each consumer as an active consumer or an inactive consumer, wherein the read status of the active consumer is working and the read status of the inactive consumer is suspended.
Step 5313, removing the information of the inactive consumers and the read positions of the inactive consumers in the read data queue.
That is, in this implementation, it is inconvenient to directly acquire the read positions of all active consumers in all data queues read, and the read positions of all consumers in the read data queues may be acquired first, and all consumers include inactive consumers and active consumers. Then, the read status of all consumers is queried to classify each consumer as either an active consumer, whose read status is active, or an inactive consumer, whose read status is suspended. Then, the information of the reading positions of the inactive consumers and the inactive consumers in the read data queues is removed, and the reading position of each active consumer in all the read data queues can be obtained.
With continued reference to fig. 5, after determining that the starting migration position of the first migration policy is the latest data position of the original data queue in step 533 and migrating the original data queue according to the set sequence in step 54, the data migration method further includes:
step 55, acquiring the reading position of the earliest active consumer of the original data queue in the data migration process;
step 56, judging whether the migration data queue is copied to the data which is newly written into the original data queue by the producer;
step 57, if the determination result is yes, determining whether the read position of the earliest active consumer in the original data queue in the data migration process enters a range of second data in the original data queue, wherein the second data is copied data in the migration data queue;
and step 58, if the judgment result is yes, finishing the data migration. After the data migration is completed, new data is not written into the original data queue any more. When the migration data queue is used as a leader copy, a producer starts to write data into the migration data queue; when the migration data queue is used as a follower replica, the migration data queue continues to copy the data in the new Leader replica.
Since the migration starting position is determined to be the latest data position of the original data queue, all consumers in the original data queue are real-time consumers and move along with the latest data, and thus after the migration data queue is copied to the data which is newly written into the original data queue by a producer, the reading positions of all active consumers of the original data queue can be ensured to enter the range of the migration data queue as long as the reading position of the earliest active consumer which is farthest from the latest data position is confirmed to be positioned in the range of the migration data queue, and the data migration is completed.
In addition, it should be noted that the "earliest active consumer" at the time of determining whether the migration is completed at step 57 may be the same as or different from the "earliest active consumer" mentioned at the time of determining the migration start position of the first migration policy at step 532.
FIG. 7 is a process diagram of a first scenario of a first migration policy. Since the consumption progress of the downstream consumption task in the main usage scenario of the data channel generally follows the latest data to ensure the real-time property of the acquired data, and for a large amount of first data in the original copy, which has already been processed by downstream tasks, is not consumed from scratch without special requirements, in the first scenario, therefore, the start migration position is the latest data position of the original data queue, i.e., where the producer is currently writing data to the original data queue, as shown in the upper diagram "pre-migration state" of fig. 7, when migrating data, the first data before the most recent data location (i.e., the data represented by the white rectangle in FIG. 7) is not copied, and only the data that was most recently written to the original data queue (i.e., the data represented by the black rectangle in the original data queue in FIG. 7) is copied, thus ensuring that the migrated replica can immediately follow the most recent location of the original replica. At this time, the mark of copy migration completion is that the migration data queue copies the latest data of the original data queue, and the reading positions of all downstream active consumers all enter the range of the migration data queue, as shown in the lower diagram "migration completion" of fig. 7, the migration data queue copies the latest data of the original data queue, and the reading positions of consumer 1 and consumer 2 enter the range of the copied second data of the original data queue, so as to ensure that the migration copy can be effectively served after the original data queue (original copy) is taken out of service.
The migration process of the first scheme of the first migration policy is described above with reference to fig. 5 to 7, and the migration process of the second scheme of the first migration policy is described below. With continued reference to FIG. 5, at this point, the data migration method includes:
step 534, if the comparison result of step 532 is no, it is determined that the starting migration position of the first migration policy is the read position of the earliest active consumer in the original data queue, that is, the second scheme of the data migration method according to the embodiment of the present application.
That is, after receiving the migration instruction and selecting the first migration policy, at this time, the distance between the read position of the earliest active consumer of the original data queue and the latest data position is greater than the set distance, at least the earliest active consumer of all the active consumers is a non-real-time consumer of the original data queue, the stay time of the non-real-time consumer at the first data position of the original data queue is longer, if data is migrated according to the first scheme, the earliest active consumer may not enter the range of the migrated data queue soon, so that the start migration position of the first migration policy can be determined to be the read position of the earliest active consumer, which can ensure that the earliest active consumer can work normally all the time in the data migration process, and the first data between the earliest data position of the original data queue and the read position of the earliest active consumer need not to be copied, the speed of completing the copy migration can be improved, and the probability of reading the real disk can be reduced, so that the io resource consumption of the disk is reduced, and the influence of the copy migration on normal services is reduced.
Then, step 54 may be executed to copy the data in the original data queue to the second server in the set order to form a migration data queue. And, when it is determined that the start migration position of the first migration policy is the read position of the oldest active consumer, the data migration method may further include:
step 55', judging whether the migration data queue is copied to the data which is newly written into the original data queue by the producer;
and step 56', if the judgment result is yes, the data migration is finished.
That is, for some consumption tasks of the non-real-time data channel, the consumption progress may be stopped for a long time. If the first approach is taken it may happen that the consumer is unable to access the area of the original data queue where the second data has been copied for a long time. Therefore, a second scheme that the starting migration position is set as the reading position of the earliest active consumer is provided, so that when the migration data queue is copied to the data which is newly written into the original data queue by the producer in the data migration process, all active consumers can be ensured to be positioned in the range of the migration data queue, and the data migration can be completed.
Fig. 8 is a process diagram of a second approach to the first migration policy. In the second scenario, the start migration location is the read location of the oldest active consumer in the raw data queue, such as the read location of consumer 1 shown in the upper diagram "state before migration" of FIG. 8. The indication of the completion of the copy migration is that the migration copy follows the latest written data in the original copy, and as shown in the lower diagram "migration complete" of fig. 8, the migration data queue has copied the latest data at the rightmost side of the original data queue. As can be seen from fig. 8, the data actually migrated in the scheme only includes data that is not read by the earliest active consumer, such as consumer 1, in the original data (including data that is newly written during migration, that is, data represented by black rectangles in the original data queue), and compared with the existing migration scheme, on the premise of ensuring normal reading by consumer 1 and consumer 2, the amount of data to be copied can be greatly reduced.
FIG. 9 is a graph comparing the disk utilization when the optimized data migration method according to the embodiment of the present application is used with the original data migration method. As shown in fig. 9, in the first and second solutions optimized in the embodiment of the present application, only a small portion of actually migrated data is present, and because the data copied from the original copy is all written latest, the data has a high probability of still existing in a page Cache (page Cache) of the operating system, and the page Cache is a transparent Cache for a page originating from an auxiliary storage device (such as a hard disk drive or a solid state drive), so that reading the original copy generally does not trigger a real disk read operation, which not only improves the read performance, but also reduces the influence of a disk io storing the data on normal business. However, with the original scheme, the migration is started from the earliest data position, and the first data copied too early is hard to hit the page cache, so that the actual disk reading occurs, and the normal disk processing service is affected.
In addition, in the first and second solutions, after the data migration is completed, the migrated copy may bear the work of the original copy, and a timing for deleting the original data queue may be selected according to whether the data has a situation of reading the first data that is not migrated in the original data queue and a disk load situation of the original node, specifically, with reference to fig. 5, after the data migration is completed, the data migration method further includes:
step 59, determine whether there is a situation of reading the un-migrated first data of the original data queue.
And step 510, if the judgment result is yes, deleting the original data queue according to a set strategy. That is, the original data queue does not provide data writing service, and data is gradually reduced according to the set strategy until the data is completely deleted. The service can still be used if there is a scenario to read the first data.
And step 511, if the judgment result is negative, deleting the original data queue. This frees up space immediately, reducing cluster load.
That is, since the migration start position of the first migration policy is located downstream of the earliest data position of the original data queue, when the first migration policy is selected to determine the migration start position, data between the earliest data position in the original data queue and the migration start position is not copied into the migration data queue, and it is necessary to determine whether the first data that is not migrated in the original data queue is read by the consumer after the data migration is completed; if there is a situation that the first data of the original data queue that is not migrated is read after the migration is completed, that is, the number of times of reading the first data of the original data queue that is not migrated is not 0 but is less than or equal to a set number of times, for example, 1 or 2 times, the original data queue may be deleted according to a set policy, for example, the original data queue may be deleted after the consumer reads the first data that is not migrated, so as to ensure that normal work is not affected.
Two schemes of the first migration policy of the data migration method according to the embodiment of the present application are described above with reference to fig. 5 to 9, and a scheme of the second migration policy of the data migration method according to the embodiment of the present application is described below. Specifically, with continued reference to fig. 5, the data migration method further includes:
step 53', if the determination result in step 52 is negative, that is, the total number of times that the consumer reads the first data of the original data queue is greater than the set number of times, selecting a second migration policy to determine a migration start position, where the migration start position of the second migration policy is the earliest data position of the original data queue, that is, the third scheme of the data migration method according to the embodiment of the present application.
That is, after the first scheme and the second scheme of the first migration policy are selected to complete data migration, the original data queue may be deleted according to the set policy to ensure that the consumer can read the first data that is not migrated in the original data queue, but the consumption of switching the consumer from the migrated data queue in the second server to the original data queue in the first server is large, so when the consumer reads the first data of the original data queue many times, in order to reduce the consumption, the second migration policy may be selected to start migration from the earliest data position of the original data queue to ensure that the read service of the consumer can be performed normally. At this time, the data migration method may further include:
step 55 ", judging whether the migration data queue is copied to the data which is newly written into the original data queue by the producer;
and step 56', if the judgment result is yes, the data migration is finished.
That is, when the second migration policy is selected to start migration from the earliest data position of the original data queue, and the migration data queue copies data newly written into the original data queue by the producer, it can be guaranteed that the migration data queue copies all data of the original data queue, and data migration is completed.
Step 57 ", delete the original data queue after completing the data migration.
Because the migration start position of the second migration strategy is the earliest data position of the original data queue, after the migration data queue is copied to the data which is newly written into the original data queue by a producer and the data migration is completed, the migration data queue can copy all the data of the original data queue, and therefore the original data queue can be immediately deleted after the data migration is completed so as to release the memory.
According to the data migration method provided by the embodiment of the application, on the premise that the upstream and downstream data can be effectively transmitted and the problem of data loss is avoided, different migration schemes can be selected according to different consumption conditions of a consumer, so that the speed of completing copy migration is improved, the io resource consumption of a disk for storing data is reduced, and the influence of copy migration on normal services is further reduced.
In summary, the technical solution claimed in the embodiment of the present application is introduced by taking the most widely used message channel product kafka as an example, and the key technical points are as follows:
1. and determining the position of the original leader copy from which to start migration according to the condition that the consumer reads the data in the original data queue, namely selecting the position for starting migration. Specifically, the start migration location may be one of a latest data location of the raw data queue (first scheme), a read location of an earliest active consumer in the raw data queue (second scheme), and an earliest data location of the raw data queue (third scheme).
2. The read position of the earliest active consumer in the original data queue is obtained, see steps 5314-5316 or steps 5311-5316.
3. Criteria for completion of data migration
For the first scheme, firstly, it needs to determine whether the migration data queue is copied to the data that is newly written into the original data queue (i.e. the leader copy) by the producer, and when the determination result is yes, then, it needs to invoke the application program interface api described above to confirm that the read position of the earliest active consumer in the migration process enters the range of the second data of the original data queue that has been copied by the migration data queue, i.e. data migration is completed.
For the second scheme and the third scheme, the data migration is completed when the migration data queue is copied to the data which is newly written into the original data queue by the producer in the migration process.
4. Whether to delete the original data queue
Judging whether the situation of first data which is not migrated in an original data queue is read or not for two schemes of a first migration strategy, if so, deleting the original data queue according to a set strategy, namely, the deletion strategy of the original data queue after the migration is finished continues to use the deletion strategy of Kafka, such as only storing the latest data in a fixed time or a data file with a fixed size; if there is no read condition, the original data queue may be deleted immediately after the data migration is completed.
For the third scheme, since the migration data queue has copied all the data of the original data queue, the original data queue can be immediately deleted after the data migration is completed, so as to release the disk space as soon as possible.
Fig. 10 is a flowchart of another data migration method provided in an embodiment of the present application. The data migration method is used for migrating data in a data queue, wherein a producer is used for writing data into the data queue in sequence, and at least one consumer is used for reading data from the data queue in sequence, as shown in fig. 10, the data migration method includes:
1001, responding to a migration instruction for migrating data of an original data queue in a first server, and acquiring a reading position and a latest data position of an earliest active consumer of the original data queue at the current time, wherein the current time is the time when the migration instruction is received, and the latest data position is a position where a producer writes data in the original data queue at the current time;
step 1002, comparing whether the distance between the reading position of the earliest active consumer and the latest data position is less than or equal to a set distance;
step 1003, if the comparison result is yes, copying the data in the original data queue to a second server according to the sequence from the latest data position to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue, namely, setting the starting migration position as the latest data position of the original data queue; and/or the presence of a gas in the gas,
step 1004, if the comparison result is negative, copying the data in the original data queue to the second server according to the sequence from the reading position of the earliest active consumer to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue, that is, the starting migration position is the reading position of the earliest active consumer of the original data queue.
In the above scheme, at the current time when the migration instruction is received, if the distance between the reading position of the oldest active consumer in the original data queue and the latest data position is less than or equal to the set distance, the active consumer is the real-time consumer immediately following the latest data, and data migration can be started from the latest data position of the original data queue; if the distance between the read position of the earliest active consumer in the original data queue and the latest data position is greater than the set distance, the active consumer stays for a longer time at the first data position of the original data queue, and is a non-real-time consumer of the original data queue, data can be migrated from the read position of the earliest consumer in the original data queue, and the data of the original data queue in the first server is copied to the second server to form a migrated data queue according to the sequence from the migration start position to the position where the producer writes the data of the original data queue in the migration process, compared with the scheme of migrating the data from the earliest data position, because the data which is written into the original data queue by the producer is stored into the page buffer first and then stored into the disk from the page buffer after a certain time, the data migration method of the embodiment of the application can increase the migration rate of the data copied from the page buffer, the number of first data copied from the disk is reduced, the probability of actual disk reading is reduced, the influence of copy migration on normal service is reduced, and the speed of completing copy migration can be increased because all data in the original data queue does not need to be migrated.
In addition, when migrating data from the latest data position of the original data queue, the data migration method further includes:
step 1005, acquiring the reading position of the earliest active consumer of the original data queue in the data migration process;
step 1006, judging whether the migration data queue is copied to the data newly written into the original data queue by the producer;
step 1007, if yes, judging whether the read position of the earliest active consumer in the original data queue enters a second data range of the original data queue in the data migration process, wherein the second data is copied data of the migration data queue;
and step 1008, if the judgment result is yes, the data migration is completed.
Since the migration start position is the latest data position of the original data queue, all active consumers in the original data queue are real-time consumers, when the migration data queue is copied to the data written into the original data queue by the producer in the migration process, as long as the read position of the earliest active consumer farthest from the latest data position in the migration process enters the range of the second data copied by the migration data queue in the original data queue, the read positions of all active consumers in the original data queue can be ensured to enter the range of the migration data queue, and the data migration can be completed, and the specific migration process can be seen in fig. 7.
When migrating data starting at a read position of an oldest active consumer in the raw data queue, the data migration method further comprises:
step 1005', judging whether the migration data queue is copied to the data which is newly written into the original data queue by the producer;
in step 1006', if the determination result is yes, the data migration is completed.
Since the migration start position is the read position of the oldest active consumer, when the migration data queue is copied to the data newly written into the original data queue by the producer in the data migration process, all the active consumers are necessarily located within the range of the migration data queue, and the data migration can be completed, and the specific migration process can be referred to in fig. 8.
It should be noted that the process of obtaining the read position of the earliest active consumer in the original data queue in step 1001 and step 1005 can be performed according to steps 5314-5316 or steps 5311-5316 described above. In addition, as shown in fig. 10, after the data migration is completed at step 1008 or step 1006', the data migration method further includes:
step 1009 determines whether there is a situation of reading the first data of the original data queue that is not migrated, where the first data is the data written in the original data queue when the migration instruction is received.
And step 1010, if the judgment result is yes, deleting the original data queue according to a set strategy.
In step 1011, if the determination result is no, the original data queue is deleted.
That is, after the data migration is completed, it is necessary to determine whether the first data that is not migrated in the original data queue is read by the consumer, and if the first data that is not migrated in the original data queue is not read, the original data queue can be immediately deleted; if there is a read situation, the original data queue may be deleted according to the original set policy, for example, the original data queue may be deleted after the consumer reads the first data that has not been migrated, so as to ensure that normal operation is not affected.
According to the data migration method, limited data copying can be carried out according to user decision when the duplicate data are copied, migration efficiency is improved on the premise that data reliability is met, and migration influence is reduced. In addition, after the data migration is completed, two schemes for deleting the original data are also provided: 1. deleting the original data queue according to the original set strategy, thereby realizing the playback of the service on the data; 2. the original data queue is immediately deleted, which can free up disk capacity.
Fig. 11 is a schematic structural diagram of a data migration apparatus according to an embodiment of the present application. The data migration apparatus 1100 is used for migrating data in a data queue, wherein a producer is used for writing data into the data queue in sequence, and at least one consumer is used for reading data from the data queue in sequence. As shown in fig. 11, the data migration apparatus 1100 includes a prediction unit 1101, a selection unit 1102, and a migration unit 1103. The prediction unit 1101 is configured to predict, in response to a migration instruction for migrating data in an original data queue in a first server, a total number of times that a consumer reads first data in the original data queue, where the first data is data written in the original data queue when the migration instruction is received. The selecting unit 1102 is configured to select a first migration policy to determine a migration start position when the total number of times is less than or equal to a set number of times, where the migration start position of the first migration policy is located downstream of an earliest data position of the original data queue. The migration unit 1103 is configured to copy the data in the original data queue to the second server according to a set order to form a migration data queue, where the set order is an order from a migration start position of the selected migration policy to a position of data that is newly written into the original data queue by the producer during the migration process.
Fig. 12 is a schematic diagram of the structure of the selection unit in fig. 11. As shown in fig. 12, the selection unit 1102 includes an acquisition module 21 and a determination module 22. The obtaining module 21 is configured to obtain a read position and a latest data position of an earliest active consumer of the original data queue at a current time, where the "current time" is a time when the total number of times is determined to be less than a set number of times, and the latest data position is a position of data written into the original data queue by the producer at the current time. The determining module 22 is configured to determine the starting migration position of the first migration policy to be the latest data position of the original data queue when the distance between the read position of the oldest active consumer in the original data queue and the latest data position is less than or equal to the set distance.
When the determining module 22 determines that the start migration position of the first migration policy is the latest data position of the original data queue, the obtaining module 21 is further configured to obtain, during data migration, a read position of the earliest active consumer of the original data queue, as shown in fig. 11, the data migration apparatus 1100 further includes a first determining unit 1104, configured to, when the migration data queue copies data that is newly written into the original data queue by the producer, determine that the read position of the earliest active consumer of the original data queue enters a second data range of the original data queue during data migration, and complete data migration, where the second data is data that has been copied by the migration data queue.
Further, the determining module 22 may be further configured to determine that the starting migration position of the first migration policy is the read position of the earliest active consumer when the distance between the read position of the earliest active consumer and the latest data position is greater than the set distance. When the determining module 22 determines that the starting migration position of the first migration policy is the read position of the earliest active consumer, as shown in fig. 11, the data migration apparatus 1100 may further include a second determining unit 1105 configured to determine that the migrated data queue is copied to the data that is newly written into the original data queue by the producer, and complete the data migration.
Fig. 13 is a schematic structural diagram of the acquisition module in fig. 12. As shown in fig. 13, the obtaining module 21 may include a first obtaining submodule 211, a dump submodule 212, and a comparison obtaining submodule 213. The first obtaining sub-module 211 is used to obtain the read position of each active consumer in all data queues for reading, including the original data queue. The unloading submodule 212 is configured to unload the mapping structure of each active consumer and the read position of each active consumer in all read data queues into the mapping structure of each data queue and the read position of all active consumers in each data queue, so as to obtain the read positions of all active consumers of the original data queue. The comparison obtaining sub-module 213 is configured to compare the read positions of all active consumers in the raw data queue to obtain the read position of the oldest active consumer of the raw data queue.
Further, the obtaining module 21 may further include a second obtaining sub-module 214, a querying sub-module 215, and a removing sub-module 216, where the second obtaining sub-module 214 is configured to obtain reading positions of all consumers in the read data queue, where all consumers include inactive consumers and active consumers. The query submodule 215 is used for querying the read status of all consumers to classify each consumer as an active consumer or an inactive consumer, wherein the read status of the active consumer is working and the read status of the inactive consumer is suspended. The removal submodule 216 is configured to remove the information of the inactive consumers and the read positions of the inactive consumers in the read data queue, so as to obtain the read position of the oldest active consumer in the original data queue.
With continued reference to fig. 11, when the selection unit 1103 selects the first migration policy to determine the migration start position, the data migration apparatus 1100 further includes a first deletion unit 1106, configured to, after completing the data migration, determine that there is a situation where the consumer reads the first data of the original data queue that is not migrated, and delete the original data queue according to the set policy; deleting the primary data queue to determine if the consumer did not read the first data of the primary data queue that was not migrated.
In addition, the selecting unit 1102 is further configured to select a second migration policy to determine a migration start position when the total number of times is greater than the set number of times, where the migration start position of the second migration policy is the earliest data position of the original data queue. When the selection unit 1102 selects the second migration policy to determine the migration start position, as shown in fig. 11, the data migration apparatus 1100 may further include a third determination unit 1107 configured to determine that the migration data queue is copied to the data that is newly written in the original data queue by the producer, and complete the data migration. Further, the data migration apparatus 1100 may further include a second deleting unit 1108 configured to delete the original data queue after the data migration is completed.
According to the data migration device in the embodiment of the application, on the premise that effective transmission of upstream and downstream data is ensured and no data loss problem occurs, migration schemes can be selected for different consumption situations of a consumer, specifically, when the total number of times that the consumer reads first data of an original data queue is small, namely the total number of times is less than or equal to a set number of times, a first migration policy can be selected to determine a migration starting position, the migration starting position of the first migration policy is located downstream of the earliest data position of the original data queue, and when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is less than or equal to the set distance, the migration starting position of the first migration policy is determined to be the latest data position of the original data queue, namely the first scheme; and when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is greater than the set distance, determining that the starting migration position of the first migration strategy is the reading position of the earliest active consumer in the original data queue, namely the second scheme. In the first scheme and the second scheme, all data of the original data queue does not need to be copied, that is, the copy number of the first data is reduced, so that the speed of completing the copy migration is increased, the io resource consumption of a disk for storing the data is reduced, and the influence of the copy migration on normal services is reduced. In addition, after the data migration is successful, two original data elimination modes are provided, namely, the original data queue is deleted according to a set strategy and immediately deleted, the original data queue is selected to be deleted according to the set strategy, the data can be played back, namely, a consumer can read the first data again and immediately delete the original data, the disk capacity can be released as soon as possible, and therefore the service can conveniently make a selection on the data playability and the disk capacity. And when the total times of reading the first data of the original data queue by the consumer after the data migration is finished is more, namely the total times is more than the set times, the second migration strategy can be selected to determine the migration starting position, and the migration starting position of the second migration strategy is the earliest data position of the original data queue, namely a third scheme.
Fig. 14 is a schematic structural diagram of another data migration apparatus according to an embodiment of the present application. The data migration device is used for migrating data in a data queue, wherein a producer is used for writing data into the data queue in sequence, and at least one consumer is used for reading data from the data queue in sequence. As shown in fig. 14, the data migration apparatus 1400 includes an acquisition module 1401 and a migration module 1402. The obtaining module 1401 is configured to, in response to a migration instruction for migrating an original data queue in a first server, obtain a read position and a latest data position of an earliest active consumer in the original data queue at a current time, where the "current time" is a time when the migration instruction is received, and the latest data position is a position where a producer writes data in the original data queue at the current time. A migration module 1402, configured to, when a distance between a read position of an earliest active consumer in an original data queue and a latest data position is less than or equal to a set distance, copy data in the original data queue to a second server according to an order from the latest data position to a position of data that is written into the original data queue by a producer in the migration process most recently to form a migration data queue, where a migration start position is a latest data position; and/or when the distance between the reading position of the earliest active consumer in the original data queue and the latest data position is larger than the set distance, copying the data in the original data queue to the second server according to the sequence from the reading position of the earliest active consumer to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue, namely the migration starting position is the reading position of the earliest active consumer at the moment. The specific structure of the obtaining module 1401 may be the same as the specific structure of the obtaining module 21 mentioned above.
The obtaining module 1401 is further configured to obtain a read position of an earliest active consumer of the primary data queue during data migration when migrating data from a latest data position of the primary data queue. The data migration apparatus 1400 further includes a first determining module 1403, configured to determine, when the migration data queue is copied to the data that is newly written into the original data queue by the producer, that a read position of an earliest active consumer of the original data queue during the data migration enters a second data range of the original data queue, and complete the data migration, where the second data is the data that has been copied by the migration data queue.
When migrating data from the read location of the oldest consumer in the original data queue, the data migration apparatus 1400 further includes a second determination module 1404 for determining that the migrated data queue copies to data that was last written to the original data queue by the producer, completing the data migration.
Further, after completing the data migration, the data migration apparatus 1400 may further include a deleting module 1405, configured to determine that a consumer has a situation of reading first data that is not migrated in the original data queue, and delete the original data queue according to a set policy, where the first data is data that has been written in the original data queue when the migration instruction is received; in the event that the consumer is determined not to read the first data of the primary data queue that was not migrated, the primary data queue is immediately deleted.
According to the scheme of the embodiment of the application, the migration starting position and the migration ending condition are determined based on the downstream data reading state, the downstream task data consumption is guaranteed not to be lost, and the data copying work of the message channel can be completed quickly. For example, for a typical 200G copy, the time of the original migration scheme needs 2-3 hours, and migration can be completed within ten minutes by using the scheme of the embodiment of the application, thereby greatly improving the migration efficiency. Meanwhile, a large number of disk reads and writes are reduced in the data copying process, and the impact on normal services is reduced. In addition, two deletion schemes of the original data copy are determined according to business requirements, and the original copy enables data to be replayed and consumed according to the scheme of deleting the original copy according to a set strategy; and the scheme of immediately deleting the original copy can realize space optimization.
In addition, since information related to producers, consumers, data queues, and the like is shared among a plurality of servers in a system such as a Kafka cluster, the above-described data migration method may be performed by a first server and also by a second server. Fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 15, the server 1500 includes a transceiver 1501, a memory 1502, and a processor 1503. The transceiver 1501 is used for receiving and transmitting data. The memory 1502 stores computer programs. The processor 1503 is configured to execute the computer program stored in the memory 1502 to enable the server 1500 to implement the data migration method according to the embodiment of the present application. The server is the first server or the second server mentioned in the data migration method.
Fig. 16 is a schematic structural diagram of a network system according to an embodiment of the present application. As shown in fig. 16, the network system includes at least two servers, i.e., server a and server B, the original data queue is located in server a, and the migrated data queue is located in server B. The server a or the server B can execute the data migration method according to the embodiment of the present application.
It will be appreciated that the steps of the above-described method embodiments may be performed by logic circuits in the form of hardware or instructions in the form of software in a processor. The processor may be a CPU, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic device, such as a discrete gate, a transistor logic device, or a discrete hardware component.
It is understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. The general purpose processor may be a microprocessor, but may be any conventional processor.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of the present application.

Claims (22)

1. A data migration method, characterized in that the data migration method comprises:
predicting the total times of a consumer reading first data of an original data queue in response to a migration instruction for migrating the data of the original data queue in a first server, wherein the first data is data written in the original data queue when the migration instruction is received;
when the total times is less than or equal to the set times, selecting a first migration strategy to determine a migration starting position, wherein the migration starting position of the first migration strategy is located downstream of the earliest data position of the original data queue;
and copying the data in the original data queue to a second server according to a set sequence to form a migration data queue, wherein the set sequence is the sequence from the migration starting position of the selected migration strategy to the position of the data which is newly written into the original data queue by the producer in the migration process.
2. The data migration method according to claim 1, wherein the selecting the first migration policy to determine the migration start position specifically includes:
acquiring a reading position and a latest data position of an earliest active consumer of the original data queue at the current moment, wherein the current moment is the moment when the total times are determined to be less than the set times, and the latest data position is a position of data written into the original data queue by the producer at the current moment;
determining a starting migration position of the first migration policy to be the latest data position when a distance between a read position of the oldest active consumer and the latest data position is less than or equal to a set distance.
3. The data migration method according to claim 2, wherein when the start migration position is set to the latest data position, the data migration method further comprises:
acquiring the read position of the earliest active consumer of the original data queue in the data migration process;
when the migration data queue is copied to the data which is written into the original data queue by the producer latest, determining that the reading position of the earliest active consumer of the original data queue enters the range of second data of the original data queue in the data migration process, and completing data migration, wherein the second data is the data copied by the migration data queue.
4. The data migration method according to claim 2, further comprising:
when the distance between the reading position of the earliest active consumer and the latest data position is larger than a set distance, determining that the starting migration position of the first migration strategy is the reading position of the earliest active consumer.
5. The data migration method of claim 4, wherein when the start migration location is a read location of the earliest active consumer, the data migration method further comprises:
and determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
6. The data migration method according to any one of claims 2 to 5, wherein the obtaining of the read position of the oldest active consumer of the primary data queue specifically includes:
acquiring a reading position of each active consumer in all data queues for reading, wherein all the data queues comprise the original data queue;
unloading the mapping structure of each active consumer and the read position of each active consumer in all the read data queues of each active consumer into the mapping structure of each data queue and the read position of all the active consumers in each data queue, thereby obtaining the read positions of all the active consumers of the original data queue;
comparing the read positions of all active consumers in the raw data queue to obtain a read position of an earliest active consumer of the raw data queue.
7. The data migration method of claim 6, wherein said obtaining a read position of an earliest active consumer of the original data queues before said obtaining a read position of each active consumer in all data queues read, further comprises:
obtaining read positions of all consumers in the read data queue, wherein all consumers comprise inactive consumers and active consumers;
querying the read status of all consumers to classify each of the all consumers as the active consumer or the inactive consumer, wherein the read status of the active consumer is working and the read status of the inactive consumer is suspended;
removing the inactive consumer and information of the read position of the inactive consumer in the read data queue.
8. The data migration method according to any one of claims 1 to 5, wherein after completion of data migration, the data migration method further comprises:
determining that the consumer has a situation of reading the first data of the original data queue which is not migrated, and deleting the original data queue according to a set strategy; or the like, or, alternatively,
determining that the consumer did not read the first data of the primary data queue that was not migrated, and deleting the primary data queue.
9. The data migration method according to claim 1, wherein the data migration method comprises:
and when the total times is greater than the set times, selecting a second migration strategy to determine a migration starting position, wherein the migration starting position of the second migration strategy is the earliest data position of the original data queue.
10. The data migration method according to claim 9, further comprising:
and determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
11. The data migration method according to claim 9 or 10, wherein the data migration method further comprises:
and deleting the original data queue after the data migration is completed.
12. A data migration method, characterized in that the data migration method comprises:
responding to a migration instruction for migrating data of an original data queue in a first server, and acquiring a reading position and a latest data position of an earliest active consumer of the original data queue at the current time, wherein the current time is the time when the migration instruction is received, and the latest data position is a position where a producer writes data of the original data queue at the current time;
when the distance between the reading position of the earliest active consumer and the latest data position is smaller than or equal to a set distance, copying the data in the original data queue to a second server according to the sequence from the latest data position to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue; and/or the presence of a gas in the gas,
when the distance between the reading position of the earliest active consumer and the latest data position is larger than a set distance, copying the data in the original data queue to a second server according to the sequence from the reading position of the earliest active consumer to the position of the data which is written into the original data queue by the producer in the migration process to form a migration data queue.
13. The data migration method according to claim 12, wherein when migrating data from the latest data location, the data migration method comprises:
acquiring the read position of the earliest active consumer of the original data queue in the data migration process;
and when the migration data queue is copied to the data which is written into the original data queue by the producer latest, determining that the read position of the earliest active consumer of the original data queue enters a second data range of the original data queue in the data migration process, and completing data migration, wherein the second data is the data copied by the migration data queue.
14. The data migration method of claim 12, wherein when migrating data starting from the read location of the earliest active consumer, the data migration method further comprises:
and determining that the data which is newly written into the original data queue by the producer is copied to the migration data queue, and finishing data migration.
15. The data migration method according to any one of claims 12 to 14, wherein the obtaining a read position of an earliest active consumer in the primary data queue specifically includes:
acquiring a reading position of each active consumer in all data queues for reading, wherein all the data queues comprise the original data queue;
unloading the mapping structure of each active consumer and the read position of each active consumer in all the read data queues of each active consumer into the mapping structure of each data queue and the read position of all the active consumers in each data queue, thereby obtaining the read positions of all the active consumers of the original data queue;
comparing the read positions of all active consumers in the raw data queue to obtain a read position of an earliest active consumer of the raw data queue.
16. The method of data migration according to claim 15, wherein said obtaining a read position of an earliest active consumer of said original data queues before said obtaining a read position of each active consumer in all data queues read further comprises:
acquiring a read position of all consumers in the read data queue, wherein all consumers comprise inactive consumers and active consumers;
querying the read status of all consumers to classify each of the all consumers as the active consumer or the inactive consumer, wherein the read status of the active consumer is working and the read status of the inactive consumer is suspended;
removing the inactive consumer and information of the read position of the inactive consumer in the read data queue.
17. The data migration method according to any one of claims 12 to 14, wherein after completion of data migration, the data migration method further comprises:
determining that the consumer has a situation of reading first data which is not migrated in the original data queue, and deleting the original data queue according to a set strategy, wherein the first data is data written in the original data queue when the migration instruction is received; or the like, or, alternatively,
determining that the consumer does not read the first data of the original data queue that has not been migrated, and deleting the original data queue.
18. A data migration apparatus, characterized in that the data migration apparatus comprises:
the prediction unit is used for responding to a migration instruction for migrating data of an original data queue in a first server and predicting the total times of a consumer reading the first data of the original data queue, wherein the first data is the data written in the original data queue when the migration instruction is received;
a selecting unit, configured to select a first migration policy to determine a migration start position when the total number of times is less than or equal to a set number of times, where the migration start position of the first migration policy is located downstream of an earliest data position of the original data queue;
and the migration unit is used for copying the data in the original data queue to a second server according to a set sequence to form a migration data queue, wherein the set sequence is the sequence from the migration starting position of the selected migration strategy to the position of the data which is newly written into the original data queue by the producer in the migration process.
19. A data migration apparatus, characterized in that the data migration apparatus comprises:
an obtaining module, configured to respond to a migration instruction for migrating data of an original data queue in a first server, and obtain a read position and a latest data position of an earliest active consumer in the original data queue at a current time, where the current time is a time when the migration instruction is received, and the latest data position is a position where a producer writes data in the original data queue at the current time;
a migration module, configured to copy, when a distance between a read position of the oldest active consumer and the latest data position is less than or equal to a set distance, data in the original data queue to a second server in an order from the latest data position to a position of data that is written into the original data queue by a producer during a migration process, so as to form a migration data queue; and/or when the distance between the reading position of the earliest active consumer and the latest data position is larger than a set distance, copying the data in the original data queue to a second server to form a migration data queue according to the sequence from the reading position of the earliest active consumer to the position of the data which is newly written into the original data queue by a producer in the migration process.
20. A server, comprising:
a transceiver for receiving and transmitting data;
a memory storing a computer program;
a processor for executing the computer program stored in the memory to cause the server to implement the data migration method of any one of claims 1-17, wherein the server is the first server or the second server.
21. A network system comprising a first server and a second server, wherein the first server or the second server is capable of performing the data migration method of any one of claims 1-17.
22. A computer storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data migration method according to any one of claims 1 to 17.
CN202011219098.1A 2020-11-04 2020-11-04 Data migration method and device, server and network system Pending CN114442907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011219098.1A CN114442907A (en) 2020-11-04 2020-11-04 Data migration method and device, server and network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011219098.1A CN114442907A (en) 2020-11-04 2020-11-04 Data migration method and device, server and network system

Publications (1)

Publication Number Publication Date
CN114442907A true CN114442907A (en) 2022-05-06

Family

ID=81361060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011219098.1A Pending CN114442907A (en) 2020-11-04 2020-11-04 Data migration method and device, server and network system

Country Status (1)

Country Link
CN (1) CN114442907A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060059228A1 (en) * 2004-08-12 2006-03-16 Oracle International Corporation Capturing and re-creating the state of a queue when migrating a session
CN103227747A (en) * 2012-03-14 2013-07-31 微软公司 High density hosting for messaging service
CN103425529A (en) * 2012-05-17 2013-12-04 国际商业机器公司 System and method for migrating virtual machines between networked computing environments based on resource utilization
CN109144972A (en) * 2017-06-26 2019-01-04 华为技术有限公司 A kind of method and back end of Data Migration
CN109271098A (en) * 2018-07-18 2019-01-25 成都华为技术有限公司 A kind of data migration method and device
CN109842636A (en) * 2017-11-24 2019-06-04 阿里巴巴集团控股有限公司 Cloud service moving method, device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060059228A1 (en) * 2004-08-12 2006-03-16 Oracle International Corporation Capturing and re-creating the state of a queue when migrating a session
CN103227747A (en) * 2012-03-14 2013-07-31 微软公司 High density hosting for messaging service
CN103425529A (en) * 2012-05-17 2013-12-04 国际商业机器公司 System and method for migrating virtual machines between networked computing environments based on resource utilization
CN109144972A (en) * 2017-06-26 2019-01-04 华为技术有限公司 A kind of method and back end of Data Migration
CN109842636A (en) * 2017-11-24 2019-06-04 阿里巴巴集团控股有限公司 Cloud service moving method, device and electronic equipment
CN109271098A (en) * 2018-07-18 2019-01-25 成都华为技术有限公司 A kind of data migration method and device

Similar Documents

Publication Publication Date Title
US10599637B2 (en) Granular buffering of metadata changes for journaling file systems
US9069484B2 (en) Buffer pool extension for database server
US8307170B2 (en) Information processing method and system
US20090307329A1 (en) Adaptive file placement in a distributed file system
CN111309732B (en) Data processing method, device, medium and computing equipment
US9792231B1 (en) Computer system for managing I/O metric information by identifying one or more outliers and comparing set of aggregated I/O metrics
EP3252609A1 (en) Cache data determination method and device
CN110555001B (en) Data processing method, device, terminal and medium
US8751446B2 (en) Transference control method, transference control apparatus and recording medium of transference control program
CN110688382A (en) Data storage query method and device, computer equipment and storage medium
CN113377868A (en) Offline storage system based on distributed KV database
CN113031864B (en) Data processing method and device, electronic equipment and storage medium
CN110413689B (en) Multi-node data synchronization method and device for memory database
CN112711564B (en) Merging processing method and related equipment
CN113672169A (en) Data reading and writing method of stream processing system and stream processing system
US20210263668A1 (en) Information processing device and computer-readable recording medium recording storage control program
CN111459402B (en) Magnetic disk controllable buffer writing method, controller, hybrid IO scheduling method and scheduler
JP4189342B2 (en) Storage apparatus, storage controller, and write-back cache control method
CN114442907A (en) Data migration method and device, server and network system
CN113835613B (en) File reading method and device, electronic equipment and storage medium
CN111352590B (en) File storage method and device
JP7073737B2 (en) Communication log recording device, communication log recording method, and communication log recording program
CN117290075B (en) Process migration method, system, device, communication equipment and storage medium
WO2021063242A1 (en) Metadata transmission method of storage system, and storage system
CN117370227A (en) Memory page determining method and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination