CN108932104B - Data processing method and device and processing server - Google Patents

Data processing method and device and processing server Download PDF

Info

Publication number
CN108932104B
CN108932104B CN201710378791.5A CN201710378791A CN108932104B CN 108932104 B CN108932104 B CN 108932104B CN 201710378791 A CN201710378791 A CN 201710378791A CN 108932104 B CN108932104 B CN 108932104B
Authority
CN
China
Prior art keywords
data
fragment
data fragment
target
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710378791.5A
Other languages
Chinese (zh)
Other versions
CN108932104A (en
Inventor
徐晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710378791.5A priority Critical patent/CN108932104B/en
Publication of CN108932104A publication Critical patent/CN108932104A/en
Application granted granted Critical
Publication of CN108932104B publication Critical patent/CN108932104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method, a data processing device and a data processing server, wherein the method comprises the following steps: detecting the data volume of each data fragment set; determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers; determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set; and moving the target data fragment from the first data fragment set to the second data fragment set. The embodiment of the invention can improve the availability of the storage server in the distributed storage system and provides possibility for reducing the maintenance burden of the distributed storage system.

Description

Data processing method and device and processing server
Technical Field
The invention relates to the technical field of data storage, in particular to a data processing method, a data processing device and a data processing server.
Background
Under the high concurrency situation of mass data of the internet, compared with a mode of adopting a single storage server to store data in a centralized mode, the distributed storage system has obvious advantages in reliability, usability and access efficiency. The distributed storage system is a system architecture in which data is stored in a distributed manner on a plurality of storage servers, and the storage pressure is shared by the plurality of storage servers.
When the distributed storage system stores data, a Shard (Shard) technology is generally used, specifically, data is sharded (i.e. data is subdivided) to obtain a plurality of data shards, and the plurality of data shards are stored in a scattered manner; for example, 10000 pieces of data need to be stored in the distributed storage system, the 10000 pieces of data may be fragmented, each 100 pieces of data are taken as one data fragment, 100 data fragments are obtained, and the 100 data fragments are dispersedly stored on a plurality of storage servers, so that the distributed storage system stores the 10000 pieces of data.
However, the inventor of the present invention finds that, in the process of implementing data storage by using the fragmentation technology in the distributed storage system, the following problems exist: when a certain storage server of the distributed storage system has a large amount of data fragments written, the data fragments of the storage server are fully written and cannot be continuously written into the data fragments in the follow-up process, so that the storage server cannot be continuously used in the follow-up process, the availability of the storage server is reduced, and the maintenance burden of the distributed storage system is increased; therefore, how to improve the availability of the storage servers in the distributed storage system provides possibility for reducing the maintenance burden of the distributed storage system, which becomes a problem to be considered by those skilled in the art.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method, an apparatus, and a processing server, so as to improve the availability of a storage server in a distributed storage system, and provide a possibility for reducing the maintenance burden of the distributed storage system.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a method of data processing, comprising:
detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set;
moving the target data fragment from a first data fragment set to a second data fragment set; and the data volume of the second data fragment set transferred into the target data fragment is not more than the data volume of the first data fragment set when the target data fragment is not transferred.
An embodiment of the present invention further provides a data processing apparatus, including:
the data volume detection module is used for detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
the fragment set determining module is used for determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
the target fragment determining module is used for determining a target data fragment to be moved in a first data fragment set, and the data volume of the target data fragment is smaller than that of the first data fragment set;
the data moving module is used for moving the target data fragment from a first data fragment set to a second data fragment set; and the data volume of the second data fragment set transferred into the target data fragment is not more than the data volume of the first data fragment set when the target data fragment is not transferred.
An embodiment of the present invention further provides a processing server, including: at least one memory and at least one processor;
the memory stores programs, and the processor calls the programs stored in the memory;
the program is for:
detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set;
moving the target data fragment from a first data fragment set to a second data fragment set; and the data volume of the second data fragment set transferred into the target data fragment is not more than the data volume of the first data fragment set when the target data fragment is not transferred.
Based on the above technical solution, in the data processing method provided in the embodiments of the present invention, data fragments may be stored in a form of data fragment sets, and one data fragment set corresponds to a plurality of data fragments, one storage server corresponds to at least one data fragment set, and the data fragment sets corresponding to different storage servers are different; therefore, by detecting the data volume of each data sharding set, a first data sharding set and a second data sharding set can be determined from each data sharding set, the data volume of the first data sharding set is larger than that of the second data sharding set, and the first data sharding set and the second data sharding set correspond to different storage servers; further determining a target data fragment to be moved from a first data fragment set, moving the target data fragment from the first data fragment set to a second data fragment set, so that part of data fragments of a data fragment set with large data volume in different storage servers can be moved to a data fragment set with small data volume, and the data volume of the second data fragment set moved into the target data fragment is not more than that of the first data fragment set when the target data fragment is not moved; therefore, the storage balance probability of the data fragments among different storage servers in the distributed storage system is improved, the possibility that the storage servers with the data fragments fully written are generated is reduced, the availability of the storage servers in the distributed storage system is improved, and the possibility of reducing the maintenance burden of the distributed storage system is provided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a block diagram of a distributed storage system;
FIG. 2 is another block diagram of a distributed storage system;
FIG. 3 is a block diagram of a hardware configuration of a processing server;
FIG. 4 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a relationship between a data shard collection and a storage server;
FIG. 6 is a schematic diagram of another relationship between a data shard collection and a storage server;
FIG. 7 is a flowchart of a method for migrating a target data fragment to a second set of data fragments;
FIG. 8 is another flow chart of a data processing method according to an embodiment of the present invention;
FIG. 9 is a further flowchart of a data processing method according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a selection of a second set of data slices;
FIG. 11 is another alternative diagram of a second set of data slices;
FIG. 12 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 13 is a diagram illustrating an exemplary application provided by an embodiment of the present invention;
FIG. 14 is a flowchart of a method for accessing data in a distributed storage system according to an embodiment of the present invention;
fig. 15 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 16 is another block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 17 is a block diagram of another structure of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
The inventor of the invention finds that one of the reasons for causing that a certain storage server of the distributed storage system is full of data fragments is that the data fragments are stored unevenly in the distributed storage system; the capacity of the data fragments stored by different storage servers in the distributed storage system is not balanced, the capacity of the data fragments stored by some storage servers reaches the capacity limit, and the capacity of the data fragments stored by some storage servers is smaller; therefore, the inventor of the present invention considers that the availability of the storage servers in the distributed storage system is improved by improving the storage balance probability of the data fragments in the distributed storage system, and the possibility that the storage servers with full data fragments exist in the distributed storage system is reduced, so as to provide a possibility for reducing the maintenance burden of the distributed storage system.
Based on this, the scheme provided by the embodiment of the present invention will be described in terms of how to improve the storage balancing probability of the data slice in the distributed storage system. The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The data processing method provided by the embodiment of the invention can improve the storage equilibrium probability of the data fragments in the distributed storage system, and can be applied to a processing server;
optionally, the processing server may be a server of a master control storage server in a distributed storage system, as an optional structure of the distributed storage system shown in fig. 1, the processing server may be a master control node of the distributed storage system, and may support a cluster of a plurality of storage servers;
optionally, the processing server may also be a primary storage server in a plurality of storage servers of the distributed storage system, and the primary storage server may interact with other storage servers, as an alternative structure of the distributed storage system shown in fig. 2, and the processing server may be the primary storage server and may interact with other storage servers in the distributed storage system;
optionally, the processing server may also be a service device specially configured in the distributed storage system and configured to implement the data processing method provided in the embodiment of the present invention, and the processing server may interact with the storage server in the distributed storage system by convention;
obviously, the form of the processing server for implementing the data processing method provided by the embodiment of the present invention is optional, and the specific form of the processing server may be set according to actual needs, and the processing server is generally set as a part of the distributed storage system and can interact with the storage server in the distributed storage system;
further, the processing server may also have proxy capability for data access (e.g., data read and write in a distributed storage system).
The processing server may load a corresponding program to implement the data processing method provided by the embodiment of the present invention, where the program may be stored in a memory of the processing server and invoked by a processor of the processing server to implement the data processing method. Fig. 3 shows an alternative hardware structure of a processing server, which, referring to fig. 3, may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the present invention, the number of the processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processor 1, the communication interface 2, and the memory 3 complete mutual communication through the communication bus 4; it is clear that the communication connection illustration of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 shown in fig. 3 is only optional;
optionally, the communication interface 2 may be an interface of a communication module, such as an interface of a GSM module;
the processor 1 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention.
The memory 3 may comprise a high-speed RAM memory and may also comprise a non-volatile memory, such as at least one disk memory.
The memory 3 stores a program, and the processor 1 calls the program stored in the memory 3 to implement the data processing method provided by the embodiment of the invention.
In terms of a processing server, fig. 4 shows a flowchart of a data processing method provided by an embodiment of the present invention, where the method is applicable to the processing server, and is implemented by a processing server calling a corresponding program, and referring to fig. 4, the data processing method may include:
s100, detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different.
Optionally, the data fragment set is a set of multiple data fragments defined in the embodiment of the present invention, and one data fragment set corresponds to multiple data fragments; a storage server in the distributed storage system corresponds to at least one data fragment set, and the data fragment sets corresponding to different storage servers are different; in the embodiment of the present invention, a data slice set may be defined as a set;
optionally, one way that the storage server corresponds to the data shard set may be that one storage server in the distributed storage system corresponds to one data shard set (for example, storage spaces such as a disk of one storage server are used as storage spaces of one data shard set), and the data shard sets corresponding to different storage servers are different; as shown in fig. 5, a storage server corresponds to a data fragment set, and different data fragment sets store corresponding data fragments;
optionally, another way may be to divide a storage space such as a hard disk of a storage server into a plurality of storage areas, where one storage area of a storage server corresponds to one data fragment set, and the data fragment sets corresponding to different storage servers are different; as shown in fig. 6, the storage space of each storage server may be divided into a plurality of storage areas, where one storage area of a storage server corresponds to one data fragment set, so that one storage server may correspond to a plurality of data fragment sets, and different data fragment sets store corresponding data fragments;
optionally, the embodiment of the present invention does not exclude a storage server corresponding to one data shard set, and in combination, a storage server corresponds to a plurality of data shard sets; for example, in a distributed storage system, there may be a case where one storage server corresponds to one data shard set, and one storage server corresponds to multiple data shard sets;
correspondingly, when the data is stored in the distributed storage system through the fragmentation technology, the data fragments can be stored in the data fragment set corresponding to the storage server after being distributed to the storage server; optionally, under the condition that the storage server corresponds to multiple data fragment sets, the data fragments allocated to the storage server for storage may be randomly stored in the multiple data fragment sets corresponding to the allocated storage server, and the data fragments may also be the data fragment set with the smallest data size in the allocated storage server;
optionally, the number of data fragments corresponding to a data fragment set is variable, for example, as data storage increases, the number of data fragments stored by a storage server also increases, and correspondingly, the number of data fragments corresponding to a data fragment set corresponding to the storage server also increases as data fragment storage increases.
After the data fragments are stored in the data fragment set mode, the embodiment of the invention can detect the data volume of each data fragment set in a timing or real-time mode; optionally, one mode may be that the disk usage amount of the storage server corresponding to each data shard set is regularly queried, and the disk usage amount of the storage server corresponding to one data shard set is used as the data amount of the data shard set;
for example, under the condition that one storage server corresponds to one data fragment set, the embodiment of the invention can query the disk usage amount of each storage server at regular time to determine the data amount of each data fragment set; for example, in the case that one storage server corresponds to a plurality of data shard sets, the embodiment of the present invention may periodically query the disk usage amount of the storage area of the storage server corresponding to each data shard set, so as to implement the detection of the data amount of each data shard set.
Step S110, determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data sharding set is larger than that of the second data sharding set, and the first data sharding set and the second data sharding set correspond to different storage servers.
In the data processing method provided by the embodiment of the invention, the storage equilibrium probability of the data fragments in the distributed storage system is improved mainly by moving part of the data fragments in the data fragment set with large data volume to the data fragment set with small data volume after the data fragments are stored in the form of the data fragment set, and the storage servers corresponding to the data fragment set of the data fragments needing to be moved are different from the storage servers corresponding to the data fragment set of the data fragments moved, so that the data volume of the data fragment set of each storage server tends to be balanced;
in this thought, the first data fragment set referred to in the embodiment of the present invention may be a data fragment set to which data fragments need to be moved, and the number of the first data fragment set may be at least one; the second data fragment set referred to in the embodiment of the present invention may be a data fragment set moved into a data fragment, and the number of the second data fragment set may be at least one; and the storage servers corresponding to the first data fragment set and the second data fragment set are different, so that the situation that the data fragments of the data fragment sets in the same storage server are mutually moved is avoided, namely the moved data fragments are established under the condition of crossing the storage servers.
Optionally, a data volume threshold may be set in the embodiment of the present invention, where the first data shard set may be determined from data shard sets whose data volume is greater than the data volume threshold, and the second data shard set is determined from data shard sets whose data volume is less than the number threshold; optionally, the data amount threshold may be a preset value, and may be specifically adjusted according to an actual situation;
optionally, a simpler manner may be that, in the data slice set whose data size is greater than the data size threshold, the data slice set with the largest data size may be determined as the first data slice set; determining the data shard set with the minimum data size as a second data shard set in the data shard sets with the data size smaller than the data size threshold; therefore, partial data fragments of the data fragment set with the largest data volume are moved to the data fragment set with the smallest data volume, and the storage balance probability of the data fragments in the distributed storage system is improved;
obviously, as long as the data volume of the first data fragmentation set is larger than that of the second data fragmentation set, the embodiment of the invention moves part of the data fragmentation of the first data fragmentation set to the second data fragmentation set, and can also improve the storage equilibrium probability of the data fragmentation in the distributed storage system; and not necessarily by way of a data volume threshold.
Step S120, determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set.
Optionally, after determining the first data fragment set and the second data fragment set, the embodiment of the present invention may determine a part of data fragments to be migrated in the first data fragment set (referred to as target data fragments in the embodiment of the present invention); it is to be noted that, in the embodiment of the present invention, not all the data fragments in the first data fragment set are migrated, and the data volume of the migrated target data fragment is smaller than the data volume of the first data fragment set, that is, only a part of the data fragments in the first data fragment set are migrated.
Optionally, under the condition that the first data fragment set and the second data fragment set are defined by the data volume threshold, the target data fragment may be determined according to a data volume difference value between the data volume of the first data fragment set and the data volume threshold, and the target data fragment corresponding to the data volume difference value is determined from the first data fragment set according to the data volume difference value, so that after the target data fragment is moved by the first data fragment set, the data volume of the first data fragment set is smaller than the data volume threshold;
optionally, in order to enable the data volume of the second data fragment set not to exceed the data volume threshold after the second data fragment set is moved into the target data fragment, in the embodiment of the present invention, when the second data fragment set is determined, a data fragment set whose difference between the data volume and the data volume threshold is not less than the data volume of the target data fragment may be determined from the data fragment set whose data volume is less than the data volume threshold, so as to be used as the second data fragment set;
optionally, determining that the target data fragment is only an optional manner according to the data volume difference between the data volume of the first data fragment set and the data volume threshold, if the embodiment of the present invention determines the first data fragment set and the second data fragment set based on the principle that the data volume of the first data fragment set is greater than that of the second data fragment set, the embodiment of the present invention may also determine the target data fragment from the first data fragment set according to the data volume difference between the first data fragment set and the second data fragment set, so that after the second data fragment set is migrated into the target data fragment, the data volume of the second data fragment set is not greater than that of the first data fragment set when the target data fragment is not migrated; if the data volume of the target data slice is not more than half of the difference value of the data volumes of the first data slice set and the second data slice set.
Step S130, the target data fragment is moved from the first data fragment set to the second data fragment set.
Optionally, the data size of the second data fragment set moved into the target data fragment should not be greater than the data size of the first data fragment set when the target data fragment is not moved (for example, the data size of the second data fragment set moved into the target data fragment is not greater than a data size threshold), so as to avoid that the data size of the second data fragment set is too large after the second data fragment set is moved into the target data fragment. The data volume of the target data fragment to be migrated can be determined based on the difference between the data volume of the first data fragment set and the data volume threshold and the difference between the data volume of the second data fragment set and the data volume threshold; or, the data volume of the target data slice is not larger than half of the difference value of the data volumes of the first data slice set and the second data slice set, and the like.
Optionally, the data shard may be moved by copying the data shard from the old data shard set to the new data shard set, then updating the corresponding relationship between the data shard and the data shard set, and deleting the data shard in the old data shard set; according to the embodiment of the invention, the target data fragment can be copied from the first data fragment set to the second data fragment set, the data fragment set corresponding to the target data fragment is updated to the second data fragment set, and the target data fragment in the first data fragment set is deleted;
it should be noted that, updating the data fragment set corresponding to the target data fragment to be the second data fragment set is performed here, so that after the target data fragment is migrated to the second data fragment set, the access to the target data fragment can be addressed to the second data fragment set, thereby implementing the access to the target data fragment in the second data fragment set;
optionally, for a corresponding data fragment in each data fragment set, the embodiment of the present invention may record a corresponding relationship between the data fragment and the data fragment set; if the data fragment numbers are set for the data fragments, the data fragment set numbers are set for the data fragment sets, so that the corresponding relation between the data fragments and the data fragment sets is recorded through the corresponding relation between the data fragment numbers and the data fragment set numbers; optionally, the corresponding relationship between the data fragment and the data fragment set may be maintained by a routing table; correspondingly, the data fragment sets correspond to the storage servers, so that the storage servers corresponding to the data fragment sets can be represented by the data fragment set numbers;
therefore, when the data fragment set corresponding to the updated target data fragment is the second data fragment set, the embodiment of the invention can update the data fragment set number corresponding to the data fragment number of the target data fragment into the data fragment set number of the second data fragment set in the routing table.
Optionally, fig. 4 describes a processing situation for a determined first data fragment set, if the number of the determined first data fragment sets is multiple, a second data fragment set is determined for each first data fragment set, and the processing from step S120 to step S130 can implement moving of a target data fragment to be moved in each first data fragment set.
The data processing method provided by the embodiment of the invention can store the data fragments in the form of data fragment sets, one data fragment set corresponds to a plurality of data fragments, one storage server corresponds to at least one data fragment set, and the data fragment sets corresponding to different storage servers are different; therefore, by detecting the data volume of each data sharding set, a first data sharding set and a second data sharding set can be determined from each data sharding set, the data volume of the first data sharding set is larger than that of the second data sharding set, and the first data sharding set and the second data sharding set correspond to different storage servers; further determining a target data fragment to be moved from a first data fragment set, moving the target data fragment from the first data fragment set to a second data fragment set, so that part of data fragments of a data fragment set with large data volume in different storage servers can be moved to a data fragment set with small data volume, and the data volume of the second data fragment set moved into the target data fragment is not more than that of the first data fragment set when the target data fragment is not moved; therefore, the storage balance probability of the data fragments among different storage servers in the distributed storage system is improved, the possibility that the storage servers with the data fragments fully written are generated is reduced, the availability of the storage servers in the distributed storage system is improved, and the possibility of reducing the maintenance burden of the distributed storage system is provided.
Optionally, in the process of moving the target data fragment to the second data fragment set in a data copy manner, the embodiment of the present invention may be implemented in a manner of importing a snapshot of the target data fragment in the second data fragment set and playing back a running log on the imported target data fragment;
optionally, fig. 7 shows a flowchart of a method for migrating a target data fragment to a second data fragment set, and referring to fig. 7, the method may include:
step S200, a snapshot of the target data fragment in the first data fragment set is obtained.
Optionally, in the embodiment of the present invention, the snapshot may be regarded as a read-only static view of the database at the snapshot obtaining time; the embodiment of the invention can obtain the snapshot of the target data fragment in the first data fragment set through the database snapshot technology.
And step S210, determining the flow log after the timestamp of the snapshot is acquired.
Step S220, importing the snapshot into a second data shard set, so that the second data shard set has target data shards, and playing back the streaming log on the target data shards of the second data shard set.
After the snapshot of the target data fragment is imported into the second data fragment set, a new fragment space belonging to the target data fragment is generated in the second data fragment set, so that the second data fragment set has the target data fragment, the running log is played back in the new fragment space belonging to the target data fragment in the second data fragment set, and the target data fragment can be copied from the first data fragment set to the second data fragment set.
Optionally, further, in the method shown in fig. 7, after the target data shard is moved from the first data shard set to the second data shard set, in the correspondence between the data shard and the data shard set, the data shard set corresponding to the target data shard is updated from the first data shard set to the second data shard set, so as to adjust the data shard set to which the target data shard belongs, and enable access to the target data shard to be addressed to the second data shard set; after the data fragment set corresponding to the target data fragment is updated to the second data fragment set, the embodiment of the present invention may delete the target data fragment in the first data fragment set, and complete the moving of the target data fragment from the first data fragment set to the second data fragment set.
Optionally, in the embodiment of the present invention, a first data fragment set and a second data fragment set may be determined from each data fragment set by using a preset data amount threshold, so as to implement the determination of the first data fragment set of data fragments to be moved and the determination of the second data fragment set of data fragments to be moved;
optionally, fig. 8 shows another flowchart of a data processing method provided in an embodiment of the present invention, where the method is applicable to a processing server, and referring to fig. 8, the method may include:
step S300, detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different.
Step S310, a first data fragment set is determined from the data fragment sets with the data volume larger than the data volume threshold, and a second data fragment set is determined from the data fragment sets with the data volume smaller than the data volume threshold, wherein the first data fragment set and the second data fragment set correspond to different storage servers.
Optionally, in the embodiment of the present invention, after the data volume of each data shard set is determined, the data shard set of which the data volume is greater than the data volume threshold is analyzed, and a first data shard set of the data shards to be moved is determined from the data shard set; optionally, in the embodiment of the present invention, a data fragmentation set with the largest data size in data fragmentation sets with a data size larger than a data size threshold may be used as the first data fragmentation set;
or, the data slice sets with the data volume greater than the data volume threshold may be respectively used as the first data slice sets, and correspondingly, the number of the first data slice sets may be at least one;
it is also possible that, in the data slice sets with the data volume greater than the data volume threshold, a plurality of data slice sets with the maximum data volume are respectively used as the first data slice set; for example, the data slice sets with the data size larger than the data size threshold value may be sorted according to the data size, and the data slice sets with the top set number with the maximum data size in the sorting may be used as the first data slice set.
When determining the second data slice set, the embodiment of the present invention may determine from the data slice set whose data volume is smaller than the data volume threshold; that is, whether the data volume of the data shard set is greater than the data volume threshold value or not may be considered as a boundary between the first data shard set and the second data shard set;
optionally, in the embodiment of the present invention, a data fragmentation set with the smallest data volume in the data fragmentation sets with the data volume smaller than the data volume threshold may be used as the second data fragmentation set;
optionally, in order to implement more reasonable data relocation, after determining the first data fragment set, for each first data fragment set (the number of the first data fragment sets may be at least one), the embodiment of the present invention may determine the data amount of the data fragment to be relocated in the first data fragment set according to a data amount difference between the data amount of the first data fragment set and a data amount threshold (of course, when actually performing data fragment relocation, the data amount of the data fragment to be relocated in the first data fragment set may not be less than the data amount difference), where the data amount of the data fragment to be relocated is not less than the data amount difference; therefore, after the data volume of the data fragment needing to be moved is determined to be increased from the data fragment set of which the data volume is smaller than the data volume threshold, the data fragment set of which the data volume is still smaller than the data volume threshold is used as a second data fragment set; that is, after the second data fragment set is moved into the partial data fragment of the first data fragment set, the data volume of the second data fragment set is still smaller than the data volume threshold.
Step S320, determining a target data fragment to be moved in a first data fragment set, where a data volume of the target data fragment is smaller than a data volume of the first data fragment set.
Step S330, the target data fragment is moved from the first data fragment set to the second data fragment set.
Preferably, according to the data volume difference between the data volume of the first data slice set and the data volume threshold, the embodiment of the present invention may determine the second data slice set from the data slice sets whose data volume is smaller than the data volume threshold;
fig. 9 shows a further flowchart of a data processing method provided by an embodiment of the present invention, where the method is applicable to a processing server, and referring to fig. 9, the method may include:
s400, detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different.
Step S410, a first data fragment set is determined from the data fragment sets with the data volume larger than the data volume threshold value.
Optionally, the determining manner of the first data slice set may be that, of the data slice sets whose data volumes are greater than the data volume threshold, the data slice set with the largest data volume is used as the first data slice set; or the data fragment sets with the data volume larger than the data volume threshold value may be respectively used as the first data fragment sets; it is also possible that a plurality of data slice sets with the maximum data size among the data slice sets with the data size larger than the data size threshold are respectively used as the first data slice set.
Step S420, determining a data volume difference value between the data volume of the first data fragment set and a data volume threshold, and determining a target data volume according to the data volume difference value; wherein the target data amount is not less than the data amount difference.
Optionally, the number of the determined first data fragmentation sets may be at least one, and for each first data fragmentation set, the manners of selecting the second data fragmentation set and performing data relocation may be similar and may be referred to each other;
after the first data fragment set is determined, for each first data fragment set, according to a data volume difference value between a data volume of the first data fragment set and a data volume threshold, a target data volume of a target data fragment to be migrated in the first data fragment set (the data volume of the target data fragment to be migrated may be referred to as a target data volume) is determined, and the target data volume of the target data fragment in the first data fragment set is not less than the data volume difference value; according to the embodiment of the present invention, at least one data slice in the first data slice set, in which the data amount is not less than the data amount difference and the data amount is closest to the data amount difference, may be determined, and the data amount of the at least one data slice is taken as the target data amount, and accordingly, the at least one data slice may be used as the target data slice.
Step S430, determining a difference value between the data volume and the data volume threshold value, which is not less than the data slice set of the target data volume, from the data slice set of which the data volume is less than the data volume threshold value.
And step S440, selecting a second data fragment set from the determined data fragment sets.
Optionally, in the embodiment of the present invention, a difference between the data volume of each data shard set and the data volume threshold may be determined from the data shard sets whose data volumes are smaller than the data volume threshold, so as to select the data shard set whose difference is not smaller than the target data volume; optionally, there may be a plurality of data shard sets correspondingly determined by a first data shard set (for example, for a first data shard set, in a data shard set whose data volume is smaller than a data volume threshold, a difference between the data volume of the plurality of data shard sets and the data volume threshold is not smaller than the data volume of a target data shard to be moved by the first data shard set), in an embodiment of the present invention, a data shard set with the smallest data volume may be selected as the second data shard set, or a data shard set may be randomly selected from the data shard sets as the second data shard set.
For example, as shown in fig. 10, after determining a first data slice set a with a data volume greater than a data volume threshold, if a difference between the data volume of the first data slice set a and the data volume threshold is X, a target data slice with a target data volume not less than X data volume needs to be determined from the first data slice set a, and in an embodiment of the present invention, a data slice set with a data volume greater than or equal to the target data volume difference between the data volume and the data volume threshold may be determined from a data slice set with a data volume less than the data volume threshold, so as to determine B, C to be selected as a second data slice set; and a second set of data shards is selected from B, C sets of data shards (e.g., the selection may be random or the selection may be the least amount of data).
Step S450, a target data fragment to be moved in a first data fragment set is determined, and the data volume of the target data fragment is smaller than that of the first data fragment set.
Optionally, the data size of the target data fragment may be a target data size, and in the embodiment of the present invention, a data fragment corresponding to the target data size may be determined from the first data fragment set, and is used as the target data fragment; for example, it may be determined that the data amount in the first data slice set is not less than the data amount difference, and at least one data slice whose data amount is closest to the data amount difference is used as a target data slice;
by such processing, the data volume of the second data fragment set moved into the target data fragment is not greater than the data volume threshold (the data volume of the second data fragment set moved into the target data fragment is not greater than an optional form of the data volume of the first data fragment set when the target data fragment is not moved).
Optionally, in another implementation, in the embodiment of the present invention, the data size of the target data fragment may also be not greater than half of the difference between the data sizes of the first data fragment set and the second data fragment set, so that the data size of the second data fragment set moved into the target data fragment is not greater than the data size of the first data fragment set when the target data fragment is not moved; when this means is used, the embodiment of the present invention is not limited to determining the second data slice set in the manner shown in fig. 9, but may adopt means such as a data slice set with the smallest data amount as the second data slice set.
Step S460, the target data segment is moved from the first data segment set to the second data segment set.
Optionally, fig. 9 describes a processing situation for a determined first data fragment set, and if the number of the determined first data fragment sets is multiple, the processing from step S420 to step S460 is performed for each first data fragment set, so that the target data fragment to be moved in each first data fragment set can be moved.
According to the data processing method provided by the embodiment of the invention, the target data volume of the target data fragment can be determined according to the data volume difference value between the data volume of the first data fragment set and the data volume threshold; determining the difference value between the data volume and the data volume threshold value from the data fragment set with the data volume smaller than the data volume threshold value, wherein the difference value is not smaller than the data fragment set with the target data volume, and the determination of a second data fragment set is realized; the second data fragment set is determined according to the data volume of the target data fragment to be moved in the first data fragment set, so that the determined second data fragment set can accurately meet the moving requirement of the target data fragment, the data fragments can accurately tend to be stored in a distributed storage system, and the storage balance probability of the data fragments in the distributed storage system is greatly improved.
Optionally, the embodiment of the present invention may also occur, and in a data fragment set whose data size is smaller than the data size threshold, there is no single data fragment set that can meet the relocation requirement of a target data fragment of the first data fragment set; if there is no difference between the data volume and the data volume threshold value in the data fragment set with the data volume smaller than the data volume threshold value, and there is no single data fragment set with the data volume not smaller than the target data volume of the target data fragment, then for this situation, the embodiment of the present invention may set a plurality of second data fragment sets for the first data fragment set, and through the form of dividing the target data fragment into a plurality of sub-target data fragments, the target data fragment in the first data fragment set is moved in the form of moving the sub-target data fragments in different second data fragment sets;
for example, as shown in fig. 11, after determining first data slice set a with a data volume greater than the data volume threshold, if a difference between the data volume of first data slice set a and the data volume threshold is X, a target data slice may be determined from the first data slice set, where the target data volume of the target data slice is not less than X; in the data fragment set with the data volume smaller than the data volume threshold value, the data fragment set with the difference value between the data volume and the data volume threshold value larger than or equal to the target data volume does not exist; at this time, the embodiment of the present invention may divide the target data shard set into a plurality of sub-target data shards a1, a2 …, and determine, from the data shard sets whose data volume is smaller than the data volume threshold, a plurality of data shard sets whose difference between the data volume and the data volume threshold is greater than or equal to the sub-target data shards, to be used as the second data shard set;
optionally, fig. 12 shows another flowchart of a data processing method provided in an embodiment of the present invention, where the method is applicable to a processing server, and referring to fig. 12, the method may include:
s500, detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different.
Step S510, determine a first data slice set from the data slice sets with the data volume greater than the data volume threshold.
Step S520, determining a data volume difference value between the data volume of the first data fragment set and a data volume threshold, and determining a target data volume of the target data fragment according to the data volume difference value; wherein the target data amount is not less than the data amount difference.
Step S530, if the data volume is smaller than the data volume threshold value, the difference value between the data volume and the data volume threshold value does not exist in the data slice set, the difference value is not smaller than the single data slice set of the target data volume, and the target data slice is divided into a plurality of sub-target data slices;
step S540, determining a plurality of second data slice sets of which the difference between the data volume and the data volume threshold is greater than or equal to the sub-target data slice from the data slice sets of which the data volume is less than the data volume threshold.
Step S550, moving the plurality of sub-target data fragments from a first data fragment set to a plurality of second data fragment sets; and a second data fragment set is moved into at least one sub-target data fragment, and the data volume of the second data fragment set moved into the sub-target data fragment is not greater than the data volume threshold.
Optionally, in the embodiment of the present invention, the target data fragment may be divided into a plurality of sub-target data fragments of a set number, so that, from a data fragment set whose data volume is smaller than the data volume threshold, a plurality of second data fragment sets whose difference between the data volume and the data volume threshold is not smaller than the sub-target data fragments are determined, at least one sub-target data fragment is moved in one second data fragment set, and the data volume of the second data fragment set moved into the sub-target data fragments is controlled not to be greater than the data volume threshold, so that the movement of the target data fragment in the first data fragment set to the plurality of second data fragment sets can be realized;
for example, after determining the first data slice set a with the data volume greater than the data volume threshold, if the difference between the data volume of the first data slice set a and the data volume threshold is X, a target data slice may be determined from the first data slice set, where the target data volume of the target data slice is not less than X (the target data slice may be at least one data slice in the first data slice set, whose data volume is not less than X and is closest to X);
if the data volume is smaller than the data volume threshold value, a single data fragment set with the data volume not smaller than the target data volume does not have the difference value between the data volume and the data volume threshold value; the embodiment of the invention can divide the target data fragment in A into 3 sub-target data fragments (3 is an optional numerical value with a set number, and the specific numerical value can be adjusted according to the actual situation; it is worth noting that if the data fragment set with the data volume smaller than the data volume threshold still cannot meet the moving requirement of the target data fragment after the sub-target data fragment is divided for the last time, if the data volume of the sub-target data fragment is larger than the maximum difference value between the data volume in the data fragment set with the data volume smaller than the data volume threshold and the data volume threshold, the set number for dividing the target data fragment into the sub-target data fragments can be increased until the data fragment set with the data volume smaller than the data volume threshold meets the moving requirement of the target data fragment);
therefore, from the data slice sets with the data volume smaller than the data volume threshold, it is determined that the difference between the data volume and the data volume threshold is not smaller than the plurality of first data slice sets of the sub-target data slices (the number of the plurality of first data slice sets may be greater than or equal to the set number; of course, if the data volume is smaller than the data volume threshold, there exists a data slice set with a larger difference between the data volume and the data volume threshold in the data slice sets with the data volume smaller than the data volume threshold, there may also be a case that the number of the plurality of first data slice sets is smaller than the set number, which may be determined according to actual circumstances).
Optionally, the manner in which a second set of data fragments is moved into a sub-target data fragment may be described with reference to fig. 7, and the manner is similar.
As shown in fig. 13, referring to fig. 13, application examples of the data processing method according to the embodiment of the present invention include that, referring to fig. 13, shards 1 to 3 belong to a data shard set (set)1, corresponding to a storage server 1, shards 4 to 6 belong to a set2, corresponding to a storage server 2, and so on;
after the data volume of each set is detected, the data volume of set1 is found to be larger than the data volume threshold, set1 can be used as a first data fragment set, and according to the data volume difference between the data volume of set1 and the data volume threshold, a target data fragment shard2 with the data volume not smaller than the data volume difference can be selected from the set1, namely, after the shard2 is moved from the set1, the data volume of the set1 can be smaller than the data volume threshold, and the data volume of the set1 is reduced;
according to the data volume of the shard2, selecting seti which can meet the requirements of migrating the shard2 into the space and still has the data volume smaller than the data volume threshold value after migrating the shard2 into the space (the seti may be the set with the data volume smaller than the data volume threshold value and the set with the minimum data volume, or may be randomly selected from the sets with the data volume smaller than the data volume threshold value and capable of meeting the requirements of migrating the shard2 into the space);
therefore, the shard2 is moved from the set1 to the seti, the set corresponding to the shard2 is updated to be the seti, the shard2 is transferred to be stored in the seti, the data volume of the set1 is reduced, the areas of the data fragment storage quantity of each set of each storage server in the distributed storage system are balanced, and the storage balance probability of the data fragments in the distributed storage system is improved.
Optionally, after the balanced storage of the data fragments in the distributed storage system is realized, the embodiment of the present invention can realize data access (e.g., data read-write) in the distributed storage system; FIG. 14 is a flow chart of a method of implementing data access for a distributed storage system, the method being applicable to a processing server, which may optionally have access proxy functionality; of course, the processing server may not have the access proxy function, but implement the data access processing by interacting with the access proxy server;
referring to fig. 14, the method may include:
step S600, an access request is obtained, wherein the access request carries keywords of data to be accessed.
Optionally, when accessing the service data, the user may send an access request, and indicate a key (keyword) of the service data to be accessed in the access request.
And step S610, carrying out Hash processing on the keywords, and rounding the processed character strings to obtain target integers.
Optionally, the hash function of the common character string may be BKDRHash, APHash, DJBHash, JSHash, RSHash, SDBMHash, PJWHash, etc.; the embodiment of the invention can use a character string hash function to carry out hash processing on the keyword, and carry out rounding downwards or upwards on the processed character string; rounding down can be considered to be taking the integer digits of the string; rounding up can be considered to be rounding the non-integer digits of the string and then combining with the original integer digits of the string, or taking the last digit of the integer digits of the string directly, etc.
Step S620, performing modulus taking processing on the set fragment number by using the target integer to obtain a target modulus value; and the target module value is a target data fragment number of a data fragment corresponding to the data to be accessed.
The set number of slices can be a fixed number of slices set in advance in the distributed storage system; the embodiment of the invention can utilize the target integer obtained by the Hash processing and rounding of the keyword to carry out the modulus processing on the set fragment number so as to obtain the target modulus value; the target module value may be regarded as a target data fragment number of a data fragment corresponding to the data to be accessed.
It should be noted that, because the set number of fragments is fixed, for example, 10000, the modulo target module value is within the value range of the set number of fragments, for example, the value range of 10000, that is, after a specific piece of data is rounded and then modulo is performed, a fixed number, that is, the target data fragment number of the data fragment corresponding to the data to be accessed, is addressed.
Optionally, steps S610 to S620 may be regarded as an optional manner of determining the data fragment number of the data fragment corresponding to the data to be accessed according to the keyword, and embodiments of the present invention do not exclude other manners of obtaining the data fragment number of the data fragment corresponding to the data to be accessed based on the keyword of the data to be accessed.
Step S630, determining the data fragment set number corresponding to the target data fragment number according to the corresponding relation between the data fragment number and the data fragment set number.
The corresponding relation between the data fragment numbers and the data fragment set numbers is a corresponding embodiment of the data fragment set to which the data fragments belong, and the corresponding relation can be updated when the data fragments are moved from the old data fragment set to the new data fragment set.
Step S640, obtaining the data slice corresponding to the data to be accessed from the data slice set corresponding to the data slice set number.
After the data fragment set number corresponding to the target module value is determined, the data fragment set corresponding to the data fragment set number can be addressed, so that the data fragment corresponding to the data fragment number of the target module value is obtained from the data fragment set, the data fragment corresponding to the data to be accessed is obtained, and the data access in the distributed storage system is realized.
The data processing method provided by the embodiment of the invention can move partial data fragments of a data fragment set with large data volume in different storage servers into a data fragment set with small data volume, improve the storage balance probability of the data fragments among different storage servers in the distributed storage system, reduce the possibility of occurrence of storage servers with full data fragment writing, improve the availability of the storage servers in the distributed storage system, and provide possibility for reducing the maintenance burden of the distributed storage system; meanwhile, when the data fragments are moved, the data fragment set to which the data fragments belong is adjusted, so that the data fragment set in which the data to be accessed is located can be accurately found for the data access of the distributed storage system, and the normal data access of the distributed storage system is guaranteed.
In the following, the data processing apparatus provided in the embodiment of the present invention is introduced, and the data processing apparatus described below may be regarded as a program module that is required to be set by a processing server to implement the data processing method provided in the embodiment of the present invention; the contents of the data processing device described below can be referred to in correspondence with the contents of the data processing method described above
Fig. 15 is a block diagram of a data processing apparatus according to an embodiment of the present invention, where the apparatus is applicable to a processing server, and referring to fig. 15, the data processing apparatus may include:
a data amount detection module 100, configured to detect a data amount of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
a shard set determining module 200, configured to determine, according to a data amount of each data shard set, a first data shard set and a second data shard set from each data shard set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
a target fragment determining module 300, configured to determine a target data fragment to be moved in a first data fragment set, where a data volume of the target data fragment is smaller than a data volume of the first data fragment set;
a data relocation module 400, configured to relocate the target data segment from a first data segment set to a second data segment set; and the data volume of the second data fragment set transferred into the target data fragment is not more than the data volume of the first data fragment set when the target data fragment is not transferred.
Optionally, the fragmentation set determining module 200 is configured to determine, according to the data amount of each data fragmentation set, a first data fragmentation set and a second data fragmentation set from each data fragmentation set, and specifically includes:
and determining a first data fragmentation set from the data fragmentation sets with the data volume larger than the data volume threshold value, and determining a second data fragmentation set from the data fragmentation sets with the data volume smaller than the data volume threshold value.
Optionally, the fragmentation set determining module 200 is configured to determine the first data fragmentation set from the data fragmentation set whose data volume is greater than the data volume threshold, and specifically includes:
taking the data fragmentation set with the largest data volume in the data fragmentation sets with the data volume larger than the data volume threshold value as a first data fragmentation set;
or, the data fragment sets with the data volume larger than the data volume threshold are respectively used as first data fragment sets;
or, a plurality of data slice sets with the maximum data volume in the data slice sets with the data volume larger than the data volume threshold are respectively used as the first data slice set.
Optionally, the fragmentation set determining module 200 is configured to determine the second data fragmentation set from the data fragmentation set whose data volume is smaller than the data volume threshold, and specifically includes:
determining a data volume difference value between the data volume of the first data fragment set and a data volume threshold;
determining a target data volume according to the data volume difference; wherein the target data amount is not less than the data amount difference;
determining a difference value between the data volume and the data volume threshold value from a data fragment set with the data volume smaller than the data volume threshold value, wherein the difference value is not smaller than the data fragment set of the target data volume;
a second set of data shards is selected from the determined sets of data shards.
Optionally, the fragmentation set determining module 200 is configured to determine the target data volume according to the data volume difference, and specifically includes:
and determining at least one data fragment of which the data volume is not less than the data volume difference value and the data volume is closest to the data volume difference value in the first data fragment set, and taking the data volume of the at least one data fragment as a target data volume.
Optionally, the target fragment determining module 300 is configured to determine a target data fragment to be moved in the first data fragment set, and specifically includes:
and determining the data fragment corresponding to the target data volume from the first data fragment set as the target data fragment.
Optionally, the fragmentation set determining module 200 is configured to determine the second data fragmentation set from the data fragmentation set whose data volume is smaller than the data volume threshold, and may further include:
if the data volume is smaller than the data volume threshold value in the data fragment set, the difference value between the data volume and the data volume threshold value does not exist, and the data fragment set is not smaller than the single data fragment set of the target data volume, and the target data fragment is divided into a plurality of sub-target data fragments;
determining a plurality of second data fragment sets of which the difference value between the data volume and the data volume threshold is greater than or equal to the sub-target data fragments from the data fragment sets of which the data volume is less than the data volume threshold;
correspondingly, the data relocation module 400 is configured to relocate the target data fragment from the first data fragment set to the second data fragment set, and specifically includes:
and moving at least one sub-target data fragment in a second data fragment set, and controlling the data volume of the second data fragment set moved to the sub-target data fragment not to be larger than the data volume threshold.
Optionally, the data relocation module may implement relocation of the target data fragment in a snapshot form; optionally, the data relocation module 400 is configured to relocate the target data fragment from the first data fragment set to the second data fragment set, and specifically includes:
acquiring a snapshot of a target data fragment in a first data fragment set;
determining a journal log after a timestamp of taking the snapshot;
and importing the snapshot into a second data shard set to enable the second data shard set to have target data shards, and playing back the streaming journal on the target data shards of the second data shard set.
Optionally, fig. 16 shows another structural block diagram of the data processing apparatus according to the embodiment of the present invention, and in combination with fig. 15 and fig. 16, the data processing apparatus may further include:
the relationship updating module 500 is configured to update a data shard set corresponding to a target data shard from a first data shard set to a second data shard set in a corresponding relationship between the data shard and the data shard set.
Optionally, the correspondence between the data shards and the data shard set may include: the corresponding relation between the data fragment number and the data fragment set number;
accordingly, fig. 17 shows a further structural block diagram of the data processing apparatus according to the embodiment of the present invention, and in combination with fig. 16 and 17, the data processing apparatus may further include:
an access processing module 600, configured to obtain an access request, where the access request carries a keyword of data to be accessed; determining the target data fragment number of the data fragment corresponding to the data to be accessed according to the keyword; determining a data fragment set number corresponding to the target data fragment number according to the corresponding relation between the data fragment number and the data fragment set number; and acquiring the data fragment corresponding to the data to be accessed from the data fragment set corresponding to the data fragment set number.
Optionally, the access processing module 600 is configured to determine, according to the keyword, a target data fragment number of a data fragment corresponding to data to be accessed, and specifically includes:
carrying out Hash processing on the keywords, and rounding the processed character strings to obtain a target integer;
performing modulus taking processing on the set fragment number by using the target integer to obtain a target modulus; and the target module value is a target data fragment number of a data fragment corresponding to the data to be accessed.
The functional modules described above may be implemented by corresponding programs loaded on a processing server, the hardware structure of the processing server may be as shown in fig. 3, and the processing server may at least include: at least one memory and at least one processor;
the memory stores programs, and the processor calls the programs stored in the memory;
the program is for:
detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set;
moving the target data fragment from a first data fragment set to a second data fragment set; and the data volume of the second data fragment set transferred into the target data fragment is not more than the data volume of the first data fragment set when the target data fragment is not transferred.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A data processing method, comprising:
detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set;
moving the target data fragment from a first data fragment set to a second data fragment set; the data volume of the second data fragment set moved into the target data fragment is not greater than the data volume of the first data fragment set when the target data fragment is not moved;
wherein the moving the target data shard from the first data shard set to the second data shard set comprises:
acquiring a snapshot of a target data fragment in a first data fragment set;
determining a journal log after a timestamp of taking the snapshot;
and importing the snapshot into a second data shard set to enable the second data shard set to have target data shards, and playing back the streaming journal on the target data shards of the second data shard set.
2. The data processing method of claim 1, wherein determining the first set of data shards and the second set of data shards from the data shard sets according to the data amount of the data shard sets comprises:
and determining a first data fragmentation set from the data fragmentation sets with the data volume larger than the data volume threshold value, and determining a second data fragmentation set from the data fragmentation sets with the data volume smaller than the data volume threshold value.
3. The data processing method of claim 2, wherein determining the first set of data shards from the set of data shards having the amount of data greater than the threshold amount of data comprises:
taking the data fragmentation set with the largest data volume in the data fragmentation sets with the data volume larger than the data volume threshold value as a first data fragmentation set;
or, the data fragment sets with the data volume larger than the data volume threshold are respectively used as first data fragment sets;
or, a plurality of data slice sets with the maximum data volume in the data slice sets with the data volume larger than the data volume threshold are respectively used as the first data slice set.
4. The data processing method of claim 2, wherein determining the second set of data shards from the set of data shards having the amount of data less than the data amount threshold comprises:
determining a data volume difference value between the data volume of the first data fragment set and a data volume threshold;
determining a target data volume according to the data volume difference; wherein the target data amount is not less than the data amount difference;
determining a difference value between the data volume and the data volume threshold value from a data fragment set with the data volume smaller than the data volume threshold value, wherein the difference value is not smaller than the data fragment set of the target data volume;
a second set of data shards is selected from the determined sets of data shards.
5. The data processing method of claim 4, wherein the determining a target data volume from the data volume difference comprises:
and determining at least one data fragment of which the data volume is not less than the data volume difference value and the data volume is closest to the data volume difference value in the first data fragment set, and taking the data volume of the at least one data fragment as a target data volume.
6. The data processing method according to claim 4, wherein the determining a target data fragment to be migrated in the first set of data fragments comprises:
and determining the data fragment corresponding to the target data volume from the first data fragment set as the target data fragment.
7. The data processing method of claim 4, wherein determining the second set of data shards from the set of data shards having the amount of data less than the data amount threshold further comprises:
if the data volume is smaller than the data volume threshold value in the data fragment set, the difference value between the data volume and the data volume threshold value does not exist, and the data fragment set is not smaller than the single data fragment set of the target data volume, and the target data fragment is divided into a plurality of sub-target data fragments;
and determining a plurality of second data fragment sets of which the difference value between the data volume and the data volume threshold is greater than or equal to the sub-target data fragments from the data fragment sets of which the data volume is less than the data volume threshold.
8. The data processing method of claim 7, wherein the migrating the target data shard from a first set of data shards to a second set of data shards comprises:
and moving at least one sub-target data fragment in a second data fragment set, and controlling the data volume of the second data fragment set moved to the sub-target data fragment not to be larger than the data volume threshold.
9. The data processing method according to any one of claims 1 to 8, further comprising:
and in the corresponding relation between the data fragments and the data fragment set, updating the data fragment set corresponding to the target data fragment from the first data fragment set to the second data fragment set.
10. The data processing method according to claim 9, wherein the correspondence between the data shards and the data shard sets comprises: the corresponding relation between the data fragment number and the data fragment set number;
the method further comprises the following steps:
acquiring an access request, wherein the access request carries a keyword of data to be accessed;
determining the target data fragment number of the data fragment corresponding to the data to be accessed according to the keyword;
determining a data fragment set number corresponding to the target data fragment number according to the corresponding relation between the data fragment number and the data fragment set number;
and acquiring the data fragment corresponding to the data to be accessed from the data fragment set corresponding to the data fragment set number.
11. The data processing method according to claim 10, wherein the determining, according to the keyword, a target data fragment number of a data fragment corresponding to the data to be accessed comprises:
carrying out Hash processing on the keywords, and rounding the processed character strings to obtain a target integer;
performing modulus taking processing on the set fragment number by using the target integer to obtain a target modulus; and the target module value is a target data fragment number of a data fragment corresponding to the data to be accessed.
12. A data processing apparatus, comprising:
the data volume detection module is used for detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
the fragment set determining module is used for determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
the target fragment determining module is used for determining a target data fragment to be moved in a first data fragment set, and the data volume of the target data fragment is smaller than that of the first data fragment set;
the data moving module is used for moving the target data fragment from a first data fragment set to a second data fragment set; the data volume of the second data fragment set moved into the target data fragment is not greater than the data volume of the first data fragment set when the target data fragment is not moved;
the data relocation module is specifically configured to:
acquiring a snapshot of a target data fragment in a first data fragment set;
determining a journal log after a timestamp of taking the snapshot;
and importing the snapshot into a second data shard set to enable the second data shard set to have target data shards, and playing back the streaming journal on the target data shards of the second data shard set.
13. The data processing apparatus of claim 12, further comprising:
the relationship updating module is used for updating a data fragment set corresponding to the target data fragment from a first data fragment set to a second data fragment set in the corresponding relationship between the data fragment and the data fragment set; the corresponding relationship between the data shards and the data shard set comprises: the corresponding relation between the data fragment number and the data fragment set number;
the access processing module is used for acquiring an access request, and the access request carries a keyword of data to be accessed; determining the target data fragment number of the data fragment corresponding to the data to be accessed according to the keyword; determining a data fragment set number corresponding to the target data fragment number according to the corresponding relation between the data fragment number and the data fragment set number; and acquiring the data fragment corresponding to the data to be accessed from the data fragment set corresponding to the data fragment set number.
14. A processing server, comprising: at least one memory and at least one processor;
the memory stores programs, and the processor calls the programs stored in the memory;
the program is for:
detecting the data volume of each data fragment set; the data shard collection comprises a plurality of data shards, a storage server and a plurality of storage servers, wherein one data shard collection corresponds to the plurality of data shards, one storage server corresponds to at least one data shard collection, and the data shard collections corresponding to different storage servers are different;
determining a first data fragment set and a second data fragment set from each data fragment set according to the data volume of each data fragment set; the data volume of the first data fragment set is greater than that of the second data fragment set, and the first data fragment set and the second data fragment set correspond to different storage servers;
determining a target data fragment to be moved in a first data fragment set, wherein the data volume of the target data fragment is smaller than that of the first data fragment set;
moving the target data fragment from a first data fragment set to a second data fragment set; the data volume of the second data fragment set moved into the target data fragment is not greater than the data volume of the first data fragment set when the target data fragment is not moved;
wherein the moving the target data shard from the first data shard set to the second data shard set comprises:
acquiring a snapshot of a target data fragment in a first data fragment set;
determining a journal log after a timestamp of taking the snapshot;
and importing the snapshot into a second data shard set to enable the second data shard set to have target data shards, and playing back the streaming journal on the target data shards of the second data shard set.
CN201710378791.5A 2017-05-25 2017-05-25 Data processing method and device and processing server Active CN108932104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710378791.5A CN108932104B (en) 2017-05-25 2017-05-25 Data processing method and device and processing server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710378791.5A CN108932104B (en) 2017-05-25 2017-05-25 Data processing method and device and processing server

Publications (2)

Publication Number Publication Date
CN108932104A CN108932104A (en) 2018-12-04
CN108932104B true CN108932104B (en) 2021-06-25

Family

ID=64451495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710378791.5A Active CN108932104B (en) 2017-05-25 2017-05-25 Data processing method and device and processing server

Country Status (1)

Country Link
CN (1) CN108932104B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708763B (en) * 2020-06-18 2023-12-01 北京金山云网络技术有限公司 Data migration method and device of sliced cluster and sliced cluster system
US20220358095A1 (en) * 2021-05-10 2022-11-10 Shreyas JAIN Managing data requests to a data shard

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051686B (en) * 2012-12-10 2018-03-27 北京普泽创智数据技术有限公司 A kind of method and system of distributed system dynamic application isolation
CN103136338B (en) * 2013-02-04 2016-02-10 中国科学院信息工程研究所 A kind of metadata distribution method based on catalogue division and device
CN103500072A (en) * 2013-09-27 2014-01-08 华为技术有限公司 Data migration method and data migration device
US20150261444A1 (en) * 2014-03-12 2015-09-17 Kabushiki Kaisha Toshiba Memory system and information processing device
CN104376087B (en) * 2014-11-19 2017-09-29 天津南大通用数据技术股份有限公司 A kind of computational methods using the distributed data base load balancing for intersecting backup
CN104883394A (en) * 2015-04-28 2015-09-02 浙江大学 Method and system for server load balancing
CN106502576B (en) * 2015-09-06 2020-06-23 中兴通讯股份有限公司 Migration strategy adjusting method and device
CN106682185B (en) * 2016-12-29 2019-05-24 北京奇虎科技有限公司 The method and apparatus for preventing the equalization operation in Mongos cluster from failing

Also Published As

Publication number Publication date
CN108932104A (en) 2018-12-04

Similar Documents

Publication Publication Date Title
US10394847B2 (en) Processing data in a distributed database across a plurality of clusters
CN111247518B (en) Method and system for database sharding
CN110147407B (en) Data processing method and device and database management server
US8214388B2 (en) System and method for adding a storage server in a distributed column chunk data store
US20130097402A1 (en) Data prefetching method for distributed hash table dht storage system, node, and system
CN112100293A (en) Data processing method, data access method, data processing device, data access device and computer equipment
CN106294421B (en) Data writing and reading method and device
KR20170054299A (en) Reference block aggregating into a reference set for deduplication in memory management
US20190179752A1 (en) Multi-level caching method and multi-level caching system for enhancing graph processing performance
CN107430551B (en) Data caching method, storage control device and storage equipment
CN111159436A (en) Method and device for recommending multimedia content and computing equipment
CN106909595B (en) Data migration method and device
JP2014232483A (en) Database system, retrieval method and program
US10771358B2 (en) Data acquisition device, data acquisition method and storage medium
US9658774B2 (en) Storage system and storage control method
US10915533B2 (en) Extreme value computation
CN112000467A (en) Data tilt processing method and device, terminal equipment and storage medium
CN108932104B (en) Data processing method and device and processing server
US20150081710A1 (en) Data typing with probabilistic maps having imbalanced error costs
US9898518B2 (en) Computer system, data allocation management method, and program
CN106682130B (en) Similar picture detection method and device
CN117009389A (en) Data caching method, device, electronic equipment and readable storage medium
CN112035498B (en) Data block scheduling method and device, scheduling layer node and storage layer node
CN115970295A (en) Request processing method and device and electronic equipment
CN113806354B (en) Method and device for realizing time sequence feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant