CN114625695A

CN114625695A - Data processing method and device

Info

Publication number: CN114625695A
Application number: CN202210319540.0A
Authority: CN
Inventors: 颜红波; 王志强; 毛耀宽; 张锋; 裴晓辉; 徐立
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-14

Abstract

The embodiment of the specification provides a data processing method and a data processing device, wherein the data processing method comprises the following steps: receiving a data processing instruction, and acquiring a data block to be processed and a target data block based on the data processing instruction; reading the data to be processed stored in the data block to be processed, and determining a related data block corresponding to the data to be processed according to the data to be processed; and reading the associated data corresponding to the data to be processed and stored in the associated data block, and migrating the data to be processed and the associated data to the target data block. The data management of the data to be processed and the associated data is realized, and the data migration speed is improved.

Description

Data processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data processing method.

Background

In a file storage system, portions of the same file are stored in different locations, resulting in discontinuous storage space occupied by the file. Since data is read many times, data fragmentation is severe, and performance of the file storage system is affected. When data migration is performed on fragmented data, the difficulty is high, and the migration speed is slow, so that a more effective data processing method is urgently needed to solve the above problems.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical deficiencies of the prior art.

According to a first aspect of embodiments herein, there is provided a data processing method including:

receiving a data processing instruction, and acquiring a data block to be processed and a target data block based on the data processing instruction;

reading the data to be processed stored in the data block to be processed, and determining a related data block corresponding to the data to be processed according to the data to be processed;

and reading the associated data corresponding to the data to be processed and stored in the associated data block, and migrating the data to be processed and the associated data to the target data block.

Optionally, the obtaining the to-be-processed data block based on the data processing instruction includes:

determining at least two data blocks in response to the data processing instruction;

calculating the fragment rate of each data block based on a preset fragment rate calculation strategy;

and selecting the data block to be processed from the at least two data blocks according to the fragmentation rate of each data block.

Optionally, the selecting a data block to be processed from the at least two data blocks according to the fragmentation rate of each data block includes:

sorting each data block according to the fragment rate of each data block, and selecting a preset number of data blocks as data blocks to be processed; alternatively, the first and second electrodes may be,

and determining the data blocks with the fragmentation rate larger than a preset fragmentation rate threshold value as the data blocks to be processed.

Optionally, the calculating the fragmentation rate of each data block based on the preset fragmentation rate calculation policy includes:

traversing data stored in each data block, and acquiring effective data information and data slice information stored in each data block;

the fragmentation rate of each data block is calculated based on the valid data information and the data slice information of each data block.

Optionally, the reading the to-be-processed data stored in the to-be-processed data block includes:

and under the condition that the data block to be processed is a data file to be processed, acquiring data of the file to be processed stored in the data file to be processed.

Optionally, the determining, according to the to-be-processed data, an associated data block corresponding to the to-be-processed data includes:

determining a user file according to the file data to be processed;

and determining an associated data file corresponding to the file data to be processed based on the user file.

Optionally, the determining the user file according to the file data to be processed includes:

acquiring file index information corresponding to the file data to be processed;

and determining the user file corresponding to the file data to be processed according to the file index information.

Optionally, the determining, based on the user file, an associated data file corresponding to the file data to be processed includes:

and determining an associated data file corresponding to the file data to be processed according to the file index information and the user file.

Optionally, after the step of migrating the to-be-processed data and the associated data to the target data file is executed, the method further includes:

and updating the file index information according to the target data block.

and under the condition that the data block to be processed is a layer file to be processed, acquiring index data to be processed stored in the layer file to be processed.

Optionally, the determining, according to the data to be processed, an associated data block corresponding to the data to be processed includes:

acquiring an index file corresponding to the index data to be processed;

and determining an associated layer file corresponding to the index data to be processed based on the index file.

Optionally, migrating the to-be-processed data and the associated data to the target data block, including:

traversing the layer file to be processed and the associated layer file based on a preset iteration strategy to obtain index data to be processed and associated index data;

and storing the index data to be processed and the associated index data to a target layer file.

According to a second aspect of embodiments of the present specification, there is provided a data processing apparatus comprising:

the receiving module is configured to receive a data processing instruction, and obtain a data block to be processed and a target data block based on the data processing instruction;

the reading module is configured to read the data to be processed stored in the data block to be processed, and determine an associated data block corresponding to the data to be processed according to the data to be processed;

and the migration module is configured to read the associated data corresponding to the data to be processed and stored in the associated data block, and migrate the data to be processed and the associated data to the target data block.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, and the computer-executable instructions realize the steps of the data processing method when being executed by the processor.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data processing method described above.

According to a fifth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-mentioned data processing method.

One embodiment of the specification acquires a data block to be processed and a target data block based on a data processing instruction by receiving the data processing instruction; reading data to be processed stored in a data block to be processed, and determining a related data block corresponding to the data to be processed according to the data to be processed; and reading the associated data corresponding to the data to be processed stored in the associated data block, and transferring the data to be processed and the associated data to the target data block. The data management of the data to be processed and the associated data is realized, and the data migration speed is improved.

Drawings

FIG. 1 is a diagram illustrating a data processing method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present specification;

FIG. 3 is a data migration flow diagram of a data processing method provided in one embodiment of the present specification;

FIG. 4 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a processing procedure of another data processing method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;

fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms referred to in one or more embodiments of the present specification are explained.

Key-value store (key-value): key is a key and value is a value. The key-value distributed storage system is high in query speed, large in data storage quantity, high in support concurrency and suitable for query through the main key.

appendix-Only: the data record is written into the data file in an apend mode, and random writing is changed into sequential writing.

NAS file storage: the NAS (network Attached storage) network storage realizes data transmission based on a standard network protocol, and provides file sharing and data backup for computers of various operating systems in the network.

File defragmentation: and rearranging the data of the file, combining the small pieces of data into a larger piece of data, and arranging the data distribution of the whole file.

KVServer: and the metadata server is used for storing the small data, recording the size of a storage space occupied by the small data, setting a storage threshold value, and writing the small data into the data file together when the small data reaches the storage threshold value, wherein the storage threshold value can be 512 bytes.

Fig. 1 illustrates a schematic diagram of a data processing method provided according to an embodiment of the present disclosure, where fig. 1 illustrates a data storage method of a distributed file storage system, where data in a same NAS user file is stored in multiple memory blocks in a memory data file by taking a data block as a unit, the data in the memory blocks is stored in multiple data files, index information is generated at the same time, the data is recorded by multiple sub-layer files in a layer file L0, each time data update occurs, the update record is also stored by the sub-layer file, the layer file L0 may be a disk, and is used to record index information corresponding to data with a higher liveness, and the layer file L1 records index information corresponding to data with a lower liveness. The data of one user file is finally saved into a plurality of data files, and the data files contain data fragments.

In the distributed file storage system, data is written in an appendix-Only mode, and the data can be accessed for many times after being written in, so that fragmentation of the data file can be caused, the performance of accessing the data file is influenced, the complexity of data defragmentation is increased, and the difficulty of data relocation tasks is improved. The main reason for fragmenting user file data is that data is written in a relatively discrete manner, and discrete scenes include: the front-end machine directly writes the small data blocks, the small data blocks and other data blocks are mixed and written into a data file, and under an extreme condition, a plurality of clients directly write the small data blocks in a plurality of front-end machines; writing large block data and small block data in a mixed manner, directly writing the large block data in a front-end computer, and writing the small block data in a KVServer side in a transstorage manner; overwrite writing results in fragmentation of data, especially persistent random writes. Therefore, the embodiment provides a data processing method, which can solve the problem of data fragmentation in a distributed file storage system, and achieve file defragmentation, thereby improving performance.

In the present specification, a data processing method is provided, and the present specification simultaneously relates to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Referring to fig. 2, fig. 2 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 202: and receiving a data processing instruction, and acquiring a data block to be processed and a target data block based on the data processing instruction.

Specifically, the data processing instruction refers to a computer command, and performs data processing in response to the data processing instruction, where the data processing instruction includes, but is not limited to, a data obtaining instruction, a data storing instruction, and the like, and in this embodiment, a data block to be processed and a target data block are obtained in response to the data processing instruction, where the data block to be processed refers to a data file or a data storing unit that has stored data and needs to perform operations such as reading, modifying, and migrating the data; a target data block refers to a data file or data unit used to store data migrated from a data block to be processed or other data block.

Based on the above, after the data processing instruction is received, the data block to be processed, which needs to be processed, is acquired in response to the data processing instruction, and the target data block for storing the data migrated from the data block to be processed is acquired at the same time. In practical application, in a scenario of migrating data fragments in a data file, the data file is a data block to be processed, and a file for storing the migrated data fragments is a target data block.

Further, when the data blocks to be processed are obtained based on the data processing instruction, selection needs to be performed from a plurality of data blocks in consideration of the large number of data blocks. When selecting from a plurality of data blocks, determining a data block to be processed according to the fragmentation rate of each data block, specifically implementing as follows:

determining at least two data blocks in response to the data processing instruction; calculating the fragment rate of each data block based on a preset fragment rate calculation strategy; and selecting the data block to be processed from the at least two data blocks according to the fragmentation rate of each data block.

Specifically, the data block refers to a data storage unit or a data file, and is used for storing data, and can perform operations such as reading, modifying, migrating and the like on the data stored in the data block; the fragment rate of the data block is a calculation result obtained by calculating the effective data and the fragment data stored in the data block by adopting a fragment rate calculation formula, and is used for expressing the proportion of the fragment data in the effective data stored in the data block; the preset fragment rate calculation strategy refers to a preset calculation method or formula for calculating the fragment rate of the data block.

Based on the data processing instruction, after the data processing instruction is received, at least two data blocks corresponding to the data processing instruction are determined in the plurality of data blocks in response to the data processing instruction, and the at least two data blocks are used as processing objects. The method comprises the steps of reading fragment data of each data block and effective data stored in each data block aiming at least two data blocks, respectively calculating the fragment rate of each data block based on a preset fragment rate calculation strategy according to the fragment data and the effective data, and selecting a data block to be processed from the at least two data blocks according to the fragment rate of each data block. It should be noted that the number of the data blocks to be processed may be one or multiple, which is not specifically limited in this embodiment.

In summary, the data block to be processed is determined from the at least two data blocks by calculating the fragmentation rate of each data block, so that data processing is performed on the data block to be processed, the problem of data fragmentation is solved, and the system performance is improved.

Further, when calculating the fragmentation rate of each data block based on the preset fragmentation rate calculation strategy, considering that the data amount stored in each data block is different, the fragmentation data in each data block, that is, the number of data pieces is also different, so that the fragmentation rate may be calculated based on the data pieces and the data amount in each data block, which is specifically implemented as follows:

traversing data stored in each data block, and acquiring effective data information and data slice information stored in each data block; the fragmentation rate of each data block is calculated based on the valid data information and the data slice information of each data block.

Specifically, the valid data refers to data stored in the data block, and correspondingly, the valid data information refers to quantity information of the valid data, which can be represented by the size of the occupied storage space; the data slice refers to fragment data in the data block, and the data slice information refers to quantity information of the data slice, which can be expressed by the size of the occupied storage space, such as: 1GB data slice, 1MB data slice, etc., and may also be expressed by the number of data slices, such as: 1 piece of data, 100 pieces of data.

Based on the above, when the fragmentation rate of each data block is calculated based on the preset fragmentation rate calculation strategy, the data stored in each data block is traversed, the effective data information of the effective data stored in the data block is calculated, and the data slice information of the data slice stored in the data block is calculated. And calculating the effective data information and the data slice information corresponding to each data block to obtain the fragment rate of each data block.

In summary, the fragmentation rate of the data block is calculated by obtaining the valid data and the data slice of the data block, so that the accuracy of calculating the fragmentation rate of the data block is improved.

Further, when the data block to be processed is selected according to the fragmentation rate of the data block, various selection modes can be adopted, and the following specific implementation can be achieved:

sorting each data block according to the fragment rate of each data block, and selecting a preset number of data blocks as data blocks to be processed; or determining the data block with the fragmentation rate larger than the preset fragmentation rate threshold value as the data block to be processed.

Specifically, the preset fragmentation rate threshold refers to a preset parameter value, and is used as a reference value of the fragmentation rate when selecting a data block to be processed according to the fragmentation rate of the data block, and the data block with the fragmentation rate greater than, less than, or equal to the preset fragmentation rate threshold is used as the data block to be processed.

Based on this, when selecting the data block to be processed according to the fragmentation rate of each data block, the following two methods may be employed for the selection of the data block to be processed. Sequencing each data block according to the fragment rate of the data block to obtain a data block sequence which is sequenced from high to low or from low to high according to the fragment rate, presetting the number of the data blocks to be selected to be processed, and selecting the data blocks with preset number from a data block list as the data blocks to be processed; or, a preset fragmentation rate threshold is predetermined, and data blocks with fragmentation rates greater than or less than the preset fragmentation rate threshold are selected as the data blocks to be processed.

For example, fig. 3 is a data migration flowchart of a data processing method provided in an embodiment of the present specification. As shown in fig. 3 (a), when the data block to be processed is a data file to be processed, after receiving the data processing instruction, processing the data fragments stored in the data file, reading the data fragments stored in the data file, and migrating the data fragments to a target data file. Data file 1, data file 2, data file 3, and data file 4 are determined in response to the data read instruction. And traversing the data stored in each data file to acquire the effective data information and the fragment data information stored in each data file. And calculating the fragment rate of each data file by adopting a fragment rate calculation formula (1).

In formula (1), frannum represents the number of fragmented data written into a data file by a user file, EffectiveSize represents the effective data amount in the current data file, and fransizebenchmark is a fixed normalization factor: 4096. when the fragment data is 1024 pieces of data and the valid data is 2GB, the fragment rate is calculated to be 2/512 bytes, that is, there are 2 pieces of fragment data per 512 bytes. The fragmentation rates of data file 1, data file 2, data file 3, and data file 4 were calculated to be 1/512 byte, 2/512 byte, 1/512 byte, and 1/512 byte, respectively. Sorting according to the fragment rate from high to low, and selecting the data file 2 with the highest fragment rate as a data block to be processed; or the fragmentation threshold is set to be 2/512 bytes, and the data file 2 which is greater than or equal to the fragmentation threshold is taken as a data block to be processed.

In summary, the data blocks to be processed are selected from the at least two data blocks by presetting the fragmentation rate threshold or sorting the fragmentation rate, so that the diversity of the selection modes of the data blocks to be processed is improved.

Step 204: and reading the data to be processed stored in the data block to be processed, and determining an associated data block corresponding to the data to be processed according to the data to be processed.

Specifically, the data to be processed refers to part or all of the data stored in the data block to be processed; the associated data block refers to a data file or a data unit for storing associated data having an association relationship with the data to be processed, where the association relationship between the associated data and the data to be processed may be that the associated data and the data to be processed come from the same user file.

Based on this, after the data block to be processed is obtained, the data stored in the data block is read, and the data to be processed stored in the data block to be processed is read. And determining an associated data block having an association relation with the data to be processed according to the data to be processed so as to read the management data stored in the associated data block subsequently.

Further, the data block to be processed may be a data unit for storing various data, and when the data block to be processed is a data file to be processed for storing file data to be processed, and when the data file to be processed is subjected to data processing, file data to be processed needs to be acquired, which is specifically implemented as follows:

Specifically, the data file to be processed refers to a file or a data storage unit that needs to process part or all of the data stored in the data file, and correspondingly, the data of the file to be processed refers to part or all of the data stored in the data file to be processed.

Based on the above, when the data block to be processed is the data file to be processed, the data stored in the data file to be processed is read, and all or part of the read data is used as the data of the file to be processed. So as to subsequently and continuously carry out data processing on the file data to be processed.

In summary, when the data block to be processed is the data file to be processed, the data file to be processed is processed, so that the universality of data processing is realized.

Further, after the file data to be processed is determined, considering that data having an association relationship with the file data to be processed may be stored in a plurality of associated data blocks, in order to implement integrity of data processing, it is also necessary to determine an associated data file corresponding to the file data to be processed, which is specifically implemented as follows:

determining a user file according to the file data to be processed; and determining an associated data file corresponding to the file data to be processed based on the user file.

Specifically, the user file refers to a data file for storing data, in this embodiment, data in the user file may be stored in a plurality of data files, the to-be-processed data file is one or more of the plurality of data files, a relevant data file corresponding to the to-be-processed data file may be determined according to the to-be-processed data stored in the to-be-processed data file, and the relevant data file is another data file of the plurality of data files except the to-be-processed data file.

Based on this, after the file data to be processed is determined, the file data to be processed corresponds to the user file, so that the user file corresponding to the file data to be processed can be determined based on the file data to be processed, the user file comprises the file data to be processed, and the user data except the file data to be processed is also included, and the user data is stored in one or more associated data files, so that the associated data file corresponding to the file data to be processed can be determined based on the user file, and the associated data in the associated data file, which has an associated relationship with the file data to be processed, can be read.

In summary, the user file can be determined through the file data to be processed, so that the associated data file is determined, the associated data stored in the associated data file is further read, the file data to be processed and the associated data corresponding to the user file are read, and the accuracy of data reading is improved.

Further, when determining the user file according to the file data to be processed, considering that the user file contains a large amount of data, and the data corresponding to the user file may be divided into a plurality of portions to be stored in different locations, file index information may be generated when storing the data, and the file index information is used for recording the data and the storage locations of the data, so as to facilitate subsequent data reading, and the specific implementation is as follows:

acquiring file index information corresponding to the file data to be processed; and determining the user file corresponding to the file data to be processed according to the file index information.

Specifically, the file index information is used to indicate a storage location of the file data to be processed and a data source condition. In this embodiment, the file index information is stored in the form of key value pairs, where in a key value pair, a key may represent a data name and an identifier corresponding to data, and a value may represent a storage location of data or specific data. According to the user file and the data corresponding to the user file, file index information is generated according to the data storage conditions of the data file to be processed and the associated data file, and is used for recording the corresponding relations between the data of the file to be processed and the user file and between the associated data and the user file, namely the storage position of the data of the file to be processed in the user file and the storage position of the data of the file to be processed in the data file to be processed.

Based on this, when the user file is determined according to the file data to be processed, the file index information corresponding to the file data to be processed is determined according to the file data to be processed, the user file corresponding to the file data to be processed can be obtained through the file index information, and the storage positions of other data corresponding to the user file in the associated data file are also recorded through the file index information, so that the user file corresponding to the file data to be processed can be determined according to the file index information.

In summary, the associated data file corresponding to the file data to be processed is determined by acquiring the file index information, so that the data corresponding to the user file can be acquired, and the integrity and the accuracy of data acquisition are improved.

Further, after determining the file data to be processed, the file index information, and the user file, in order to implement the integrity of data processing, it is further required to determine a file data to be processed, which is specifically implemented as follows:

Based on the method, data corresponding to the user file, including file data to be processed and other data, are determined by reading the file index information, the file data to be processed is stored in the file of the file to be processed, the other data is stored in the associated data file, and the other data is the associated data. And reading the file index information, and obtaining the storage position of the data in the user file in the data file to be processed or the associated data file, thereby obtaining the associated data file corresponding to the data of the file to be processed.

Following the above example, as shown in fig. 3 (a), after it is determined that the data file 2 is a data block to be processed, that is, a data file to be processed, the 4K data block and the 8K data block stored in the data file 2 are read, file index information is obtained, a user file corresponding to the 4K data block or the 8K data block is determined according to the file index information, and then, an associated data file having an association relationship with the 4K data block or the 8K data block, that is, a data file 1, a data file 3, and a data file 4, is determined according to the file index information and the user file.

In summary, the associated data file is determined according to the file index information and the user file, so that data corresponding to the user file is acquired as much as possible, and the integrity of data processing is improved.

Further, the data block to be processed may be a data unit for storing various data, and when the data block to be processed is a layer file to be processed for storing the index data to be processed, and when the layer file to be processed is subjected to data processing, the index data to be processed needs to be acquired, which is specifically implemented as follows:

Specifically, the layer file to be processed refers to a file or a data storage unit that processes part or all of the index data stored in the layer file to be processed, and correspondingly, the index data to be processed refers to part or all of the index data stored in the layer file to be processed.

Based on the above, when the data block to be processed is the layer file to be processed, the index data stored in the layer file to be processed is read, and all or part of the read index data is used as the index data to be processed. So as to subsequently and continuously carry out data processing on the index data to be processed.

In summary, when the data block to be processed is the layer file to be processed, the layer file to be processed is processed, so that the universality of data processing is realized.

Further, after the index data to be processed is obtained, the index file corresponding to the index data to be processed can be determined according to the index data to be processed, so that the association layer file is determined according to the index file, which is specifically implemented as follows:

acquiring an index file corresponding to the index data to be processed; and determining an associated layer file corresponding to the index data to be processed based on the index file.

Specifically, the index file refers to a file for storing index information, and in this embodiment, a storage location, actual data, and a data tag or a data name of index data to be processed are recorded in the index file. Meanwhile, the index file also records the related information of the related index data stored in the related layer file, so that the related layer file can be determined according to the index file.

Based on this, after the index data to be processed is determined, the index file is obtained according to the index data to be processed, the index file stores the relevant information of the index data to be processed, the index file comprises the layer file to be processed corresponding to the index data to be processed and the relevant information of the associated layer file, and the relevant information of the associated layer file is the information such as the storage position, the data name and the like of the associated index data in the associated layer file.

In the above example, as shown in fig. 3 (b), when the data block to be processed is a layer file to be processed, the index data to be processed stored in the layer file to be processed is obtained, then the index file corresponding to the index data to be processed is obtained, and the associated index data stored in the associated layer file 1, the associated layer file 2, and the associated layer file 3 is determined according to the index file.

In summary, the associated layer file corresponding to the index data to be processed is determined according to the index file, so that the associated index data to be processed, which is stored in the associated layer file, is obtained, and the integrity of data processing is improved.

Step 206: and reading the associated data corresponding to the data to be processed and stored in the associated data block, and migrating the data to be processed and the associated data to the target data block.

Specifically, the associated data refers to data having an association relationship with the data to be processed, and in this embodiment, the data to be processed and the associated data may be data from the same source, such as: the data to be processed and the associated data are uploaded to a file storage system by the same user, and the data to be processed and the associated data are stored in different data blocks by the file storage system.

Based on this, after the associated data block is determined, all data stored in the associated data block is read, and partial data in all data is determined to be associated data, or the data stored in the associated data block is all associated data. Data migration is carried out on the data to be processed and the associated data, and the data are migrated into the target data block, wherein the migration sequence can be a data acquisition sequence, namely the data to be processed are stored firstly, and then the associated data are stored; the data to be processed and the associated data may also be stored in parallel, and the storage manner is not particularly limited in this embodiment.

In practical applications, when the data block to be processed is a data file to be processed, as shown in fig. 3 (a), data in the data blocks in the data files 1, 2, 3 and 4 are migrated into the target data file. In the case that the data block to be processed is the layer file to be processed, as shown in (b) in fig. 3, the index data to be processed stored in the layer file to be processed, and the associated index data stored in the associated layer file 1, the associated layer file 2, and the associated layer file 3 are migrated to the target layer file.

Further, in the case of a failure in data migration, the target data block is deleted. And after the data to be processed and the associated data are migrated to the target data block, deleting the data to be processed in the data block to be processed and deleting the associated data in the associated data block.

Further, when the data block to be processed is a data file to be processed, after the data migration is completed, the file index information needs to be updated because the data is migrated, which is specifically implemented as follows:

and updating the file index information according to the target data block.

Based on this, when the data block to be processed is the data file to be processed, after the data of the file to be processed in the data file to be processed and the associated file data in the associated data file are migrated to the target data block, the file index information is updated according to the data migration result, that is, the information such as the data storage location of the data of the file to be processed and the associated file data, the corresponding relationship between the data of the file to be processed and the user file, and the like is updated.

In practical application, when the data block to be processed is the data file to be processed, after the data migration is completed, the file index information is updated according to the data migration condition. I.e. the correspondence of data blocks in the user file to data blocks in the target data file.

In summary, after the data migration is completed, the file index information is updated in time, so as to improve the data read-write speed after the migration.

In addition, under the condition that the data block to be processed is the layer file to be processed, when the index data to be processed and the associated index data are acquired, in order to improve the data acquisition efficiency, data omission can be prevented from traversing the layer file to be processed and the associated layer file through a preset iteration strategy, and the method is specifically implemented as follows:

traversing the layer file to be processed and the associated layer file based on a preset iteration strategy to obtain index data to be processed and associated index data; and storing the index data to be processed and the associated index data to a target layer file.

The preset iteration strategy is a predetermined data reading strategy and is a data reading method. Based on the method, the layer file to be processed and the associated layer file are traversed based on a preset iteration strategy, index data to be processed are obtained from the layer file to be processed, and associated index data are obtained from the associated layer file. The index data to be processed and the associated index data are stored in the target layer file, and when the index data to be processed and the associated index data are stored, the index data to be processed and the associated index data may be stored according to the obtaining sequence of the index data to be processed and the associated index data, or may be stored in parallel.

In practical application, as shown in fig. 3 (b), when the to-be-processed data block is the to-be-processed layer file, after the to-be-processed layer file, the associated layer file 1, the associated layer file 2, and the associated layer file 3 are determined, the to-be-processed layer file, the associated layer file 1, the associated layer file 2, and the associated layer file 3 are iterated in an iterative manner, the data block to be moved is obtained, and the data in the data block is moved to the target layer file.

In summary, the layer file to be processed and the associated layer file are traversed based on the preset iteration strategy, so that the acquisition speed of the index data to be processed and the associated index data is increased, and the index data to be processed and the associated index data are stored in the target layer file more quickly.

In summary, by receiving a data processing instruction, a data block to be processed and a target data block are obtained based on the data processing instruction; reading data to be processed stored in a data block to be processed, and determining a related data block corresponding to the data to be processed according to the data to be processed; and reading the associated data corresponding to the data to be processed and stored in the associated data block, and transferring the data to be processed and the associated data to the target data block. The data management of the data to be processed and the associated data is realized, and the data migration speed is improved.

The data processing method provided in this specification is further described with reference to fig. 4 by taking an application of the data processing method in fragmentation data migration as an example. Fig. 4 shows a flowchart of a processing procedure of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.

Step 402: a data processing instruction is received.

Step 404: at least two data files are determined based on the data processing instructions.

Step 406: and traversing the data stored in each data file to obtain effective data information and data slice information.

Step 408: and calculating the fragmentation rate of each data file according to the effective data information and the data slice information.

Step 410: and sequencing each data file according to the fragment rate of each data file, and selecting a preset number of data files as data files to be processed.

Step 412: and acquiring the data of the file to be processed stored in the file to be processed.

Step 414: and determining file index information according to the file data to be processed.

Step 416: and reading the file index information, and determining the user file storing the file data to be processed according to the file index information.

Step 418: and determining the associated data file according to the file index information and the user file.

Step 420: and reading the associated data in the associated data file.

Step 422: and migrating the file data to be processed and the associated data to a target data file.

In summary, by receiving a data processing instruction, acquiring data files based on the data processing instruction and calculating the fragment rate of each data file, determining a data file to be processed among a plurality of data files according to the fragment rate, reading data of the data file to be processed stored in the data file to be processed, and determining a related data file corresponding to the data of the data file to be processed according to the data of the data file to be processed; and reading the associated data corresponding to the file data to be processed stored in the associated data file, and transferring the file data to be processed and the associated data to a target data file. The data management of the file data to be processed and the associated data is realized, and the data migration speed is improved.

The data processing method provided in this specification is further described with reference to fig. 5 by taking an application of the data processing method in index data migration as an example. Fig. 5 is a flowchart illustrating a processing procedure of another data processing method provided in an embodiment of the present specification, and specifically includes the following steps.

Step 502: a data processing instruction is received.

Step 504: at least two layer files are determined based on the data processing instructions.

Step 506: and traversing the index data stored in each layer file to obtain effective index data information and index data piece information.

Step 508: and calculating the fragment rate of each layer file according to the effective index data information and the index data piece information.

Step 510: and sequencing the layer files according to the fragment rate of each layer file, and selecting a preset number of layer files as the layer files to be processed.

Step 512: and acquiring index data to be processed stored in the layer file to be processed.

Step 514: and acquiring an index file corresponding to the index data to be processed.

Step 516: and determining an associated layer file corresponding to the index data to be processed according to the index file.

Step 518: and reading the associated index data stored in the associated layer file.

Step 520: and migrating the index data to be processed and the associated index data to a target layer file.

In summary, by receiving a data processing instruction, obtaining at least two layer files based on the data processing instruction and calculating a fragmentation rate of each layer file, determining a layer file to be processed among the layer files according to the fragmentation rate, reading index data to be processed stored in the layer file to be processed, and determining an associated layer file corresponding to the index data to be processed according to the index data to be processed; and reading the associated index data corresponding to the index data to be processed and stored in the associated layer file, and transferring the index data to be processed and the associated index data to the target layer file. The data management of the index data to be processed and the associated index data is realized, and the data migration speed is improved.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a data processing apparatus, and fig. 6 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:

a receiving module 602 configured to receive a data processing instruction, and obtain a to-be-processed data block and a target data block based on the data processing instruction;

a reading module 604, configured to read to-be-processed data stored in the to-be-processed data block, and determine, according to the to-be-processed data, an associated data block corresponding to the to-be-processed data;

a migration module 606 configured to read the associated data corresponding to the to-be-processed data stored in the associated data block, and migrate the to-be-processed data and the associated data to the target data block.

In an optional embodiment, the receiving module 602 is further configured to:

In an optional embodiment, the reading module 604 is further configured to:

and updating the file index information according to the target data block.

In an optional embodiment, the reading module 604 is further configured to:

The data processing apparatus provided by the present specification, by receiving a data processing instruction, acquires a data block to be processed and a target data block based on the data processing instruction; reading data to be processed stored in a data block to be processed, and determining a related data block corresponding to the data to be processed according to the data to be processed; and reading the associated data corresponding to the data to be processed stored in the associated data block, and transferring the data to be processed and the associated data to the target data block. The data management of the data to be processed and the associated data is realized, and the data migration speed is improved.

The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.

FIG. 7 illustrates a block diagram of a computing device 700 provided in accordance with one embodiment of the present description. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

Wherein the processor 720 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data processing method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the steps of the data processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the data processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of data processing, comprising:

2. The method of claim 1, the fetching of the block of data to be processed based on the data processing instruction, comprising:

3. The method of claim 2, wherein selecting the data block to be processed from the at least two data blocks according to the fragmentation rate of each data block comprises:

4. The method of claim 2, wherein the calculating the fragmentation rate of each data block based on a preset fragmentation rate calculation strategy comprises:

5. The method of claim 1, the reading the pending data stored by the pending data block, comprising:

6. The method of claim 5, the determining, from the to-be-processed data, an associated data block corresponding to the to-be-processed data, comprising:

determining a user file according to the file data to be processed;

7. The method of claim 6, the determining a user file from the pending file data, comprising:

8. The method of claim 7, the determining, based on the user file, an associated data file corresponding to the file data to be processed, comprising:

9. The method of claim 7, after the step of migrating the data to be processed and the associated data to the target data file is performed, further comprising:

and updating the file index information according to the target data block.

10. The method of claim 1, the reading the pending data stored by the pending data block, comprising:

11. The method of claim 10, the determining, from the to-be-processed data, an associated data block corresponding to the to-be-processed data, comprising:

acquiring an index file corresponding to the index data to be processed;

12. The method of claim 10, migrating the pending data and the associated data to the target data block, comprising:

13. A computing device, comprising:

a memory and a processor;

the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions, which when executed by the processor, implement the steps of the data processing method of any one of claims 1 to 12.

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 12.