CN117667873A

CN117667873A - Log file processing method and device

Info

Publication number: CN117667873A
Application number: CN202211095916.0A
Authority: CN
Inventors: 孙磊; 曹强
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2024-03-08

Abstract

The application provides a log file processing method, which can comprise the following steps: client data is written into k log files, and the address space of each log file is divided into n data blocks. When the target data block of each log file is fully written, writing check data into m target check blocks according to data in k target data blocks, wherein the m target check blocks are respectively one check block in m check log files, and the address space of each check log file is divided into n check blocks. The coding process taking the log file as the minimum unit is split into a plurality of coding processes taking the data block in the log file as the minimum unit, so that the method is beneficial to reducing the computational complexity of the coding process, reducing the computational resources occupied by the coding process and improving the performance of a storage system.

Description

Log file processing method and device

Technical Field

The present disclosure relates to the field of computer data storage technologies, and in particular, to a method and an apparatus for processing a log file.

Background

The current storage system increasingly adopts a log file to record continuously generated data, the mode can fully exert the high performance of sequential writing of storage equipment, and the system architecture design can be simplified, including snapshot design, data consistency, cost optimization, data security assurance, abnormal processing and the like.

To improve the reliability of data in the log files, after writing a plurality of log files, a verification rule (such as erasure coding technique) is used to generate verification log files of the plurality of log files. Taking the erasure code ratio as K+M as an example, data in the K log files can be encoded to obtain M check log files, and then the K log files and the M check log files are stored. When any one log file is damaged, the log file can be reconstructed (or decoded) by reading the check log file and other log files, so that the reliability of data in the log file is improved.

The process of encoding a plurality of log files has higher computational complexity, occupies a large amount of computational resources, and greatly reduces the performance of the storage system.

Disclosure of Invention

The application provides a log file processing method and device, which are used for reducing the computational resources occupied by a process of encoding a plurality of log files.

In a first aspect, a method for processing a log file is provided, where the method may include: client data is written into k log files, and the address space of each log file is divided into n data blocks, wherein k and n are positive integers. When the target data blocks of each log file are fully written, writing check data into m target check blocks according to data in k target data blocks, wherein the m target check blocks are respectively one check block in m check log files, the address space of each check log file is divided into n check blocks, and m is a positive integer.

The coding process taking the log file as the minimum unit is split into a plurality of coding processes taking the data block in the log file as the minimum unit, so that the computational complexity of the coding process is reduced, the computational resources occupied by the coding process are reduced, and the performance of the storage system is improved.

Optionally, the client data includes first data written as indicated by one or more write requests.

Optionally, the first data includes a plurality of data units arranged in sequence, and writing the first data into k log files includes: and writing the plurality of data units into k log files in turn according to the size of the residual address space of each log file. In this way, the first data is distributed in k log files at the same progress, so that the time resources before the k log files are fully utilized to fill m check log files.

Optionally, writing an ith data unit in the plurality of data units into k log files according to a size of a remaining address space of each log file includes: and writing the ith data unit into at least one log file with the largest residual address space, wherein i is a positive integer less than or equal to k. The first data are distributed in k log files at the same progress, so that the time resources before the k log files are fully utilized to fill m check log files.

Optionally, each of the data units is data indicated to be written by one of the one or more write requests.

Optionally, the size of each data unit is the size of one or more of the data blocks. The data written by the single writing request may include a plurality of data blocks, and each data block is written into at least one log file with the largest residual address space, so that when the data written by the writing request is written, verification data of the data can be acquired, the generation efficiency of the verification data can be improved, and the reliability of the data written by the writing request can be improved.

Optionally, the client data further includes management data of each log file, and the management data of each log file includes metadata of a corresponding log file.

Optionally, the address space of each log file includes a first address space for writing the first data and a second address space for writing management data of the corresponding log file, and writing the management data of k log files into k log files includes: and writing management data of the target log file into a second address space of the target log file after the first address space of the target log file is fully written by part of data in the first data.

Optionally, the address space of k log files and the address space of m check log files are divided into n check groups, each check group includes one data block of k log files and one check block of m check log files, and a target check group of n check groups includes k target data blocks and m target check blocks.

In a second aspect, there is provided a log file processing apparatus, including: and the log file writing module is used for writing client data into k log files, the address space of each log file is divided into n data blocks, and k and n are positive integers. And the verification log file writing module is used for writing verification data into m target verification blocks according to data in k target data blocks when the target data blocks of each log file are fully written, wherein the m target verification blocks are respectively one verification block in m verification log files, the address space of each verification log file is divided into n verification blocks, and m is a positive integer.

Optionally, the first data includes a plurality of data units that are sequentially arranged, and the log file writing module is specifically configured to sequentially write the plurality of data units into k log files according to a size of a remaining address space of each log file.

Optionally, the log file writing module is specifically configured to write the ith data unit into at least one log file with the largest remaining address space, where i is a positive integer less than or equal to k.

Optionally, the size of each data unit is the size of one or more of the data blocks.

Optionally, the address space of each log file includes a first address space for writing the first data and a second address space for writing management data of the corresponding log file, and the log file writing module is specifically configured to write the management data of the target log file into the second address space of the target log file after the first address space of the target log file is fully written by part of the data in the first data.

In a third aspect, the present application provides a computer device comprising a memory and a processor executing computer instructions stored in the memory, such that the computer device performs the method described in any one of the possible implementations of the first aspect.

In a fourth aspect, the present application provides a storage system comprising a processing device as any one of the possibilities of the second aspect.

Optionally, the storage system further comprises a client or application server, the client or application server sending a write request or a read request to any one of the possible processing means of the second aspect. Optionally, the write request is a file archive signal and the read request is a file read signal.

In a fifth aspect, the present application provides a computer readable storage medium comprising instructions which, when run on a computer device, cause the computer device to perform the method described in any one of the possible implementations of the first aspect.

In a sixth aspect, the present application provides a computer program product comprising program code which, when executed by a computer device, implements a method as described in any one of the possible implementations of the first aspect.

Since each device provided in the present application may be configured to perform the steps of any one of the possible implementation manners of the first aspect, the technical effects obtained by each device in the present application may refer to the technical effects obtained by the foregoing method, and are not described herein in detail.

Aiming at the defects and improvement demands of the prior art, the invention provides a multi-log and verification concurrency generation system, a method and a client thereof, and aims at: by dividing the log file into a plurality of fine-grained data blocks, the check data blocks are generated concurrently in the concurrent writing process of the plurality of log files, and the efficiency of generating the check data and the overall reliability of the log data are improved.

In order to achieve the above object, according to a first aspect of the present invention, there is provided a multi-log and check concurrency generating method and system thereof, including recording client writes using log file groups, dividing a logical address space inside log files into a set of consecutive fixed blocks and writing user data from low addresses to high numbers according to addresses, each log file writing user data in an append-only manner from low addresses, a data encoding and invalidation data recovering method of log file check groups, a client write placement rule, a client write reading rule, a client.

Preferably, a large number of clients write in an append write to a set of log files within a log storage server, each log file having a unique ID and being fixed in size. After one log file is filled, all client writing attributes and internal storage addresses contained in metadata record in the log file are also stored in the log file, and then the log file is changed into a seal state and is in a read-only state. The plurality of seal log files need to generate corresponding check log files according to a given check rule, and the seal log files and the check log files form a log file check group.

Preferably, the logical address space within the log file is divided into a set of contiguous data blocks. Firstly, creating an empty log file check group comprising empty log files and check files, wherein the client writing operation is respectively additionally written into the log files of the group according to a specific placement strategy, when the corresponding data blocks in the log files of the group are all full, immediately generating corresponding check data blocks according to a given check rule, and writing the corresponding data blocks of the corresponding check log files. After all the log files are fully written, writing all metadata of the log files into the tail parts of the log files, sealing the log files, generating a group of check data blocks for the data blocks where the metadata of the log files are located by the system, and writing the check data blocks into the check log file data blocks. And finally, closing the log file check group integrally.

Preferably, each log file is written with user data in a mode of only adding from a low address, the data written by a client is written in the log file first, the storage position information of the data written by the client in the log file is recorded, and when the log file is not fully written, the log file metadata is stored in a memory or other storage areas; and writing all log file metadata and log file attributes into the log file once after the log file is fully written, and sealing the log file.

Preferably, the logical address space inside the log file is divided into a set of consecutive fixed data blocks, and the different log file blocks are the same size, numbered from low to high. In the same log file check group, the same numbered blocks in the log file and the check log file form a coding strip. Taking (n, k) Reed Solomon coding as an example, the log file group comprises n log files, k check log files, the log files and the check log files are the same in size, the log files and the internal logic address space of the check log files are divided into a group of continuous fixed blocks, the check blocks in the same log file are numbered gradually from a low address, the check blocks with the same number in the same log file group form a coding strip, wherein the check blocks in the log files store data blocks, the check blocks in the check log files store check data, and after all the data blocks in one strip are fully written, the check blocks are generated according to the data blocks in the strip and written into corresponding positions in the check log.

Preferably, a log file check group has a log file data recovery method. When one or more log files or check files fail within the given fault tolerance capability, recovering the data in the log files or check files according to the residual log files or check files, and recovering the failed log files or check files according to the given check rules to replace the failed files.

Preferably, in the log file check group, when a log file or a part of data blocks in the check file fail, the failed data blocks can be recovered from the corresponding encoded stripes at a given fault tolerance capability.

Preferably, the client write placement policy assigns client writes to different log files in a log file check group. Specifically, one log file group comprises n log files and k check data log files, one or more writing processes are used for distributing all writing requests to the n log files according to a specific customer writing placement rule, and data blocks in one coding strip are guaranteed to be filled at the same time as much as possible, so that corresponding check data blocks are generated rapidly.

Preferably, the customer write placement rules in an embodiment are: and sequencing the log files in the check group according to the residual storage space. If the current client write size is smaller than or equal to the maximum remaining space, the write is allocated to the log file with the largest remaining space, and if two log files with the largest remaining space exist, the write number is smaller. If the current client write size is greater than the maximum remaining space but less than the sum of the remaining space of all log files, splitting the client write, and starting allocation from the log file with the maximum remaining space until the client write is completely allocated. If the current client write size is larger than the sum of the residual spaces of all log files, splitting the client write to fill all log files, then creating a new log file check group, and writing the residual data written by the client according to the previous algorithm. Each time a user write is completed, the corresponding log file metadata is updated.

Preferably, the customer write placement rules in an embodiment are: if the current client writing is in a coding stripe priority mode, the client writing is sequentially written into the data blocks of the same coding stripe in different log files in turn in a horizontal mode. When one encoded stripe is full, the next encoded stripe is written. Each time a user write is completed, the corresponding log file metadata is updated.

Preferably, the customer write placement rules in an embodiment are: when all log files in a log file check group are filled, writing all log file metadata into tail data blocks of the corresponding log files, and sealing the log files. When the lengths of the seal log files are unequal, generating a check stripe according to the largest data block of the longest seal log file, and if the shorter log file does not have the data block in the check stripe or the data block is not full, filling the non-existing data block with all 0 s or all 1 s to further generate a check data block, and writing the check data block into the check file.

Preferably, the sealing rules of the log file group are: and writing all log file metadata of the current log file group into the tail of the log file, and additionally writing the log file metadata into the header of the log file. If the sum of the client file and the metadata length is less than the predefined length of the log file, the insufficient portion is filled in between the client file and the metadata using invalid data. When the writing of the log file metadata and the header is completed, the log file becomes a read-only non-writable state, and the sealing of the log file is completed.

Preferably, the reading process of the client file is as follows: taking a client file with a read keyword as A as an example, inquiring the keyword A from the index table, if the inquiry fails, indicating that the client file does not exist, and returning to failure. If the inquiry is successful, all word files of the client file are read from the log file, and the whole word files are spliced into a complete file and then returned.

Preferably, the reading process of the client write is: and inquiring the key words of the client files from the index table, and if the inquiry fails, indicating that the client files do not exist, returning failure. If the inquiry is successful, all word files of the client file are read from the log file, and the whole word files are spliced into a complete file and then returned.

To achieve the above object, according to a second aspect of the present invention, there is provided a client capable of communicating with a multi-log parity concurrency generation method and system thereof according to the first aspect of the present invention, and transmitting a file archiving signal and a file reading signal thereto.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

the system and the method for generating the multi-log and the concurrent verification provided by the invention fully consider the parallel verification of the log file at the data block level and consider the writing speed of the log file and the generation speed of verification data. The check data is generated in the process of writing the log file, so that the problems of large-scale read-write behavior of the disk and the accompanying system performance degradation caused by the fact that a large amount of check data is generated at one time in the conventional check data generation process can be avoided.

Alternatively, the foregoing description is only an overview of the technical solution of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented according to the content of the specification, and the following detailed description will be given with reference to the accompanying drawings. Specific embodiments of the present invention are given in detail by the following examples and the accompanying drawings.

Drawings

FIG. 1 schematically illustrates a system architecture to which the present application is applicable;

FIG. 2 schematically illustrates a flow of a method of processing log files of the present application;

FIG. 3-1 schematically illustrates data in a full log file;

fig. 3-2 schematically illustrates one possible flow of a client writing process;

FIGS. 4-1 through 4-10 schematically illustrate one particular of the processes illustrated in FIG. 2;

FIGS. 5-1 through 5-5 schematically illustrate another specific one of the flows shown in FIG. 2;

FIG. 6 schematically illustrates a process of reading a client file;

FIG. 7 schematically illustrates one possible configuration of a computer device provided herein;

fig. 8 schematically illustrates a possible structure of the log file processing apparatus provided in the present application.

Detailed Description

For the purposes of making apparent the objects, technical solutions and advantages of the present invention, the principle and features of the present invention are described below with reference to the attached drawings, the examples being given for the purpose of illustrating the invention only and not for limiting the scope of the invention. The invention is more particularly described by way of example in the following paragraphs with reference to the drawings. Advantages and features of the invention will become more apparent from the following description and from the claims. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The current storage system increasingly adopts a log additional writing mode to store continuously generated data, the mode can fully exert the high performance of sequential writing of storage equipment, and the system architecture design can be simplified, including snapshot design, data consistency, cost optimization, data security assurance, exception handling and the like. On the other hand, in order to ensure the reliability of data, the storage system often adopts a Redundant Array of Independent Disks (RAID) mode, a plurality of independent hard disks are connected in a designed mode to form a logic hard disk group, a data block check group is constructed by setting a specific check rule, a check data block is generated according to the data blocks in the group, when partial data (within a tolerable range) in the group is inaccessible, inaccessible data can be obtained through decoding the active data blocks and the check blocks in the group, and therefore, the reliability higher than that of a single hard disk is obtained.

Fig. 1 schematically illustrates a system architecture. Referring to fig. 1, the system architecture includes a computer device and a first memory. The computer device may create a log file based on the storage resources of the first memory, writing the client data into the log file. The type of the first memory is not limited, and the first memory may be a memory of a computer device, or the first memory may be a low-cost high-capacity memory such as a hard disk. The amount of memory in the first memory is not limited and, alternatively, the first memory may comprise one or more memories.

To increase the reliability of data in log files, a computer device may use a verification rule (e.g., erasure coding techniques) to generate a verification log file for a plurality of log files. When any one log file is damaged, the log file is reconstructed (or decoded) by reading the check log file and other log files, so that the reliability of data in the log file is improved. For ease of description, the plurality of log files and corresponding check log files are referred to as a log file check group. In order to improve reliability, the computer device may optionally create multiple files (log files or check log files) in the same log file check group in different memories of the first memory.

Optionally, the system architecture further comprises a second memory. The computer device may write the log file check group in the first memory to the second memory. Optionally, the second memory includes a plurality of memories, and the computer device may store the file distribution in the log file check group in the plurality of memories in order to improve reliability. The type of the second memory is not limited, and the second memory may be a large-capacity low-cost memory, for example, a plurality of hard disks.

The type of computer device is not limited and may alternatively be a terminal device or an application server or a storage server. When the computer device shown in fig. 1 is a storage server (or referred to as a log storage server), optionally, referring to fig. 1, the system architecture corresponding to fig. 1 further includes one or more terminals or application servers, where the terminals or application servers send write requests or read requests to the storage server. The following description will take a computer device as an example of a storage server (or log storage server).

In one possible example, the log write and storage redundancy is writing a large number of client files to a set of memory log files within a log storage server in an append write method, each log file having a unique ID and being fixed in size. After a log file is filled, the log file becomes a seal state and is in a read-only state. And the plurality of seal log files generate corresponding check log files according to a given check rule, and the seal log files and the check log files form a log file check group, so that the reliability of log data is improved.

In the above example, the storage server sequentially generates log files and then generates check log files. The single log file is used as a basic unit for verification calculation, and a larger verification calculation is required to be executed after all log files are written, so that the calculation task load in a certain period of time is particularly heavy, and the use of other applications on a CPU is affected; secondly, when the log file is not subjected to final verification, the reliability of the log file cannot be guaranteed, the data damage at the time cannot be detected and recovered, and the time is not short; thirdly, a large verification calculation is needed to be performed after the writing of a whole group of log files is completed, so that the utilization rate of the whole disk bandwidth is insufficient.

In order to solve the technical problems, the following technical conception is designed: dividing the logic address space in the log file into a group of continuous data blocks, uniformly distributing the log file in the group of data blocks, immediately checking and calculating the data blocks when the group of data blocks is fully stored, iterating continuously, and writing the completed data block group into a hard disk, thereby greatly improving concurrency; in order to further improve the writing efficiency of the log file with the verification data, one key point is to improve the generation and writing efficiency of the verification data so as to improve the overall efficiency.

Based on the technical conception, a method and a device for processing log files are described below.

Fig. 2 schematically shows one possible flow of the processing method of the log file. Referring to fig. 2, the method includes steps S201 to S203.

S201, creating an empty first log file check group;

step S201 is an optional step. The storage server may create an empty log file check group (referred to as a first log file check group) comprising k empty log files and m empty check log files. Wherein k and m are positive integers. Alternatively, each file corresponds to a storage address space (abbreviated as address space) in the first memory, to which the storage server can write data. "File" refers to a log file or a verification log file.

Alternatively, the address space of each log file is divided into n unit address spaces (referred to as data blocks), and the address space of each check log file is divided into n unit address spaces (referred to as check blocks). Wherein n is a positive integer.

Optionally, each log file and/or each verification log file has a unique identification. Optionally, the length (or size) of each log file and/or each verification log file is fixed.

S202, writing client data into k log files;

the storage server may write client data into k log files. The type of the client data is not limited, and the client data will be described by way of example.

Optionally, the client data includes first data written as indicated by the one or more write requests. Each write request is used to instruct a storage server to store data (also known as write request data), and the first data includes write request data (or simply client write) to which each of the one or more write requests is to be written.

Optionally, the client data includes management data for each log file. The management data of each log file is used to write into the corresponding log file. Optionally, the management data of the log file is used to index the write request data stored in the log file. In some examples, the management data of the log file includes a header of the log file and/or metadata of the log file.

Optionally, the metadata of each log file records the attribute and the internal storage address of the write request data in the corresponding log file. Optionally, the attribute of the write request data may include an identification and/or creation time of the write request data, etc. The internal storage address includes the storage location (e.g., start address and length, etc.) of the write request data in the log file.

Fig. 3-1 schematically illustrates the 1 st log file after being full, and referring to fig. 3-1, the address bits of the log file include, in order from low to high, a client write a (i.e., the data written as indicated by the write request a), a client write B (i.e., the data written as indicated by the write request B), a client write C (i.e., the data written as indicated by the write request C), … …, client write a metadata, client write B metadata, client write C metadata, and a header. Taking the metadata of the client write a as an example, the metadata includes an identification (e.g., a number) of the client write a, a start address and a length of the client write a in the log file, a creation time of the client write a, and the like. Alternatively, no other data may be included between the client write C and the client write A metadata, or fixed data (e.g., 0 or 1) may be populated.

Optionally, the client data includes first data and management data for each log file. Accordingly, the address space of each log file may include a first address space for writing (a part of or all of) the first data and a second address space for writing management data of the corresponding log file.

Optionally, each log file is additionally written with data starting from a low address. The write request data is written to a low address area (i.e., a first address space) of the log file, and then metadata of the log file (including metadata corresponding to each write request data) and a header of the log file are written in a second address space.

Optionally, the total length of the log file is a fixed length, and if the sum of the length of the write request data, the metadata of the write request data, and the header is less than the fixed length, then all 0 or all 1 fills are used between the write request data and the metadata to fill the log file.

The possible procedure by which the storage server writes client data into k log files is described below.

In some examples, the first data may be considered to include a plurality of data units arranged in sequence. Optionally, writing the first data into the k log files specifically includes: and writing a plurality of data units into k log files in turn according to the size of the residual address space (or the residual storage space) of each log file. The method aims at ensuring that data blocks in k log files are filled up as synchronously as possible, and further generating corresponding check data blocks rapidly.

Optionally, each data unit is data indicated to be written by a write request of the one or more write requests. Alternatively, each data unit may be one or more data blocks in size, e.g., the storage server may segment the first data into a plurality of data units according to a target size, where the target size is the size of the one or more data blocks.

In some examples, "sequential writing" may refer to a physical write performed by one or more processes, or in some examples, "sequential writing" refers to a logical write (i.e., allocating memory space), and a physical write process may be performed by multiple processes in parallel.

Optionally, the size of the remaining address space of the log file refers to the remaining size of the first address space in the corresponding log file. In some examples, writing an i-th data unit of the plurality of data units to the k log files according to the size of the remaining address space of each log file includes: and writing the ith data unit into at least one log file with the largest remaining address space. Wherein i is a positive integer less than or equal to k.

Optionally, i is any positive integer less than or equal to k. Sequentially writing the 1 st data unit, the 2 nd data unit, … …, the i-th data unit, … …, and the kth data unit into k log files, and writing the i-th data unit into the k log files includes: and writing the ith data unit into at least one log file with the largest remaining address space.

The writing of the management data of k log files to the k log files by the storage server includes: and writing management data of the target log file into the second address space of the target log file after the first address space of the target log file is fully written by part of data in the first data.

Optionally, after writing the management data into the target log file, the target log file may be sealed (or closed), and the target log file is set to a read-only state, so as to ensure the security of the data in the target log file.

Alternatively, when all log files in one log file check group are filled, all log file metadata is written to the tail data block of the corresponding log file, and the log file is sealed. When the lengths (or sizes) of the log files after sealing are not equal, a check stripe is generated according to the maximum data block of the longest sealing log file, that is, the length of the check log file is equal to the length of the longest sealing log file. If the shorter log file does not have the data blocks in the check stripe or the data blocks are not full, then the data blocks which do not exist are filled with all 0 s or all 1 s, so that check data blocks are generated, and the check file is written.

Optionally, the lengths of any two log files in the k log files are the same. Alternatively, the length may be a predefined length. In some examples, metadata for all log files of the current log file group is written to the end of the corresponding log file and the header of the corresponding log file is additionally written. For any one log file, if the sum of the length of the first data (referring to all or a portion of the first data, or the client file) and the management data is smaller than the predefined length of the log file, the insufficient portion may be filled with fixed data (or invalid data), for example, between the first data and the management data. When the writing of the metadata and the header of the log file is completed, the log file becomes a read-only non-writable state, and the sealing of the log file is completed.

The process by which the storage server writes client data into k log files will be described below by way of example and will not be described herein.

S203, when the target data block of each log file is fully written, writing check data into m target check blocks according to data in k target data blocks;

in the process that the storage server writes client data into k log files, when all or part of data in the first data is fully written into the target data block of each log file, the storage server can write check data into m target check blocks according to data in k target data blocks respectively.

The k target data blocks are respectively one data block of k log files, and the m target check blocks are respectively one check block of m check log files. In other words, the target data block of each log file is one data block of the corresponding log file, and the target check block of each check log file is one check block of the corresponding check log file.

The selection modes of the target check block and the target data block are not limited. Optionally, the storage server divides the address space of the k log files and the address space of the m check log files into n check groups, each check group including one data block of each log file and one check block of each check log file. The k target data blocks and the m target parity blocks correspond to the same parity group (referred to as a target parity group or target encoded stripe). In some examples, the target parity group may be any one of n parity groups.

Optionally, the internal logical address space of the log file and the check file is divided into a group of continuous fixed data blocks or check blocks respectively, and the different log file blocks are the same in size according to the numbers from low to high of the addresses. Within the same log file check group, the same numbered blocks in the log file and the check log file form a code stripe (i.e., check group). Taking the (k, m) reed-solomon encoding as an example, the log file group comprises k log files and m check log files, wherein the log files and the check log files have the same size, and the logical address space inside the log files and the check log files is divided into a group of continuous fixed blocks. The check blocks in the same log file are numbered in an increasing way from the low address, the check blocks with the same number in the same log file group form a coding strip, wherein the check blocks in the log file are stored with data blocks, and the check blocks in the log file are checked with check data. And after all the data blocks in one stripe are fully written, generating a check block according to the data blocks in the stripe, and writing the check block into a corresponding position in a check log.

The manner in which the storage server detects the target parity group to which the data block is written is not limited. Optionally, the storage server may detect in real time a target parity group in which the data blocks in the n parity groups are written, and when the target parity group is detected, the storage server may immediately write the parity data into the parity blocks therein. Or alternatively, the storage server detects the target check group that the data block is written in the n check groups every fixed time period, and when the target check group is detected, the storage server may write the check data into the check block therein.

The manner of determining the check data is not limited as long as the check data can be used to recover the data in the at least one target data block. Optionally, the storage server determines the verification data in each target verification block according to the verification rule. The type of the check rule is not limited and, alternatively, the check rule may be determined according to erasure coding techniques. For example, the check rule includes a check matrix (or generating matrix), the proportion of erasure codes is k+m, the storage server encodes data in k target data blocks according to the check matrix to obtain m pieces of check data, and the m pieces of check data are written into the m target check blocks respectively.

The type of data in the target data block is not limited, and optionally the target data block includes the write request data and/or the management data of the log file described above.

When the target data blocks of each log file are fully written, the check data are written into m target check blocks according to the data in k target data blocks, so that partial check data can be written into m check log files before all the k log files are fully written. The method corresponding to fig. 2 considers the parallel verification of the log file at the data block level, and considers both the writing speed of the log file and the generation speed of verification data. The problems of large-scale calculation behavior and large-scale disk read-write behavior caused by generating a large amount of check data at one time in the generation process of the check log file and the accompanying system performance reduction are avoided, and the utilization rate of the disk bandwidth is improved.

When the data in part of the blocks in the target check stripe is lost, the storage server can recover the data in the lost blocks according to the rest blocks, so that the reliability of the data written in the first log file check group is improved before the first log file check group is closed.

Specifically, in the log file check group, when a log file or a part of data blocks in the check file fail, the failed data blocks can be recovered from the corresponding encoded stripes with a given fault tolerance.

Step S201 is an optional step, and in another possible example, the storage server creates k log files before step S202, and creates m check log files after the target data block of each log file is written.

Optionally, the method corresponding to fig. 2 further includes step S204.

S204, when all log files and all check log files in the first log file check group are written fully, sealing the first log file check group.

S204 is an optional step, and step S204 is not limited to be performed after S203. When all log files and all check log files in the first log file check group are written, the storage server may seal (or otherwise close) the first log file check group.

Alternatively, after the storage server closes the first log file group, each file in the first log file group may be distributed and written into a plurality of memories in the second memory. Alternatively, after the storage server writes a parity group or stripe, the data in each file in the parity group may be distributed to multiple memories in the second memory.

When one or more log files or check files fail within a given fault tolerance capability, the storage server can recover the data in the log files or check files according to the remaining log files or check files to replace the failed files.

In the log file check group, when a log file or part of data blocks in the check file fail, the failed data blocks can be recovered from the corresponding coding stripes under the given fault tolerance.

If the storage server writes a portion of the data in the first data (referred to as remaining data) to the log files in the first log file check group, then all of the log files in the first log file check group are full, optionally, the storage server may create a second log file check group, and write the remaining data to the log files in the second log file check group. The procedure may refer to the contents of steps S201 to S204, and will not be described here again.

One possible implementation of the processing method corresponding to fig. 2 will be described below taking as an example the data written as indicated by one of the one or more write requests per data unit.

Alternatively, the storage server may sort the log files in the first log check group according to the remaining storage space. The locality-prioritized client writing process is shown in fig. 3-2. If the current client write size is smaller than or equal to the maximum remaining space, the write is allocated to the log file with the largest remaining space, and if two log files with the largest remaining space exist, the write number is smaller. If the current client write size is greater than the maximum remaining space but less than the sum of the remaining space of all log files, splitting the client write, and starting allocation from the log file with the maximum remaining space until the client write is completely allocated. If the current client write size is larger than the sum of the residual spaces of all log files, splitting the client write to fill all log files, then creating a new log file check group, and writing the residual data written by the client according to the previous algorithm. Each time a user write is completed, the corresponding log file metadata is updated.

Specifically, taking the example of using (4, 2) reed solomon encoding, the total length of the log file is 1GB, the corresponding metadata length of a single client write is 1KB, the header length of the log file is 1KB, and the log file is divided into one fixed data block every 1 KB. One log file group contains 4 log files with the length of 1GB, 2 check log files with the length of 1GB, each log file or check log file is divided into 1048576 fixed data blocks, and each fixed data block in each log file or check log file is numbered from 1 to 1048576 from a low address. Fixed data blocks with the same subscript in the log file group form a check stripe, for example, log files 1, 2, 3 and 4 and all fixed data blocks with the subscript of 1 in the log check file form a check stripe, the number is 1, and the like, and the total number of the check stripes is 1048587. Eight client writes (i.e., data indicated to be written by a write request) having lengths of 512MB, 768MB, 512MB, 524285KB, 4KB, 1048566KB, 512MB are written in order, and client write numbers are 1 to 8, respectively.

One possible implementation of the method corresponding to fig. 2 is described below in connection with fig. 4-1 to fig. 4-10.

First, as shown in fig. 4-1, the storage server creates an empty first log file group (i.e., a first log file check group), where log file numbers 1, 2, 3, 4, and check log file numbers 5, 6. At this time, the maximum remaining space of the log files in the log file group is 1GB minus 1KB of reserved header size, and minus 1KB of reserved metadata (denoted as metadata p) size, which is equal to 1048574KB.

As shown in fig. 4-2, the size of the data written as indicated by the write request 1 (abbreviated as client write 1) is smaller than the largest remaining space of the log file, so the log file 1 with the largest remaining space and the smallest number is written, and at this time, the smallest remaining space of the log file 1 is 1048574KB minus the written client write size 512MB, minus the reserved metadata size 1KB, which is equal to 524285KB.

As shown in fig. 4-3, the size of the client write 2 is 768MB, which is smaller than the maximum remaining space of the log file, and the client write 2 is allocated to the log file 2 with the maximum remaining space and the minimum number.

As shown in fig. 4-4, the size of the client write 3 is 512MB, which is smaller than the maximum remaining space of the log file, and the client write 3 is allocated to the log file 3 with the maximum remaining space and the minimum number.

As shown in fig. 4-5, the size of the client write 4 is 512MB, which is smaller than the maximum remaining space of the log file, and the client write 4 is allocated to the log file 4 with the maximum remaining space and the minimum number. During the writing process of the client writing 4, the data blocks in the check stripes numbered 1 to 524288 in the log file group are filled in sequence, and immediately after the check stripe numbered 1 is filled, the check data blocks are calculated and written into the log check files 5 and 6, and so on.

As shown in fig. 4-6, the size of the client write 5 is 524285KB, which is smaller than or equal to the largest remaining space of the log file, and the client write 5 is allocated to the log file 1 with the largest remaining space and the smallest number, at this time, the size of the client write plus the reserved client write metadata plus the header in the log file 1 is exactly 1GB, and the log file 1 is fully written, so that the metadata and the header are written into the log file 1, and then the log file 1 is sealed (the rectangular box corresponding to the log file 1 is bolded).

The client write 6 is 4KB in size, less than or equal to the largest remaining space of the log file, and the client write 6 is allocated to the log file 3 with the largest remaining space and the smallest number (e.g., the blank gray background rectangle in the log file 3 shown in FIGS. 4-7).

As shown in fig. 4-8, the size of the client write 7 is 1048566KB, which is larger than the maximum remaining space of the log file, the current log file 1 has been sealed, the remaining writable space of the log file 2 is 262141KB, the remaining writable space of the log file 3 is 524281KB, the remaining writable space of the log file 4 is 524285KB, the client write 7 is first split into client writes 7' with the size of 524285KB, written into the log file 4, the log file 4 is filled up, metadata and a header are written into the log file 4, and then the log file 4 is sealed. The client write 7 is split into the client write 7 'of 524281KB, the client write 7' is written into the log file 3, the log file 3 is fully written, metadata and a header are written into the log file 3, and then the log file 3 is sealed. During the writing process of the log file 3, the data blocks in the check stripes numbered 524289 to 786432 in the log file group are sequentially filled, and after the check stripe numbered 524289 is filled, the check data blocks are immediately calculated and written into the log check files 5 and 6, and so on.

The size of the client write 8 is 512MB, which is larger than the maximum remaining space of the log file and larger than the sum of the remaining spaces of the log file, and the client write 8 'needs to be split into a client write 8' of 262141KB and a client write 8″ of 262147.

As shown in fig. 4-9, the client write 8' is written to the log file 2, the log file 2 is full, metadata and a header are written to the log file 2, and then the log file 2 is sealed. During the writing process of the log file 2, the data blocks in the check stripes numbered 786433 to 1048576 in the first log file group are filled in sequence, and immediately after the check stripe numbered 786433 is filled, the check data blocks are calculated and written into the log check files 5 and 6, and so on. After all the verification data are written into the verification log file, the whole log file group is completely written and sealed integrally.

4-10, a new group of log files (referred to as a second log file check group) is created, wherein the log files are numbered 7, 8, 9, 10 in order, and the check log files are numbered 11 and 12 in order. Client write 8 "of the remaining 262147KB is written to log file number 7 of the second log file check group.

Another possible implementation of the processing method corresponding to fig. 2 is described below taking as an example the size of each data unit as one or more data blocks.

Alternatively, if the current client write is in a code stripe priority mode, the client writes are written into the data blocks of the same code stripe in different log files in turn in a horizontal mode. When one encoded stripe is full, the next encoded stripe is written. Each time a user write is completed, the corresponding log file metadata is updated.

Specifically, taking the example of using (4, 2) reed solomon encoding, the total length of the log file is 1GB, the corresponding metadata length of a single client write is 1KB, the header length of the log file is 1KB, and the log file is divided into one fixed data block every 1 KB. One log file group contains 4 log files with the length of 1GB, 2 check log files with the length of 1GB, each log file or check log file is divided into 1048576 fixed data blocks, and each fixed data block in each log file or check log file is numbered from 1 to 1048576 from a low address. Fixed data blocks with the same subscript in the log file group form a check stripe, for example, log files 1, 2, 3 and 4 and all fixed data blocks with the subscript of 1 in the log check file form a check stripe, the number is 1, and the like, and the total number of the check stripes is 1048587. Eight client writes with lengths of 512MB, 768MB, 512MB, 524285KB, 4KB, 1048566KB, 512MB are written in sequence, and client write numbers are 1 to 8 respectively.

Another possible implementation of the method corresponding to fig. 2 is described below in connection with fig. 5-1 to 5-5.

First, as shown in fig. 4-1, the storage server creates an empty first log file group (i.e., a first log file check group), where log file numbers 1, 2, 3, 4, and check log file numbers 5, 6. At this time, the maximum remaining space of the log files in the log file group is 1GB subtracted by 1KB of the reserved header, and then 1KB of the reserved metadata (denoted as metadata p) is subtracted, which is equal to 1048574KB, and the total remaining space is 4194296KB.

Client write 1 includes 524288 (4 x 131072) blocks of data (denoted B) that are less than the total remaining space size of the first log file group. The data blocks 1 (denoted as B1) to 4 (denoted as B4) are written into the log files 1 to 4 respectively, the data blocks in the data stripe with the number 1 in the first log file group are filled after the writing is completed, then the check data blocks are calculated, and the check log files 5 and 6 are written. And so on until all the data blocks 524288 are written, the corresponding log file metadata is stored in the memory, and referring to fig. 5-1, each log file is reserved with metadata corresponding to 1KB storage client write 1.

Client write 2 includes 786432 (4 x 196608) blocks of data that are smaller than the total remaining space size of the log file group. As shown in fig. 5-2, data blocks 1 (denoted as B1) to 4 (denoted as B4) are written into log files 1 to 4 respectively, after the writing is completed, the data blocks in the data stripe numbered 131073 in the first log file group are filled, then check data blocks are calculated, check log files 5 and 6 are written, and so on until all the 786432 data blocks are written, the corresponding log file metadata is stored in the memory, and each log file is reserved with metadata corresponding to 1KB storage client write 2.

Client write 3 includes 524288 (4 x 131072) data blocks, less than the total remaining space size of the log file group. As shown in fig. 5-3, data blocks 1 (denoted as B1) to 4 (denoted as B4) are written into log files 1 to 4 respectively, after the writing is completed, the data blocks in the data stripe numbered 327681 in the first log file group are filled, then check data blocks are calculated, check log files 5 and 6 are written, and so on until all writing of 524288 data blocks is completed, corresponding log file metadata are stored in a memory, and 1KB of metadata corresponding to the storage client write 3 is reserved for each log file.

Client write 4 includes 524288 (4 x 131072) data blocks, less than the total remaining space size of the log file group. The data blocks 1 (marked as B1) to 4 (marked as B4) are respectively written into the log files 1 to 4, after the writing is completed, the data blocks in the data strips with the number of 458753 in the log file group are filled, then the check data blocks are calculated, the check log files 5 and 6 are written, and the like until all the writing of the 524288 data blocks is completed, the corresponding log file metadata is stored in the memory, and each log file is reserved with 1KB for storing metadata corresponding to the client writing 4.

Client write 5 includes 524285 (4 x 131071+1) data blocks, less than the total remaining space size of the log file group. Data blocks 1 (denoted as B1) to 4 (denoted as B4) are respectively written into log files 1 to 4, after the writing is completed, data blocks in data strips with the number 589825 in a first log file group are filled, check data blocks are calculated, check log files 5 and 6 are written, and so on until 524284 data blocks are completely written, the data strips with the numbers 1 to 720895 are filled, 524285 data blocks in customer writing are written into log file 1, corresponding log file metadata are stored in a memory, and 1KB is reserved for storing metadata corresponding to customer writing 5 in each log file.

The client write 6 comprises 4 data blocks, smaller than the total remaining space size of the log file group. Data blocks 1 (denoted as B1) to 4 (denoted as B4) are respectively written into a log file 2, a log file 3, a log file 4 and a log file 1, the data blocks in the data strips with the number 720896 in the log file group after the 3 rd data block is written are filled, then check data blocks are calculated, check log files 5 and 6 are written, corresponding log file metadata are stored in a memory, and each log file reserves metadata corresponding to 1KB storage client write 6.

The client write 7 comprises 1048566 (4×262626141+2) data blocks, which are smaller than the total residual space size of the log file group, and the data blocks are written into different log files in turn. Data blocks 1 (denoted as B1) to 4 (denoted as B4) are respectively written into a log file 2, a log file 3, a log file 4 and a log file 1, data blocks in data strips with the number 720897 in the log file group are filled after the writing of the data blocks 3 is completed, check data blocks are calculated, check log files 5 and 6 are written, data blocks 5, 6, 7 and 8 are respectively written into the log files 2, 3, 4 and 1, data blocks in data strips with the number 720898 in the log file group are filled after the writing of the data blocks 7 is completed, check data blocks are calculated, and check log files 5 and 6 are written. And so on until the 1048563 data blocks are completely written, filling the data stripes with the numbers 720897 to 983037, writing 1048564 data blocks, 1048565 data blocks and 1048566 data blocks into the log file 1, the log file 2 and the log file 3 respectively in the client write, storing corresponding log file metadata in a memory, and reserving metadata corresponding to 1KB storage client write 7 in each log file.

The client write 7 writes complete, log file 1 has used space for client write 1 to occupy space 131072KB, client write 2 to occupy space 196608KB, client write 3 to occupy space 131072KB, client write 4 to occupy space 131072KB, client write 5 to occupy space 131072KB, client write 6 to occupy space 1KB, client write 7 to occupy 262141KB, 7KB metadata reserved space and 1KB header reserved space, and 983046KB in total. 1KB metadata reserved space is deducted, and the remaining space of the log file 1 is 65529KB. Similarly, the remaining space of the log file 2 is 65529KB, the remaining space of the log file 3 is 65529KB, the remaining space of the log file 4 is 65530KB, and the total remaining space size is 262117KB.

The client write 8 comprises 524288 (4 x 131072) data blocks, the size of the total residual space of the log file group is 262117KB, the client write current log file group with 262117KB (4 x 65529+1) is split, the client write current log file group is sequentially written into log file 4, log file 1, log file 2 and log file 3 in turn, after the writing of the data block 1 is completed, the data blocks in the data strip with the number 983039 in the log file group are filled, then check data blocks are calculated, check log files 5 and 6 are written, and so on until the writing of 262117 data blocks is completed, and the data strip with the number 983039 to 1048567 is filled. After the 262114 data block is written, the size of the client write plus the reserved client write metadata plus the header in the log file 1 is just 1GB, so that the log file 1 is fully written, the metadata and the header are written into the log file 4, and then the log file 1 is sealed. After the 262115 data block is written, the size of the client write plus the reserved client write metadata plus the header in the log file 2 is just 1GB, and the log file 2 is fully written, so that the metadata and the header are written into the log file 4, and then the log file 2 is sealed. After the 262116 data block is written, the size of the client write plus the reserved client write metadata plus the header in the log file 3 is just 1GB, and the log file 4 is fully written, so that the metadata and the header are written into the log file 4, and then the log file 4 is sealed. After the 262117 data block is written, the size of the client write plus the reserved client write metadata plus the header in the log file 4 is just 1GB, so that the log file 4 is fully written, the metadata and the header are written into the log file 4, then the log file 4 is sealed, and then the whole log file group is sealed, as shown in fig. 5-4.

In fig. 5-1 to 5-4, the client write p_q in the log file represents the data written in the log file in the client write p, p is any integer from 1 to 8, and q is any integer from 1 to 4.

As shown in fig. 5-5, a new log file group (referred to as a second log file group or a second log file verification group) is created, the log file numbers are 7, 8, 9, 10 in order, and the verification log file numbers are 11, 12 in order. Writing the data of the remaining 262171KB (4 x 65542+3) of the client write 8 into a new log file group, wherein the data block 1, the data block 2, the data block 3 and the data block 4 are sequentially written into the log file 7, the log file 8, the log file 9 and the log file 10, the data block 4 is written into the data block filling in the data strip with the number 1 in the second log file group, the check data block writing check log file 11 and the check log file 12 are calculated, and the like until the 262168 th data block writing log file 10 is filled, the data strip with the number 1 to 262168 is filled, the 262169 th data block writing log file 1 in the client write, the 262170 th data block writing log file 2 and the 262171 th data block writing log file 3 are sequentially written. In FIGS. 5-5, client writes 8_1', 8_2', 8_3', 8_4' represent data of the client write 8 remaining 262171KB, respectively, written to data in log files 7-10.

After the storage server writes the first data into the first log file check group, the data (referred to as a client file) may be read from the first log file check group according to the read request. In order to facilitate the reading operation, the storage server may optionally generate an index table according to management data of each file in the first log file check group. The organization of the index table is not limited, and fig. 6 exemplifies an index tree. Fig. 6 schematically shows a reading process of a client file. Referring to fig. 6, taking a client file with a read key a as an example, fig. 6 taking a key as a client file name as an example and not by way of limitation. And querying the client file name from the index tree, if the query fails, indicating that the client file does not exist, and returning failure. If the inquiry is successful, all the subfiles of the client file are read from each log file, and the subfiles are spliced into a complete file and then returned. For example, if the index tree has a leaf node corresponding to the client file name, the corresponding log file (or called sub-file) is read from the log according to the index item in each leaf node, and the plurality of sub-files are spliced into a complete file and returned.

In summary, a large number of client files are written to a set of memory log files within a log storage server in an append write method, each log file having a unique ID and being fixed in size. After a log file is filled, the log file becomes a seal state and is in a read-only state. And the plurality of seal log files generate corresponding check log files according to a given check rule, and the seal log files and the check log files form a log file check group, so that the reliability of log data is improved. Different from the traditional mode, the log files are sequentially generated, and then the check log files are generated. The method divides the logical address space inside the log file into a set of consecutive fixed blocks. When the client files are written, an empty log file check group is created, the client files are respectively additionally written into the log files in the group according to a specific placement strategy, when the corresponding data blocks in the group of log files are all full, corresponding check data blocks are immediately generated according to a given check rule, and the corresponding check data blocks are written into the corresponding check log file data blocks. After metadata is fully established and sealed, the log files are fully written, a last group of check data blocks are generated, and the log file check group is integrally closed. According to the method, the fine-grained data blocks are used as verification generation granularity, instead of the whole log file, so that the fine-grained data block verification can be generated before the log file is not fully written, and the overall reliability of log data is improved; in addition, the verification data is generated in time when the client write-in data is written into the memory for caching, so that the verification generated when the log file is read again after being integrally sealed is avoided, and the integral processing efficiency is improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device provided in the present application. As shown in fig. 7, the computer device 7 includes: processor 701 and memory 702, processor 701 and memory 702 being interconnected, alternatively processor 701 and memory 702 may be interconnected by internal bus 703.

The processor 701 may be comprised of one or more general purpose processors, such as a central processing unit (central processing unit, CPU), or a combination of CPU and hardware chips. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof.

The memory 702 may include volatile memory (RAM), such as random access memory (random access memory); the memory 702 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD), or a Solid State Drive (SSD); the memory 702 may also include combinations of the above.

Bus 703 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others.

Alternatively, the computer device 7 may comprise a communication interface 704, the processor 701 being connected to the communication interface 702, for example as shown in fig. 7, the processor 701 and the communication interface 702 being connected by a bus.

The memory 702 stores computer instructions that, when executed by the processor 701, enable at least one of steps S201 to S204 to be performed. The specific embodiments may refer to the corresponding content of the foregoing, and will not be repeated herein.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a log file processing apparatus provided in the present application, where the processing apparatus may be the computer device shown in fig. 1. The computer device may be an application server or a terminal or a storage server. As shown in fig. 8, the processing device 8 may include a log file writing module 801 and a verification log file writing module 802.

The log file writing module 801 is configured to write client data into k log files, where an address space of each log file is divided into n data blocks, and k and n are positive integers. In the embodiment, please refer to the related content of step S202, which is not described herein.

And a verification log file writing module 802, configured to write verification data into m target verification blocks according to data in k target data blocks when target data blocks of each log file are full, where the m target verification blocks are each one of m verification log files, and an address space of each verification log file is divided into n verification blocks, and m is a positive integer. In the embodiment, please refer to the related content of step S203, which is not described herein.

Optionally, the processing device 8 may further comprise a creation module 803 for performing step S201.

Optionally, the processing device 8 may further include a seal module 804 for executing step S204.

The different modules in the processing means 8 can be interconnected for data transmission. Each module in the processing device 8 may be a software module, a hardware module, or a software module and a hardware module.

The configuration of the processing device 8 is merely an example, and should not be limited to a specific configuration, and each module in the processing device 8 may be added, reduced, or combined as needed.

The present application also provides a storage system, which may be a centralized storage system or a distributed storage system, which may include, for example, the processing device 8 shown in fig. 8. In some examples, the storage system may include the corresponding system architecture of fig. 1.

The present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program that, when executed by a processor, can implement some or all of the steps recited in any of the above-described method embodiments. Computer readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

The present invention also provides a computer program comprising instructions which, when executed by a computer, cause the computer to perform part or all of the steps of any one of the method embodiments.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

Reference to "a and/or B" in this application is to be understood as including both schemes "a and B" and "a or B". The term "plurality" as used herein may be understood as two or more. The ordinal numbers such as "1", "2", "3", "4", "5", and "6" in the present application are used to distinguish a plurality of objects, and are not used to define the order of the plurality of objects.

In the examples provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. In another aspect, the present application relates to "connected" or "connected" may be an indirect connection via some interface, device or element.

The foregoing detailed description of the embodiments has further described the objects, technical solutions and advantageous effects of the present application, and it should be understood that the foregoing is only a detailed description of the present application, and is not intended to limit the scope of the present application.

Claims

1. A method for processing a log file, comprising:

writing client data into k log files, wherein the address space of each log file is divided into n data blocks, and k and n are positive integers;

When the target data blocks of each log file are fully written, writing check data into m target check blocks according to data in k target data blocks, wherein the m target check blocks are respectively one check block in m check log files, the address space of each check log file is divided into n check blocks, and m is a positive integer.

2. The method of claim 1, wherein the customer data comprises first data that is indicated to be written by one or more write requests.

3. The method of claim 2, wherein the first data comprises a plurality of data units arranged in sequence, and writing the first data to k log files comprises:

and writing the plurality of data units into k log files in turn according to the size of the residual address space of each log file.

4. A method according to claim 3, wherein writing an i-th one of said plurality of data units into k of said log files according to the size of the remaining address space of each of said log files comprises:

and writing the ith data unit into at least one log file with the largest residual address space, wherein i is a positive integer less than or equal to k.

5. A method according to claim 3 or 4, wherein the size of each data unit is the size of one or more of the data blocks.

6. The method of claim 3 or 4, wherein each of the data units is data indicated to be written by one of the one or more write requests.

7. The method of any one of claims 2 to 6, wherein the customer data further comprises management data for each of the log files, the management data for each of the log files comprising metadata for the corresponding log file.

8. The method of claim 7, wherein the address space of each log file includes a first address space for writing the first data and a second address space for writing management data of the corresponding log file, and writing the management data of k log files into k log files includes:

and writing management data of the target log file into a second address space of the target log file after the first address space of the target log file is fully written by part of data in the first data.

9. A log file processing apparatus, comprising:

The system comprises a log file writing module, a storage module and a storage module, wherein the log file writing module is used for writing client data into k log files, the address space of each log file is divided into n data blocks, and k and n are positive integers;

and the verification log file writing module is used for writing verification data into m target verification blocks according to data in k target data blocks when the target data blocks of each log file are fully written, wherein the m target verification blocks are respectively one verification block in m verification log files, the address space of each verification log file is divided into n verification blocks, and m is a positive integer.

10. The processing apparatus of claim 9, wherein the client data comprises first data indicated to be written by one or more write requests.

11. The processing device according to claim 10, wherein the first data comprises a plurality of data units arranged in sequence, and the log file writing module is specifically configured to sequentially write the plurality of data units into k log files according to a size of a remaining address space of each log file.

12. The processing device according to claim 11, wherein the log file writing module is specifically configured to write the ith data unit into at least one log file with the largest remaining address space, where i is a positive integer less than or equal to k.

13. A processing device according to claim 11 or 12, wherein the size of each data unit is the size of one or more of the data blocks.

14. The processing apparatus according to claim 11 or 12, wherein each of the data units is data indicated to be written by one of the one or more write requests.

15. The processing apparatus according to any one of claims 10 to 14, wherein the client data further comprises management data for each of the log files, the management data for each of the log files comprising metadata for the corresponding log file.

16. The processing device according to claim 15, wherein the address space of each log file comprises a first address space for writing the first data and a second address space for writing management data of the corresponding log file, the log file writing module being specifically configured to write the management data of the target log file to the second address space of the target log file after the first address space of the target log file is filled with a part of the data in the first data.

17. A computer device comprising a memory and a processor executing computer instructions stored in the memory, causing the computer device to perform the method of any one of claims 1 to 8.

18. A storage system comprising the apparatus of any one of claims 9 to 16.

19. A computer readable storage medium comprising instructions which, when run on a computer device, cause the computer device to perform the method of any of claims 1 to 8.