WO2023020136A1 - Data storage method and apparatus in storage system - Google Patents

Data storage method and apparatus in storage system Download PDF

Info

Publication number
WO2023020136A1
WO2023020136A1 PCT/CN2022/103574 CN2022103574W WO2023020136A1 WO 2023020136 A1 WO2023020136 A1 WO 2023020136A1 CN 2022103574 W CN2022103574 W CN 2022103574W WO 2023020136 A1 WO2023020136 A1 WO 2023020136A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
stripe
verification
size
storage
Prior art date
Application number
PCT/CN2022/103574
Other languages
French (fr)
Chinese (zh)
Inventor
吴祥
朱超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023020136A1 publication Critical patent/WO2023020136A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Definitions

  • the present application relates to the field of storage technologies, and in particular to a data storage method and device in a storage system.
  • the storage system can use erasure coding (EC) verification mechanism to store data.
  • the EC verification mechanism is to divide the data to be stored into data fragments, and calculate the verification fragments of the data fragments according to a certain verification algorithm.
  • the data fragments and the verification fragments constitute the EC verification relationship, and the data fragments are divided into Shards and parity shards are stored in a shard that stores EC parity relations.
  • Storing the data fragments and the verification fragments specifically includes: storing the data fragments in the data stripes of the stripes, and storing the verification fragments in the verification stripes of the stripes. When data loss occurs in the data stripe where one of the data fragments is located, the lost data can be recovered by using the data in the remaining data stripes and the data in the parity stripe.
  • one of the EC verification mechanisms stores data, and the storage system calculates the verification data of the data after receiving the data, and stores the received data in the corresponding data stripes in the stripe until all the data in a stripe When the data stripes are full, the check data of these data stripes are stored in the check stripes.
  • This mechanism performs frequent calculations before the data fills up the data stripes in the shards, occupying the computing resources of the storage system.
  • Embodiments of the present application provide a data storage method and device in a storage system, which are used to reduce computing resources consumed by the storage system during data storage.
  • the embodiment of the present application provides a data storage method in a storage system, including:
  • the first data after receiving the first data, the first data is stored.
  • the first data is smaller than the size of a data stripe, there is no need to perform check calculation on the first data. In other words, during the data storage process is not The verification calculation is performed once when the data is stored, so as to reduce the number of times the storage system calculates the verification data, thereby reducing the computing resources consumed by the storage system during the data storage process.
  • the first data can also be recorded in the log, so that if the process of storing the first data is abnormal, the first data in the log can be used to re-execute the process of storing the first data to improve the storage capacity. System reliability.
  • the first data is divided into corresponding data stripes in the storage system for persistent storage, so that the data can be stored faster, thereby improving the efficiency of the storage system for storing data.
  • the second data is received, and when the size of the second data is greater than or equal to the size of a data stripe in the stripe, the first check data is calculated, and the first check data
  • the verification data is the verification data of the second data, and the second data is stored in the data stripe corresponding to the second data in the stripe.
  • the size of the second data to be stored is greater than or equal to the size of a data stripe, then the first check data of the second data can be calculated, and the second data can be stored in the corresponding data stripe
  • the check data of the second data is calculated first, and then the check calculation can be performed based on the check data of the second data and other data, so as to avoid the subsequent simultaneous calculation based on multiple
  • the consumption of computing resources of the storage system can be relatively reduced.
  • the sum of the sizes of the first data and the second data is less than the sum of the sizes of all the data stripes in the stripe, it means that all the data stripes in the stripe are not full, so
  • the calculated first check data of the second data may be recorded in a log, and then the check data of the second data and other data may be calculated by using the first check data in the log.
  • the size of the calculated check data of the data is an integer multiple of the size of a data stripe , so in this implementation manner, recording the first verification data in the log can relatively save the storage space occupied by the log compared with recording the second data in the log.
  • a method of calculating the verification data of the first data and the second data is provided. Specifically, the first data recorded in the log can be read, and according to the first data in the log and the first check data of the second data to obtain the second check data.
  • reading the first data from the log is compared to reading the first data from the storage system. It takes less time to fetch the first data, so the second verification data can be calculated faster, so as to improve the efficiency of the storage system for calculating the verification data, and further help to improve the efficiency of the storage system for storing data.
  • the first data can also be cached, and the cached first data and the second data's check data can be used to calculate the check data of the first data and the second data to obtain the second check data data.
  • the second check data can be calculated faster by using the cached first data to perform check calculation, which is beneficial to improve the efficiency of storing data.
  • the second check data and the data in all the data stripes form an EC check relationship, so it can
  • the second verification data is stored in the verification stripe in the stripe, thereby completing the process of filling up a stripe.
  • the calculation of the verification data and the storage of the data can be performed simultaneously, which is beneficial to improving the efficiency of the storage of the data.
  • an embodiment of the present application provides a data storage device, where the device includes a communication interface and a processor.
  • the communication interface and the processor may be used to implement the data storage method in any storage system in the first aspect above.
  • the communication interface is used to receive first data;
  • the processor is used to store the first data in the In the data stripe corresponding to the first data in the stripe, the check data of the first data is not calculated, and the first data is recorded in a log.
  • the data storage device further includes other components, such as an antenna, an input/output module, an interface, and the like.
  • these components can be hardware, software, or a combination of software and hardware.
  • the embodiment of the present application provides a data storage device, the device includes a communication module and a processing module, and the communication module and the processing module can be used to implement the data storage method in any storage system in the first aspect above .
  • the communication module is used to receive the first data;
  • the processing module is used to store the first data in the stripe when the size of the first data is smaller than the size of a data stripe in a stripe In the data stripe corresponding to the first data, the check data of the first data is not calculated, and the first data is recorded in a log.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and when the computer program runs on a computer, the computer executes the computer program described in the first aspect. any one of the methods described.
  • an embodiment of the present application provides a computer program product, the computer program product stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes the first The method described in any one of the aspects.
  • the present application provides a chip system, which includes a processor and may further include a memory, configured to implement the method described in the first aspect.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • an embodiment of the present application provides a storage system, the storage system includes the data storage device described in any one of the second aspects, or the storage system includes the data storage device described in any one of the third aspects.
  • FIG. 1A is a schematic diagram of an application scenario applicable to an embodiment of the present application
  • FIG. 1B is a schematic diagram of another application scenario applicable to the embodiment of the present application.
  • FIG. 2A is a schematic diagram of another application scenario applicable to the embodiment of the present application.
  • FIG. 2B is a schematic diagram of another application scenario applicable to the embodiment of the present application.
  • FIG. 2C is a schematic diagram of another application scenario applicable to the embodiment of the present application.
  • FIG. 3 is a schematic diagram of the distribution of logical layers in a data storage system applicable to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a data storage method in a storage system provided in an embodiment of the present application
  • FIG. 5 is a schematic diagram of a process of storing data provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an example of a data storage device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another example of a data storage device provided by an embodiment of the present application.
  • the EC verification mechanism in the embodiment of this application involves a data stripe and a verification stripe, wherein the data stripe is used to store data, and the verification stripe is used to store the verification data of the data in the data stripe, and the data A stripe and a parity stripe form a stripe; a data stripe and a parity stripe are the same size.
  • the data in the data stripe is lost, the data in the data stripe in which data loss occurs can be recovered by using the check data in the check stripe and the data in the data stripe without data loss.
  • EC check algorithm includes array erasure code algorithm, Reed-Solomon type (reed-solomon, RS) erasure code algorithm or low density parity check (low density parity check code, LDPC) erasure code algorithm, etc., this application
  • the check calculation in this embodiment may use any EC check algorithm.
  • the EC verification mechanism in the embodiment of the present application includes a redundant array of independent hard disks (redundant array of independent disks, RAID) mechanism.
  • a stripe includes data stripes d1, d2, and d3, and a check stripe y1, and d1, d2, d3, and y1 correspond to corresponding storage spaces in the storage system. For example, if the data in d1 is lost, the storage system can restore the data in d1 according to the verification data in y1 , the data in d2 and the data in d3.
  • the number of nouns means “singular noun or plural noun", that is, “one or more”. "At least one” means one or more, and “plurality” means two or more. "And/or” describes the association relationship of associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The character “/" generally indicates that the contextual objects are an "or” relationship. For example, A/B means: A or B. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b, or c means: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, c Can be single or multiple.
  • ordinal numerals such as “first” and “second” mentioned in the embodiments of this application are used to distinguish multiple objects, and are not used to limit the order, timing, priority or importance of multiple objects.
  • first data and “second data” in the embodiments of the present application are used to represent two data, and do not limit the size of the two data or the order of receiving the two data.
  • first storage medium and “second storage medium” in the embodiments of the present application are used to represent two storage media, and do not limit the priority or importance of the two storage media.
  • an embodiment of the present application provides a data storage method in a storage system.
  • this data storage method when the size of the first data to be stored is smaller than the size of the data stripe, the first data is stored in the corresponding data stripe in the stripe, and there is no need to check and calculate the first data, which is relatively Compared with the method of performing verification calculations on data each time, the data storage method in the embodiment of the present application can relatively reduce the number of times of data verification calculations, thereby reducing the amount of calculations in the data storage process.
  • the first data is stored in a corresponding data stripe in the stripe, so as to realize persistent storage of data and improve the efficiency of storing data.
  • the write operation of the first data can be re-executed based on the first data in the log to improve the reliability of the data stored in the storage system.
  • the data storage method in the embodiment of the present application can be applied to various centralized data storage systems and various distributed data storage systems.
  • the following first introduces an example of a centralized data storage system applicable to the embodiment of the present application.
  • both the centralized data storage system and the distributed data storage system hereinafter may be referred to as storage systems for short.
  • FIG. 1A is a schematic diagram of an application scenario applicable to the embodiment of the present application, or a schematic diagram of an architecture of a centralized data storage system with integrated disk and control.
  • the storage system 120 can communicate with the host 100 through the switch 110 .
  • the host 100 sends the data to be stored to the storage system 120 through the switch 110, and the storage system 120 performs a write operation on the data.
  • the switch 110 is, for example, a fiber optic switch.
  • the switch 110 is an optional device, for example, the host 100 can communicate with the storage system 120 through a network.
  • the storage system 120 includes an engine 121.
  • the engine 121 can be regarded as an entrance of the centralized data storage system, and all data from external devices must pass through the entrance. For example, when the engine 121 receives the data, it can also receive the address information of the data, such as the logical address of the stored data. When storing the data, the engine 121 can store the data to the corresponding in the data strip.
  • the engine 121 includes one or more controllers.
  • FIG. 1A takes the engine 121 including two controllers (such as controller 0 and controller 1) as an example for illustration, and the number of controllers included in the engine 121 is not actually limited.
  • the controller 0 includes a central processing unit (central processing unit, CPU) 122 and a memory 123 .
  • the CPU 122 is used to process write requests or read requests from the storage system 120 generated externally (for example, servers, other storage systems) or internally. Wherein, the write request is used to request to write data into the storage system 120 data.
  • the read request is used to request to read data from the storage system 120 .
  • the controller 0 may include one or more CPUs 122, and one CPU 122 has one or more CPU cores, and the embodiment of the present application does not limit the number of CPUs and the number of CPU cores.
  • the memory 123 refers to a memory memory capable of directly exchanging data with the CPU 122 .
  • the CPU 122 can perform write or read operations on the memory 123 .
  • the memory 123 can cache data, and the subsequent engine 121 can quickly read the data from the memory 123 to speed up the subsequent process of calculating and verifying data.
  • the memory 123 may include one or more types of memory, and the embodiment of the present application does not limit the quantity and type of the memory 123 .
  • the controller 0 may also include a front-end interface 124 and a back-end interface 125 .
  • the front-end interface 124 is used to communicate with the host 100 to provide storage services for the host 100 .
  • the backend interface 125 is used for communicating with the hard disk 126 to expand the capacity of the storage system 120 .
  • the engine 121 can be connected with more hard disks 126 through the back-end interface 125 .
  • the storage system 120 shown in FIG. 1A further includes a plurality of hard disks 126.
  • the hard disks 126 may be magnetic disks or other types of storage media, such as solid state disks or mechanical hard disks, which are not limited in the present application.
  • a plurality of hard disks 126 may be deployed in the hard disk slots of the engine 121, and at this time, the back-end interface 125 is an optional configuration. Alternatively, the hard disk 126 may communicate with the engine 121 through the backend interface 125 .
  • the physical storage space of the hard disk 126 in the storage system 120 shown in FIG. 1A provides the storage space of stripes in a stripe.
  • the engine 121 When the engine 121 performs a write operation on data, it specifically stores the data in a corresponding data stripe, that is, stores the data in a physical storage space corresponding to the hard disk 126 .
  • the engine 121 may also store the verification data corresponding to the data in the verification stripe, that is, store the verification data in the physical storage space corresponding to the hard disk 126 .
  • FIG. 1B is a schematic diagram of another application scenario applicable to the data storage method provided by the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a centralized data storage system with separate disk control.
  • the storage system 120 can communicate with the host 100 through the switch 110 .
  • the storage system 120 includes an engine 121 , a CPU 122 , a memory 123 , a front-end interface 124 , a back-end interface 125 and a hard disk enclosure 130 .
  • the engine 121 includes a controller 0 and a controller 1 as shown in FIG. 1B . Wherein, the implementation and functions of the engine 121 , controller 0 , controller 1 , CPU 122 , memory 123 , front-end interface 124 , and back-end interface 125 can refer to the content discussed in FIG. 1A above.
  • the engine 121 shown in FIG. 1B needs to be connected to the hard disk enclosure 130 through a separate back-end interface 125, and the hard disk enclosure 130 can be provided with multiple hard disks.
  • the hard disk enclosure 130 includes a network card 131 , a control unit 132 and several hard disks 126 .
  • the network card 131 is used for communication between the hard disk enclosure 130 and the engine 121 .
  • the hard disk enclosure 130 may belong to a smart disk enclosure.
  • a smart disk enclosure refers to a hard disk enclosure that has computing resources and storage resources and can independently complete data processing functions.
  • the control unit 132 may include a CPU and a memory.
  • the CPU is used to perform operations such as address translation and reading and writing data.
  • the internal memory is used for temporarily storing data to be written into the hard disk 126 , or data read from the hard disk 126 to be sent to the control unit 132 .
  • the shape and quantity of the control unit 132 can be arbitrary, and this application does not limit it.
  • the physical storage space of the hard disk 126 in the storage system 120 shown in FIG. 1B provides the storage space of the stripes in the stripes.
  • the engine 121 stores data, it writes the data into corresponding data stripes in the stripes.
  • FIGS. 1A and 1B are architectural examples of a centralized data storage system.
  • An example of a distributed data storage system applicable to the embodiments of the present application is introduced below.
  • Distributed data storage systems include a distributed data storage system that separates storage from computing, a fully integrated distributed data storage system, and a distributed data storage system that integrates storage and computing. Examples are introduced below.
  • FIG. 2A it is a schematic diagram of another application scenario applicable to the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a distributed data storage system with separation of storage and calculation.
  • the storage system includes computing node clusters and storage node clusters.
  • the computing node cluster includes one or more computing nodes 210
  • the storage node cluster includes one or more storage nodes 250 .
  • Each computing node 210 can communicate with each other.
  • two computing nodes 210 and two storage nodes 250 are taken as an example, and the number of computing nodes 210 and storage nodes 250 is not limited actually.
  • Any computing node 210 can access any storage node 250 in the storage node cluster through the network.
  • computing node 210 receives data to be stored, and sends the data to be stored to storage node 250, and storage node 250 may perform a write operation on the data.
  • the computing node 210 generally refers to a device having a computing function, such as a server, a desktop computer, or a controller of a storage array.
  • the computing node 210 includes at least a CPU 211 , a memory 212 and a network card 213 .
  • the CPU 211 may be used to process a write request or a read request from outside the computing node 210 , or a write request or a read request generated inside the computing node 210 .
  • the network card 213 is used for communicating with the storage node 250 .
  • computing node 210 may further include a bus, and the bus in FIG. 2A may be used for communication between components of computing node 210 . Only one CPU 211 is shown in FIG. 2A , but actually there may be one or more CPUs 211 .
  • the function of the memory 212 can refer to the function of the memory in FIG. 1A above, and will not be repeated here.
  • one storage node 250 includes one or more controllers 251 , network cards 252 , hard disks 253 and memory 254 .
  • the controller 251 is configured to write data into the hard disk 253 according to the data sent by the computing node 210 .
  • Network card 252 is used to communicate with computing node 210.
  • the memory 254 is used for temporarily storing data to be written into the hard disk 253 , or reading data from the hard disk 253 to be sent to the computing node 210 .
  • the controller 251 may have various forms, for example, the controller 251 includes a CPU.
  • the controller 251 may also include a memory, and the function of the memory may refer to the function of the memory in FIG. 1A above.
  • the controller 251 is a programmable electronic component, such as a data processing unit (data processing unit, DPU), an image processing unit (graphics processing unit, GPU) or an embedded neural network processor (neural-network processing units, NPU) and other processing chips.
  • the number of controllers 251 may be arbitrary, which is not limited in this embodiment of the present application.
  • the storage node 250 may not have a controller 251 inside.
  • the functions of the controller 251 may be offloaded to the network card 252.
  • the network card 252 may be used to complete data reading and writing, address translation and other computing functions.
  • the network card 252 can be an intelligent network card, and the network card 252 can include a CPU and a memory, and the CPU is used to perform operations such as address translation and reading and writing.
  • the network card 252 can receive the first data sent by the computing node 210, and Store the first data in the corresponding hard disk 253 .
  • the role of memory can refer to the previous article.
  • there may be no affiliation relationship between the network card 252 and the hard disk 253 in the storage node 250 that is, the network card 252 can access any hard disk 253 in the storage node 250 .
  • the physical storage space in one hard disk 253 or a plurality of hard disks 253 in each storage node in the storage system 120 shown in FIG. 1A provides stripes in stripes. storage.
  • the storage node 250 executes a data write operation, the storage node 250 essentially stores the data in a corresponding data stripe, that is, stores the data in a corresponding hard disk of the storage node 250 .
  • FIG. 2B is a schematic diagram of another application scenario applicable to the data storage method provided by the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a fully integrated distributed data storage system.
  • the fully integrated distributed data storage system includes a server cluster, and the server cluster includes one or more servers 260, and each server 260 can communicate with each other.
  • the server 260 generally refers to devices with computing capabilities and storage capabilities, such as servers, desktop computers, and the like.
  • the server 260 may be implemented by an advanced antibiotic machine (ARM) server or an X86 server.
  • the server 260 can include a virtual machine (virtual machine, VM) 262, the computing resources required by the VM 262 come from the local processor and memory of the server, and the storage resources required by the VM 262 can be derived from the local server.
  • Hard disks which can also come from hard disks in other servers.
  • various applications can run in the VM 262, and users can trigger read/write requests through the applications in the virtual machine.
  • each server 260 may also include a processor 261 , a network card 252 , a hard disk 253 and a memory 254 .
  • the implementation forms and functions of the network card 252 and the memory 254 can refer to the contents discussed above in FIG. 2A .
  • the processor 261 can receive data to be stored, and perform a write operation on the data.
  • the physical storage space in one hard disk 253 or multiple hard disks 253 in each server provides storage space for the stripes in the stripes.
  • the server 260 executes a data writing operation, the server 260 essentially stores the data in a corresponding data stripe, that is, stores the data in a corresponding hard disk of the server 260 .
  • FIG. 2C is a schematic diagram of another application scenario applicable to the data storage method provided by the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a distributed data storage system integrating storage and computing.
  • the storage-computing integrated distributed data storage system includes a server cluster, and the server cluster includes one or more servers 260, and each server 260 can communicate with each other.
  • the server 260 includes at least a processor 261 , a memory 254 , a network card 252 and a hard disk 253 .
  • the processor 261 , the memory 254 , the network card 252 and the hard disk 253 are connected through a bus.
  • the processor 261 and the memory 254 may be used to provide computing resources.
  • the physical storage space in one hard disk 253 or multiple hard disks 253 in each server provides storage space for the stripes in the stripes.
  • the server 260 executes a data writing operation, the server 260 essentially stores the data in a corresponding data stripe, that is, stores the data in a corresponding hard disk of the server 260 .
  • the difference between the integrated storage and computing distributed data storage system and the fully integrated distributed storage system is that the servers in the integrated storage and computing distributed data storage system may not have virtual machines, and do not run corresponding s application.
  • the difference between the storage-computing integrated distributed data storage system and the storage-computing-separated distributed data storage system is that the server in the storage-computing-integrated distributed data storage system is equivalent to integrating the storage-computing separated Functions of storage nodes and computing nodes in a storage system.
  • FIG. 3 is a schematic diagram of logical layer distribution of a data storage system applicable to the data storage method provided in the embodiment of the present application.
  • the storage system 300 in FIG. 3 may be, for example, the storage system discussed in any one of FIG. 1A , FIG. 1B , FIG. 2A , FIG. 2B or FIG. 2C above.
  • the storage system 300 includes several hard disks 320 . Wherein, the type of the hard disk 320 can refer to the foregoing.
  • the storage system 300 also provides a client 310 , which may be understood as an entrance for accessing the storage system 300 .
  • the client 310 is used to provide logical storage space to the host.
  • the client 310 may be located on the host side.
  • the client 310 is located in the host shown in FIG. 1A or FIG. 1B.
  • the client 310 may be located in a storage node in the storage system. or the client 310 may also be located in the server shown in FIG. 2B or FIG. 2C.
  • each hard disk 320 is divided into several physical blocks (chunks) 321, and these physical chunks 321 are mapped into logical blocks 331 to form a storage pool 330, and the storage pool 330 is used to provide storage space upwards, and the storage The space actually comes from the hard disk 320 included in the system.
  • the storage system may include one or more storage pools 330 , and one storage pool 330 includes part or all of the hard disks 320 .
  • a plurality of logical chunks from different hard disks 320 or different storage nodes form a logical chunk group (chunk group) 340, and the logical chunk group 340 is the minimum allocation unit of the storage pool 330.
  • a logical block group 340 may include one or more stripes 341, as shown in Figure 3, a stripe 341 includes 4 data stripes (4 data stripes shown as 0-3 in Figure 3) and 2 check strips (the check strips shown in P1 and Q1 shown in Figure 3).
  • a stripe 341 includes 4 data stripes (4 data stripes shown as 0-3 in Figure 3) and 2 check strips (the check strips shown in P1 and Q1 shown in Figure 3).
  • all hard disks in Figure 1A or Figure 1B form a storage pool
  • multiple hard disks on all storage nodes in Figure 2A form one or more storage pools
  • multiple hard disks on all servers in Figure 2B or Figure 2C can form a storage pool. or multiple storage pools.
  • the storage pool 330 may provide one or more logical block groups to the storage service layer.
  • the storage service layer further virtualizes the storage space provided by the logical block group into a logical unit (logical unit, LU) 350, which is provided by the client 310 to the host for use.
  • logical unit logical unit
  • LUN logical unit number
  • LUN may be used to refer to the logical unit.
  • Each LUN has a LUN ID for identifying the LUN. The specific location of data within a LUN can be determined by the start address and the length of the data.
  • the starting address it may be called a logical block address (logical block address, LBA). It can be understood that the three factors of LUN ID, LBA and length identify a certain address segment.
  • LBA logical block address
  • the write request or read request generated by the client 310 usually carries the LUN ID, LBA and length of the data in the write request or read request.
  • the embodiment of the present application provides a data storage method in a storage system.
  • the data storage method is applicable to any storage system that uses an EC verification mechanism to store data. 2B, Figure 2C or any one of the storage systems in Figure 3.
  • the data storage method is executed by the storage system, specifically, it can be executed by a certain device in the storage system, and it can be specifically executed by a certain component in a certain device.
  • the data storage method can be implemented by the storage device in FIG. 1A or FIG.
  • the system specifically, the engine in the storage system shown in FIG. 1A or FIG. 1B ), the storage node in FIG. 2A (specifically, the controller in the storage node in FIG. 2A ), the server in FIG. 2B or FIG. 2C ( Specifically, the processor in the server as shown in FIG. 2B or FIG. 2C can also be implemented by a chip system having the function of a data storage system.
  • FIG. 4 is a flow chart of a data storage method in a storage system provided by an embodiment of the present application.
  • the method is applied to the centralized data storage system shown in FIG. 1A or FIG. 1B , and the data storage method can be executed by an engine in the centralized data storage system as an example.
  • Step 401 the engine receives a first write request, where the first write request is used to request to write first data.
  • the first write request includes the first data, and the first write request can also include the address information that the first data needs to be written into.
  • the address information of the first data is, for example, the LBA shown in Figure 3, and the address information can also include LUN ID and length , which is not limited in this application.
  • Step 402 the engine stores the first data in a data stripe corresponding to the stripe.
  • the engine can, according to the address information and correspondence in the first write request, write Storing the first data in the physical block corresponds to storing the first data in the logical block, which is equivalent to storing the first data in the striped data stripes.
  • the meaning of the data stripe can refer to the above.
  • the first data may be stored in a data stripe, so the engine may store the first data in the data stripe. If the size of the first data is larger than the size of a stripe, it means that a data stripe cannot accommodate the first data, so the engine can split the first data to obtain part of the first data and another part of the first data , and respectively store a part of the split first data into one data stripe, and store another part of the first data into another data stripe.
  • FIG. 5 is a schematic diagram of a process of storing data provided by an embodiment of the present application.
  • the storage system includes 6 stripes, specifically including 4 data stripes (the first data stripe 511, the second data stripe 512 as shown in Figure 5 (1) , the third data stripe 513, the fourth data stripe 514), and two check stripes (the first check stripe 515 and the second check stripe 516 shown in (1) in Figure 5) .
  • the stripe size of the storage system is 3M, and the size of each stripe is 512K, that is, the stripe depth of the storage system is 512KB.
  • the size of the first data D1 is 1KB. Since the size of the first data D1 is smaller than the size of a data stripe, it means that the data The stripe is enough to accommodate the first data D1 , so the first data D1 can be stored in the first data stripe 511 .
  • the first data D1 is written into the first data stripe 511 as an example, and the actual first data D1 can be written into any data stripe, which is not limited in this application.
  • the size of the second data D2 is 1KB. Since the sum of the sizes of the first data D1 and the second data D2 is smaller than the size of the first data stripe 511, it means that the first data stripe The stripe 511 is enough to accommodate the second data D2, so the second data D2 can be stored in the first data stripe 511. After the engine receives the third data D3, the size of the third data D3 is 1KB.
  • the size of the third data D3 is One data stripe 511 is enough to accommodate the third data D3, so the third data D3 can be stored in the first data stripe 511.
  • Step 403 if the size of the first data is smaller than the size of one data stripe, the engine does not calculate the verification data of the first data, and records the first data in the log.
  • the calculation granularity of the EC check calculation (also called the encoding granularity) is the size of a stripe, that is, regardless of the size of the data itself, the EC check calculation is used
  • the size of the check data of the data is N times the size of a stripe, where N is a positive integer. If the size of the first data is smaller than one stripe, the calculated check data size of the first data will be larger than the size of the first data.
  • the size of the first data is smaller than the size of a stripe, it is not necessary to calculate the check data of the first data, but to record the first data in the log, so that it is not necessary to calculate the first data
  • the engine does not need to perform verification calculations for each write request, so as to reduce the number of times the engine calculates the verification data.
  • the size of the first data is smaller than the size of the verification data of the first data, so recording the first data in the log can reduce the space occupied by the log compared to recording the verification data of the first data in the log. physical storage space.
  • the engine may subsequently use the first data in the log to calculate verification data between the first data and other data.
  • the engine may use the first data recorded in the log to store the first data again, so as to ensure the reliability of the storage system.
  • the engine may also record the metadata of the first data in the log.
  • the metadata is data used to describe the data.
  • the metadata of the first data includes, for example, the size information of the first data or the type information etc.
  • the format of the log may be arbitrary, which is not limited in this application.
  • the log can be understood as a form of recording data, and the log is stored in the corresponding storage medium.
  • the log can be stored in a first storage medium, such as a memory, and the memory is specifically such as the memory in the engine shown in Figure 1A or Figure 1B; or, the first storage medium can also be the storage system
  • the hard disk in such as the hard disk shown in FIG. 1A or FIG. 1B , is specifically a verification disk, which can be understood as a hard disk that provides storage space for a verification stripe in a storage system.
  • the engine determines that the size (1KB) of the first data D1, the size (1KB) of the second data D2 and the size (1KB) of the third data D3 are all smaller than the size of the first data stripe 511, As shown in (1) in FIG. 5 , the engine may record the first data D1 , the second data D2 and the third data D3 in the log 520 .
  • Step 404 when the size of the first data is smaller than the size of one data stripe, the engine caches the first data.
  • the engine After the engine stores the first data to the corresponding data stripe in the stripe, the check calculation has not been performed on the first data.
  • the engine can also cache the first data. For example, the engine can cache the first data in the second storage medium, and then perform verification calculations based on the cached first data.
  • the reading speed of the second storage medium is greater than or equal to the reading speed of the first storage medium.
  • the second storage medium may be a memory, such as the memory in the engine shown in FIG. 1A or FIG. 1B .
  • the second storage medium may also be a solid-state disk (solid-state disk, SSD), etc., which is not limited in this embodiment of the present application.
  • solid-state disk solid-state disk, SSD
  • the reading speed of the second storage medium is greater than that of the first storage medium.
  • the engine when the reading speed of the second storage medium is greater than that of the first storage medium, the engine can then use the first data in the second storage medium to analyze the first data and other data Perform verification calculations, which can relatively increase the speed at which the engine reads the first data, the engine can determine the verification data of the first data and other data faster, and correspondingly can store the verification data to the verification data faster In this way, the data storage efficiency of the entire storage system can be improved.
  • this embodiment of the present application does not limit the writing speed of the first storage medium and the writing speed of the second storage medium, for example, the writing speed of the first storage medium is lower than the writing speed of the second storage medium, or the writing speed of the first storage medium The writing speed of is greater than the writing speed of the second storage medium, or the writing speed of the first storage medium is equal to the writing speed of the second storage medium.
  • the engine may also write the first data D1 , the second data D2 and the third data D3 into the memory 530 .
  • Step 405 when the size of the second data is greater than or equal to the size of one data stripe, the host calculates the first check data.
  • the first verification data is the verification data of the second data.
  • the client on the host side calculates the check data of the second data.
  • the client on the host side in the storage system calculates the check data of the second data as an example.
  • the engine calculates checksum data for the second data.
  • the order of calculating the verification data of the second data and generating the second write request may be arbitrary, and the present application does not limit this.
  • step 406 the engine receives the second write request and the first verification data from the host, the second write request is used to request to write the second data.
  • the client in the host sends the second write request and the first verification data to the engine.
  • the second write request includes the second data, and may further include address information of the second data, and the address information of the second data may refer to the above address information of the first data.
  • Step 406 is an example where the host sends the second write request and the first verification data to the engine at the same time.
  • the first verification data and the second write request can also be sent to the engine separately, which is not limited in this application .
  • step 405 is an optional step. That is to say, the check data of the second data may not be calculated.
  • step 405 the host only needs to send the second write request to the engine, and correspondingly, the engine may only receive the second write request from the host.
  • the host before the host sends the second data to the engine, it can determine that the sum of the sizes of the first data and the second data is less than or equal to the sum of the sizes of all data stripes in the stripe, so as to ensure that the first data and the second data The sum of the size of the data will not be greater than the size of all the data stripes in the stripe.
  • Step 407 the engine stores the second data in the data stripe corresponding to the stripe.
  • the engine can determine the physical block corresponding to the second data according to the address information of the second data, and store the second data in the corresponding logical block, which is equivalent to storing the second data in the corresponding data stripe .
  • the sum of the size of the second data and the size of the first data is less than or equal to the size of a data stripe, it means that the data stripe that previously stored the first data can also accommodate the second data.
  • the second The data is stored in the data stripe for storing the first data, in other words, the first data and the second data are stored in the same data stripe.
  • the engine can split the second data to obtain a part of the second data and another part of the second data.
  • the engine can divide the second data into Part of the data is stored in the data stripe used to store the first data, and another part of the second data is stored in other data stripes in the stripe, in other words, part of the second data and the first data are stored in the same data stripe.
  • the engine determines that the size of the second data is greater than or equal to the size of a data stripe, and may calculate the check data of the second data, that is, obtain the first check data.
  • the engine may log the first verification data. Subsequently, the check data of the second data and the first data can be calculated according to the first check data.
  • the engine may record the second data in the log.
  • the engine can subsequently calculate the verification data of the first data and the second data according to the second data in the log.
  • the engine may also cache the second data.
  • the subsequent engine may perform check calculation according to the second data in the cache.
  • the engine can split the fourth data D4 to obtain the part of the fourth data D4 and the other part of the fourth data D4 A part of the fourth data D4 has a size of 509 KB, and another part of the fourth data D4 has a size of 512 KB.
  • the engine may store part of the fourth data D4 in the first data stripe 511 , and store another part of the fourth data D4 in the second data stripe 512 .
  • Step 408 the engine reads the cached first data.
  • the engine may read the cached first data, that is, read the first data from the second storage medium, and then may follow the cached first data.
  • the engine may read the first data from a log if the engine has not cached the first data.
  • Step 409 the engine calculates the second verification data according to the first verification data and the first data.
  • the engine calculates the check data of the first data and the second data, that is, the second check data. Since the size of the second check data may be smaller than the sum of the sizes of the first data and the second data, the second check data is cached. Compared with caching the first data and the second data, the test data can relatively occupy less storage space.
  • the engine may calculate the verification data of the first data and the second data according to the second data and the first data, and obtain the second verification data.
  • the engine determines that the size (1021KB) of the fourth data D4 is greater than the size (512KB) of a stripe, so the engine can read the first data from the internal memory 530 D1, the second data D2 and the third data D3, obtain the first verification data E1 of the fourth data D4 from the host computer, and calculate the first verification data E1 and the first data D1 of the fourth data D4 according to the The second parity data E2 of the first data D1, the second data D2, the third data D3 and the fourth data D4.
  • Step 410 the engine records the second verification data in the log.
  • the engine may cache the second verification data, and the specific manner of caching the second verification data may refer to the contents of the cached first data above, which will not be repeated here.
  • the engine may record the second verification data E2 of the first data D1 , the second data D2 , the third data D3 and the fourth data D4 in the log 520 .
  • the engine may also record the second verification data E2 of the first data D1 , the second data D2 , the third data D3 and the fourth data D4 in the memory 530 .
  • step 410 is an optional step. For example, if the engine determines that the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all data stripes in the stripe, the engine may store the second verification data in the Striped parity strip. In this way, the first data, the second data and the second verification data also make up the stripe.
  • steps 408 to 410 are optional parts.
  • the engine may calculate the second check data under the condition that the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all the data stripes in the stripe.
  • the engine performs final check calculation when all the data stripes in the stripe can be filled up. For example, the engine may determine whether the first data and the second data have filled up all the data stripes in the stripe according to the address information of the first data and the address information of the second data. In this way, the number of check calculations performed by the engine can be further reduced, and the consumption of computing resources of the engine can be reduced. In the case where the host calculates the first verification data, the engine performs fewer verification calculations during the process of filling up a stripe, which can further reduce the consumption of computing resources of the engine.
  • Step 411 if the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all data stripes in a stripe, the engine stores the first check data in the check stripe of the stripe.
  • the engine determines that the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all the data stripes in a stripe, it means that the first data and the second data have filled all the data stripes in the stripe. In some cases, the engine can put the check data of the first data and the second data into the check stripe of the stripe.
  • the engine receives the fifth data D5 and the third verification data E3 of the fifth data D5 from the host, and the size of the fifth data D5 is 1024KB.
  • the engine reads the second verification data E2 from the memory, and the engine calculates the first data D1, the second data D2, the third data D3, and the fourth data according to the third verification data E3 and the second verification data E2 D4 and the fourth parity data E4 of the fifth data D5.
  • the engine can convert the fourth verification data E4 is stored in the first parity stripe 515 and the second parity stripe 516 .
  • the first data D1 , the second data D2 , the third data D3 , the fourth data D4 , the fifth data D5 and the fourth parity data E4 complete the stripes.
  • the first data D1, the second data D2, the third data D3, the fourth data D4 and the fifth data D5 are correspondingly stored in the data stripe of the stripe
  • the fourth verification data E4 is correspondingly stored in the stripe in the check strip.
  • step 410 is to process other write requests until all the data fills up all the data stripes in the stripe, and then step 411 is executed.
  • Step 412 the engine deletes the data recorded in the log for obtaining the stripes.
  • the engine After the engine determines that the stripe is full, it can delete data related to obtaining the stripe in the log, for example, delete the first data recorded in the log, the first verification data, and the like.
  • the engine can delete the data in the log in time, and quickly recover the physical storage space occupied by the log, so as to improve the utilization rate of the physical storage space occupied by the log.
  • the engine can also delete the data related to the stripe in the second storage medium, so as to quickly reclaim the memory, and then when processing other write requests, the second storage medium can be used again to increase the space of the second storage medium utilization rate.
  • the engine may delete the first data D1 , the second data D2 , the third data D3 , the fourth data D4 , and the second verification data E2 recorded in the log 520 .
  • the engine can also delete the second verification data E2 recorded in the memory 530 .
  • step 412 is an optional step.
  • the sizes of the multiple data in the data stripe written in the stripe are all smaller than the size of one stripe, but the sum of the sizes of the multiple data is equal to all the data in the stripe The sum of the size of the stripes.
  • the engine can calculate the check data of multiple data and store the check data of multiple data in the check stripe of the stripe.
  • the size of a stripe in the storage system is 1536KB, and the size of a data stripe is 512KB.
  • the fourth data (with a size of 311KB) is stored in the corresponding data stripe.
  • the process of writing the first data, the second data, the third data and the fourth data by the engine may refer to the process of steps 401-404 above.
  • the engine determines that the sum of the sizes of the first data, the second data, the third data, and the fourth data satisfies the sum of the sizes of all the data stripes in the stripe. At this time, the engine can For the four data of the third data and the fourth data, the check data of the four data is calculated, and the check data of the four data are stored in the check strip.
  • steps 404 to 412 in FIG. 4 are optional steps, which are indicated by dotted lines in FIG. 4 .
  • the above is an example where the size of the first data is smaller than the size of one data stripe. In fact, any data whose size is smaller than a data stripe can be processed according to the above-mentioned process of processing the first data.
  • the above is an example where the size of the second data is greater than or equal to the size of a data stripe. In fact, any data whose size is greater than or equal to a data stripe can be processed according to the above process of processing the second data.
  • the engine when the engine processes the write request, it stores the data in the corresponding data stripe.
  • the number of check calculations to reduce the computing resource consumption of the storage system during data storage.
  • the check data of the data can be calculated, which is equivalent to performing check calculation during the process of storing the data.
  • the verification calculation of the data can also be completed correspondingly, so that the verification data of the data can be obtained faster, and the stripes can be filled earlier to improve the efficiency of data storage.
  • the data required for the verification calculation can also be stored with the help of a second storage medium such as memory, so that the data can be read from the memory faster later, thereby improving the efficiency of calculating the verification data.
  • FIG. 4 is an example of the application of the data storage method in the storage system to the centralized data storage system shown in FIG. 1A.
  • the data storage method can also be applied to the centralized data storage system shown in FIG. 1B.
  • the engine in the centralized data storage system shown in Figure 1B can execute the data storage method shown in Figure 4, which will not be described one by one here. enumerate.
  • this data storage method can also be applied to the distributed data storage system shown in Figure 2A, Figure 2B or Figure 2C, when the data storage method is applied to the distributed data storage system shown in Figure 2A, Figure 2B or Figure 2C
  • the storage node in FIG. 2A, the server in FIG. 2B or the server in FIG. 2C can all execute the data storage method in the storage system described above, and the process of executing the data storage method in the storage system can refer to The contents discussed in Fig. 4 are not listed here one by one.
  • FIG. 6 shows a schematic structural diagram of a data storage device 600 .
  • the data storage device 600 can be applied to a storage system, or a device in the storage system, which can realize the function of the storage system in the method provided by the embodiment of the application; the data storage device 600 can also support the storage system to implement the implementation of the application.
  • the example provides a means of storing the functionality of the system in the method.
  • the data storage device 600 may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the data storage device 600 may be implemented by a system on a chip. In the embodiment of the present application, the system-on-a-chip may be composed of chips, or may include chips and other discrete devices.
  • the data storage device 600 may include a communication module 601 and a processing module 602 .
  • the communication module 601 may be used to execute step 401 in the embodiment shown in FIG. 4 , and may also execute step 406 , and may also be used to support other processes of the technologies described herein.
  • the communication module 601 is used for the data storage device 600 to communicate with other modules, and it may be a circuit, device, interface, bus, software module, transceiver or any other device capable of realizing communication.
  • the processing module 602 can be used to execute step 402-step 403 in the embodiment shown in FIG. 4, and can also execute step 404-step 412 in FIG. 4, and can also be used to support other processes of the technology described herein.
  • each functional module in each embodiment of the present application can be integrated into a processing In the controller, it can also be physically present separately, or two or more modules can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the data storage device 700 provided by the embodiment of the present application, wherein the data storage device 700 can be the storage system in the embodiment shown in FIG.
  • the functions of the storage system in the illustrated embodiment; the data storage device 700 may also be a device capable of supporting the storage system to implement the functions of the storage system in the method provided in the embodiment shown in FIG. 4 of the present application.
  • the data storage device 700 may be a system on a chip.
  • the system-on-a-chip may be composed of chips, or may include chips and other discrete devices.
  • the data storage device 700 includes at least one processor 701, which is used to realize or support the data storage device 700 to realize the function of the engine in FIG. Function.
  • the processor 701 may obtain the first data of the data to be stored, and store the first data in a corresponding data stripe.
  • the processor 701 may obtain the first data of the data to be stored, and store the first data in a corresponding data stripe.
  • the data storage device 700 may further include a communication interface 702 for communicating with other devices through a transmission medium, so that the data storage device 700 can communicate with other devices.
  • the other device may be a server.
  • the processor 701 can use the communication interface 702 to send and receive data.
  • the data storage device 700 may also include at least one memory 703 for storing program instructions and/or data.
  • the memory 703 is coupled to the processor 701 .
  • the coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • Processor 701 may cooperate with memory 703 .
  • Processor 701 may execute program instructions stored in memory 703 . At least one of the at least one memory 703 may be included in the processor 701 .
  • the processor 701 executes the program instructions in the memory 703, the data storage method in the storage system of any one of the embodiments shown in FIG. 4 may be implemented.
  • the memory 703 in FIG. 7 is an optional part, which is indicated by a dashed box in FIG. 7 .
  • the memory 703 is coupled with the processor 701 .
  • a specific connection medium among the communication interface 702, the processor 701, and the memory 703 is not limited.
  • the communication interface 702, the processor 701, and the memory 703 are connected through a bus 704.
  • the bus is represented by a thick line in FIG. , is not limited.
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 7 , but it does not mean that there is only one bus or one type of bus.
  • the processor 701 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement Or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application.
  • a general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in connection with the embodiments of the present application may be implemented by a hardware processor, or by a combination of hardware and software modules in the processor.
  • the memory 703 can be a non-volatile memory, such as a hard disk (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD), etc., and can also be a volatile memory (volatile memory), For example random-access memory (random-access memory, RAM).
  • a memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, and is used for storing program instructions and/or data.
  • An embodiment of the present application provides a storage system, where the storage system includes the data storage device in FIG. 6 , or, the storage system includes the data storage device in FIG. 7 .
  • the storage system may implement any data storage method in the storage system in the foregoing embodiments shown in FIG. 4 .
  • An embodiment of the present application also provides a computer-readable storage medium, which is used to store a computer program, and when the computer program is run on a computer, the computer executes any of the embodiments shown in FIG. 4 .
  • An embodiment of the present application also provides a computer program product, the computer program product stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes any of the embodiments shown in FIG. 4 .
  • An embodiment of the present application provides a system-on-a-chip, where the system-on-a-chip includes a processor and may further include a memory, configured to implement the function of the memory system in the foregoing method.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. integrated with one or more available media.
  • the available medium can be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), optical media (for example, digital video disc (digital video disc, DVD for short)), or semiconductor media (for example, SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the technical field of storage. Provided are a data storage method and apparatus in a storage system. In the data storage method, after first data is received, the first data can be stored in a corresponding data stripe for persistent storage, and when the size of the first data is less than one data stripe, there is no need to perform check calculation on the first data, such that the number of instances of calculating check data during a data storage process is reduced, and the consumption of computing resources of a storage system is relatively reduced.

Description

存储***中的数据存储方法及装置Data storage method and device in storage system 技术领域technical field
本申请涉及存储技术领域,尤其涉及一种存储***中的数据存储方法及装置。The present application relates to the field of storage technologies, and in particular to a data storage method and device in a storage system.
背景技术Background technique
为了保证存储数据的可靠性,存储***可以采用纠删码(erasure coding,EC)校验机制存储数据。EC校验机制是将待存储的数据划分为数据分片,按照一定的校验算法计算数据分片的校验分片,数据分片和校验分片构成EC校验关系,并将数据分片和校验分片存储到一个存储EC校验关系的分条中。存储数据分片和校验分片具体包括:将数据分片存储到分条中的数据条带中,将校验分片存储到分条的校验条带中。当其中一个数据分片所在的数据条带发生数据丢失时,可以利用其余数据条带中的数据和校验条带中的数据恢复丢失的数据。In order to ensure the reliability of stored data, the storage system can use erasure coding (EC) verification mechanism to store data. The EC verification mechanism is to divide the data to be stored into data fragments, and calculate the verification fragments of the data fragments according to a certain verification algorithm. The data fragments and the verification fragments constitute the EC verification relationship, and the data fragments are divided into Shards and parity shards are stored in a shard that stores EC parity relations. Storing the data fragments and the verification fragments specifically includes: storing the data fragments in the data stripes of the stripes, and storing the verification fragments in the verification stripes of the stripes. When data loss occurs in the data stripe where one of the data fragments is located, the lost data can be recovered by using the data in the remaining data stripes and the data in the parity stripe.
目前,其中一种EC校验机制存储数据中,存储***接收到数据,便计算数据的校验数据,在分条中相应的数据条带中存储接收到的数据,直到一个分条中的所有数据条带凑满,将这些数据条带的校验数据存储在校验条带中。这种机制在数据凑满分条中数据条带前频繁进行计算,占用了存储***的计算资源。At present, one of the EC verification mechanisms stores data, and the storage system calculates the verification data of the data after receiving the data, and stores the received data in the corresponding data stripes in the stripe until all the data in a stripe When the data stripes are full, the check data of these data stripes are stored in the check stripes. This mechanism performs frequent calculations before the data fills up the data stripes in the shards, occupying the computing resources of the storage system.
发明内容Contents of the invention
本申请实施例提供一种存储***中的数据存储方法及装置,用于减少数据存储过程中存储***消耗的计算资源。Embodiments of the present application provide a data storage method and device in a storage system, which are used to reduce computing resources consumed by the storage system during data storage.
第一方面,本申请实施例提供一种存储***中的数据存储方法,包括:In the first aspect, the embodiment of the present application provides a data storage method in a storage system, including:
接收第一数据;当所述第一数据的大小小于一个分条中的一个数据条带的大小时,将所述第一数据存储到所述分条中第一数据对应的数据条带中,不计算所述第一数据的校验数据,并将所述第一数据记录到日志中。receiving first data; when the size of the first data is smaller than the size of a data stripe in a stripe, storing the first data in a data stripe corresponding to the first data in the stripe, The verification data of the first data is not calculated, and the first data is recorded in a log.
在本申请实施例中,接收第一数据之后,存储第一数据,当第一数据小于一个数据条带的大小时,无需对第一数据进行校验计算,换言之,在数据存储过程中并不是存储一次数据就进行一次校验计算,以减少存储***计算校验数据的次数,从而减少了存储***在数据存储过程中消耗的计算资源。且,在存储第一数据时,还可以将第一数据记录在日志中,这样后续存储第一数据的过程出现异常,可以利用日志中的第一数据重新执行存储第一数据的过程,提升存储***的可靠性。且,接收第一数据之后,将第一数据在存储***中分条相应的数据条带进行持久化存储,可以更快地存储数据,进而提高了存储***存储数据的效率。In the embodiment of the present application, after receiving the first data, the first data is stored. When the first data is smaller than the size of a data stripe, there is no need to perform check calculation on the first data. In other words, during the data storage process is not The verification calculation is performed once when the data is stored, so as to reduce the number of times the storage system calculates the verification data, thereby reducing the computing resources consumed by the storage system during the data storage process. Moreover, when storing the first data, the first data can also be recorded in the log, so that if the process of storing the first data is abnormal, the first data in the log can be used to re-execute the process of storing the first data to improve the storage capacity. System reliability. Moreover, after receiving the first data, the first data is divided into corresponding data stripes in the storage system for persistent storage, so that the data can be stored faster, thereby improving the efficiency of the storage system for storing data.
在一种可能的实施方式中,接收第二数据,当所述第二数据的大小大于或等于该分条中的一个数据条带的大小时,计算第一校验数据,所述第一校验数据为该第二数据的校验数据,以及将所述第二数据存储到该分条中第二数据对应的数据条带中。In a possible implementation manner, the second data is received, and when the size of the second data is greater than or equal to the size of a data stripe in the stripe, the first check data is calculated, and the first check data The verification data is the verification data of the second data, and the second data is stored in the data stripe corresponding to the second data in the stripe.
上述实施方式中如果待存储的第二数据的大小大于或等于一个数据条带的大小,那么可以计算第二数据的第一校验数据,并将第二数据存储到分条对应的数据条带中,由于第二数据的大小相对较大,因此先计算出第二数据的校验数据,后续可以根据该第二数据的校验数据以及其他数据进行校验计算,从而避免后续同时基于多个较大的数据进行校验计算的情况,相对可以减少存储***的计算资源消耗。In the above embodiment, if the size of the second data to be stored is greater than or equal to the size of a data stripe, then the first check data of the second data can be calculated, and the second data can be stored in the corresponding data stripe In , because the size of the second data is relatively large, the check data of the second data is calculated first, and then the check calculation can be performed based on the check data of the second data and other data, so as to avoid the subsequent simultaneous calculation based on multiple In the case of larger data for verification calculation, the consumption of computing resources of the storage system can be relatively reduced.
在一种可能的实施方式中,如果第一数据和第二数据的大小之和小于分条中所有数据条带的大小之和,那么表示分条中的所有数据条带还未凑满,因此可以将计算出的第二数据的 第一校验数据记录在日志中,后续可以利用日志中的第一校验数据计算第二数据与其他数据的校验数据。且,由于校验计算的计算粒度一般是一个数据条带的大小,也就是说,无论数据本身的大小,计算出的该数据的校验数据的大小均是一个数据条带的大小的整数倍,因此在该实施方式中,将第一校验数据记录在日志中相比将第二数据记录在日志中,可以相对节省日志占用的存储空间。In a possible implementation, if the sum of the sizes of the first data and the second data is less than the sum of the sizes of all the data stripes in the stripe, it means that all the data stripes in the stripe are not full, so The calculated first check data of the second data may be recorded in a log, and then the check data of the second data and other data may be calculated by using the first check data in the log. Moreover, since the calculation granularity of check calculation is generally the size of a data stripe, that is to say, regardless of the size of the data itself, the size of the calculated check data of the data is an integer multiple of the size of a data stripe , so in this implementation manner, recording the first verification data in the log can relatively save the storage space occupied by the log compared with recording the second data in the log.
在一种可能的实施方式中,提供了一种计算第一数据和第二数据的校验数据的方式,具体来说,可以读取日志中记录的第一数据,根据日志中的第一数据和第二数据的第一校验数据,计算得到第二校验数据。In a possible implementation manner, a method of calculating the verification data of the first data and the second data is provided. Specifically, the first data recorded in the log can be read, and according to the first data in the log and the first check data of the second data to obtain the second check data.
在上述实施方式中,由于日志中存储的数据总量,一般来说比存储***的数据条带中存储的数据总量更少,因此从日志中读取第一数据相比从存储***中读取第一数据耗费的时间更少,因此能更快地计算第二校验数据,以提高存储***计算校验数据的效率,进而利于提高存储***存储数据的效率。In the above embodiments, since the total amount of data stored in the log is generally less than the total amount of data stored in the data stripes of the storage system, reading the first data from the log is compared to reading the first data from the storage system. It takes less time to fetch the first data, so the second verification data can be calculated faster, so as to improve the efficiency of the storage system for calculating the verification data, and further help to improve the efficiency of the storage system for storing data.
在一种可能的实施方式中,还可以缓存第一数据,利用缓存的第一数据和第二数据的校验数据,计算第一数据和第二数据的校验数据,以获得第二校验数据。In a possible implementation manner, the first data can also be cached, and the cached first data and the second data's check data can be used to calculate the check data of the first data and the second data to obtain the second check data data.
在上述实施方式中,由于缓存的读取速度一般较快,因此利用缓存的第一数据进行校验计算,可以更快地计算出第二校验数据,有利于提高存储数据的效率。In the above embodiments, since the reading speed of the cache is generally fast, the second check data can be calculated faster by using the cached first data to perform check calculation, which is beneficial to improve the efficiency of storing data.
在一种可能的实施方式中,当第一数据和第二数据凑满分条中的所有数据条带时,那么第二校验数据与所有数据条带中的数据构成EC校验关系,因此可以将第二校验数据存储到分条中的校验条带中,从而完成凑满一个分条的过程。且,计算校验数据与存储数据可以同步进行,有利于提高存储数据的效率。In a possible implementation, when the first data and the second data make up all the data stripes in the stripe, the second check data and the data in all the data stripes form an EC check relationship, so it can The second verification data is stored in the verification stripe in the stripe, thereby completing the process of filling up a stripe. Moreover, the calculation of the verification data and the storage of the data can be performed simultaneously, which is beneficial to improving the efficiency of the storage of the data.
第二方面,本申请实施例提供一种数据存储装置,所述装置包括通信接口和处理器。其中,通信接口和处理器可以用于实现上述第一方面中的任一的存储***中的数据存储方法。例如,所述通信接口用于接收第一数据;所述处理器用于当所述第一数据的大小小于一个分条中的一个数据条带的大小时,将所述第一数据存储到所述分条中第一数据对应的数据条带中,不计算所述第一数据的校验数据,并将所述第一数据记录到日志中。In a second aspect, an embodiment of the present application provides a data storage device, where the device includes a communication interface and a processor. Wherein, the communication interface and the processor may be used to implement the data storage method in any storage system in the first aspect above. For example, the communication interface is used to receive first data; the processor is used to store the first data in the In the data stripe corresponding to the first data in the stripe, the check data of the first data is not calculated, and the first data is recorded in a log.
可选的,该数据存储装置还包括其他部件,例如,天线,输入输出模块,接口等等。这些部件可以是硬件,软件,或者软件和硬件的结合。Optionally, the data storage device further includes other components, such as an antenna, an input/output module, an interface, and the like. These components can be hardware, software, or a combination of software and hardware.
第三方面,本申请实施例提供一种数据存储装置,所述装置包括通信模块和处理模块,通信模块和处理模块可以用于实现上述第一方面中的任一的存储***中的数据存储方法。例如,通信模块用于接收第一数据;处理模块用于当所述第一数据的大小小于一个分条中的一个数据条带的大小时,将所述第一数据存储到所述分条中第一数据对应的数据条带中,不计算所述第一数据的校验数据,并将所述第一数据记录到日志中。In the third aspect, the embodiment of the present application provides a data storage device, the device includes a communication module and a processing module, and the communication module and the processing module can be used to implement the data storage method in any storage system in the first aspect above . For example, the communication module is used to receive the first data; the processing module is used to store the first data in the stripe when the size of the first data is smaller than the size of a data stripe in a stripe In the data stripe corresponding to the first data, the check data of the first data is not calculated, and the first data is recorded in a log.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行第一方面中任意一项所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and when the computer program runs on a computer, the computer executes the computer program described in the first aspect. any one of the methods described.
第五方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第一方面中任意一项所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes the first The method described in any one of the aspects.
第六方面,本申请提供了一种芯片***,该芯片***包括处理器,还可以包括存储器,用于实现第一方面所述的方法。该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。In a sixth aspect, the present application provides a chip system, which includes a processor and may further include a memory, configured to implement the method described in the first aspect. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.
第七方面,本申请实施例提供了一种存储***,该存储***包括第二方面任一所述的数据存储装置,或者,该存储***包括第三方面任一所述的数据存储装置。In a seventh aspect, an embodiment of the present application provides a storage system, the storage system includes the data storage device described in any one of the second aspects, or the storage system includes the data storage device described in any one of the third aspects.
上述第二方面至第七方面及其实现方式的有益效果可以参考对第一方面的方法及其实施方式的有益效果的描述。For the beneficial effects of the above-mentioned second aspect to the seventh aspect and their implementation manners, reference may be made to the description of the beneficial effects of the method of the first aspect and its implementation manners.
附图说明Description of drawings
图1A为本申请实施例适用的一种应用场景示意图;FIG. 1A is a schematic diagram of an application scenario applicable to an embodiment of the present application;
图1B为本申请实施例适用的另一种应用场景示意图;FIG. 1B is a schematic diagram of another application scenario applicable to the embodiment of the present application;
图2A为本申请实施例适用的又一种应用场景示意图;FIG. 2A is a schematic diagram of another application scenario applicable to the embodiment of the present application;
图2B为本申请实施例适用的再一种应用场景示意图;FIG. 2B is a schematic diagram of another application scenario applicable to the embodiment of the present application;
图2C为本申请实施例适用的再一种应用场景示意图;FIG. 2C is a schematic diagram of another application scenario applicable to the embodiment of the present application;
图3为本申请实施例适用的一种数据存储***中的逻辑层的分布示意图;FIG. 3 is a schematic diagram of the distribution of logical layers in a data storage system applicable to an embodiment of the present application;
图4为本申请实施例提供的一种存储***中的数据存储方法的一种流程示意图;FIG. 4 is a schematic flowchart of a data storage method in a storage system provided in an embodiment of the present application;
图5为本申请实施例提供的一种存储数据的过程示意图;FIG. 5 is a schematic diagram of a process of storing data provided by an embodiment of the present application;
图6为本申请实施例提供的数据存储装置的一种示例的结构示意图;FIG. 6 is a schematic structural diagram of an example of a data storage device provided by an embodiment of the present application;
图7为本申请实施例提供的数据存储装置的另一种示例的结构示意图。FIG. 7 is a schematic structural diagram of another example of a data storage device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或***实施例中。In order to make the purpose, technical solution and advantages of the application clearer, the application will be further described in detail below in conjunction with the accompanying drawings. The specific operation methods in the method embodiments can also be applied to the device embodiments or system embodiments.
以下,对本申请实施例中的部分用语进行解释说明,以便于本领域技术人员理解。In the following, some terms used in the embodiments of the present application are explained, so as to facilitate the understanding of those skilled in the art.
1、本申请实施例中的EC校验机制涉及数据条带和校验条带,其中数据条带用于存储数据,校验条带用于存储数据条带中的数据的校验数据,数据条带和校验条带构成一个分条;数据条带和校验条带的大小相同。当数据条带中的数据丢失时,可以使用校验条带中的校验数据以及未丢失数据的数据条带中的数据恢复发生数据丢失的数据条带中的数据。EC校验算法包括阵列纠删码算法、里德-所罗门类(reed-solomon,RS)纠删码算法或低密度奇偶校验(low density parity check code,LDPC)纠删码算法等,本申请实施例中的校验计算可以采用任意一种EC校验算法。本申请实施例中的EC校验机制包含独立硬盘冗余阵列(redundant array of independent disks,RAID)机制。例如,一个分条中包含数据条带d1、d2和d3,以及校验条带y1,d1、d2、d3和y1对应存储***中相应的存储空间。例如d1中的数据丢失,则存储***可以根据y1中的校验数据、d2中的数据和d3中的数据恢复出d1中的数据。1. The EC verification mechanism in the embodiment of this application involves a data stripe and a verification stripe, wherein the data stripe is used to store data, and the verification stripe is used to store the verification data of the data in the data stripe, and the data A stripe and a parity stripe form a stripe; a data stripe and a parity stripe are the same size. When the data in the data stripe is lost, the data in the data stripe in which data loss occurs can be recovered by using the check data in the check stripe and the data in the data stripe without data loss. EC check algorithm includes array erasure code algorithm, Reed-Solomon type (reed-solomon, RS) erasure code algorithm or low density parity check (low density parity check code, LDPC) erasure code algorithm, etc., this application The check calculation in this embodiment may use any EC check algorithm. The EC verification mechanism in the embodiment of the present application includes a redundant array of independent hard disks (redundant array of independent disks, RAID) mechanism. For example, a stripe includes data stripes d1, d2, and d3, and a check stripe y1, and d1, d2, d3, and y1 correspond to corresponding storage spaces in the storage system. For example, if the data in d1 is lost, the storage system can restore the data in d1 according to the verification data in y1 , the data in d2 and the data in d3.
本申请中,对于名词的数目,除非特别说明,表示“单数名词或复数名词”,即"一个或多个”。“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。例如,A/B,表示:A或B。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),表示:a,b,c,a和b,a和c,b和c,或a和b和c,其中a,b,c可以是单个,也可以是多个。In this application, the number of nouns, unless otherwise specified, means "singular noun or plural noun", that is, "one or more". "At least one" means one or more, and "plurality" means two or more. "And/or" describes the association relationship of associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. For example, A/B means: A or B. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c means: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, c Can be single or multiple.
除非有特定的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度,例如,本申请实施例中的 “第一数据”和“第二数据”用于表示两个数据,并不限定两个数据的大小或接收两个数据的顺序等。又例如,本申请实施例中的“第一存储介质”和“第二存储介质”用于表示两个存储介质,并不限定两个存储介质的优先级或重要程度等。Unless otherwise specified, ordinal numerals such as "first" and "second" mentioned in the embodiments of this application are used to distinguish multiple objects, and are not used to limit the order, timing, priority or importance of multiple objects. For example, "first data" and "second data" in the embodiments of the present application are used to represent two data, and do not limit the size of the two data or the order of receiving the two data. As another example, the "first storage medium" and "second storage medium" in the embodiments of the present application are used to represent two storage media, and do not limit the priority or importance of the two storage media.
为了减少数据存储过程中存储***的计算资源消耗,本申请实施例提供一种存储***中的数据存储方法。在该数据存储方法中,当待存储的第一数据的大小小于数据条带的大小时,将第一数据存储到分条中相应的数据条带中,无需对第一数据校验计算,相较于每次都对数据进行校验计算的方式,本申请实施例中的数据存储方法可以相对减少对数据进行校验计算的次数,进而减少了数据存储过程中的计算量。并且,接收第一数据之后,将第一数据存储到分条中相应的数据条带中,从而实现数据的持久化存储,以提高存储数据的效率。将第一数据写入日志中,后续存储第一数据过程出现异常,可以基于日志中的第一数据,重新执行第一数据的写操作,提高存储***中存储数据的可靠性。In order to reduce computing resource consumption of a storage system during data storage, an embodiment of the present application provides a data storage method in a storage system. In this data storage method, when the size of the first data to be stored is smaller than the size of the data stripe, the first data is stored in the corresponding data stripe in the stripe, and there is no need to check and calculate the first data, which is relatively Compared with the method of performing verification calculations on data each time, the data storage method in the embodiment of the present application can relatively reduce the number of times of data verification calculations, thereby reducing the amount of calculations in the data storage process. Moreover, after receiving the first data, the first data is stored in a corresponding data stripe in the stripe, so as to realize persistent storage of data and improve the efficiency of storing data. When the first data is written into the log, if an exception occurs in the subsequent storage of the first data, the write operation of the first data can be re-executed based on the first data in the log to improve the reliability of the data stored in the storage system.
本申请实施例中的数据存储方法可以适用于各类集中式数据存储***,以及各类分布式数据存储***。下面先对本申请实施例适用的集中式数据存储***进行示例介绍。为了简化描述,下文中的集中式数据存储***和分布式数据存储***均可以简称为存储***。The data storage method in the embodiment of the present application can be applied to various centralized data storage systems and various distributed data storage systems. The following first introduces an example of a centralized data storage system applicable to the embodiment of the present application. To simplify the description, both the centralized data storage system and the distributed data storage system hereinafter may be referred to as storage systems for short.
请参照图1A,为本申请实施例适用的一种应用场景示意图,或者视为一种盘控一体式的集中式数据存储***的架构示意图。该存储***120可以通过交换机110与主机100通信。例如,主机100通过交换机110将待存储的数据发送给存储***120,存储***120对数据执行写操作。Please refer to FIG. 1A , which is a schematic diagram of an application scenario applicable to the embodiment of the present application, or a schematic diagram of an architecture of a centralized data storage system with integrated disk and control. The storage system 120 can communicate with the host 100 through the switch 110 . For example, the host 100 sends the data to be stored to the storage system 120 through the switch 110, and the storage system 120 performs a write operation on the data.
其中,交换机110例如为光纤交换机。作为一个示例,交换机110为一个可选的设备,例如,主机100可以通过网络与存储***120通信。Wherein, the switch 110 is, for example, a fiber optic switch. As an example, the switch 110 is an optional device, for example, the host 100 can communicate with the storage system 120 through a network.
存储***120包括引擎121,引擎121可以视为该集中式数据存储***的入口,所有从外部设备来的数据都要经过该入口。例如,引擎121接收数据的同时,可以接收数据的地址信息,该地址信息例如存储数据的逻辑地址,在存储数据时,引擎121可以根据地址信息,将该数据存储至一个分条中的相应的数据条带中。其中,引擎121包括一个或多个控制器,图1A以引擎121包含两个控制器(如控制器0和控制器1)为例进行说明,实际不限制引擎121包括的控制器的数量。The storage system 120 includes an engine 121. The engine 121 can be regarded as an entrance of the centralized data storage system, and all data from external devices must pass through the entrance. For example, when the engine 121 receives the data, it can also receive the address information of the data, such as the logical address of the stored data. When storing the data, the engine 121 can store the data to the corresponding in the data strip. The engine 121 includes one or more controllers. FIG. 1A takes the engine 121 including two controllers (such as controller 0 and controller 1) as an example for illustration, and the number of controllers included in the engine 121 is not actually limited.
控制器0包括中央处理器(central processing unit,CPU)122和内存123。CPU 122用于处理来自存储***120的外部(例如,服务器、其他存储***)或内部生成的写请求或读请求。其中,写请求用于请求将数据写入存储***120数据。读请求用于请求从存储***120中读取数据。控制器0可以包括一个或多个CPU122,一个CPU122具有一个或多个CPU核,本申请实施例不对CPU的数量以及CPU核的数量进行限定。内存123是指与CPU122可以直接交换数据的内存存储器。CPU122可以对内存123进行写操作或读操作。例如,内存123可以缓存数据,后续引擎121可以快速地从内存123中读取出数据,以加快后续计算校验数据的过程。内存123可以包括一种或多种存储器,本申请实施例不对内存123的数量和类型进行限定。The controller 0 includes a central processing unit (central processing unit, CPU) 122 and a memory 123 . The CPU 122 is used to process write requests or read requests from the storage system 120 generated externally (for example, servers, other storage systems) or internally. Wherein, the write request is used to request to write data into the storage system 120 data. The read request is used to request to read data from the storage system 120 . The controller 0 may include one or more CPUs 122, and one CPU 122 has one or more CPU cores, and the embodiment of the present application does not limit the number of CPUs and the number of CPU cores. The memory 123 refers to a memory memory capable of directly exchanging data with the CPU 122 . The CPU 122 can perform write or read operations on the memory 123 . For example, the memory 123 can cache data, and the subsequent engine 121 can quickly read the data from the memory 123 to speed up the subsequent process of calculating and verifying data. The memory 123 may include one or more types of memory, and the embodiment of the present application does not limit the quantity and type of the memory 123 .
除此之外,控制器0还可以包括前端接口124和后端接口125。其中前端接口124用于与主机100通信,为主机100提供存储服务。后端接口125用于与硬盘126通信,以扩充存储***120的容量。引擎121可以通过后端接口125连接更多的硬盘126。In addition, the controller 0 may also include a front-end interface 124 and a back-end interface 125 . The front-end interface 124 is used to communicate with the host 100 to provide storage services for the host 100 . The backend interface 125 is used for communicating with the hard disk 126 to expand the capacity of the storage system 120 . The engine 121 can be connected with more hard disks 126 through the back-end interface 125 .
作为一个示例,在图1A所示的存储***120还包括多个硬盘126,硬盘126可以是磁盘或者其他类型的存储介质,例如固态硬盘或机械硬盘等,本申请对此不作限定。多个硬盘126可以部署在引擎121中的硬盘槽位中,这时,后端接口125属于可选配置。或者,硬盘126 可以通过后端接口125与引擎121通信。As an example, the storage system 120 shown in FIG. 1A further includes a plurality of hard disks 126. The hard disks 126 may be magnetic disks or other types of storage media, such as solid state disks or mechanical hard disks, which are not limited in the present application. A plurality of hard disks 126 may be deployed in the hard disk slots of the engine 121, and at this time, the back-end interface 125 is an optional configuration. Alternatively, the hard disk 126 may communicate with the engine 121 through the backend interface 125 .
图1A中所示的存储***120中的硬盘126的物理存储空间提供分条中条带的存储空间。在引擎121对数据执行写操作时,具体是将数据存储在相应的数据条带中,也就是将数据存储在硬盘126对应的物理存储空间中。同理,引擎121还可以将数据对应的校验数据存储在校验条带中,也就是将校验数据存储在硬盘126对应的物理存储空间中。The physical storage space of the hard disk 126 in the storage system 120 shown in FIG. 1A provides the storage space of stripes in a stripe. When the engine 121 performs a write operation on data, it specifically stores the data in a corresponding data stripe, that is, stores the data in a physical storage space corresponding to the hard disk 126 . Similarly, the engine 121 may also store the verification data corresponding to the data in the verification stripe, that is, store the verification data in the physical storage space corresponding to the hard disk 126 .
请参照图1B,为本申请实施例提供的数据存储方法适用的另一种应用场景示意图,或者可以视为一种盘控分离式的集中式数据存储***的架构示意图。该存储***120可以通过交换机110与主机100通信。该存储***120包括引擎121、CPU122、内存123、前端接口124、后端接口125以及硬盘框130。引擎121包括如图1B所示的控制器0和控制器1。其中,引擎121、控制器0、控制器1、CPU122、内存123、前端接口124、后端接口125的实现方式和作用可以参照前文图1A论述的内容。Please refer to FIG. 1B , which is a schematic diagram of another application scenario applicable to the data storage method provided by the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a centralized data storage system with separate disk control. The storage system 120 can communicate with the host 100 through the switch 110 . The storage system 120 includes an engine 121 , a CPU 122 , a memory 123 , a front-end interface 124 , a back-end interface 125 and a hard disk enclosure 130 . The engine 121 includes a controller 0 and a controller 1 as shown in FIG. 1B . Wherein, the implementation and functions of the engine 121 , controller 0 , controller 1 , CPU 122 , memory 123 , front-end interface 124 , and back-end interface 125 can refer to the content discussed in FIG. 1A above.
与盘控一体式的集中式数据存储***不同的是,图1B所示的引擎121需要通过单独的后端接口125接入硬盘框130中,硬盘框130上可以设置有多个硬盘。Different from the centralized data storage system integrated with disk control, the engine 121 shown in FIG. 1B needs to be connected to the hard disk enclosure 130 through a separate back-end interface 125, and the hard disk enclosure 130 can be provided with multiple hard disks.
其中,硬盘框130包括网卡131、控制单元132和若干个硬盘126。网卡131用于硬盘框130与引擎121之间的通信。硬盘框130可以属于智能盘框,智能盘框是指硬盘框具有计算资源和存储资源,能够独立完成数据处理功能的硬盘框。控制单元132可以包括CPU和内存。CPU用于执行地址转换以及读写数据等操作。内存用于临时存储将要写入硬盘126的数据,或者从硬盘126读取出来将要发送给控制单元132的数据。控制单元132的形态和数量可以是任意的,本申请对此不作限定。Wherein, the hard disk enclosure 130 includes a network card 131 , a control unit 132 and several hard disks 126 . The network card 131 is used for communication between the hard disk enclosure 130 and the engine 121 . The hard disk enclosure 130 may belong to a smart disk enclosure. A smart disk enclosure refers to a hard disk enclosure that has computing resources and storage resources and can independently complete data processing functions. The control unit 132 may include a CPU and a memory. The CPU is used to perform operations such as address translation and reading and writing data. The internal memory is used for temporarily storing data to be written into the hard disk 126 , or data read from the hard disk 126 to be sent to the control unit 132 . The shape and quantity of the control unit 132 can be arbitrary, and this application does not limit it.
其中,图1B中所示的存储***120中的硬盘126的物理存储空间提供分条中条带的存储空间。在引擎121存储数据时,是将数据对应写入分条中的相应数据条带中。Wherein, the physical storage space of the hard disk 126 in the storage system 120 shown in FIG. 1B provides the storage space of the stripes in the stripes. When the engine 121 stores data, it writes the data into corresponding data stripes in the stripes.
上述图1A和图1B是集中式数据存储***的架构示例。下面对本申请实施例适用的分布式数据存储***进行示例介绍。分布式数据存储***包括存算分离式的分布式数据存储***、全融合式的分布式数据存储***和存算一体式的分布式数据存储***,下面分别进行示例介绍。The aforementioned Figures 1A and 1B are architectural examples of a centralized data storage system. An example of a distributed data storage system applicable to the embodiments of the present application is introduced below. Distributed data storage systems include a distributed data storage system that separates storage from computing, a fully integrated distributed data storage system, and a distributed data storage system that integrates storage and computing. Examples are introduced below.
如图2A所示,为本申请实施例适用的又一应用场景示意图,或者可以视为一种存算分离式的分布式数据存储***的架构示意图。该存储***包括计算节点集群和存储节点集群。As shown in FIG. 2A , it is a schematic diagram of another application scenario applicable to the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a distributed data storage system with separation of storage and calculation. The storage system includes computing node clusters and storage node clusters.
计算节点集群包括一个或多个计算节点210,存储节点集群包括一个或多个存储节点250。各个计算节点210之间可以相互通信。图2A中是以两个计算节点210,两个存储节点250为例,实际不限制计算节点210和存储节点250的数量。任意一个计算节点210可通过网络访问存储节点集群中的任意一个存储节点250。例如,计算节点210接收待存储的数据,将待存储的数据发送给存储节点250,存储节点250可以对数据执行写操作。The computing node cluster includes one or more computing nodes 210 , and the storage node cluster includes one or more storage nodes 250 . Each computing node 210 can communicate with each other. In FIG. 2A , two computing nodes 210 and two storage nodes 250 are taken as an example, and the number of computing nodes 210 and storage nodes 250 is not limited actually. Any computing node 210 can access any storage node 250 in the storage node cluster through the network. For example, computing node 210 receives data to be stored, and sends the data to be stored to storage node 250, and storage node 250 may perform a write operation on the data.
其中,计算节点210泛指具有计算功能的设备,例如服务器、台式计算机或者存储阵列的控制器等。如图2A所示,计算节点210至少包括CPU211、内存212和网卡213。CPU211可以用于处理来自计算节点210外部的写请求或读请求,或者计算节点210内部生成的写请求或读请求。网卡213用于与存储节点250通信。另外,计算节点210还可以包括总线,在图2A中的总线可以用于计算节点210各组件之间的通信。图2A中仅示意出了一个CPU211,实际上CPU211的数量可以是一个或多个。内存212的作用可以参照前文图1A中内存的作用,此处不再赘述。Wherein, the computing node 210 generally refers to a device having a computing function, such as a server, a desktop computer, or a controller of a storage array. As shown in FIG. 2A , the computing node 210 includes at least a CPU 211 , a memory 212 and a network card 213 . The CPU 211 may be used to process a write request or a read request from outside the computing node 210 , or a write request or a read request generated inside the computing node 210 . The network card 213 is used for communicating with the storage node 250 . In addition, computing node 210 may further include a bus, and the bus in FIG. 2A may be used for communication between components of computing node 210 . Only one CPU 211 is shown in FIG. 2A , but actually there may be one or more CPUs 211 . The function of the memory 212 can refer to the function of the memory in FIG. 1A above, and will not be repeated here.
其中,一个存储节点250包括一个或多个控制器251、网卡252、硬盘253和内存254。例如,控制器251用于根据计算节点210发送的数据,向硬盘253中写入数据。网卡252用 于与计算节点210通信。硬盘253的实现方式可以参照前文。内存254用于临时存储将要写入硬盘253的数据,或者从硬盘253读取出来将要发送给计算节点210的数据。在实际应用时,控制器251可以有多种形态,例如,控制器251包括CPU。控制器251还可以包括内存,该内存的作用可以参照前文图1A中内存的作用。或者例如,控制器251是一个可编程的电子部件,例如数据处理单元(data processing unit,DPU)、图像处理单元(graphics processing unit,GPU)或嵌入式神经网络处理器(neural-network processing units,NPU)等处理芯片。控制器251的数量可以是任意的,本申请实施例对此不作限定。Wherein, one storage node 250 includes one or more controllers 251 , network cards 252 , hard disks 253 and memory 254 . For example, the controller 251 is configured to write data into the hard disk 253 according to the data sent by the computing node 210 . Network card 252 is used to communicate with computing node 210. For the implementation manner of the hard disk 253, reference may be made to the foregoing. The memory 254 is used for temporarily storing data to be written into the hard disk 253 , or reading data from the hard disk 253 to be sent to the computing node 210 . In actual application, the controller 251 may have various forms, for example, the controller 251 includes a CPU. The controller 251 may also include a memory, and the function of the memory may refer to the function of the memory in FIG. 1A above. Or for example, the controller 251 is a programmable electronic component, such as a data processing unit (data processing unit, DPU), an image processing unit (graphics processing unit, GPU) or an embedded neural network processor (neural-network processing units, NPU) and other processing chips. The number of controllers 251 may be arbitrary, which is not limited in this embodiment of the present application.
作为一个示例,存储节点250内部可以不具有控制器251,例如,控制器251的功能可以卸载到网卡252上,这种情况下,可以由网卡252来完成数据读写、地址转换以及其他计算功能。这种情况下,网卡252可以为智能网卡,网卡252可以包含CPU和内存,CPU用于执行地址转换以及读写等操作,这时,可以由网卡252接收计算节点210发送的第一数据,并将第一数据存储到相应的硬盘253中。内存的作用可以参照前文。这种情况下,存储节点250中的网卡252和硬盘253之间可以没有归属关系,即网卡252可以访问存储节点250中的任意一个硬盘253。As an example, the storage node 250 may not have a controller 251 inside. For example, the functions of the controller 251 may be offloaded to the network card 252. In this case, the network card 252 may be used to complete data reading and writing, address translation and other computing functions. . In this case, the network card 252 can be an intelligent network card, and the network card 252 can include a CPU and a memory, and the CPU is used to perform operations such as address translation and reading and writing. At this time, the network card 252 can receive the first data sent by the computing node 210, and Store the first data in the corresponding hard disk 253 . The role of memory can refer to the previous article. In this case, there may be no affiliation relationship between the network card 252 and the hard disk 253 in the storage node 250 , that is, the network card 252 can access any hard disk 253 in the storage node 250 .
在图2A所示的数据存储***架构中,图1A中所示的存储***120中的每个存储节点中的一个硬盘253或多个硬盘253中的物理存储空间为分条中的条带提供存储空间。存储节点250在执行数据写操作时,存储节点250实质是将数据存储至相应的数据条带中,也就是将数据存储至存储节点250相应的硬盘中。In the data storage system architecture shown in FIG. 2A, the physical storage space in one hard disk 253 or a plurality of hard disks 253 in each storage node in the storage system 120 shown in FIG. 1A provides stripes in stripes. storage. When the storage node 250 executes a data write operation, the storage node 250 essentially stores the data in a corresponding data stripe, that is, stores the data in a corresponding hard disk of the storage node 250 .
请参照图2B,为本申请实施例提供的数据存储方法适用的又一应用场景示意图,或者可以视为一种全融合式的分布式数据存储***的架构示意图。该全融合式的分布式数据存储***包括服务器集群,服务器集群包括一个或多个服务器260,各个服务器260之间可以相互通信。Please refer to FIG. 2B , which is a schematic diagram of another application scenario applicable to the data storage method provided by the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a fully integrated distributed data storage system. The fully integrated distributed data storage system includes a server cluster, and the server cluster includes one or more servers 260, and each server 260 can communicate with each other.
服务器260是泛指具有计算能力和存储能力的设备,如服务器、台式计算机等。例如,服务器260可以通过进阶精简指令集机器(advanced risc machine,ARM)服务器或者X86服务器实现。在软件上,服务器260上可以包括虚拟机(virtual machine,VM)262,VM 262所需的计算资源来源于服务器本地的处理器和内存,VM 262所需的存储资源既可以来源于服务器本地的硬盘,也可以来自其他服务器中的硬盘。此外,VM 262中可运行各种应用程序,用户可通过虚拟机中的应用程序触发读/写请求。The server 260 generally refers to devices with computing capabilities and storage capabilities, such as servers, desktop computers, and the like. For example, the server 260 may be implemented by an advanced risc machine (ARM) server or an X86 server. In terms of software, the server 260 can include a virtual machine (virtual machine, VM) 262, the computing resources required by the VM 262 come from the local processor and memory of the server, and the storage resources required by the VM 262 can be derived from the local server. Hard disks, which can also come from hard disks in other servers. In addition, various applications can run in the VM 262, and users can trigger read/write requests through the applications in the virtual machine.
在硬件上,每个服务器260还可以包括处理器261、网卡252、硬盘253和内存254。其中网卡252和内存254的实现形式和功能可以参照前文图2A论述的内容。处理器261可以接收待存储的数据,并对数据执行写操作。In terms of hardware, each server 260 may also include a processor 261 , a network card 252 , a hard disk 253 and a memory 254 . The implementation forms and functions of the network card 252 and the memory 254 can refer to the contents discussed above in FIG. 2A . The processor 261 can receive data to be stored, and perform a write operation on the data.
在图2B所示的数据存储***架构中,每个服务器中的一个硬盘253或多个硬盘253中的物理存储空间为分条中的条带提供存储空间。服务器260在执行数据写操作时,服务器260实质是将数据存储至相应的数据条带中,也就是将数据存储至服务器260相应的硬盘中。In the data storage system architecture shown in FIG. 2B , the physical storage space in one hard disk 253 or multiple hard disks 253 in each server provides storage space for the stripes in the stripes. When the server 260 executes a data writing operation, the server 260 essentially stores the data in a corresponding data stripe, that is, stores the data in a corresponding hard disk of the server 260 .
请参照图2C,为本申请实施例提供的数据存储方法适用的再一应用场景示意图,或者可以视为一种存算一体式的分布式数据存储***的架构示意图。该存算一体式的分布式数据存储***包括服务器集群,服务器集群包括一个或多个服务器260,各个服务器260之间可以相互通信。Please refer to FIG. 2C , which is a schematic diagram of another application scenario applicable to the data storage method provided by the embodiment of the present application, or it can be regarded as a schematic diagram of an architecture of a distributed data storage system integrating storage and computing. The storage-computing integrated distributed data storage system includes a server cluster, and the server cluster includes one or more servers 260, and each server 260 can communicate with each other.
在硬件上,服务器260至少包括处理器261、内存254、网卡252和硬盘253。处理器261、内存254、网卡252和硬盘253之间通过总线连接。其中,处理器261和内存254可以用于提供计算资源。In terms of hardware, the server 260 includes at least a processor 261 , a memory 254 , a network card 252 and a hard disk 253 . The processor 261 , the memory 254 , the network card 252 and the hard disk 253 are connected through a bus. Wherein, the processor 261 and the memory 254 may be used to provide computing resources.
同理,在图2C所示的数据存储***架构中,每个服务器中的一个硬盘253或多个硬盘253中的物理存储空间为分条中的条带提供存储空间。服务器260在执行数据写操作时,服务器260实质是将数据存储至相应的数据条带中,也就是将数据存储至服务器260相应的硬盘中。Similarly, in the data storage system architecture shown in FIG. 2C , the physical storage space in one hard disk 253 or multiple hard disks 253 in each server provides storage space for the stripes in the stripes. When the server 260 executes a data writing operation, the server 260 essentially stores the data in a corresponding data stripe, that is, stores the data in a corresponding hard disk of the server 260 .
需要说明的是,存算一体式的分布式数据存储***与全融合式的分布式存储***不同的是,存算一体式的分布式数据存储***中的服务器可以没有虚拟机,以及没有运行相应的应用程序。存算一体式的分布式数据存储***与存算分离式的分布式数据存储***不同的是,存算一体式的分布式数据存储***中的服务器相当于集成了存算分离式的分布式数据存储***中的存储节点和计算节点的功能。It should be noted that the difference between the integrated storage and computing distributed data storage system and the fully integrated distributed storage system is that the servers in the integrated storage and computing distributed data storage system may not have virtual machines, and do not run corresponding s application. The difference between the storage-computing integrated distributed data storage system and the storage-computing-separated distributed data storage system is that the server in the storage-computing-integrated distributed data storage system is equivalent to integrating the storage-computing separated Functions of storage nodes and computing nodes in a storage system.
请参见图3,为本申请实施例提供的数据存储方法适用的一种数据存储***的逻辑层的分布示意图。图3中的存储***300例如可以为前文中图1A、图1B、图2A、图2B或图2C中任一论述的存储***。该存储***300包括若干个硬盘320。其中,硬盘320的类型可以参照前文。Please refer to FIG. 3 , which is a schematic diagram of logical layer distribution of a data storage system applicable to the data storage method provided in the embodiment of the present application. The storage system 300 in FIG. 3 may be, for example, the storage system discussed in any one of FIG. 1A , FIG. 1B , FIG. 2A , FIG. 2B or FIG. 2C above. The storage system 300 includes several hard disks 320 . Wherein, the type of the hard disk 320 can refer to the foregoing.
另外,存储***300还提供客户端310,客户端310可以理解为访问存储***300的入口。客户端310用于向主机提供逻辑存储空间。该客户端310可以位于主机侧,例如,客户端310位于图1A或图1B所示的主机中,该客户端310可以位于存储***中的一个存储节点,例如,客户端310位于图2A所示的存储节点中,或者客户端310还可以位于图2B或图2C所示的服务器中。In addition, the storage system 300 also provides a client 310 , which may be understood as an entrance for accessing the storage system 300 . The client 310 is used to provide logical storage space to the host. The client 310 may be located on the host side. For example, the client 310 is located in the host shown in FIG. 1A or FIG. 1B. The client 310 may be located in a storage node in the storage system. or the client 310 may also be located in the server shown in FIG. 2B or FIG. 2C.
如图3所示,每个硬盘320被划分为若干个物理块(chunk)321,这些物理chunk321映射成逻辑块331从而构成一个存储池330,存储池330用于向上提供存储空间,所述存储空间实际来源于***所包含的硬盘320。当然,并非所有硬盘320都需要提供空间给所述存储池330。存储***中可包含一个或多个存储池330,一个存储池330包括部分或全部硬盘320。来自不同硬盘320或者不同存储节点的多个逻辑chunk组成一个逻辑块组(chunk group)340,所述逻辑块组340是所述存储池330的最小分配单位。一个逻辑块组340可以包括一个或多个分条341,如图3所示,一个分条341包括4个数据条带(如图3中0-3所示的4个数据条带)以及2个校验条带(如图3中所示的P1和Q1所示的校验条带)。例如,图1A或图1B中所有硬盘组成一个存储池,图2A中所有存储节点上的多个硬盘组成一个或多个存储池,图2B或图2C中所有服务器上的多个硬盘可以组成一个或多个存储池。As shown in Figure 3, each hard disk 320 is divided into several physical blocks (chunks) 321, and these physical chunks 321 are mapped into logical blocks 331 to form a storage pool 330, and the storage pool 330 is used to provide storage space upwards, and the storage The space actually comes from the hard disk 320 included in the system. Of course, not all hard disks 320 need to provide space for the storage pool 330 . The storage system may include one or more storage pools 330 , and one storage pool 330 includes part or all of the hard disks 320 . A plurality of logical chunks from different hard disks 320 or different storage nodes form a logical chunk group (chunk group) 340, and the logical chunk group 340 is the minimum allocation unit of the storage pool 330. A logical block group 340 may include one or more stripes 341, as shown in Figure 3, a stripe 341 includes 4 data stripes (4 data stripes shown as 0-3 in Figure 3) and 2 check strips (the check strips shown in P1 and Q1 shown in Figure 3). For example, all hard disks in Figure 1A or Figure 1B form a storage pool, multiple hard disks on all storage nodes in Figure 2A form one or more storage pools, and multiple hard disks on all servers in Figure 2B or Figure 2C can form a storage pool. or multiple storage pools.
当存储服务层向所述存储池330申请存储空间时,所述存储池330可以提供一个或多个逻辑块组给存储服务层。存储服务层进一步将逻辑块组提供的存储空间虚拟化为逻辑单元(logical unit,LU)350,由客户端310提供给主机使用。其中,每个逻辑单元具有唯一的逻辑单元号(logical unit number,LUN)。由于客户端310能感知到逻辑单元号,因此可以用LUN代指逻辑单元。每个LUN具有LUN ID,用于标识所述LUN。数据位于一个LUN内的具***置可以由起始地址和该数据的长度(length)确定。对于起始地址,可以称为逻辑块地址(logical block address,LBA)。可以理解的是,LUN ID、LBA和length这三个因素标识了一个确定的地址段。客户端310生成的写请求或读请求,通常在写请求或读请求中携带数据的LUN ID、LBA和length。When the storage service layer applies for storage space from the storage pool 330, the storage pool 330 may provide one or more logical block groups to the storage service layer. The storage service layer further virtualizes the storage space provided by the logical block group into a logical unit (logical unit, LU) 350, which is provided by the client 310 to the host for use. Wherein, each logical unit has a unique logical unit number (logical unit number, LUN). Since the client 310 can perceive the logical unit number, LUN may be used to refer to the logical unit. Each LUN has a LUN ID for identifying the LUN. The specific location of data within a LUN can be determined by the start address and the length of the data. For the starting address, it may be called a logical block address (logical block address, LBA). It can be understood that the three factors of LUN ID, LBA and length identify a certain address segment. The write request or read request generated by the client 310 usually carries the LUN ID, LBA and length of the data in the write request or read request.
下面结合附图介绍本申请实施例提供的技术方案。The following describes the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
本申请实施例提供一种存储***中的数据存储方法,该数据存储方法适用于任意采用EC校验机制存储数据的存储***中,例如,可以适用于前文图1A、图1B、图2A、图2B、图2C或图3任一的存储***中。该数据存储方法由存储***执行,具体可以由存储***中的某 个设备执行,具体可以由某个设备中的某个部件执行,例如,该数据存储方法可以由图1A或图1B中的存储***(具体如图1A或图1B中所示的存储***中的引擎)、图2A中的存储节点(具体如图2A中的存储节点中的控制器)、图2B或图2C中的服务器(具体如图2B或图2C中的服务器中的处理器)等执行,也可以由具有数据存储***功能的芯片***实现。The embodiment of the present application provides a data storage method in a storage system. The data storage method is applicable to any storage system that uses an EC verification mechanism to store data. 2B, Figure 2C or any one of the storage systems in Figure 3. The data storage method is executed by the storage system, specifically, it can be executed by a certain device in the storage system, and it can be specifically executed by a certain component in a certain device. For example, the data storage method can be implemented by the storage device in FIG. 1A or FIG. The system (specifically, the engine in the storage system shown in FIG. 1A or FIG. 1B ), the storage node in FIG. 2A (specifically, the controller in the storage node in FIG. 2A ), the server in FIG. 2B or FIG. 2C ( Specifically, the processor in the server as shown in FIG. 2B or FIG. 2C can also be implemented by a chip system having the function of a data storage system.
请参见图4,为本申请实施例提供的一种存储***中的数据存储方法的流程图。图4中是以该方法应用于图1A或图1B所示的集中式数据存储***,该数据存储方法可以由集中式数据存储***中的引擎执行为例进行介绍。Please refer to FIG. 4 , which is a flow chart of a data storage method in a storage system provided by an embodiment of the present application. In FIG. 4 , the method is applied to the centralized data storage system shown in FIG. 1A or FIG. 1B , and the data storage method can be executed by an engine in the centralized data storage system as an example.
步骤401,引擎接收第一写请求,该第一写请求用于请求写入第一数据。Step 401, the engine receives a first write request, where the first write request is used to request to write first data.
第一写请求包括第一数据,第一写请求还可以包括第一数据需要写入的地址信息,第一数据的地址信息例如图3中所示的LBA,地址信息还可以包括LUN ID和length,本申请对此不作限定。The first write request includes the first data, and the first write request can also include the address information that the first data needs to be written into. The address information of the first data is, for example, the LBA shown in Figure 3, and the address information can also include LUN ID and length , which is not limited in this application.
步骤402,引擎将第一数据存储至分条对应的数据条带中。Step 402, the engine stores the first data in a data stripe corresponding to the stripe.
如前文图3论述的内容,逻辑块与物理块存在一定的对应关系,在本申请实施例中,引擎在接收第一写请求之后,可以根据第一写请求中的地址信息以及对应关系,将第一数据存储至物理块中,对应地,也就相当于将第一数据存储至了逻辑块中,也就相当于将第一数据存储在了分条的数据条带中。其中,数据条带的含义可以参照前文。As discussed above in Figure 3, there is a certain correspondence between logical blocks and physical blocks. In the embodiment of this application, after receiving the first write request, the engine can, according to the address information and correspondence in the first write request, write Storing the first data in the physical block corresponds to storing the first data in the logical block, which is equivalent to storing the first data in the striped data stripes. Wherein, the meaning of the data stripe can refer to the above.
如果第一数据的大小小于或等于一个数据条带的大小,那么该第一数据可以被存储至一个数据条带中,因此引擎可以将第一数据存储至该数据条带中。如果第一数据的大小大于一个条带的大小,那么表示一个数据条带无法容纳第一数据,因此引擎可以将第一数据进行拆分,获得第一数据的部分,以及第一数据的另一部分,并分别将拆分第一数据的部分存储至一个数据条带中,将第一数据的另一部分存储至另一个数据条带中。If the size of the first data is smaller than or equal to the size of a data stripe, the first data may be stored in a data stripe, so the engine may store the first data in the data stripe. If the size of the first data is larger than the size of a stripe, it means that a data stripe cannot accommodate the first data, so the engine can split the first data to obtain part of the first data and another part of the first data , and respectively store a part of the split first data into one data stripe, and store another part of the first data into another data stripe.
例如,请参照图5,为本申请实施例提供的一种存储数据的过程示意图。如图5中(1)所示,该存储***包括6个条带,具体包括4个数据条带(如图5中(1)所示的第一数据条带511、第二数据条带512、第三数据条带513、第四数据条带514),以及2个校验条带(如图5中(1)所示的第一校验条带515和第二校验条带516)。校验条带和数据条带的含义可以参照前文。其中该存储***的分条大小为3M,每个条带的大小为512K,即存储***的分条深度为512KB。For example, please refer to FIG. 5 , which is a schematic diagram of a process of storing data provided by an embodiment of the present application. As shown in (1) in Figure 5, the storage system includes 6 stripes, specifically including 4 data stripes (the first data stripe 511, the second data stripe 512 as shown in Figure 5 (1) , the third data stripe 513, the fourth data stripe 514), and two check stripes (the first check stripe 515 and the second check stripe 516 shown in (1) in Figure 5) . For the meaning of parity stripe and data stripe, please refer to the previous section. The stripe size of the storage system is 3M, and the size of each stripe is 512K, that is, the stripe depth of the storage system is 512KB.
以图5中(1)为例,引擎在接收到第一数据D1之后,该第一数据D1的大小为1KB,由于第一数据D1的大小小于一个数据条带的大小,那么表示该一个数据条带已足够容纳第一数据D1,因此可以将第一数据D1存储到第一数据条带511中。这里是将第一数据D1被写入第一数据条带511为例,实际第一数据D1可以被写入任一的数据条带中,本申请对此不作限制。Taking (1) in Figure 5 as an example, after the engine receives the first data D1, the size of the first data D1 is 1KB. Since the size of the first data D1 is smaller than the size of a data stripe, it means that the data The stripe is enough to accommodate the first data D1 , so the first data D1 can be stored in the first data stripe 511 . Here, the first data D1 is written into the first data stripe 511 as an example, and the actual first data D1 can be written into any data stripe, which is not limited in this application.
同理,引擎接收第二数据D2之后,第二数据D2的大小为1KB,由于第一数据D1和第二数据D2的大小之和小于第一数据条带511的大小,那么表示第一数据条带511已足够容纳第二数据D2,因此可以将第二数据D2存储至第一数据条带511中。引擎接收第三数据D3之后,第三数据D3的大小为1KB,由于第一数据D1、第二数据D2和第三数据D3的大小之和小于第一数据条带511的大小,那么表示该第一数据条带511已足够容纳第三数据D3,因此可以将第三数据D3存储至第一数据条带511中。Similarly, after the engine receives the second data D2, the size of the second data D2 is 1KB. Since the sum of the sizes of the first data D1 and the second data D2 is smaller than the size of the first data stripe 511, it means that the first data stripe The stripe 511 is enough to accommodate the second data D2, so the second data D2 can be stored in the first data stripe 511. After the engine receives the third data D3, the size of the third data D3 is 1KB. Since the sum of the sizes of the first data D1, the second data D2 and the third data D3 is smaller than the size of the first data stripe 511, it means that the size of the third data D3 is One data stripe 511 is enough to accommodate the third data D3, so the third data D3 can be stored in the first data stripe 511.
步骤403,如果第一数据的大小小于一个数据条带的大小,引擎不计算第一数据的校验数据,将第一数据记录在日志中。Step 403, if the size of the first data is smaller than the size of one data stripe, the engine does not calculate the verification data of the first data, and records the first data in the log.
采用EC校验机制存储数据时,EC校验计算的计算粒度(又可以称为编码粒度)为一个条带的大小,也就是说,无论数据本身的大小的多少,采用EC校验计算得到的该数据的校验 数据的大小均为一个条带的大小的N倍,N为正整数。如果第一数据的大小小于一个条带,那么计算得到的第一数据的校验数据的大小会大于第一数据的大小。因此在本申请实施例中,如果第一数据的大小小于一个条带的大小,无需计算第一数据的校验数据,而是将第一数据记录在日志中,这样,可以无需计算第一数据的校验数据,引擎无需针对每个写请求进行校验计算,以减少引擎计算校验数据的计算次数。且,第一数据的大小比第一数据的校验数据的大小更小,因此将第一数据记录在日志中,相比将第一数据的校验数据记录在日志中,能够减少日志占用的物理存储空间。且,引擎后续可以采用该日志中的第一数据进行计算第一数据与其他数据的校验数据。另外,如果存储第一数据的过程出现异常,引擎可以利用该日志中记录的第一数据重新存储第一数据,以保证存储***的可靠性。When the EC check mechanism is used to store data, the calculation granularity of the EC check calculation (also called the encoding granularity) is the size of a stripe, that is, regardless of the size of the data itself, the EC check calculation is used The size of the check data of the data is N times the size of a stripe, where N is a positive integer. If the size of the first data is smaller than one stripe, the calculated check data size of the first data will be larger than the size of the first data. Therefore, in this embodiment of the application, if the size of the first data is smaller than the size of a stripe, it is not necessary to calculate the check data of the first data, but to record the first data in the log, so that it is not necessary to calculate the first data The engine does not need to perform verification calculations for each write request, so as to reduce the number of times the engine calculates the verification data. Moreover, the size of the first data is smaller than the size of the verification data of the first data, so recording the first data in the log can reduce the space occupied by the log compared to recording the verification data of the first data in the log. physical storage space. Moreover, the engine may subsequently use the first data in the log to calculate verification data between the first data and other data. In addition, if an abnormality occurs in the process of storing the first data, the engine may use the first data recorded in the log to store the first data again, so as to ensure the reliability of the storage system.
可选地,引擎还可以将第一数据的元数据一并记录在日志中,元数据是用于描述数据的数据,第一数据的元数据例如包括第一数据的大小信息或第一数据的类型信息等。Optionally, the engine may also record the metadata of the first data in the log. The metadata is data used to describe the data. The metadata of the first data includes, for example, the size information of the first data or the type information etc.
其中,日志的格式可以是任意的,本申请对此不做限定。日志可以理解为记录数据的一种形式,日志是存储在相应的存储介质中的。作为一个示例,日志可以存储在第一存储介质中,第一存储介质例如为内存,内存具体例如图1A或图1B所示的引擎中的内存;或者,第一存储介质也可以是该存储***中的硬盘,例如图1A或图1B中所示的硬盘,具体例如为校验盘,校验盘可以理解为存储***中为校验条带提供存储空间的硬盘。Wherein, the format of the log may be arbitrary, which is not limited in this application. The log can be understood as a form of recording data, and the log is stored in the corresponding storage medium. As an example, the log can be stored in a first storage medium, such as a memory, and the memory is specifically such as the memory in the engine shown in Figure 1A or Figure 1B; or, the first storage medium can also be the storage system The hard disk in , such as the hard disk shown in FIG. 1A or FIG. 1B , is specifically a verification disk, which can be understood as a hard disk that provides storage space for a verification stripe in a storage system.
继续沿用图5的例子,例如引擎确定第一数据D1的大小(1KB)、第二数据D2的大小(1KB)和第三数据D3的大小(1KB)均小于第一数据条带511的大小,如图5中(1)所示,则引擎可以将第一数据D1、第二数据D2和第三数据D3记录在日志520中。Continuing to use the example of FIG. 5 , for example, the engine determines that the size (1KB) of the first data D1, the size (1KB) of the second data D2 and the size (1KB) of the third data D3 are all smaller than the size of the first data stripe 511, As shown in (1) in FIG. 5 , the engine may record the first data D1 , the second data D2 and the third data D3 in the log 520 .
步骤404,当第一数据的大小小于一个数据条带的大小时,引擎缓存第一数据。Step 404, when the size of the first data is smaller than the size of one data stripe, the engine caches the first data.
在引擎将第一数据存储至分条中相应的数据条带之后,还未对第一数据进行校验计算,在本申请实施例中,如果第一数据的大小小于一个条带的大小,则引擎除了将第一数据写入日志之外,还可以缓存第一数据,例如,引擎可以将第一数据缓存在第二存储介质中,后续可以根据缓存的第一数据进行校验计算。After the engine stores the first data to the corresponding data stripe in the stripe, the check calculation has not been performed on the first data. In the embodiment of this application, if the size of the first data is smaller than the size of a stripe, then In addition to writing the first data to the log, the engine can also cache the first data. For example, the engine can cache the first data in the second storage medium, and then perform verification calculations based on the cached first data.
可选的,第二存储介质的读取速度大于或等于第一存储介质的读取速度。Optionally, the reading speed of the second storage medium is greater than or equal to the reading speed of the first storage medium.
作为一个示例,第二存储介质可以为内存,该内存例如图1A或图1B所示的引擎中的内存。或者,第二存储介质也可以为固态硬盘(solid-state disk,SSD)等,本申请实施例对此不作限定。例如,当第一存储介质为存储***中的校验盘,第二存储介质为内存时,那么第二存储介质的读取速度大于第一存储介质的读取速度。As an example, the second storage medium may be a memory, such as the memory in the engine shown in FIG. 1A or FIG. 1B . Alternatively, the second storage medium may also be a solid-state disk (solid-state disk, SSD), etc., which is not limited in this embodiment of the present application. For example, when the first storage medium is a verification disk in the storage system and the second storage medium is a memory, the reading speed of the second storage medium is greater than that of the first storage medium.
在本申请实施例中,在第二存储介质的读取速度大于第一存储介质的读取速度的情况下,引擎后续可以通过第二存储介质中的第一数据,对第一数据以及其他数据进行校验计算,这样可以相对提高引擎读取第一数据的速度,引擎可以更快地确定第一数据以及其他数据的校验数据,对应也就能更快地将校验数据存储至校验条带中,从而可以提高整个存储***存储数据的效率。In the embodiment of the present application, when the reading speed of the second storage medium is greater than that of the first storage medium, the engine can then use the first data in the second storage medium to analyze the first data and other data Perform verification calculations, which can relatively increase the speed at which the engine reads the first data, the engine can determine the verification data of the first data and other data faster, and correspondingly can store the verification data to the verification data faster In this way, the data storage efficiency of the entire storage system can be improved.
另外,本申请实施例不限制第一存储介质的写入速度与第二存储介质的写入速度,例如第一存储介质的写入速度小于第二存储介质的写入速度,或者第一存储介质的写入速度大于第二存储介质的写入速度,或者第一存储介质的写入速度与第二存储介质的写入速度相等。In addition, this embodiment of the present application does not limit the writing speed of the first storage medium and the writing speed of the second storage medium, for example, the writing speed of the first storage medium is lower than the writing speed of the second storage medium, or the writing speed of the first storage medium The writing speed of is greater than the writing speed of the second storage medium, or the writing speed of the first storage medium is equal to the writing speed of the second storage medium.
继续沿用图5的例子,如图5中(1)所示,引擎还可以将第一数据D1、第二数据D2和第三数据D3写入内存530中。Continuing with the example in FIG. 5 , as shown in ( 1 ) in FIG. 5 , the engine may also write the first data D1 , the second data D2 and the third data D3 into the memory 530 .
步骤405,当第二数据的大小大于或等于一个数据条带的大小时,主机计算第一校验数据。Step 405, when the size of the second data is greater than or equal to the size of one data stripe, the host calculates the first check data.
其中,第一校验数据为第二数据的校验数据。一种实现中,由位于主机侧的客户端计算第二数据的校验数据,在图4中是以存储***中位于主机侧的客户端计算第二数据的校验数据为例。在另一种实现中,引擎计算第二数据的校验数据。Wherein, the first verification data is the verification data of the second data. In an implementation, the client on the host side calculates the check data of the second data. In FIG. 4 , the client on the host side in the storage system calculates the check data of the second data as an example. In another implementation, the engine calculates checksum data for the second data.
本申请实施例计算第二数据的校验数据与生成第二写请求的顺序可以是任意的,本申请对此不作限定。In this embodiment of the present application, the order of calculating the verification data of the second data and generating the second write request may be arbitrary, and the present application does not limit this.
在主机执行步骤405的情况下,还包括步骤406,引擎接收来自的第二写请求和第一校验数据,该第二写请求用于请求写入第二数据。When the host executes step 405, step 406 is also included, the engine receives the second write request and the first verification data from the host, the second write request is used to request to write the second data.
例如,主机中的客户端向引擎发送第二写请求和第一校验数据。其中,第二写请求包括第二数据,还可以包括第二数据的地址信息,第二数据的地址信息可以参照前文第一数据的地址信息。For example, the client in the host sends the second write request and the first verification data to the engine. Wherein, the second write request includes the second data, and may further include address information of the second data, and the address information of the second data may refer to the above address information of the first data.
步骤406是以主机将第二写请求和第一校验数据同时发送给引擎为例,实际上还可以将第一校验数据和第二写请求分别发送给引擎,本申请对此不做限定。Step 406 is an example where the host sends the second write request and the first verification data to the engine at the same time. In fact, the first verification data and the second write request can also be sent to the engine separately, which is not limited in this application .
作为一个示例,步骤405为可选的步骤。也就是说,可以不计算第二数据的校验数据。As an example, step 405 is an optional step. That is to say, the check data of the second data may not be calculated.
如果没有执行步骤405,那么主机只需向引擎发送第二写请求,对应地,引擎可以只接收来自主机的第二写请求。If step 405 is not performed, the host only needs to send the second write request to the engine, and correspondingly, the engine may only receive the second write request from the host.
作为一个示例,主机向引擎发送第二数据之前,可以确定第一数据和第二数据的大小之和小于或等于分条中的所有数据条带的大小之和,以保证第一数据和第二数据的大小之和不会大于该分条中的所有数据条带的大小。As an example, before the host sends the second data to the engine, it can determine that the sum of the sizes of the first data and the second data is less than or equal to the sum of the sizes of all data stripes in the stripe, so as to ensure that the first data and the second data The sum of the size of the data will not be greater than the size of all the data stripes in the stripe.
步骤407,引擎将第二数据存储至该分条相应的数据条带中。Step 407, the engine stores the second data in the data stripe corresponding to the stripe.
引擎可以根据第二数据的地址信息,确定该第二数据对应的物理块,并将第二数据存储到相应的逻辑块中,也就相当于将第二数据存储至了相应的数据条带中。The engine can determine the physical block corresponding to the second data according to the address information of the second data, and store the second data in the corresponding logical block, which is equivalent to storing the second data in the corresponding data stripe .
如果第二数据的大小与第一数据的大小之和小于或等于一个数据条带的大小,那么表示之前存储第一数据的数据条带,还可以容纳该第二数据,这时可以将第二数据存储至用于存储第一数据的数据条带,换言之,第一数据和第二数据存储在同个数据条带中。If the sum of the size of the second data and the size of the first data is less than or equal to the size of a data stripe, it means that the data stripe that previously stored the first data can also accommodate the second data. At this time, the second The data is stored in the data stripe for storing the first data, in other words, the first data and the second data are stored in the same data stripe.
如果第二数据的大小大于一个数据条带的大小,或者如果第二数据的大小小于一个数据条带的大小且第二数据的大小与第一数据的大小之和大于一个数据条带的大小,那么表示之前存储第一数据的数据条带,无法容纳该第二数据,因此,引擎可以对第二数据进行拆分,获得第二数据的部分以及第二数据的另一部分,引擎可以将第二数据的部分存储到用于存储第一数据的数据条带中,将第二数据的另一部分存储至该分条中的其他数据条带中,换言之,第二数据中的部分和第一数据存储在同个数据条带中。If the size of the second data is larger than the size of one data stripe, or if the size of the second data is smaller than the size of one data stripe and the sum of the size of the second data and the size of the first data is larger than the size of one data stripe, Then it means that the data stripe that previously stored the first data cannot accommodate the second data. Therefore, the engine can split the second data to obtain a part of the second data and another part of the second data. The engine can divide the second data into Part of the data is stored in the data stripe used to store the first data, and another part of the second data is stored in other data stripes in the stripe, in other words, part of the second data and the first data are stored in the same data stripe.
作为一个示例,当主机未执行步骤405的情况下,引擎确定第二数据的大小大于或等于一个数据条带的大小,可以计算第二数据的校验数据,即获得第一校验数据。As an example, when the host does not execute step 405, the engine determines that the size of the second data is greater than or equal to the size of a data stripe, and may calculate the check data of the second data, that is, obtain the first check data.
作为一个示例,引擎可以将第一校验数据记录在日志中。后续可以根据该第一校验数据,计算第二数据和第一数据的校验数据。As one example, the engine may log the first verification data. Subsequently, the check data of the second data and the first data can be calculated according to the first check data.
作为另一个示例,引擎在没有获得第一校验数据的情况下,引擎可以将第二数据记录在日志中。引擎后续可以根据日志中的第二数据,计算第一数据和第二数据的校验数据。进一步地,引擎还可以缓存第二数据。后续引擎可以根据缓存中的第二数据进行校验计算。As another example, when the engine does not obtain the first verification data, the engine may record the second data in the log. The engine can subsequently calculate the verification data of the first data and the second data according to the second data in the log. Further, the engine may also cache the second data. The subsequent engine may perform check calculation according to the second data in the cache.
继续沿用图5的例子,如图5中的(2)所示,第四数据D4的大小为1021KB,引擎可以拆分第四数据D4,获得第四数据D4的部分和第四数据D4的另外一部分,第四数据D4的部分的大小为509KB,第四数据D4的另外一部分的大小为512KB。引擎可以将第四数据D4的部分存储至第一数据条带511中,第四数据D4的另外一部分存储至第二数据条带512中。Continue to use the example in Figure 5, as shown in (2) in Figure 5, the size of the fourth data D4 is 1021KB, the engine can split the fourth data D4 to obtain the part of the fourth data D4 and the other part of the fourth data D4 A part of the fourth data D4 has a size of 509 KB, and another part of the fourth data D4 has a size of 512 KB. The engine may store part of the fourth data D4 in the first data stripe 511 , and store another part of the fourth data D4 in the second data stripe 512 .
步骤408,引擎读取缓存的第一数据。Step 408, the engine reads the cached first data.
引擎可以读取缓存的第一数据,即从第二存储介质中读取出第一数据,后续可以根据缓存的第一数据。The engine may read the cached first data, that is, read the first data from the second storage medium, and then may follow the cached first data.
作为另一个示例,如果引擎未缓存第一数据,那么引擎可以从日志中读取第一数据。As another example, the engine may read the first data from a log if the engine has not cached the first data.
步骤409,引擎根据第一校验数据和第一数据,计算第二校验数据。Step 409, the engine calculates the second verification data according to the first verification data and the first data.
引擎计算第一数据和第二数据的校验数据,即第二校验数据,由于第二校验数据的大小可能比第一数据和第二数据的大小之和更小,因此缓存第二校验数据,相比缓存第一数据和第二数据,可以相对占用更少的存储空间。The engine calculates the check data of the first data and the second data, that is, the second check data. Since the size of the second check data may be smaller than the sum of the sizes of the first data and the second data, the second check data is cached. Compared with caching the first data and the second data, the test data can relatively occupy less storage space.
作为另一个示例,引擎在没有获得第一校验数据时,引擎可以根据第二数据和第一数据,计算第一数据和第二数据的校验数据,获得第二校验数据。As another example, when the engine does not obtain the first verification data, the engine may calculate the verification data of the first data and the second data according to the second data and the first data, and obtain the second verification data.
继续沿用图5的例子,如图5中(2)所示,引擎确定第四数据D4的大小(1021KB)大于一个分条的大小(512KB),因此引擎可以从内存530中读取第一数据D1、第二数据D2和第三数据D3,从主机获得第四数据D4的第一校验数据E1,并根据第四数据D4的第一校验数据E1和第一数据D1,计算得出第一数据D1、第二数据D2、第三数据D3和第四数据D4的第二校验数据E2。Continue to use the example of Figure 5, as shown in (2) in Figure 5, the engine determines that the size (1021KB) of the fourth data D4 is greater than the size (512KB) of a stripe, so the engine can read the first data from the internal memory 530 D1, the second data D2 and the third data D3, obtain the first verification data E1 of the fourth data D4 from the host computer, and calculate the first verification data E1 and the first data D1 of the fourth data D4 according to the The second parity data E2 of the first data D1, the second data D2, the third data D3 and the fourth data D4.
步骤410,引擎将第二校验数据记录在日志中。Step 410, the engine records the second verification data in the log.
作为一个示例,引擎可以缓存第二校验数据,缓存第二校验数据的具体方式可以参照前文缓存第一数据的内容,此处不再赘述。As an example, the engine may cache the second verification data, and the specific manner of caching the second verification data may refer to the contents of the cached first data above, which will not be repeated here.
继续沿用图5的例子,引擎可以将第一数据D1、第二数据D2、第三数据D3和第四数据D4的第二校验数据E2记录在日志520中。引擎还可以将第一数据D1、第二数据D2、第三数据D3和第四数据D4的第二校验数据E2记录在内存530中。Continuing with the example in FIG. 5 , the engine may record the second verification data E2 of the first data D1 , the second data D2 , the third data D3 and the fourth data D4 in the log 520 . The engine may also record the second verification data E2 of the first data D1 , the second data D2 , the third data D3 and the fourth data D4 in the memory 530 .
作为一个示例,步骤410为可选的步骤。例如,引擎如果确定第一数据和第二数据的大小之和等于分条中所有数据条带的大小之和,引擎在计算出第二校验数据之后,可以将第二校验数据存储到该分条的校验条带中。如此一来,第一数据、第二数据以及第二校验数据也就凑满了该分条。As an example, step 410 is an optional step. For example, if the engine determines that the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all data stripes in the stripe, the engine may store the second verification data in the Striped parity strip. In this way, the first data, the second data and the second verification data also make up the stripe.
作为一种实施例,步骤408~步骤410为可选的部分。例如,引擎可以在确定第一数据和第二数据的大小之和等于分条中的所有数据条带的大小之和的情况下,可以计算第二校验数据。As an embodiment, steps 408 to 410 are optional parts. For example, the engine may calculate the second check data under the condition that the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all the data stripes in the stripe.
在该实施例中,引擎在可以在凑满分条中的所有数据条带时,进行最终的校验计算。例如,引擎可以根据第一数据的地址信息以及第二数据的地址信息,从而确定第一数据和第二数据是否已经凑满该分条中的所有数据条带。这样可以进一步减少引擎进行校验计算的次数,减少了引擎的计算资源的消耗。在主机计算第一校验数据的情况下,在凑满一个分条的过程中,引擎实际进行校验计算的次数更少,可以进一步减少引擎的计算资源的消耗。In this embodiment, the engine performs final check calculation when all the data stripes in the stripe can be filled up. For example, the engine may determine whether the first data and the second data have filled up all the data stripes in the stripe according to the address information of the first data and the address information of the second data. In this way, the number of check calculations performed by the engine can be further reduced, and the consumption of computing resources of the engine can be reduced. In the case where the host calculates the first verification data, the engine performs fewer verification calculations during the process of filling up a stripe, which can further reduce the consumption of computing resources of the engine.
步骤411,如果第一数据和第二数据的大小之和等于一个分条中的所有数据条带的大小之和,引擎将第一校验数据存储至该分条的校验条带中。Step 411, if the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all data stripes in a stripe, the engine stores the first check data in the check stripe of the stripe.
引擎确定第一数据和第二数据的大小之和等于一个分条中的所有数据条带的大小之和时,表示第一数据和第二数据已凑满分条中的所有数据分条,这种情况下,引擎可以将第一数据和第二数据的校验数据该分条的校验条带中。When the engine determines that the sum of the sizes of the first data and the second data is equal to the sum of the sizes of all the data stripes in a stripe, it means that the first data and the second data have filled all the data stripes in the stripe. In some cases, the engine can put the check data of the first data and the second data into the check stripe of the stripe.
继续参照图5所示的例子,引擎接收到第五数据D5,以及从主机接收第五数据D5的第三校验数据E3,第五数据D5的大小为1024KB。引擎从内存中读取第二校验数据E2,引擎根据第三校验数据E3以及第二校验数据E2,从而计算出第一数据D1、第二数据D2、第三数据 D3、第四数据D4和第五数据D5的第四校验数据E4。由于第一数据D1、第二数据D2、第三数据D3、第四数据D4和第五数据D5的大小之和等于分条中所有数据条带的大小之和,引擎可以将第四校验数据E4存储在第一校验条带515和第二校验条带516中。如此一来,第一数据D1、第二数据D2、第三数据D3、第四数据D4和第五数据D5和第四校验数据E4凑满了分条。其中,第一数据D1、第二数据D2、第三数据D3、第四数据D4和第五数据D5对应存储在该分条的数据条带中,第四校验数据E4对应存储在该分条的校验条带中。Continuing to refer to the example shown in FIG. 5 , the engine receives the fifth data D5 and the third verification data E3 of the fifth data D5 from the host, and the size of the fifth data D5 is 1024KB. The engine reads the second verification data E2 from the memory, and the engine calculates the first data D1, the second data D2, the third data D3, and the fourth data according to the third verification data E3 and the second verification data E2 D4 and the fourth parity data E4 of the fifth data D5. Since the sum of the sizes of the first data D1, the second data D2, the third data D3, the fourth data D4, and the fifth data D5 is equal to the sum of the sizes of all data stripes in the stripe, the engine can convert the fourth verification data E4 is stored in the first parity stripe 515 and the second parity stripe 516 . In this way, the first data D1 , the second data D2 , the third data D3 , the fourth data D4 , the fifth data D5 and the fourth parity data E4 complete the stripes. Among them, the first data D1, the second data D2, the third data D3, the fourth data D4 and the fifth data D5 are correspondingly stored in the data stripe of the stripe, and the fourth verification data E4 is correspondingly stored in the stripe in the check strip.
作为一个示例,如果第一数据和第二数据的大小之和小于分条中的所有数据条带的大小之和,那么表示分条中的数据条带还未凑满,因此可以按照步骤401-步骤410的逻辑,对其它写请求进行处理,直到所有数据凑满分条中的所有数据条带,再执行步骤411。As an example, if the sum of the sizes of the first data and the second data is less than the sum of the sizes of all the data stripes in the stripe, it means that the data stripes in the stripe are not full, so you can follow step 401- The logic of step 410 is to process other write requests until all the data fills up all the data stripes in the stripe, and then step 411 is executed.
步骤412,引擎删除日志中记录的用于得到分条的数据。Step 412, the engine deletes the data recorded in the log for obtaining the stripes.
引擎在确定该分条凑满之后,可以删除日志中与得到该分条相关的数据,例如,删除日志中记录的第一数据,以及第一校验数据等。After the engine determines that the stripe is full, it can delete data related to obtaining the stripe in the log, for example, delete the first data recorded in the log, the first verification data, and the like.
在本申请实施例中,引擎可以及时删除日志中的数据,快速回收日志占用的物理存储空间,以提高日志占用的物理存储空间的利用率。In the embodiment of the present application, the engine can delete the data in the log in time, and quickly recover the physical storage space occupied by the log, so as to improve the utilization rate of the physical storage space occupied by the log.
作为一个示例,引擎还可以删除第二存储介质中与该分条相关的数据,从而快速回收内存,后续在处理其他写请求时,可以再次利用第二存储介质,以提高第二存储介质的空间利用率。As an example, the engine can also delete the data related to the stripe in the second storage medium, so as to quickly reclaim the memory, and then when processing other write requests, the second storage medium can be used again to increase the space of the second storage medium utilization rate.
继续沿用图5所示的例子,引擎可以删除日志520中记录的第一数据D1、第二数据D2、第三数据D3、第四数据D4、第二校验数据E2。引擎还可以删除内存530中记录的第二校验数据E2。Continuing with the example shown in FIG. 5 , the engine may delete the first data D1 , the second data D2 , the third data D3 , the fourth data D4 , and the second verification data E2 recorded in the log 520 . The engine can also delete the second verification data E2 recorded in the memory 530 .
作为一个示例,步骤412为可选的步骤。As an example, step 412 is an optional step.
在一种的可能实施例中,写入该分条中的数据条带中的多个数据的大小均小于一个条带的大小,但多个数据的大小之和等于该分条中的所有数据条带的大小之和,这时,引擎可以计算多个数据的校验数据,并将多个数据的校验数据存储到该分条的校验条带中。In a possible embodiment, the sizes of the multiple data in the data stripe written in the stripe are all smaller than the size of one stripe, but the sum of the sizes of the multiple data is equal to all the data in the stripe The sum of the size of the stripes. At this time, the engine can calculate the check data of multiple data and store the check data of multiple data in the check stripe of the stripe.
例如,存储***中一个分条的大小为1536KB,一个数据条带的大小为512KB,引擎依次将第一数据(大小为200KB)、第二数据(大小为312KB)、第三数据(大小为201KB)、第四数据(大小为311KB)存储至了相应的数据条带中。引擎写入第一数据、第二数据、第三数据和第四数据的过程可以参照前文步骤401-步骤404的过程。引擎确定第一数据、第二数据、第三数据和第四数据的大小之和满足该分条中的所有数据条带的大小之和,这时,引擎可以根据第一数据、第二数据、第三数据和第四数据这四个数据,计算这四个数据的校验数据,并将这四个数据的校验数据存储至校验条带中。For example, the size of a stripe in the storage system is 1536KB, and the size of a data stripe is 512KB. ), the fourth data (with a size of 311KB) is stored in the corresponding data stripe. The process of writing the first data, the second data, the third data and the fourth data by the engine may refer to the process of steps 401-404 above. The engine determines that the sum of the sizes of the first data, the second data, the third data, and the fourth data satisfies the sum of the sizes of all the data stripes in the stripe. At this time, the engine can For the four data of the third data and the fourth data, the check data of the four data is calculated, and the check data of the four data are stored in the check strip.
作为一个实施例,图4中的步骤404-步骤412为可选的步骤,图4中以虚线示意。As an embodiment, steps 404 to 412 in FIG. 4 are optional steps, which are indicated by dotted lines in FIG. 4 .
需要说明的是,上述是以第一数据的大小小于一个数据条带的大小为例,实际上任意一个大小小于数据条带的数据均可以按照上述处理第一数据的过程进行处理。同理,上述是以第二数据的大小大于或等于一个数据条带的大小为例,实际上任意一个大小大于或等于数据条带的数据均可以按照上述处理第二数据的过程进行处理。It should be noted that the above is an example where the size of the first data is smaller than the size of one data stripe. In fact, any data whose size is smaller than a data stripe can be processed according to the above-mentioned process of processing the first data. Similarly, the above is an example where the size of the second data is greater than or equal to the size of a data stripe. In fact, any data whose size is greater than or equal to a data stripe can be processed according to the above process of processing the second data.
需要说明的是,图4中接收第一数据和第二数据的先后顺序可以是任意的,本申请对此不作限定。It should be noted that the order in which the first data and the second data are received in FIG. 4 may be arbitrary, which is not limited in this application.
在图4所示的实施例中,引擎在处理写请求时,将数据存储至相应的数据条带中,当数据的大小小于一个数据条带时,可以不对数据进行校验计算,从而减少引擎进行校验计算的次数,以减少存储***在数据存储过程中的计算资源消耗。另外,当数据的大小大于或等于 一个数据条带时,可以计算该数据的校验数据,相当于在存储数据的过程中进行校验计算,如此,当数据凑满分条中的所有数据条带时,也能对应完成数据的校验计算,从而可以更快地获得数据的校验数据,也就可以更早地凑满分条,以提高数据存储的效率。另外,引擎计算过程中还可以借助内存等第二存储介质存储校验计算所需的数据,使得后续可以从内存中更快地读取出数据,从而提高计算校验数据的效率。In the embodiment shown in Figure 4, when the engine processes the write request, it stores the data in the corresponding data stripe. The number of check calculations to reduce the computing resource consumption of the storage system during data storage. In addition, when the size of the data is greater than or equal to a data stripe, the check data of the data can be calculated, which is equivalent to performing check calculation during the process of storing the data. In this way, when the data fills up all the data stripes in the stripe When , the verification calculation of the data can also be completed correspondingly, so that the verification data of the data can be obtained faster, and the stripes can be filled earlier to improve the efficiency of data storage. In addition, during the calculation process of the engine, the data required for the verification calculation can also be stored with the help of a second storage medium such as memory, so that the data can be read from the memory faster later, thereby improving the efficiency of calculating the verification data.
图4是以存储***中的数据存储方法应用于图1A所示的集中式数据存储***进行示例说明,实际上,该数据存储方法还可以应用于图1B所示的集中式数据存储***中,当数据存储方法应用于图1B所示的集中式数据存储***中时,图1B中所示的集中式数据存储***中的引擎可以执行图4所示的数据存储方法,此处不再一一列举。FIG. 4 is an example of the application of the data storage method in the storage system to the centralized data storage system shown in FIG. 1A. In fact, the data storage method can also be applied to the centralized data storage system shown in FIG. 1B. When the data storage method is applied to the centralized data storage system shown in Figure 1B, the engine in the centralized data storage system shown in Figure 1B can execute the data storage method shown in Figure 4, which will not be described one by one here. enumerate.
需要说明的是,该数据存储方法还可以应用于图2A、图2B或图2C所示的分布式数据存储***中,当数据存储方法应用于图2A、图2B或图2C所示的分布式数据存储***中时,图2A中的存储节点、图2B中的服务器或图2C中的服务器均可以执行上述的存储***中的数据存储方法,执行该储***中的数据存储方法的过程可以参照图4论述的内容,此处不再一一列举。It should be noted that this data storage method can also be applied to the distributed data storage system shown in Figure 2A, Figure 2B or Figure 2C, when the data storage method is applied to the distributed data storage system shown in Figure 2A, Figure 2B or Figure 2C When in the data storage system, the storage node in FIG. 2A, the server in FIG. 2B or the server in FIG. 2C can all execute the data storage method in the storage system described above, and the process of executing the data storage method in the storage system can refer to The contents discussed in Fig. 4 are not listed here one by one.
图6示出了一种数据存储装置600的结构示意图。其中,数据存储装置600可以应用于存储***,或者是存储***中的装置,能够实现本申请实施例提供的方法中存储***的功能;数据存储装置600也可以是能够支持存储***实现本申请实施例提供的方法中存储***的功能的装置。数据存储装置600可以是硬件结构、软件模块、或硬件结构加软件模块。数据存储装置600可以由芯片***实现。本申请实施例中,芯片***可以由芯片构成,也可以包含芯片和其他分立器件。FIG. 6 shows a schematic structural diagram of a data storage device 600 . Among them, the data storage device 600 can be applied to a storage system, or a device in the storage system, which can realize the function of the storage system in the method provided by the embodiment of the application; the data storage device 600 can also support the storage system to implement the implementation of the application. The example provides a means of storing the functionality of the system in the method. The data storage device 600 may be a hardware structure, a software module, or a hardware structure plus a software module. The data storage device 600 may be implemented by a system on a chip. In the embodiment of the present application, the system-on-a-chip may be composed of chips, or may include chips and other discrete devices.
数据存储装置600可以包括通信模块601和处理模块602。The data storage device 600 may include a communication module 601 and a processing module 602 .
通信模块601可以用于执行图4所示的实施例中的步骤401,还可以执行步骤406,还可以用于支持本文所描述的技术的其它过程。通信模块601用于数据存储装置600和其它模块进行通信,其可以是电路、器件、接口、总线、软件模块、收发器或者其它任意可以实现通信的装置。The communication module 601 may be used to execute step 401 in the embodiment shown in FIG. 4 , and may also execute step 406 , and may also be used to support other processes of the technologies described herein. The communication module 601 is used for the data storage device 600 to communicate with other modules, and it may be a circuit, device, interface, bus, software module, transceiver or any other device capable of realizing communication.
处理模块602可以用于执行图4所示的实施例中的步骤402-步骤403,还可以执行图4中的步骤404-步骤412,还可以用于支持本文所描述的技术的其它过程。The processing module 602 can be used to execute step 402-step 403 in the embodiment shown in FIG. 4, and can also execute step 404-step 412 in FIG. 4, and can also be used to support other processes of the technology described herein.
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。Wherein, all relevant content of each step involved in the above-mentioned method embodiment can be referred to the function description of the corresponding function module, and will not be repeated here.
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiments of the present application is schematic, and is only a logical function division. There may be other division methods in actual implementation. In addition, each functional module in each embodiment of the present application can be integrated into a processing In the controller, it can also be physically present separately, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
如图7所示为本申请实施例提供的数据存储装置700,其中,数据存储装置700可以是图4所示的实施例中的存储***,或者是存储***的装置,能够实现本申请图4所示的实施例中的存储***的功能;数据存储装置700也可以是能够支持存储***实现本申请图4所示的实施例提供的方法中存储***的功能的装置。其中,数据存储装置700可以为芯片***。本申请实施例中,芯片***可以由芯片构成,也可以包含芯片和其他分立器件。As shown in FIG. 7, the data storage device 700 provided by the embodiment of the present application, wherein the data storage device 700 can be the storage system in the embodiment shown in FIG. The functions of the storage system in the illustrated embodiment; the data storage device 700 may also be a device capable of supporting the storage system to implement the functions of the storage system in the method provided in the embodiment shown in FIG. 4 of the present application. Wherein, the data storage device 700 may be a system on a chip. In the embodiment of the present application, the system-on-a-chip may be composed of chips, or may include chips and other discrete devices.
数据存储装置700包括至少一个处理器701,用于实现或用于支持数据存储装置700实现本申请图4中引擎的功能,或者实现图2A中存储节点、图2B中服务器或图2C中服务器的功能。示例性地,处理器701可以获取待存储数据的第一数据,并将第一数据存储至相应的 数据条带中,具体参见方法示例中的详细描述,此处不做赘述。The data storage device 700 includes at least one processor 701, which is used to realize or support the data storage device 700 to realize the function of the engine in FIG. Function. Exemplarily, the processor 701 may obtain the first data of the data to be stored, and store the first data in a corresponding data stripe. For details, refer to the detailed description in the method example, and details are not repeated here.
数据存储装置700还可以包括通信接口702,用于通过传输介质和其它设备进行通信,从而用于数据存储装置700和其它设备进行通信。示例性地,该其它设备可以是服务器。处理器701可以利用通信接口702收发数据。The data storage device 700 may further include a communication interface 702 for communicating with other devices through a transmission medium, so that the data storage device 700 can communicate with other devices. Exemplarily, the other device may be a server. The processor 701 can use the communication interface 702 to send and receive data.
数据存储装置700还可以包括至少一个存储器703,用于存储程序指令和/或数据。存储器703和处理器701耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器701可能和存储器703协同操作。处理器701可能执行存储器703中存储的程序指令。所述至少一个存储器703中的至少一个可以包括于处理器701中。当处理器701执行存储器703中的程序指令时,可以实现图4所示的实施例中任一的存储***中的数据存储方法。The data storage device 700 may also include at least one memory 703 for storing program instructions and/or data. The memory 703 is coupled to the processor 701 . The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules. Processor 701 may cooperate with memory 703 . Processor 701 may execute program instructions stored in memory 703 . At least one of the at least one memory 703 may be included in the processor 701 . When the processor 701 executes the program instructions in the memory 703, the data storage method in the storage system of any one of the embodiments shown in FIG. 4 may be implemented.
作为一个示例,图7中的存储器703为可选的部分,在图7中以虚线框示意。例如,存储器703与处理器701耦合设置。As an example, the memory 703 in FIG. 7 is an optional part, which is indicated by a dashed box in FIG. 7 . For example, the memory 703 is coupled with the processor 701 .
本申请实施例中不限定上述通信接口702、处理器701以及存储器703之间的具体连接介质。本申请实施例在图7中以通信接口702、处理器701以及存储器703之间通过总线704连接,总线在图7中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。In this embodiment of the present application, a specific connection medium among the communication interface 702, the processor 701, and the memory 703 is not limited. In the embodiment of the present application, in FIG. 7, the communication interface 702, the processor 701, and the memory 703 are connected through a bus 704. The bus is represented by a thick line in FIG. , is not limited. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 7 , but it does not mean that there is only one bus or one type of bus.
在本申请实施例中,处理器701可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In this embodiment of the application, the processor 701 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement Or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in connection with the embodiments of the present application may be implemented by a hardware processor, or by a combination of hardware and software modules in the processor.
在本申请实施例中,存储器703可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。In the embodiment of the present application, the memory 703 can be a non-volatile memory, such as a hard disk (hard disk drive, HDD) or a solid-state drive (solid-state drive, SSD), etc., and can also be a volatile memory (volatile memory), For example random-access memory (random-access memory, RAM). A memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, and is used for storing program instructions and/or data.
本申请实施例提供了一种存储***,该存储***包括图6中的数据存储装置,或者,该存储***包括图7中的数据存储装置。该存储***可以实现前文图4所示的实施例中的任一的存储***中的数据存储方法。An embodiment of the present application provides a storage system, where the storage system includes the data storage device in FIG. 6 , or, the storage system includes the data storage device in FIG. 7 . The storage system may implement any data storage method in the storage system in the foregoing embodiments shown in FIG. 4 .
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质用于存储计算机程序,当该计算机程序在计算机上运行时,使得该计算机执行图4所示的实施例中任一的存储***中的数据存储方法。An embodiment of the present application also provides a computer-readable storage medium, which is used to store a computer program, and when the computer program is run on a computer, the computer executes any of the embodiments shown in FIG. 4 . A data storage method in a storage system.
本申请实施例中还提供一种计算机程序产品,该计算机程序产品存储有计算机程序,该计算机程序包括程序指令,该程序指令当被计算机执行时,使得计算机执行图4所示的实施例中任一的存储***中的数据存储方法。An embodiment of the present application also provides a computer program product, the computer program product stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes any of the embodiments shown in FIG. 4 . A data storage method in a storage system.
本申请实施例提供了一种芯片***,该芯片***包括处理器,还可以包括存储器,用于实现前述方法中存储***的功能。该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。An embodiment of the present application provides a system-on-a-chip, where the system-on-a-chip includes a processor and may further include a memory, configured to implement the function of the memory system in the foregoing method. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.
本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合 来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,简称DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,简称DVD))、或者半导体介质(例如,SSD)等。The methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. integrated with one or more available media. The available medium can be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), optical media (for example, digital video disc (digital video disc, DVD for short)), or semiconductor media (for example, SSD).
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims (19)

  1. 一种存储***中的数据存储方法,其特征在于,包括:A data storage method in a storage system, comprising:
    接收第一数据;receiving first data;
    当所述第一数据的大小小于一个分条中的一个数据条带的大小时,将所述第一数据存储到所述分条中第一数据对应的数据条带中,不计算所述第一数据的校验数据,并将所述第一数据记录到日志中。When the size of the first data is smaller than the size of a data stripe in a stripe, store the first data in the data stripe corresponding to the first data in the stripe, and do not calculate the first data Verifying data of a piece of data, and recording the first data into a log.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, further comprising:
    接收第二数据;receiving second data;
    当所述第二数据的大小大于或等于所述分条中的一个数据条带的大小时,计算第一校验数据,所述第一校验数据为所述第二数据的校验数据;When the size of the second data is greater than or equal to the size of a data stripe in the stripe, calculate first check data, where the first check data is check data of the second data;
    将所述第二数据存储到所述分条中第二数据对应的数据条带中。storing the second data in a data stripe corresponding to the second data in the stripe.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, further comprising:
    将所述第一校验数据记录到所述日志中,其中,所述第一数据的大小和所述第二数据的大小之和小于所述分条中的所有数据条带的大小之和。Recording the first verification data into the log, wherein the sum of the size of the first data and the size of the second data is smaller than the sum of the sizes of all the data stripes in the stripe.
  4. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, further comprising:
    根据所述日志中的第一数据和所述第一校验数据计算得到第二校验数据,所述第二校验数据为所述第一数据和所述第二数据的校验数据。second verification data is calculated according to the first data in the log and the first verification data, and the second verification data is verification data of the first data and the second data.
  5. 根据权利要求2所述的方法,其特征在于,在所述存储***接收所述第一数据之后,将所述第一数据存储到所述分条中第一数据对应的数据条带中之前,所述方法还包括:缓存所述第一数据;The method according to claim 2, wherein after the storage system receives the first data and before storing the first data in the data stripe corresponding to the first data in the stripe, The method further includes: caching the first data;
    根据所述缓存的第一数据和所述第一校验数据计算得到第二校验数据,所述第二校验数据为所述第一数据和所述第二数据的校验数据。second verification data is calculated according to the cached first data and the first verification data, and the second verification data is verification data of the first data and the second data.
  6. 根据权利要求4或5所述的方法,其特征在于,所述方法还包括:The method according to claim 4 or 5, characterized in that the method further comprises:
    将所述第二校验数据存储到所述分条中的校验条带,其中,所述第一数据的大小和所述第二数据的大小之和等于所述分条中的所有数据条带的大小之和。storing the second check data in a check stripe in the stripe, wherein the sum of the size of the first data and the size of the second data is equal to all data stripes in the stripe The sum of the sizes of the bands.
  7. 一种数据存储装置,其特征在于,所述装置包括通信接口和处理器,其中:A data storage device, characterized in that the device includes a communication interface and a processor, wherein:
    所述通信接口,用于接收第一数据;The communication interface is used to receive first data;
    所述处理器,用于当所述第一数据的大小小于一个分条中的一个数据条带的大小时,将所述第一数据存储到所述分条中第一数据对应的数据条带中,不计算所述第一数据的校验数据,并将所述第一数据记录到日志中。The processor is configured to store the first data in a data stripe corresponding to the first data in the stripe when the size of the first data is smaller than the size of a data stripe in a stripe , the verification data of the first data is not calculated, and the first data is recorded in a log.
  8. 根据权利要求7所述的装置,其特征在于,所述通信接口,还用于接收第二数据;The device according to claim 7, wherein the communication interface is also used to receive second data;
    所述处理器,还用于当所述第二数据的大小大于或等于所述分条中的一个数据条带的大小时,计算第一校验数据,所述第一校验数据为所述第二数据的校验数据,以及将所述第二数据存储到所述分条中第二数据对应的数据条带中。The processor is further configured to calculate first check data when the size of the second data is greater than or equal to the size of a data stripe in the stripe, and the first check data is the Check data of the second data, and store the second data in the data stripe corresponding to the second data in the stripe.
  9. 根据权利要求8所述的装置,其特征在于,所述处理器还用于:The device according to claim 8, wherein the processor is further configured to:
    将所述第一校验数据记录到所述日志中,其中,所述第一数据的大小和所述第二数据的大小之和小于所述分条中的所有数据条带的大小之和。Recording the first verification data into the log, wherein the sum of the size of the first data and the size of the second data is smaller than the sum of the sizes of all the data stripes in the stripe.
  10. 根据权利要求8所述的装置,其特征在于,所述处理器还用于:The device according to claim 8, wherein the processor is further configured to:
    根据所述日志中的第一数据和所述第一校验数据计算得到第二校验数据,所述第二校验数据为所述第一数据和所述第二数据的校验数据。second verification data is calculated according to the first data in the log and the first verification data, and the second verification data is verification data of the first data and the second data.
  11. 根据权利要求8所述的装置,其特征在于,所述处理器还用于:The device according to claim 8, wherein the processor is further configured to:
    在接收所述第一数据之后,将所述第一数据存储到所述分条中第一数据对应的数据条带中之前,缓存所述第一数据;After receiving the first data, buffer the first data before storing the first data in the data stripe corresponding to the first data in the stripe;
    根据所述缓存的第一数据和所述第一校验数据计算得到第二校验数据,所述第二校验数据为所述第一数据和所述第二数据的校验数据。second verification data is calculated according to the cached first data and the first verification data, and the second verification data is verification data of the first data and the second data.
  12. 根据权利要求10或11所述的装置,其特征在于,所述处理器还用于:The device according to claim 10 or 11, wherein the processor is further configured to:
    将所述第二校验数据存储到所述分条中的校验条带,其中,所述第一数据的大小和所述第二数据的大小之和等于所述分条中的所有数据条带的大小之和。storing the second check data in a check stripe in the stripe, wherein the sum of the size of the first data and the size of the second data is equal to all data stripes in the stripe The sum of the sizes of the bands.
  13. 一种数据存储装置,其特征在于,所述装置包括通信模块和处理模块,其中:A data storage device, characterized in that the device includes a communication module and a processing module, wherein:
    所述通信模块,用于接收第一数据;The communication module is configured to receive first data;
    所述处理模块,用于当所述第一数据的大小小于一个分条中的一个数据条带的大小时,将所述第一数据存储到所述分条中第一数据对应的数据条带中,不计算所述第一数据的校验数据,并将所述第一数据记录到日志中。The processing module is configured to store the first data in the data stripe corresponding to the first data in the stripe when the size of the first data is smaller than the size of a data stripe in a stripe , the verification data of the first data is not calculated, and the first data is recorded in a log.
  14. 根据权利要求13所述的装置,其特征在于,所述通信模块,还用于接收第二数据;The device according to claim 13, wherein the communication module is further configured to receive second data;
    所述处理模块,还用于当所述第二数据的大小大于或等于所述分条中的一个数据条带的大小时,计算第一校验数据,所述第一校验数据为所述第二数据的校验数据,以及将所述第二数据存储到所述分条中第二数据对应的数据条带中。The processing module is further configured to calculate first check data when the size of the second data is greater than or equal to the size of a data stripe in the stripe, and the first check data is the Check data of the second data, and store the second data in the data stripe corresponding to the second data in the stripe.
  15. 根据权利要求13所述的装置,其特征在于,所述处理模块还用于:The device according to claim 13, wherein the processing module is also used for:
    将所述第一校验数据记录到所述日志中,其中,所述第一数据的大小和所述第二数据的大小之和小于所述分条中的所有数据条带的大小之和。Recording the first verification data into the log, wherein the sum of the size of the first data and the size of the second data is smaller than the sum of the sizes of all the data stripes in the stripe.
  16. 根据权利要求13所述的装置,其特征在于,所述处理模块还用于:The device according to claim 13, wherein the processing module is also used for:
    根据所述日志中的第一数据和所述第一校验数据计算得到第二校验数据,所述第二校验数据为所述第一数据和第二数据的校验数据。second verification data is calculated according to the first data in the log and the first verification data, and the second verification data is verification data of the first data and the second data.
  17. 根据权利要求13所述的装置,其特征在于,所述处理模块还用于:The device according to claim 13, wherein the processing module is also used for:
    在接收所述第一数据之后,将所述第一数据存储到所述分条中第一数据对应的数据条带中之前,缓存所述第一数据;After receiving the first data, buffer the first data before storing the first data in the data stripe corresponding to the first data in the stripe;
    根据所述缓存的第一数据和所述第一校验数据计算得到第二校验数据,所述第二校验数据为所述第一数据和第二数据的校验数据。Second verification data is calculated according to the buffered first data and the first verification data, and the second verification data is verification data of the first data and the second data.
  18. 根据权利要求16或17所述的装置,其特征在于,所述处理模块还用于:The device according to claim 16 or 17, wherein the processing module is further used for:
    将所述第二校验数据存储到所述分条中的校验条带,其中,所述第一数据的大小和所述第二数据的大小之和等于所述分条中的所有数据条带的大小之和。storing the second check data in a check stripe in the stripe, wherein the sum of the size of the first data and the size of the second data is equal to all data stripes in the stripe The sum of the sizes of the bands.
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1-6中任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, and when the computer program is run on a computer, the computer executes any one of claims 1-6. method described in the item.
PCT/CN2022/103574 2021-08-20 2022-07-04 Data storage method and apparatus in storage system WO2023020136A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110962195.8A CN115904795A (en) 2021-08-20 2021-08-20 Data storage method and device in storage system
CN202110962195.8 2021-08-20

Publications (1)

Publication Number Publication Date
WO2023020136A1 true WO2023020136A1 (en) 2023-02-23

Family

ID=85240022

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103574 WO2023020136A1 (en) 2021-08-20 2022-07-04 Data storage method and apparatus in storage system

Country Status (2)

Country Link
CN (1) CN115904795A (en)
WO (1) WO2023020136A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117420969A (en) * 2023-12-19 2024-01-19 中电云计算技术有限公司 Distributed data storage method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512491A (en) * 2005-04-15 2009-08-19 英特尔公司 Power-safe disk storage apparatus, systems, and methods
US20180107383A1 (en) * 2016-10-13 2018-04-19 International Business Machines Corporation Operating a raid array with unequal stripes
CN109814805A (en) * 2018-12-25 2019-05-28 华为技术有限公司 The method and slitting server that slitting recombinates in storage system
CN109947842A (en) * 2017-07-27 2019-06-28 杭州海康威视数字技术股份有限公司 Date storage method, apparatus and system in distributed memory system
CN110874284A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Data processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512491A (en) * 2005-04-15 2009-08-19 英特尔公司 Power-safe disk storage apparatus, systems, and methods
US20180107383A1 (en) * 2016-10-13 2018-04-19 International Business Machines Corporation Operating a raid array with unequal stripes
CN109947842A (en) * 2017-07-27 2019-06-28 杭州海康威视数字技术股份有限公司 Date storage method, apparatus and system in distributed memory system
CN110874284A (en) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 Data processing method and device
CN109814805A (en) * 2018-12-25 2019-05-28 华为技术有限公司 The method and slitting server that slitting recombinates in storage system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117420969A (en) * 2023-12-19 2024-01-19 中电云计算技术有限公司 Distributed data storage method, device, equipment and storage medium
CN117420969B (en) * 2023-12-19 2024-04-16 中电云计算技术有限公司 Distributed data storage method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115904795A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
US11119668B1 (en) Managing incompressible data in a compression-enabled log-structured array storage system
US10031703B1 (en) Extent-based tiering for virtual storage using full LUNs
US20210173579A1 (en) Data migration method and apparatus
US11636089B2 (en) Deferred reclamation of invalidated entries that are associated with a transaction log in a log-structured array
US20060136654A1 (en) Method and computer program product to increase I/O write performance in a redundant array
US20210278998A1 (en) Architecture and design of a storage device controller for hyperscale infrastructure
US11593000B2 (en) Data processing method and apparatus
US11487460B2 (en) Deferred reclamation of invalidated entries associated with replication in a log-structured array
US11074124B2 (en) Method and system for enhancing throughput of big data analysis in a NAND-based read source storage
CN109918352B (en) Memory system and method of storing data
US20200133810A1 (en) Method for managing multiple disks, electronic device and computer program product
US11379326B2 (en) Data access method, apparatus and computer program product
WO2023065654A1 (en) Data writing method and related device
WO2023045483A1 (en) Storage device and data storage method and storage system
WO2023020136A1 (en) Data storage method and apparatus in storage system
CN113687978B (en) Data processing method for memory array controller
US20240070120A1 (en) Data processing method and apparatus
US11327929B2 (en) Method and system for reduced data movement compression using in-storage computing and a customized file system
CN113687977A (en) Data processing device based on RAID controller to realize calculation performance improvement
US8799580B2 (en) Storage apparatus and data processing method
US11010091B2 (en) Multi-tier storage
CN117348789A (en) Data access method, storage device, hard disk, storage system and storage medium
CN116594551A (en) Data storage method and device
CN113722131A (en) Method and system for facilitating fast crash recovery in a storage device
WO2024001863A1 (en) Data processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE