CN110442298B - Storage equipment abnormality detection method and device and distributed storage system - Google Patents

Storage equipment abnormality detection method and device and distributed storage system Download PDF

Info

Publication number
CN110442298B
CN110442298B CN201810411648.6A CN201810411648A CN110442298B CN 110442298 B CN110442298 B CN 110442298B CN 201810411648 A CN201810411648 A CN 201810411648A CN 110442298 B CN110442298 B CN 110442298B
Authority
CN
China
Prior art keywords
storage
data
storage area
preset
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810411648.6A
Other languages
Chinese (zh)
Other versions
CN110442298A (en
Inventor
叶敏
林鹏
林起芊
汪渭春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision System Technology Co Ltd
Original Assignee
Hangzhou Hikvision System Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision System Technology Co Ltd filed Critical Hangzhou Hikvision System Technology Co Ltd
Priority to CN201810411648.6A priority Critical patent/CN110442298B/en
Priority to PCT/CN2019/085128 priority patent/WO2019210844A1/en
Publication of CN110442298A publication Critical patent/CN110442298A/en
Application granted granted Critical
Publication of CN110442298B publication Critical patent/CN110442298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a storage device abnormality detection method and device and a distributed storage system, and belongs to the technical field of storage. The method comprises the following steps: writing n preset files into the n storage areas, wherein the first preset file is any one of the n preset files, and the first preset file is written into the first storage area and comprises: first target data and first verification data thereof; when the reading operation of the first storage area is successful, generating second check-up data of second target data read from the first storage area; determining that the first storage area is abnormal when the second check data is different from the third check data; and when at least m storage areas in the n storage areas are determined to be abnormal, determining that the storage equipment is abnormal, wherein m is more than or equal to 1 and less than or equal to n. The method and the device solve the problem that the management device cannot know when the storage device is abnormal. The method and the device are used for detecting the storage device abnormity.

Description

Storage equipment abnormality detection method and device and distributed storage system
Technical Field
The present application relates to the field of storage technologies, and in particular, to a method and an apparatus for detecting an abnormality of a storage device, and a distributed storage system.
Background
With the development of network communication technology, distributed storage systems are widely used. The distributed storage system may include: a management device, and a plurality of storage devices managed by the management device. The client can store data in the distributed storage system through the management device and the storage device.
In the related art, the management device may access a plurality of storage devices through a network, and may store data in the plurality of storage devices, respectively. When a user needs to store data in the distributed storage system, the data to be stored can be sent to the management device through the client, so that the management device writes the data into the storage device.
In the related art, when a certain storage device is abnormal (such as a storage device is disconnected or damaged), the management device cannot know that the storage device is abnormal, and if the management device writes data into the storage device at this time, the data will be lost.
Disclosure of Invention
The application provides a storage device abnormity detection method and device and a distributed storage system, which can solve the problem that a management device cannot know when a storage device is abnormal. The technical scheme is as follows:
on one hand, the method for detecting the abnormity of the storage equipment is applied to the management equipment, a part of storage areas in the storage equipment comprise n storage areas, n is more than or equal to 1, and the method comprises the following steps:
writing n preset files into the n storage areas, wherein a first preset file is any one of the n preset files, the first preset file is written into the first storage area, and the first preset file comprises: first target data and first verification data thereof;
when the reading operation of the first storage area is successful, generating second check-up data of second target data read from the first storage area, wherein the second target data is data obtained by reading the first target data in the first storage area;
determining that the storage device is abnormal when the second check data is different from third check data, wherein the third check data is data obtained by performing read operation on the first check data in the first storage area;
and when at least m storage areas in the n storage areas are determined to be abnormal, determining that the storage equipment is abnormal, wherein m is more than or equal to 1 and less than or equal to n.
Optionally, the method further includes:
and when the check data read from the first storage area is the same as the second check data and the overwriting of the second check data on the first check data in the first storage area fails, determining that the first storage area is abnormal.
Optionally, the method further includes:
determining that the first storage area is abnormal when a read operation to the first storage area fails.
Optionally, the first check data is obtained by processing the first target data in a preset processing manner, and the second check data is obtained by processing the second target data in the preset processing manner.
Optionally, n ≧ 3, the storage device has a plurality of consecutive storage addresses, and the n storage regions include a head portion, a middle portion, and a tail portion of the plurality of storage addresses.
Optionally, the storage device includes a redundant array of independent disks RAID, and the size of the preset file is greater than or equal to the size of one stripe in the RAID.
Optionally, the storage device has another storage area except for the n storage areas, and the method further includes:
performing the read operation on the first storage area every other preset time period;
or, when the other storage areas meet a judgment trigger condition, performing the read operation on the first storage area, where the judgment trigger condition includes: a read operation to the other storage area fails, a file error read from the other storage area fails, and a write operation to the other storage area fails.
Optionally, m ═ n.
On the other hand, the storage device abnormality detection apparatus is applied to a management device, a part of storage areas in the storage device includes n storage areas, n is larger than or equal to 1, and the storage device abnormality detection apparatus includes:
a writing module, configured to write n preset files into the n storage areas, where a first preset file is any one of the n preset files, and the first preset file is written into a first storage area, where the first preset file includes: first target data and first verification data thereof;
a generating module, configured to generate second check data of second target data when the read operation on the first storage area is successful, where the second target data is data obtained by performing a read operation on the first target data in the first storage area;
a first determining module, configured to determine that the first storage area is abnormal when the second parity data is different from third parity data, where the third parity data is data obtained by performing a read operation on the first parity data in the first storage area;
and the second determining module is used for determining that the storage equipment is abnormal when at least m storage areas in the n storage areas are determined to be abnormal, wherein m is more than or equal to 1 and less than or equal to n.
Optionally, the storage device abnormality detecting apparatus further includes:
a third determining module, configured to determine that the first storage area is abnormal when the parity data read from the first storage area is the same as the second parity data and when overwriting the first parity data in the first storage area with the second parity data fails.
Optionally, the storage device abnormality detecting apparatus further includes:
and the fourth determining module is used for determining that the first storage area is abnormal when the read operation on the first storage area fails.
Optionally, the first check data is obtained by processing the first target data in a preset processing manner, and the second check data is obtained by processing the second target data in the preset processing manner.
Optionally, n ≧ 3, the storage device has a plurality of consecutive storage addresses, and the n storage regions include a head portion, a middle portion, and a tail portion of the plurality of storage addresses.
Optionally, the storage device includes a redundant array of independent disks RAID, and the size of the preset file is greater than or equal to the size of one stripe in the RAID.
Optionally, the storage device has other storage areas except the n storage areas, and the storage device abnormality detection apparatus further includes:
the first reading module is used for performing the reading operation on the first storage area every other preset time period;
alternatively, the first and second electrodes may be,
a second reading module, configured to perform the reading operation on the first storage area when the other storage areas meet a determination trigger condition, where the determination trigger condition includes: a read operation to the other storage area fails, a file error read from the other storage area fails, and a write operation to the other storage area fails.
Optionally, m ═ n.
In another aspect, a distributed storage system is provided, where the distributed storage system includes a management device and a plurality of storage devices, and the management device includes the above storage device abnormality detection apparatus.
The beneficial effect that technical scheme that this application provided brought includes at least:
in the method for detecting the device abnormality, n preset files can be written in n storage areas in partial storage areas in the storage device, the first preset file is any one of the n preset files, the first preset file is written in the first storage area, and the first preset file comprises: the first target data and the first verification data thereof. When the read operation to the first storage area is successful, second check-up data of second target data read from the first storage area is generated. The third parity data is data obtained by performing a read operation on the first parity data in the first storage area, and when the third parity data is different from the second parity data, it may be determined that the first storage area is abnormal. And when at least m storage areas in the n storage areas are determined to be abnormal, determining that the storage equipment is abnormal, and further realizing the detection of the storage equipment abnormality.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for detecting an anomaly of a storage device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a storage device according to an embodiment of the invention;
FIG. 4 is a schematic diagram of another memory device provided by embodiments of the present invention;
FIG. 5 is a schematic diagram of another memory device provided by an embodiment of the invention;
FIG. 6 is a schematic diagram of another memory device provided by an embodiment of the invention;
FIG. 7 is a flowchart of a first method for detecting an anomaly in a storage area according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of an RAID according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an abnormality detection apparatus for a storage device according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of another apparatus for detecting an abnormality of a storage device according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of an apparatus for detecting an abnormality of a storage device according to another embodiment of the present invention;
FIG. 12 is a schematic structural diagram of an apparatus for detecting an abnormality of a storage device according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of an abnormality detection apparatus for a storage device according to another embodiment of the present invention;
fig. 14 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
With the development of network communication technology, distributed storage systems are widely used. Fig. 1 shows a schematic structural diagram of a distributed storage system, and the distributed storage system 10 may include: a management apparatus 101, and a plurality of storage apparatuses 102 managed by the management apparatus 101. The client 103 can store data in the distributed storage system 10 through the management apparatus 101 and the storage apparatus 102.
By way of example, the plurality of storage devices may include: storage devices such as a DISK (English DISK), a Solid State Drive (SSD), a Redundant Array of Independent DISKs (RAID), a Storage Area Network (SAN, also called IP SAN), a fiber Channel Storage Area Network (FC SAN), and a Network Attached Storage (NAS). The client 103 may be deployed on a terminal, and the terminal may be an electronic device such as a mobile phone, a notebook computer, a desktop computer, a tablet computer, and an intelligent television. The management device 101 may be one server, or the management device 101 may be a server cluster composed of a plurality of servers.
It should be noted that fig. 1 only shows three storage devices 102 in the distributed storage system 10, and in practical applications, four, five, or more storage devices may be included in the distributed storage system, which is not limited in this embodiment of the present invention. The client and the management device may be connected through a wired network or a wireless network, wherein the wired network may include, but is not limited to: universal Serial Bus (USB), wireless networks may include, but are not limited to: wireless Fidelity (WIFI for short), bluetooth, infrared, Zigbee, data, and the like.
When a user needs to store data in the distributed storage system 10, the user can send data to be stored to the management device 101 through the client 103, so that the management device 101 writes the data to the storage device 102. If a certain storage device is abnormal (for example, the storage device is disconnected or damaged), the management device cannot correctly read the data on the storage device, or cannot write the data into the storage device, resulting in data loss. Therefore, determining whether a storage device is abnormal is critical to a distributed storage system.
Fig. 2 is a flowchart of a method for detecting an abnormality of a storage device according to an embodiment of the present invention. The method may be used for the management device in fig. 1, as shown in fig. 2, the method comprising:
step 201, writing n preset files into n storage areas in partial storage areas in the storage device, wherein n is larger than or equal to 1. Step 202 is performed.
It should be noted that the storage device involved in the embodiment of the present invention may be any one of the storage devices in the distributed storage system shown in fig. 1. The storage device may have a plurality of storage addresses in succession, and the plurality of storage addresses in the storage device may constitute a plurality of storage areas provided that one storage area includes at least one storage address. The partial storage area may include n storage areas of the plurality of storage areas, where n is greater than or equal to 1, and in step 201, the management device may write n preset files into the n storage areas, that is, the management device may write one preset file into each of the n storage areas.
It should be noted that, assuming that the first preset file is any one of the n preset files, the first preset file is written in a first storage area of the n storage areas, and the first preset file may include: the data processing method comprises first target data and first check data thereof, wherein the first check data are data obtained by processing the first target data in a preset processing mode. For example, the preset processing manner may be a fifth version of a Message-Digest Algorithm (MD 5), in step 201, the management device may process the first target data using the MD5 to obtain an MD5 code of the first target data, that is, the first verification data, and write both the first verification data and the target data into the first storage area. For example, the first target data may be written in a first area in the first storage area, and the first check data may be written in a second area in the first storage area.
Optionally, when n is greater than or equal to 3, the management device may store the n preset files at the head, the middle, and the tail of the plurality of storage addresses, respectively. It should be noted that the head and the tail of the plurality of memory addresses are two memory addresses located at two ends of the plurality of memory addresses, respectively, the middle of the plurality of memory addresses is a part of memory addresses located between the head and the tail of the plurality of memory addresses, and at least one memory address is separated between any two adjacent memory addresses in the head, the middle, and the tail of the plurality of memory addresses.
For example, as shown in fig. 3, the storage device 30 has a plurality of storage addresses a in a continuous manner, and the management device may write three preset files (e.g., a preset file F1, a preset file F2, and a preset file F3) in a partial storage area of a plurality of storage areas (not shown in fig. 3) composed of the plurality of storage addresses, so that the preset file F1 is stored at a head of the plurality of storage addresses, the preset file F2 is stored in a middle of the plurality of storage addresses, and the preset file F3 is stored at a tail of the plurality of storage addresses. In fig. 3, the head, the middle and the tail all include only one storage address, and each of the head, the middle and the tail stores only one preset file, and the three preset files are uniformly distributed as an example.
In practical applications, the three preset files may be files with different sizes, and the head, the middle and the tail may include more than one storage address, for example, as shown in fig. 4, the preset file F1 stores two storage addresses at the head among the plurality of storage addresses, the preset file F3 stores two storage addresses at the tail among the plurality of storage addresses, and the preset file F2 stores one storage address at the middle among the plurality of storage addresses.
Alternatively, the plurality of preset files may not be evenly distributed among the plurality of storage addresses, for example, as shown in fig. 5, the number of storage addresses between the preset file F1 and the preset file F2 may be greater than the number of storage addresses between the preset file F2 and the preset file F3.
Still alternatively, the management device may only write one preset file into the storage device, for example, the one preset file F1 may be located at the head of the multiple storage addresses of the storage device (as shown in fig. 6), or the preset file may be located at other parts (such as the middle part or the tail part) of the multiple storage addresses, which is not limited in this embodiment of the present invention.
Step 202, determining whether at least m storage areas in the n storage areas are abnormal, wherein m is more than or equal to 1 and less than or equal to n. When at least m storage areas in the n storage areas are abnormal, executing step 203; when the number of abnormal storage areas in the n storage areas is less than m, the process continues to execute step 202.
For example, the first storage area is any one of n storage areas, and as shown in fig. 7, the process of determining whether the first storage area is abnormal by the management device may include:
step 2021, perform a read operation on the first storage area. When the read operation is successful, execute step 2022; when the read operation fails, step 2026 is performed.
The management device may perform a read operation on the first storage area, and determine whether the first storage area is abnormal according to whether the read operation is successful.
Step 2022 generates second check data of the second target data read from the first storage area. Step 2023 is performed.
If the read operation on the first storage area is successful, second target data and third verification data can be read from the first storage area, the second target data is obtained by performing the read operation on the first target data in the first storage area, and the third verification data is obtained by performing the read operation on the first verification data in the first storage area. That is, the second target data is data read from the first area in the first storage area, and the third verification data is data read from the second area in the first storage area. After the management device reads the second target data from the first area, the read second target data may be processed by using a preset processing mode to obtain second check data.
It should be noted that the read second target data may be the same as the actually stored first target data, the second target data may also be different from the actually stored first target data, the read third verification data may be the same as the actually stored first verification data, and the third verification data may also be different from the actually stored first verification data, which is not limited in this embodiment of the present invention.
Step 2023, determine whether the second check data is the same as the third check data. Performing step 2024 when the second parity data is the same as the third parity data; step 2026 is performed when the second parity data is different from the third parity data.
After the management device obtains the second check data, the management device may compare the second check data with the third check data read from the second area to determine whether the second check data is the same as the third check data, and further determine whether the data read from the first storage area is correct.
It should be noted that, because two check data obtained by processing two identical data in the preset processing manner are also identical, two check data obtained by processing two different data in the preset processing manner are also different.
If the third check data is the same as the first check data that is actually stored (i.e., the data read from the second area in the first storage area is correct), it may be determined whether the second target data is the same as the first target data that is actually stored, according to whether the second check data is the same as the third check data. If the second check data is the same as the third check data, the second target data can be considered to be the same as the first target data, that is, the data read from the first area in the first storage area is correct, and thus it is determined that the data read from the first storage area are all correct; otherwise, the data read from the first area is considered to be wrong, and the data read from the first storage area is determined to be wrong.
If the second target data is the same as the first target data that is actually stored (i.e., the data read from the second area is correct), the second check data is the same as the first check data, and it is determined whether the second check data is the same as the third check data, i.e., it is determined whether the first check data is the same as the third check data. If the first verification data is the same as the third verification data, determining that the data read from the second area is correct, and further determining that the data read from the first storage area are all correct; otherwise, the data read from the second area is considered to be wrong, and the data read from the first storage area is determined to be wrong.
Since the second target data is different from the first target data, and the first verification data is also different from the third verification data, but the probability that the second verification data is the same as the third verification data is very low, it can be considered that the data read from the first storage area is correct when the second verification data is the same as the third verification data; when the second parity data is different from the third parity data, the data read from the first storage area is considered to be erroneous.
Step 2024, overwrite the first parity data in the first storage area with the second parity data. If the overlay operation is successful, go to step 2025; if the override operation fails, step 2026 is performed.
It should be noted that, if the management device determines in step 2023 that the second check data is the same as the third check data, it may be determined that the data read from the first storage area is correct, that is, it is determined that the data can be normally read from the first storage area, and then the management device may determine whether the data can be normally written in the first storage area.
For example, the management device may perform a write operation on the first storage area, e.g., the management device may perform an overwrite operation on the first parity data in the first storage area with the second parity data, i.e., perform a write operation on the second parity data in the second storage area. If the overwriting operation fails, the management device may determine that the second check data cannot be written into the second area, and further determine that the data cannot be written into the first storage area; if the overwriting operation is successful, the management device may determine that the second parity data is successfully written into the second area, and further determine that data can be written into the first storage area. At this time, since the second parity data is the same as the third parity data read from the second area, it can be considered that both the third parity data and the second parity data currently written into the second area are the same as the first parity data actually stored, and the overwriting operation in step 205 does not change the data stored in the first storage area.
Step 2025, determine the first storage area is normal.
When the read operation on the first storage area is successful, the data read from the first storage area is correct, and the write operation on the first storage area is successful, the management device may determine that the first storage area is normal.
Step 2026, determine the first storage area is abnormal.
The management device may determine that the first storage area is abnormal when a read operation to the first storage area fails, data read from the first storage area is erroneous, or a write operation to the first storage area fails.
It should be noted that after step 201, the management device may perform a read operation on the first storage area every preset time period, or perform a read operation on the first storage area when other storage areas except the n storage areas in the storage device satisfy the determination trigger condition. That is, in step 202, the management device may determine whether at least m storage areas of the n storage areas are abnormal every preset time period. The judgment trigger condition may include: a failure of a read operation to the other storage area, a file error read from the other storage area, and a failure of a write operation to the other storage area.
For example, if the management device fails to read the other storage area in the storage device, the management device may determine that the storage device satisfies the determination trigger condition. If the reading operation is successful, the management device can judge whether the read file is wrong, and if the read file is wrong, the management device can determine that the storage device meets the judgment triggering condition; if the read file is correct, the management device may determine that the storage device does not satisfy the determination trigger condition. In addition, if the management device fails to perform the write operation on the other storage area in the storage device, the management device may also determine that the storage device satisfies the determination trigger condition; if the write operation is successful, the management device may determine that the storage device does not satisfy the determination trigger condition.
Step 203, determining that the storage device is abnormal.
When the management device determines that at least m of the n storage areas are abnormal, the management device may determine that the storage device is abnormal, and then the management device may prohibit data reading and writing on the storage device. Optionally, the management device may further notify a worker to maintain the storage device after determining that the storage device is abnormal, or send a prompt message to a client requesting to read and write data on the storage device, or the management device may further perform other operations, which is not limited in the embodiment of the present invention.
When m is 1, only one storage area in the storage device stores a preset file, and if the one storage area is abnormal, the management device may determine that the entire storage device is abnormal; when m is larger than or equal to 2 and smaller than or equal to n, the storage device has a plurality of storage areas which store preset files, when the management device determines that at least m of the n storage areas are abnormal, the management device can determine that the storage device is abnormal, and the value of m can be determined by a user. In addition, since the storage device may be further used when a part of the n storage areas is abnormal, m may also be equal to n, that is, the management device may determine that the storage device is abnormal when determining that all the n storage areas are abnormal, so as to improve the accuracy of determining whether the storage device is abnormal.
In the related art, a distributed storage system connects a management device to a plurality of storage devices through a network, and the abnormal condition of the storage devices generally includes: the storage device is disconnected (namely disconnected from the management device), the storage device is damaged (namely the storage device is unavailable), and the storage device is abnormal at the back end (namely the stored data is wrong, the data can be read and written but the read and written data is abnormal). When the storage device is disconnected or damaged or the back-end storage is abnormal, the management device cannot know the situation in time. At this time, if the management device needs to write data into the storage device, the management device writes the data into an address where the storage device is located before the storage device is offline, so that data loss is caused; if the management device needs to read data from the storage device, the management device may perform a reading operation at an address where the storage device is not offline, and may not read the data or read the wrong data, so that the efficiency of reading the correct data is low.
In the embodiment of the present invention, the management device may write n preset files into n storage areas in a partial storage area in the storage device, and may determine whether all of at least m storage areas in the n storage areas are abnormal every preset time period or when the storage device satisfies a determination trigger condition, and determine that the storage device is abnormal when at least m storage areas in the n storage areas are abnormal, thereby implementing detection of the storage device abnormality.
It should be noted that the storage device in the embodiment of the present invention may be any one of storage devices such as a DISK, an SSD, a RAID, a SAN, an FC SAN, a NAS, a cloud storage, and an object storage. The RAID is a hard disk group (also called a logical hard disk) formed by combining a plurality of independent hard disks (also called physical hard disks) in different ways, has higher storage performance than a single hard disk, and can perform data backup. Each hard disk in a RAID may be a disk (english: disk). In step 201, the management device writes into n preset files in n storage areas in the RAID, where the size of each preset file needs to be greater than or equal to the size of one stripe (also referred to as stripe) in the RAID, so that it can accurately determine whether the RAID is abnormal by determining whether at least m storage areas in the n storage areas are abnormal.
It should be noted that, assuming that the RAID includes x disks, each disk includes multiple storage blocks (also referred to as chunk), each stripe in the RAID may include: x memory blocks in x disks, i.e., the stripe includes one memory block per disk. And in the x storage blocks in the stripe, x-1 storage blocks are used for storing data, the rest storage blocks are used for storing the verification information of the data, and the size of the stripe is the size of the data stored in the x-1 storage blocks.
For example, each disk in a RAID may include i storage blocks, i ≧ 2. Referring to fig. 8, if RAID is RAID5 composed of three disks (RAID5 is a RAID), the three disks in RAID5 are disks A, B and C shown in fig. 8, and it is assumed that each disk in RAID5 includes five storage blocks (i.e., i is 5), for example, disk a includes storage block a1、A2、A3、A4、A5The disk B comprises a storage block B1、B2、B3、B4、B5The disk C comprises a storage block C1、C2、C3、C4、C5. It should be noted that the address of each storage block may be a storage address of a storage device, and the consecutive storage addresses of the RAID may be: a. the1、B1、C1、A2、B2、C2、A3、B3、C3、A4、B4、C4、A5、B5And C5The address of (2).
For example, assume that memory block C in FIG. 81、B2、A3、C4And B5The management apparatus may write three preset files f1, f2, and f3 to other storage addresses in the RAID5, and the sizes of the three preset files f1, f2, and f3 are all the size of one stripe. For example, the management apparatus may write the preset file f1 in the storage block a1Memory address and memory block B1The preset file f2 is written into the storage block B3Memory address and memory block C3The preset file f3 is written into the storage block A5Memory address and memory block C5The memory address of (2). Further, the preset file f1 is stored at the head of the plurality of storage addresses, the preset file f2 is stored at the middle of the plurality of storage addresses, and the preset file f3 is stored at the tail of the plurality of storage addresses. Optionally, the target data in each preset file may be stored in the storage addresses of two storage blocks.
In addition, in the embodiment of the present invention, only the storage device is RAID, and RAID is RAID5 composed of three independent disks, each disk includes 5 storage blocks, and three preset files are written into RAID, for example, in practical applications, the storage device may further include four, five, or more disks, each disk may further include eight, nine, or more storage blocks, and four or five preset files may also be written into the storage device, which is not limited in this embodiment of the present invention.
When the management device writes data into the RAID, if the size of the data is larger than the size of one stripe, the management device may sequentially write the data into the plurality of storage addresses by dividing the data into a plurality of stripes.
In summary, in the method for detecting device abnormality provided in the embodiment of the present invention, n preset files may be written in n storage areas in a partial storage area of a storage device, where a first preset file is any one of the n preset files, and the first preset file is written in a first storage area, where the first preset file includes: the first target data and the first verification data thereof. When the read operation to the first storage area is successful, second check-up data of second target data read from the first storage area is generated. The third check data is data obtained by performing a read operation on the first check data in the first storage area, and when the second check data is different from the third check data, it may be determined that the first storage area is abnormal. When at least m storage areas in the n storage areas are determined to be abnormal, the storage equipment can be determined to be abnormal, and m is more than or equal to 1 and less than or equal to n, so that the storage equipment abnormality is detected. When determining that a certain storage device is abnormal, the management device can not write data into the storage device any more, and data loss caused by the fact that the management device writes the data into the abnormal storage device is prevented.
Fig. 9 is a schematic structural diagram of an abnormality detection apparatus for a storage device according to an embodiment of the present invention. The storage device abnormality detection apparatus is applied to a management device, a part of storage regions in the storage device includes n storage regions, n ≧ 1, as shown in fig. 9, the storage device abnormality detection apparatus 90 includes:
a writing module 901, configured to write n preset files into n storage areas, where a first preset file is any one of the n preset files, and the first preset file is written into a first storage area, where the first preset file includes: the first target data and the first verification data thereof.
A generating module 902, configured to generate second check data of second target data read from the first storage area when the read operation on the first storage area is successful, where the second target data is data obtained by performing a read operation on the first target data in the first storage area;
the first determining module 903 is configured to determine that the first storage area is abnormal when the second parity data is different from third parity data, where the third parity data is data obtained by performing a read operation on the first parity data in the first storage area.
And a second determining module 904, configured to determine that the storage device is abnormal when at least m of the n storage areas are determined to be abnormal, where m is greater than or equal to 1 and less than or equal to n.
In summary, in the storage device abnormality detection apparatus provided in the embodiment of the present invention, the writing module may write n preset files in n storage areas of a partial storage area in the storage device, where the first preset file is any one of the n preset files, and the first preset file is written in the first storage area, where the first preset file includes: first target data and first verification data thereof; the generation module may generate second check-up data of second target data read from the first storage area when the read operation to the first storage area is successful. The third check data is data obtained by reading the first check data in the first storage area; the first determining module may determine that the first storage area is abnormal when the second check data is different from the third check data, and the second determining module determines that the storage device is abnormal when at least m storage areas of the n storage areas are determined to be abnormal, thereby implementing detection of the storage device abnormality.
Fig. 10 is a schematic structural diagram of another storage device abnormality detection apparatus according to an embodiment of the present invention. As shown in fig. 10, in addition to fig. 9, the storage device abnormality detection apparatus further includes:
the third determining module 905 is configured to determine that the first storage area is abnormal when the check data read from the first storage area is the same as the second check data and the overwriting of the second check data on the first check data in the first storage area fails.
Optionally, fig. 11 is a schematic structural diagram of another storage device abnormality detection apparatus according to an embodiment of the present invention. As shown in fig. 11, in addition to fig. 9, the storage device abnormality detection apparatus further includes:
a fourth determining module 906, configured to determine that the first storage area is abnormal when the read operation on the first storage area fails.
Optionally, the first check data is obtained by processing the first target data in a preset processing manner, and the second check data is obtained by processing the second target data in a preset processing manner.
Optionally, n ≧ 3, the storage device has a plurality of consecutive storage addresses, and the n storage regions include a head portion, a middle portion, and a tail portion of the plurality of storage addresses.
Optionally, the storage device includes a redundant array of independent disks RAID, and the size of the preset file is greater than or equal to the size of one stripe in the RAID.
Optionally, the storage device has other storage regions except for n storage regions, as shown in fig. 12, and on the basis of fig. 9, the storage device abnormality detection apparatus further includes:
the first reading module 907 is configured to perform a reading operation on the first storage area every preset time period.
Alternatively, as shown in fig. 12, the storage device abnormality detection apparatus further includes, in addition to fig. 9:
a second reading module 908, configured to perform a reading operation on the first storage area when the other storage areas meet a determination trigger condition, where the determination trigger condition includes: a failure of a read operation to the other storage area, a file error read from the other storage area, and a failure of a write operation to the other storage area.
Optionally, m ═ n.
In summary, in the storage device abnormality detection apparatus provided in the embodiment of the present invention, the writing module may write n preset files in n storage areas of a partial storage area in the storage device, where the first preset file is any one of the n preset files, and the first preset file is written in the first storage area, where the first preset file includes: first target data and first verification data thereof; the generation module may generate second check-up data of second target data read from the first storage area when the read operation to the first storage area is successful. The third check data is data obtained by reading the first check data in the first storage area; the first determining module may determine that the first storage area is abnormal when the second check data is different from the third check data, and the second determining module determines that the storage device is abnormal when at least m storage areas of the n storage areas are determined to be abnormal, thereby implementing detection of the storage device abnormality.
An embodiment of the present invention provides a distributed storage system, where the distributed storage system may be the distributed storage system shown in fig. 1, the distributed storage system includes a management device and a plurality of storage devices, and the management device may include the storage device abnormality detection apparatus shown in any one of fig. 8 to 13.
Fig. 14 is a schematic structural diagram of a computer device that can be used as a management device in a distributed storage system according to an embodiment of the present invention, and as shown in fig. 14, the computer device 000 includes a Central Processing Unit (CPU)001, a system memory 004 including a Random Access Memory (RAM)002 and a Read Only Memory (ROM)003, and a system bus 005 connecting the system memory 004 and the central processing unit 001. The server 000 also includes a basic input/output system (I/O system) 006 to facilitate the transfer of information between devices within the computer, and a mass storage device 007 for storing an operating system 013, application programs 014, and other program modules 015.
The basic input/output system 006 includes a display 008 for displaying information and an input device 009 such as a mouse, a keyboard, etc. for a user to input information. Wherein the display 008 and the input device 009 are both connected to the central processing unit 001 through an input-output controller 010 connected to the system bus 005. The basic input/output system 006 may also include an input/output controller 010 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 010 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 007 is connected to the central processing unit 001 through a mass storage controller (not shown) connected to the system bus 005. The mass storage device 007 and its associated computer-readable media provide non-volatile storage for the server 000. That is, the mass storage device 007 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 004 and mass storage device 007 described above may be collectively referred to as memory.
The server 000 may also operate as a remote computer connected to a network through a network such as the internet, according to various embodiments of the present invention. That is, the server 000 may be connected to the network 012 through the network interface unit 011 connected to the system bus 005, or may be connected to another type of network or a remote computer system (not shown) using the network interface unit 011.
The memory further includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 001 implements the device abnormality detection method shown in fig. 2 by executing the one or more programs.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as a memory, including instructions executable by a processor of a server to perform the storage device anomaly detection methods shown in the various embodiments of the present invention is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It should be noted that: in the storage device abnormality detection apparatus provided in the above embodiment, when detecting an abnormality of a storage device, only the division of the above functional modules is exemplified, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the storage device abnormality detection apparatus is divided into different functional modules to complete all or part of the above described functions.
It should be noted that, the method embodiment provided in the embodiment of the present invention can be mutually referred to a corresponding apparatus embodiment, and the embodiment of the present invention does not limit this. The sequence of the steps of the method embodiments provided in the embodiments of the present invention can be appropriately adjusted, and the steps can be correspondingly increased or decreased according to the situation, and any method that can be easily conceived by those skilled in the art within the technical scope disclosed in the present application shall be covered by the protection scope of the present application, and therefore, the details are not repeated.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (13)

1. A storage device abnormality detection method is applied to a management device, a partial storage area in the storage device comprises n storage areas, n is larger than or equal to 3, the storage device has a plurality of continuous storage addresses, and the n storage areas comprise a head part, a middle part and a tail part in the plurality of storage addresses, and the method comprises the following steps:
writing n preset files into the n storage areas, wherein a first preset file is any one of the n preset files, the first preset file is written into the first storage area, and the first preset file comprises: the n preset files are uniformly distributed in the plurality of storage addresses;
when the read operation on the first storage area is successful, generating second check data of second target data, wherein the second target data is data obtained by performing the read operation on the first target data in the first storage area;
determining that the first storage area is abnormal when the second check data is different from third check data, wherein the third check data is data obtained by performing read operation on the first check data in the first storage area;
determining that the first storage area is abnormal when the second check data is the same as the third check data and the overwriting of the second check data on the first check data in the first storage area fails;
and when at least m storage areas in the n storage areas are determined to be abnormal, determining that the storage equipment is abnormal, wherein m is more than or equal to 1 and less than or equal to n.
2. The method of claim 1, further comprising:
determining that the first storage area is abnormal when a read operation to the first storage area fails.
3. The method according to claim 1, wherein the first check data is obtained by processing the first target data in a preset processing manner, and the second check data is obtained by processing the second target data in the preset processing manner.
4. The method of claim 1, wherein the storage device comprises a Redundant Array of Independent Disks (RAID), and wherein the preset file has a size greater than or equal to a size of a stripe in the RAID.
5. The method of claim 1, wherein the storage device has a storage area other than the n storage areas, the method further comprising:
performing the read operation on the first storage area every other preset time period;
or, when the other storage areas meet a judgment trigger condition, performing the read operation on the first storage area, where the judgment trigger condition includes: a read operation to the other storage area fails, a file error read from the other storage area fails, and a write operation to the other storage area fails.
6. The method of claim 1, wherein m-n.
7. A storage device abnormality detection apparatus applied to a management device, wherein a partial storage area in the storage device includes n storage areas, n is greater than or equal to 3, the storage device has a plurality of consecutive storage addresses, and the n storage areas include a head portion, a middle portion, and a tail portion of the plurality of storage addresses, the storage device abnormality detection apparatus comprising:
a writing module, configured to write n preset files into the n storage areas, where a first preset file is any one of the n preset files, and the first preset file is written into the first storage area, where the first preset file includes: the n preset files are uniformly distributed in the plurality of storage addresses;
a generating module, configured to generate second check data of second target data when the read operation on the first storage area is successful, where the second target data is data obtained by performing a read operation on the first target data in the first storage area;
a first determining module, configured to determine that the first storage area is abnormal when the second parity data is different from third parity data, where the third parity data is data obtained by performing a read operation on the first parity data in the first storage area;
a third determining module, configured to determine that the first storage area is abnormal when the second parity data is the same as the third parity data and the overwriting of the second parity data on the first parity data in the first storage area fails;
and the second determining module is used for determining that the storage equipment is abnormal when at least m storage areas in the n storage areas are determined to be abnormal, wherein m is more than or equal to 1 and less than or equal to n.
8. The storage device abnormality detection apparatus according to claim 7, characterized in that the storage device abnormality detection apparatus further comprises:
and the fourth determining module is used for determining that the first storage area is abnormal when the read operation on the first storage area fails.
9. The apparatus according to claim 7, wherein the first check data is obtained by processing the first target data in a preset processing manner, and the second check data is obtained by processing the second target data in the preset processing manner.
10. The apparatus according to claim 7, wherein the storage device comprises a redundant array of independent disks RAID, and the preset file has a size greater than or equal to a size of a stripe in the RAID.
11. The apparatus according to claim 7, wherein the storage device has a storage area other than the n storage areas, the apparatus further comprising:
the first reading module is used for performing the reading operation on the first storage area every other preset time period;
alternatively, the first and second electrodes may be,
a second reading module, configured to perform the reading operation on the first storage area when the other storage areas meet a determination trigger condition, where the determination trigger condition includes: a read operation to the other storage area fails, a file error read from the other storage area fails, and a write operation to the other storage area fails.
12. The storage device abnormality detection apparatus according to claim 7, wherein m-n.
13. A distributed storage system, comprising a management apparatus and a plurality of storage apparatuses, wherein the management apparatus comprises: the storage device abnormality detection apparatus according to any one of claims 7 to 12.
CN201810411648.6A 2018-05-02 2018-05-02 Storage equipment abnormality detection method and device and distributed storage system Active CN110442298B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810411648.6A CN110442298B (en) 2018-05-02 2018-05-02 Storage equipment abnormality detection method and device and distributed storage system
PCT/CN2019/085128 WO2019210844A1 (en) 2018-05-02 2019-04-30 Anomaly detection method and apparatus for storage device, and distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810411648.6A CN110442298B (en) 2018-05-02 2018-05-02 Storage equipment abnormality detection method and device and distributed storage system

Publications (2)

Publication Number Publication Date
CN110442298A CN110442298A (en) 2019-11-12
CN110442298B true CN110442298B (en) 2021-01-12

Family

ID=68386976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810411648.6A Active CN110442298B (en) 2018-05-02 2018-05-02 Storage equipment abnormality detection method and device and distributed storage system

Country Status (2)

Country Link
CN (1) CN110442298B (en)
WO (1) WO2019210844A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312294B (en) * 2020-02-27 2024-06-18 瑞昱半导体股份有限公司 Electronic device and communication method
CN112380046B (en) * 2020-11-10 2023-12-22 北京灵汐科技有限公司 Calculation result verification method, system, device, equipment and storage medium
WO2022100576A1 (en) * 2020-11-10 2022-05-19 北京灵汐科技有限公司 Verification method, system, and apparatus, computing chip, computer device, and medium
CN112395129A (en) * 2020-11-10 2021-02-23 北京灵汐科技有限公司 Storage verification method and device, computing chip, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746854A (en) * 2004-09-10 2006-03-15 富士通株式会社 The device, method and the program that are used for control store
CN101060015A (en) * 2007-05-23 2007-10-24 北京芯技佳易微电子科技有限公司 A multi-bit flash memory and its error detection and remedy method
CN101783955A (en) * 2010-03-24 2010-07-21 杭州华三通信技术有限公司 Data recovering method when data is abnormal and equipment thereof
CN104914815A (en) * 2015-04-15 2015-09-16 北汽福田汽车股份有限公司 Processor monitoring method, device and system
CN105824717A (en) * 2016-03-16 2016-08-03 硅谷数模半导体(北京)有限公司 Method and device for controlling chip operation and chip
CN106373616A (en) * 2015-07-23 2017-02-01 深圳市中兴微电子技术有限公司 Method and apparatus for detecting failure of random access memory, and network processor
CN108121615A (en) * 2016-11-28 2018-06-05 中国科学院沈阳自动化研究所 A kind of date storage method based on redundancy fault-tolerant mechanism

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10335978B4 (en) * 2003-08-06 2006-02-16 Infineon Technologies Ag Hub module for connecting one or more memory modules
DE102006016499B4 (en) * 2006-04-07 2014-11-13 Qimonda Ag Memory module control, memory control and corresponding memory arrangement and method for error correction
KR100802059B1 (en) * 2006-09-06 2008-02-12 삼성전자주식회사 Memory system capable of suppressing generation of bad blocks due to read disturbance and operating method thereof
CN104269190B (en) * 2014-08-26 2017-10-17 上海华虹宏力半导体制造有限公司 The data verification method of memory

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1746854A (en) * 2004-09-10 2006-03-15 富士通株式会社 The device, method and the program that are used for control store
CN101060015A (en) * 2007-05-23 2007-10-24 北京芯技佳易微电子科技有限公司 A multi-bit flash memory and its error detection and remedy method
CN101783955A (en) * 2010-03-24 2010-07-21 杭州华三通信技术有限公司 Data recovering method when data is abnormal and equipment thereof
CN104914815A (en) * 2015-04-15 2015-09-16 北汽福田汽车股份有限公司 Processor monitoring method, device and system
CN106373616A (en) * 2015-07-23 2017-02-01 深圳市中兴微电子技术有限公司 Method and apparatus for detecting failure of random access memory, and network processor
CN105824717A (en) * 2016-03-16 2016-08-03 硅谷数模半导体(北京)有限公司 Method and device for controlling chip operation and chip
CN108121615A (en) * 2016-11-28 2018-06-05 中国科学院沈阳自动化研究所 A kind of date storage method based on redundancy fault-tolerant mechanism

Also Published As

Publication number Publication date
WO2019210844A1 (en) 2019-11-07
CN110442298A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
US11163472B2 (en) Method and system for managing storage system
US9910748B2 (en) Rebuilding process for storage array
CN110442298B (en) Storage equipment abnormality detection method and device and distributed storage system
US9715436B2 (en) System and method for managing raid storage system having a hot spare drive
US9372743B1 (en) System and method for storage management
US20190317872A1 (en) Database cluster architecture based on dual port solid state disk
US9804923B2 (en) RAID-6 for storage system employing a hot spare drive
US9563524B2 (en) Multi level data recovery in storage disk arrays
US10572335B2 (en) Metadata recovery method and apparatus
CN110413208B (en) Method, apparatus and computer program product for managing a storage system
JP2006139478A (en) Disk array system
US20190129646A1 (en) Method, system, and computer program product for managing storage system
US10120790B1 (en) Automated analysis system and method
US10860224B2 (en) Method and system for delivering message in storage system
US10942826B2 (en) Method and device for managing storage system
US10095504B1 (en) Automated analysis system and method
US9256490B2 (en) Storage apparatus, storage system, and data management method
US9268640B1 (en) Limiting data loss on parity RAID groups
US11275513B2 (en) System and method for selecting a redundant array of independent disks (RAID) level for a storage device segment extent
US10416982B1 (en) Automated analysis system and method
JP6556980B2 (en) Storage control device, storage control method, and storage control program
US9633066B1 (en) Taking a consistent cut during replication for storage across multiple nodes without blocking input/output
US10353771B1 (en) Managing data storage
CN112328182A (en) RAID data management method, device and computer readable storage medium
JP2019159416A (en) Data management device, file system, data management method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant