CN116149557A - Manufacturing-oriented slow disk detection strategy system and method - Google Patents

Manufacturing-oriented slow disk detection strategy system and method Download PDF

Info

Publication number
CN116149557A
CN116149557A CN202310143737.8A CN202310143737A CN116149557A CN 116149557 A CN116149557 A CN 116149557A CN 202310143737 A CN202310143737 A CN 202310143737A CN 116149557 A CN116149557 A CN 116149557A
Authority
CN
China
Prior art keywords
delay
hard disk
disk
slow
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310143737.8A
Other languages
Chinese (zh)
Other versions
CN116149557B (en
Inventor
冯筱柳
徐文豪
张凯
王弘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiling Haina Technology Co ltd
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Priority to CN202310143737.8A priority Critical patent/CN116149557B/en
Publication of CN116149557A publication Critical patent/CN116149557A/en
Application granted granted Critical
Publication of CN116149557B publication Critical patent/CN116149557B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a slow disk detection strategy system and a method for manufacturing industry, wherein the method comprises the steps of obtaining attribute information of a plurality of hard disks in storage equipment and recording the number of IO requests of the current hard disk; dynamically setting the capacity of an IO window according to the number of IO requests, and detecting an IO delay data set according to the capacity of the IO window; performing slow disk detection processing according to attribute information of the current hard disk and the IO delay data set; the slow disc detection strategy system and the slow disc detection strategy method for manufacturing industry achieve the slow disc discrimination effect more efficiently and accurately.

Description

Manufacturing-oriented slow disk detection strategy system and method
Technical Field
The invention relates to the field of computer storage, in particular to a slow disk detection strategy system and method for manufacturing industry.
Background
In existing cloud computing, storage scenarios, qoS (quality of service) is critical for users, and one of the common requirements is a shorter service response time. To guarantee quality of service, input/Output (IO) response time of data must be guaranteed. Storage systems commonly introduce hard disk detection services for slow disk detection to find hard disk sub-health (slow response).
In the existing slow disk detection technology, the slow disk is mainly judged based on the response time and bandwidth of the hard disk. The hard disk response time generally represents the time from the host generic block layer to the hard disk for the IO request to complete and notify the host. Bandwidth represents the IO size handled by the hard disk per second. For hard disk response time, there are generally the following two judgment modes: sampling for multiple times, counting the proportion or the times of the timeout response IO, and judging as a slow disc if the proportion or the times exceed a threshold value; sampling the average IO delay for a period of time, and judging that the average IO delay exceeds a threshold value is a slow disk. For the bandwidth of the hard disk, the Input/output (IOPS, input/Output OperationsPerSecond) size of the hard disk is generally counted, and if the IOPS is smaller than the threshold value, the hard disk is judged to be a slow disk. In practical applications, the denser the IO, the longer it stays in the IO queue of the hard disk or the host general block layer, the harder the obtained IO delay data shows the practical performance of the hard disk (the time of actually processing the IO, not the time of stay in the IO queue). Therefore, the slow disk detection dynamically adjusts the above threshold (timeout response IO ratio or number threshold, IOPS threshold) according to the actual load to improve the accuracy of the slow disk judgment.
The conventional slow disk judgment mode of the conventional strategy is to collect IO delay information of a hard disk for a period of time and calculate the average value of the IO delay data. However, the IO request processing is delay sensitive, if there is a small amount of IO delay that is too large, the IO request processing should be judged to be slow, and the average value cannot reflect the small amount of IO information with too long delay. Second, the average value cannot reflect the convergence of its distribution, i.e., the interval in which the average value is located is not necessarily the highest in the ratio, because the average value is affected by a small amount of long delay. Only the average value of the normal distribution type can represent the distribution information. However, hard disk IO delay is difficult to obey normal distribution. The misjudgment cost of the hard disk is great, IO request processing is affected if the hard disk is slow disk but not judged, and the available space in the cluster is reduced if the hard disk is not slow disk but judged.
Disclosure of Invention
The invention aims to provide a slow disc detection strategy system and method for manufacturing industry, which solve the technical problems pointed out in the prior art.
The invention provides a slow disk detection strategy system oriented to manufacturing industry, which comprises a host end and storage equipment;
the host end is electrically connected with the storage device;
The host side comprises a host universal block layer, a storage engine and a slow disk detection system;
the host universal block layer is connected with the slow disk detection system through a storage engine;
the host universal block layer is used for sending an IO request to the storage device;
the storage engine comprises a dynamic IO adjusting interface, an IO statistics module and a slow disk isolation module;
the storage device comprises a hard disk cluster consisting of a plurality of hard disks;
the slow disk detection system comprises a hard disk data collection module, a slow disk decision module, a delay monitor and a window regulation task distribution module;
the hard disk data collection module is used for obtaining attribute information of a plurality of hard disks in the storage device; obtaining an IO delay average value threshold L according to the IO delay data set of the hard disk A IO delay median threshold L M IO request proportion threshold P for tail delay range T
The IO statistics module is used for recording the quantity of all IO requests of the hard disk in the current system in the current time period T1; recording an IO delay data set in a current state when the dynamic IO adjustment interface is executed; the information is saved after the processing is completed, and the data is shared in the next time period T1;
The delay monitor is used for acquiring the number of IO requests in the IO statistics module and sending the number of IO requests to the window regulation task distribution module; acquiring an IO delay data set recorded by an IO statistics module, and transmitting the IO delay data set to a slow disk decision module;
the window regulation task distribution module is used for determining the capacity of the IO window according to the number of IO requests;
the dynamic IO adjusting interface is used for dynamically adjusting the capacity of the IO window according to the instruction of the window adjusting task distribution module of the slow disc detection system;
the slow disk decision module is used for acquiring the hard disk attribute information in the hard disk data collection module and according to the IO delay data set and the IO delay average value threshold L in the IO statistics module A IO delay median threshold L M IO request proportion threshold P for tail delay range T Analyzing whether the hard disk is a slow disk, if so, marking the slow disk for subsequent isolation;
the slow disk isolation module is used for isolating the hard disk corresponding to the slow disk detected by the slow disk detection system;
the slow disk detection system is used for collecting the number of IO requests, controlling the capacity adjustment of an IO window of the storage engine, monitoring an IO delay data set and carrying out slow disk judgment;
The storage engine is used for sending the IO request from the host end to the storage device, recording the IO delay data set and executing dynamic IO window adjustment by controlling the capacity of the IO window.
Correspondingly, the invention also provides a slow disc detection method facing the manufacturing industry, which comprises the following operation steps:
acquiring attribute information of all hard disks in the hard disk cluster, and recording the number of IO requests completed in each hard disk time period T1 through an IO statistics module; the delay monitor acquires the number of IO requests in the IO statistics module; if the number of IO requests of the current hard disk is larger than 0, the number of IO requests is sent to a window adjusting task distribution module for judgment;
the window regulation task distribution module judges whether the number of IO requests in the current time period T1 is larger than k; if yes, setting the capacity of the IO window as a first window threshold W1, and controlling the issuing speed of the IO to enable the IO quantity in the hard disk processing queue to be smaller than or equal to W1; and controlling an IO statistics module to record an IO delay data set U for processing the target IO request in a time period t1 l
If the window regulation task distribution module judges that the IO request quantity is small in the current time period T1In k, the capacity of the IO window is set as a second window threshold W2, and the IO statistics module is controlled to record the IO delay data set U in the time period t2 1
IO delay data set U of preset recording target hard disk l The number of times threshold n of the target IO request is recorded in real time l If the number of times of the IO delay data set Ul reaches n times, judging whether the hard disk is a slow disk or not through a slow disk decision algorithm based on the attribute information of the current hard disk and the IO delay data set, and if the hard disk is judged to be a slow disk, marking the hard disk and waiting for isolation.
Compared with the prior art, the embodiment of the invention has at least the following technical advantages:
the technical scheme adopted by the embodiment of the invention can use the IO window dynamic adjustment strategy to carry out slow disk detection according to the interaction based on the slow disk detection system and the storage engine, and is based on various statistics (the number of IO requests, the IO delay data set and the tail delay threshold L corresponding to each classified hard disk) Tail IO delay average threshold L A IO delay median threshold L M Threshold value P of IO request proportion of tail delay range T IO delay threshold L of current hard disk YW ) Making a slow disc discrimination decision;
the interaction between the slow disk detection system and the storage engine can utilize the IO request data function and the IO control function of the storage engine, so that the high coupling between the slow disk detection function and the IO function is avoided; through interaction of the storage engines, the IO delay data set is obtained directly and accurately, and accurate slow disk discrimination can be performed based on the IO data;
The IO window dynamic regulation strategy can detect more accurate hard disk processing capacity under different densities of IO request processing; when the number of IO requests is not large, the IO window is basically uncontrolled, and the performance condition corresponding to the disk can be simply obtained by not influencing IO delay data processed by the IO requests; under the condition that IO requests are denser, accurate hard disk IO delay information can be detected in a smaller time window by adjusting the capacity of an IO window, IO request processing is not affected, and accurate hard disk detection results can be obtained;
the slow disc judging strategy considers various statistics related to IO delay, including average, median and tail delay proportion, and does not only consider the IO delay average value, so that deviation of the tail delay on the general level of IO delay can be avoided to reduce slow disc misjudgment (if delay caused by the tail delay is too long, the IO delay average value cannot reflect the overlong tail delay, and misjudgment is caused); in addition, the change of the tail delay proportion is also considered, the relatively slow IO request is specially used for judging the tail delay proportion, and the IO delay average level is not only used, so that the slow IO and the slow disk can be directly related in a targeted manner, and the visual judgment of the slow disk can be performed; the embodiment utilizes the combination of the three IO delay discrimination statistics to more efficiently and accurately perform the slow disc detection, reduces the misjudgment rate and improves the slow disc detection discrimination effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system architecture of a slow disk detection strategy system for manufacturing according to a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating the operation steps of a manufacturing-oriented slow disc detection strategy method according to a second embodiment of the present invention;
fig. 3 is a schematic flow chart of operation steps of a slow disc decision algorithm in a slow disc detection strategy method for manufacturing according to a second embodiment of the present invention;
fig. 4 is a schematic flow chart of a threshold determining step of a slow disc decision algorithm in a slow disc detection strategy method for manufacturing industry according to a second embodiment of the present invention.
Reference numerals: a host side 10 and a storage device 20; a host generic block layer 13; a storage engine 12; a slow disc detection system 11; dynamic IO adjustment interface 121; an IO statistics module 122; a slow disk isolation module 123; a hard disk 21; a hard disk data collection module 111; a slow disc decision module 112; a delay monitor 113; the window adjusts the task distribution module 114.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention will now be described in further detail with reference to specific examples thereof in connection with the accompanying drawings.
Example 1
Referring to fig. 1, the invention proposes a slow disk detection strategy system for manufacturing industry, which comprises a host end 10 and a storage device 20;
the host end 10 is connected with the storage device 20;
the host side comprises a host universal block layer 13, a storage engine 12 and a slow disk detection system 11;
wherein, the slow disk detection system 11 is connected with the host universal block layer 13 through a storage engine 12;
wherein, the host generic block layer 13 is configured to send an IO request to the storage device 20; the storage engine 12 comprises a dynamic IO adjustment interface 121, an IO statistics module 122 and a slow disk isolation module 123;
the storage device includes a hard disk cluster composed of a plurality of hard disks 21;
The slow disk detection system 11 comprises a hard disk data collection module 111, a slow disk decision module 112, a delay monitor 113 and a window adjustment task distribution module 114;
the hard disk data collection module 111 is configured to obtain attribute information of a plurality of hard disks in the storage device; obtaining an IO delay average value threshold L according to the IO delay data set of the hard disk A IO delayMedian threshold L M IO request proportion threshold P for tail delay range T
The IO statistics module 122 is configured to record all the IO request numbers of the hard disk in the current system in the current time period T1; recording an IO delay data set in a current state when the dynamic IO adjustment interface is executed; the information is saved after the processing is completed, and the data is shared in the next time period T1;
the delay monitor 113 is configured to obtain the number of IO requests in the IO statistics module 122, and send the number of IO requests to the window adjustment task distribution module 114; acquiring an IO delay data set recorded by the IO statistics module 122, and sending the IO delay data set to the slow disc decision module 112;
the window adjustment task distribution module 114 is configured to determine a capacity of the IO window according to the number of IO requests;
the dynamic IO adjustment interface 121 is configured to dynamically adjust the capacity of the IO window according to an instruction of the window adjustment task distribution module 114 of the slow disc detection system;
The slow disk decision module 112 is configured to obtain hard disk attribute information in the hard disk data collection module 111, and determine an average value threshold value L according to the IO request data set and the IO delay in the IO statistics module 122 A IO delay median threshold L M IO request proportion threshold P for tail delay range T Analyzing whether the hard disk is a slow disk, if so, marking the slow disk for subsequent isolation;
the slow disk isolation module 123 is configured to isolate a hard disk corresponding to a slow disk detected by the slow disk detection system;
the slow disk detection system 11 is used for collecting the number of IO requests, controlling the capacity adjustment of an IO window of the storage engine, monitoring an IO delay data set and carrying out slow disk judgment;
the storage engine 12 is configured to send an IO request from a host side to a storage device, record an IO delay data set, and perform dynamic IO window adjustment by controlling a capacity of an IO window.
In summary, according to the manufacturing-oriented slow disk detection policy system provided by the invention, attribute information of a plurality of hard disks in a storage device is obtained through a hard disk data collection module; the IO statistics module records the IO request quantity and the IO delay data set of the current hard disk; the IO request quantity and the IO delay data set of the current hard disk are sent to a delay monitor; after judging, the delay monitor sends the IO request quantity to a window regulation task distribution module; the window distribution module dynamically sets the capacity of the IO window according to the number of IO requests; based on the capacity of the IO window, detecting an IO delay data set through an IO statistics module, and sending the IO delay data set to a slow-disk decision module through a delay monitoring module; and judging by the slow disk decision module according to the attribute information of the current hard disk and the IO delay data set, if judging to be a slow disk, transmitting the corresponding hard disk information to the slow disk isolation module, and isolating by the slow disk isolation module.
Example two
As shown in fig. 2, correspondingly, the invention further provides a slow disc detection strategy method oriented to manufacturing, which comprises the following operation steps:
step S10: acquiring attribute information of all hard disks in the hard disk cluster, and recording the number of IO requests completed in each hard disk time period T1 through an IO statistics module; the delay monitor acquires the number of IO requests in the IO statistics module; if the number of IO requests of the current hard disk is larger than 0, the number of IO requests is sent to a window adjusting task distribution module for judgment;
it should be noted that, the above operation is an IO delay statistics stage; a delay monitor in the slow disc detection system sets a timer, and the triggering time period of the timer is T1; the delay monitor acquires IO request data in the IO statistics module at intervals of a time period T1; the IO statistics module records the number of IO requests completed in each hard disk time period T1; if the number of IO requests of the current hard disk is greater than 0, the delay monitor sends IO statistics to the window regulation task distribution module for judgment;
step S20: the window regulation task distribution module judges whether the number of IO requests in the current time period T1 is larger than k; if yes, setting the size of an IO window (the IO window is a "container" for processing and placing IO requests) as a first window threshold W1, and controlling the IO The issuing speed enables the IO quantity in the hard disk processing queue to be smaller than or equal to W1; and controlling an IO statistics module to record an IO delay data set U for processing the target IO request in a time period t1 l The method comprises the steps of carrying out a first treatment on the surface of the (explaining that the time period T1 is a preset fixed value and its value is smaller than the time period T1);
the time period t1 is a preset time for processing IO requests when the number of IO attributes of the hard disk processing queue is kept smaller than a first window threshold W1;
wherein the first window threshold W1 is a preset maximum threshold of the number of IO requests in the hard disk processing queue (i.e. the number of IO requests is set to reach the threshold of the number of IO requests in a time period t1 for the hard disk, for example, the number of IO requests in the hard disk processing queue is controlled by the storage engine to be less than or equal to 10 (threshold first window threshold W1) in 1S (the number of IO requests in the hard disk processing queue is always kept less than 10 in 1S), and the time when the number of IO requests is processed is completed (the number of requests is uncertain and can be counted by the storage engine) is taken as the IO delay data set U of the current hard disk in the time period t1 l ) The first window threshold W1 in the time period t1 is larger than or equal to the target IO request quantity in the current hard disk processing queue;
In addition, the following is exemplified: assume that 1-7 IO requests (respectively, 1 IO request, 2 IO request, 3 IO request, 4 IO request, 5 IO request, 6 IO request and 7 IO request) are provided in the time period T1, and total 7 IO requests are provided; ( The number of the IO number which is 1-7 in the time period T1 is counted, and the number of the IO which reaches the storage engine in the last time period is counted, so that the number of the IO which should be processed in the last time period is used for predicting the number of the IO which should be processed at this time. The storage engine may be processing a new IO request at each time without stopping. So the IO number in the time period T1 is not counted as described above, and then the next time period is processed )
Illustrating: assuming that the storage engine receives 7 IO requests (the processing is performed after receiving the IOs) in the last time period T1, the target IOs recorded subsequently are not necessarily the 7 IOs, wherein 7 is only used for detecting the density degree of the IOs at the moment, and judging whether the subsequent IOs need to be recorded according to the density degree; the value of k is 4 (k=4) because 7>k, at this time, the value of W1 is 3, that is, w1=3; at this time, the IO number in the hard disk processing queue needs to be ensured to be less than or equal to 3; assuming that the number of IO in the hard disk processing queue is 4 (the storage engine can know the number), the storage engine does not issue IO requests to the hard disk any more; until 2 IO processes are completed in the hard disk (at the moment, the IO number in the hard disk processing queue is 2, and one IO can be processed again), the storage engine sends an IO processing request to the hard disk, and the IO number in the hard disk processing queue is ensured to be less than or equal to 3 all the time; and recording the time delay of finishing IO in the time t1 from the one IO.
The delay data set U l Delay data of each target IO request (delay data of each target IO request is time (time required or delay data, delay time) required by all target IO request processing to finish when the number of IO requests is less than or equal to a first window threshold W1);
it should be noted that, the above operation is the first case of the dynamic adjustment stage of the IO window, that is, the number of IO requests of the current hard disk is greater than k; judging by the window adjusting task distribution module: if the number of IO requests of the current hard disk is larger than k, which means that the IO delay data of the current stage cannot reflect the state of the real storage device, sending an instruction to a dynamic IO adjusting module of the storage engine, setting the IO window size to be a first window threshold W1, namely, waiting for the maximum number of outstanding IO requests of the current hard disk by a host end to be the first window threshold W1, and if the number of outstanding IO requests of the current hard disk is larger than the first window threshold W1, not sending a read-write request to the current hard disk again; if the original IO window size is the first window threshold W1, not updating; if the number of IO requests in the process of the current hard disk is smaller than or equal to a first window threshold W1, the dynamic IO regulating module sends an IO delay statistics starting instruction to the IO statistics module, and the IO statistics module counts an IO delay data set U after the period from the start of the period to t1 l (the delay data of each target IO request is the time required by the completion of all IO request processing when the number of IO requests is less than or equal to a first window threshold W1); during time period t1, the current hard disk processing queue is maintainedThe number of target IO requests in the column is smaller than or equal to a first window threshold W1; when the next delay monitor obtains target IO request data in the IO statistics module, the detection delay data in the time period t1 are shared;
step S30: if the window regulation task distribution module judges that the number of IO requests in the current time period T1 is smaller than k, (the number of IO requests is larger than 0 and smaller than k at the moment), the capacity of the IO window is set to be a second window threshold W2, and the IO statistics module is controlled to record an IO delay data set U in the time period T2 1 (the above-mentioned "control IO statistics module records the IO delay data set in the time period t2 may also be described as U 2 The present embodiment is preferably considered to be still noted as U1 because steps S20 and S30 are actually synchronized and juxtaposed, just to illustrate that two different possible end-result results are merely IO delay data sets, and that distinguishing U1 or U2 has little effect on the steps described below);
The time period t2 is a time period used for processing IO requests when the preset number of IO attributes of the hard disk processing queue is kept smaller than a first window threshold W2;
the second window threshold W2> the first window threshold W1; (if the number of IO requests is greater than k, if the number of IO requests is not determined to be greater than W2, then the first window threshold W1 is directly set to test the IO delay data set for reducing the calculated amount, and if the number of IO requests is less than k, then the number of IO requests is necessarily less than the second window threshold W2, then the second window threshold W2 is directly set to test the IO delay data set);
it should be noted that, the operation is the second case of the dynamic adjustment stage of the IO window, that is, the number of IO requests of the current hard disk is smaller than k; judging by the window adjusting task distribution module: if the number of IO requests of the current hard disk is larger than 0 and smaller than k, the IO delay data in the time period T1 is considered to reflect the state of the real storage device, an instruction is sent to a dynamic IO adjusting module of the storage engine, and the IO window size is set to be a second window threshold W2;
at the same time, a second window threshold W2>A first window threshold W1; if the original IO window (before adjustment) is largeIf the window is smaller than the second window threshold value W2, the window is not updated; the second window threshold W2 should be large enough to avoid affecting the data reading and writing of the hard disk; when the number of IO requests in the process of the current hard disk is smaller than the second window threshold W2, the dynamic IO adjusting module sends an IO delay statistics starting instruction to the IO statistics module, and the IO statistics module counts a time period t2 (time period t2 >IO delay data set U within time period t 1) 1 The method comprises the steps of carrying out a first treatment on the surface of the Waiting until the next time the delay monitor acquires IO request data in the IO statistics module, returning delay data in a time period t 2;
the purpose of setting the window to W1 is to reduce the waiting time of IO in the hard disk queue and the universal fast layer queue, and more accurate slow disk is measured; (assuming that the slow disk judgment standard is 3s. If there is an IO waiting time in the queue is 2.5s and the actual processing time of the hard disk is 1s, then the processing time of the IO obtained by the storage engine is 3.5s, the disk is judged to be a slow disk, most of the processing time in the 3.5s is used for waiting IO processing, the actual processing capacity of the hard disk is not reflected, the processing time is easy to misjudge, and when the number of the IOs in the waiting queue is excessive, the general waiting time of each IO in the queue (whether in a general fast layer or a hardware processing queue) is increased.
Another case, for example: in the time period T1, 1-3 specific cases (namely an IO request with the number 1, an IO request with the number 2 and an IO request with the number 3) are illustrated, and K=4 and W2=5, so that the IO requests are directly placed into a hard disk processing queue according to the sending time sequence, and the IO requests in the IO requests are determined to be target IO requests;
The IO statistics module is controlled to record an IO delay data set U for processing the target IO request (namely the IO request with the number 1, the IO request with the number 2 and the IO request with the number 3) in the time period t2 l (i.e., the IO request numbered 1, the IO request numbered 2, and the IO request numbered 3 are all directly targeted IO requests).
Step S40: IO delay data set U for presetting record target IO request l The number of times threshold n of the target IO request is recorded in real time l The number of times (the number of times of IO delay data set Ul isThe number of times n of the IO delay data set of the current hard disk is recorded in the preset delay monitor, the IO delay data set is +1 once recorded until the number of times reaches n), if the number of times of the IO delay data set Ul reaches n times (namely, the number of times n is the number of times threshold n, it is to be noted that the IO delay data set U is preset l Is to obtain n times of IO delay data set U l And then, carrying out slow disk detection on the current hard disk, and preventing the detection result from being influenced when the number of times is too small or too large, judging whether the current hard disk is slow disk or not through a slow disk decision algorithm based on the attribute information and the IO delay data set of the current hard disk, and if the current hard disk is slow disk, marking the current hard disk and waiting for isolation.
The attribute information of the current hard disk comprises the hard disk type, the hard disk model and the service type of the current hard disk; generally speaking, the types of hard disks are different, and the speeds of processing IO requests are different (for example, the current hard disk is used for processing games and drawing services, the type is a mechanical hard disk Exos2X14, the processing service speed is high, and when slow disk detection is carried out, detection is needed according to the normalized processing speed;
it should be noted that, when the delay data set recorded in the delay monitor and based on a certain hard disk exceeds n, the slow disk decision module is triggered to perform slow disk judgment; judging through a slow disk decision algorithm based on the hardware related information of the disk and the IO delay data set; if the result is calculated to be a slow disk, marking the result as a slow disk; otherwise, all the settings are cleared and re-recorded.
IO request processing can cause the difference of read-write modes of the hard disk, namely the difference of the number of IO requests in different time periods; the delay time of IO requests with different density degree has the difference in accuracy on the judgment of the hard disk performance; the denser IO requests typically have less latency to reflect hard disk performance; taking a Linux system as an example, the iostat command can obtain svctm (Theaverageservicetime) data of the storage device, which represents an average delay condition of the storage device in a certain time period; the delay refers to device delay, i.e. delay from arrival of an IO request command at the block device layer to completion of the IO request command; under the condition that the number of IO requests is small, the storage device can immediately process the IO requests in the device queue, and the delay can accurately reflect the IO processing state and the health condition of the storage device; the increase of the load of the IO request can cause overlong stay time of the IO request in a hard disk processing queue of the storage device or an IO queue of a general block layer of the host, and the capability and the state of the storage device for actually carrying out IO processing can not be reflected; therefore, the IO delay threshold value of the judging slow disk is regulated under normal conditions; however, the processing mode has no universality, relies on experience accumulation and needs a large amount of tests for verification;
The IO request patterns can be divided into two types according to the number of IO requests in the current time period T: a small number of IO requests, a large number of IO requests; under the condition of a small number of IO requests, the IO delay data set can reflect the real performance condition of the hard disk, and the data can be directly used; because the IO delay data set under a large number of IO requests cannot explain the delay condition of the hard disk, the capacity of the IO window is reduced for a period of time, and the IO delay data set under a short time window is obtained; the size of the window is dynamically controlled, the window is reduced only in a short time period, and the capacity of the IO window is large in most of the time period, so that IO request processing is not affected; the dynamic capacity adjustment mode of the IO window can utilize the existing IO delay data set, avoid slow-disk misjudgment caused by delay increase of IO requests in a waiting queue under high load, obtain more accurate hardware delay by limiting load of IO request processing, and meanwhile, avoid influence on IO request processing; the method limits the number of queuing IOs in a hardware queue by controlling the number of IO requests issued to hardware, namely the capacity of an IO window, so that IO request delay capable of reflecting the hardware state is obtained.
Specifically, referring to fig. 3, in step S40, an IO delay data set U of the record target IO request is preset l The number of times threshold n of the target IO request is recorded in real time l Times of (a)If the number of times of the IO delay data set Ul reaches n, judging whether the disk is a slow disk or not through a slow disk decision algorithm based on the attribute information of the current hard disk and the IO delay data set, and if the disk is judged to be the slow disk, marking the disk and waiting for isolation, wherein the method comprises the following operation steps of:
step S41: the hard disk data collection module acquires attribute information of all hard disks in the hard disk cluster and classifies the attribute information, and an IO request proportion threshold value P of a tail delay range corresponding to each classified hard disk is preset T
Delaying data set U according to IO l Calculating IO delay average value threshold L A IO delay median threshold L M IO request proportion threshold P of the tail delay range at this time T (in this case, the ratio of the IO requests in the tail delay range is the ratio of the IO requests near the tail end to the target IO requests in the IO delay data sequentially detected according to the sequence of IO request transmission when the number of the IO requests is greater than k, and the ratio of the IO requests near the tail end to the target IO requests in the IO delay data sequentially detected according to the sequence of IO request transmission when the number of the IO requests is less than k, are the following two cases;
And preset IO delay threshold L YW
The above operation is a threshold value construction phase: when the cluster is created, the hard disk data collection module classifies all the hard disks in the cluster according to the same function and the same medium (classifies all the hard disks in the cluster according to the same function and the same medium, namely, the hard disk A, B in the hard disk A, B, C is a picture video processing hard disk, and the IO requests of the hard disk data collection module are IO requests aiming at picture video information, and classifies the hard disk A, B into one type), and calculates an IO delay average value threshold L according to the IO delay information in the cluster A IO delay median threshold L M IO request proportion threshold P of the tail delay range at this time T
Step S42: judging whether the IO delay data set of the current hard disk exceeds the IO delay threshold L of the type to which the current hard disk belongs YW The method comprises the steps of carrying out a first treatment on the surface of the If so, it is marked as a slow disk and waits for isolation (e.g., IO delay threshold L for the current type of hard disk YW The threshold value is 10S to process an IO application, and in practiceIn the inter-detection, if a delay of one IO exists in the IO delay data set of the current hard disk, the IO delay data of the current hard disk exceeds an IO delay threshold L of the type of the current hard disk YW Marking the current hard disk as a slow disk and isolating the slow disk);
Step S43: if the IO delay data of the current hard disk is judged not to exceed the IO delay threshold L YW Calculating the average value of the IO delay data set of the current hard disk (the average value of the IO delay data of the current hard disk is the average value of the time required by processing all IO requests in the current hard disk time period t1 or t 2/the average value of the time required by processing one IO request in the current hard disk time period t1 or t 2) and the median value of the IO delay data of the current hard disk (the median value is the median value obtained by sequencing the IO delay data set from high to low and obtaining the most middle delay value), and further judging whether the average value of the IO delay data is greater than the IO delay average value threshold L A X% of the number of IO delay median threshold L M X of (2); if the above two cases are satisfied at the same time (the above two cases refer to the simultaneous satisfaction of "the average value of the IO delay data is greater than the IO delay average value threshold L A X% and the current median is greater than the IO delay median threshold L M Hard screening condition of x% ") of the data, further determining whether the tail delay of the IO delay data is greater than the IO request proportion threshold P of the tail delay range T Y of (2); if yes, marking the disk as a slow disk and waiting for isolation;
The x and y are generally constant;
the above operation is to judge the aged slow disc: if the IO delay data of the current hard disk is judged not to exceed the IO delay threshold L of the type to which the current hard disk belongs YW The IO delay data sets are delayed by a threshold L in the tail from small to large Tail Dividing the IO delay data into two batches; calculating an average number, a median in the first batch; if the median and average number in the first batch are equal to the IO delay average number L of the hard disk type obtained in step S41 A And median L M There is a difference of x%, then the disk is aged, the median is averagedA slow disk judgment flag1 of the number is set to 1; calculating the proportion of the number of IO requests in the second batch to the number of all IO requests, namely the proportion of IO requests in the tail delay range of the current data; if the IO request proportion of the tail delay range of the current data exceeds the set threshold value P T Y% of (2), then the tail delay slow disc judgment flag2 is set to 1; if both of the flag1 and the flag2 are 1, the disc is a slow disc;
step S44: judging whether a new disk is added in the current cluster or whether the current hard disk is not decided in a time period T4 (the time period T4 is the period time required by the total detection of a slow disk);
If a new disk is added, adding the disk data and repeating the above steps to reconstruct the threshold (the reconstruction threshold is the tail delay threshold L of each classified hard disk is reset Tail The method comprises the steps of carrying out a first treatment on the surface of the And delaying the data set U according to IO l Calculating IO delay average value threshold L A IO delay median threshold L M IO request proportion threshold P of the tail delay range at this time T The method comprises the steps of carrying out a first treatment on the surface of the And preset IO delay threshold L YW The presetting of the IO delay threshold value is preset according to an empirical value); if the current hard disk is not decided in the time T4, judging that the current hard disk is pulled out, further deleting the disk data, and repeating the steps to reconstruct the threshold value.
The above operation is a threshold adjustment phase; when a new disk is added or a disk is pulled out from the cluster, performing threshold adjustment by using the historical data in the step S43; when a new disk is added into the cluster and a slow disk decision is made, the slow disk decision module judges that the disk is a disk which never triggers the slow disk decision; the slow disc decision module sends the hard disc related decision information to the hard disc data collection module, so that the hard disc data collection module builds related threshold value data again according to the flow in the step S41; when the disk is unplugged, the slow disk decision module does not process the slow disk decision of a certain hard disk in the time of T4, the disk information is sent to the hard disk data collection module, the data collection module finds that the disk is unplugged and deletes the data of the disk, and the relevant threshold is built again according to the flow of the step S41.
As can be seen from an analysis of the technical scheme of the embodiment of the invention, firstly, the IO delay number is obtainedA data set; building IO delay average value threshold L through hard disk attribute information A IO delay median threshold L M IO request proportion threshold P of the tail delay range at this time T The method comprises the steps of carrying out a first treatment on the surface of the By IO delay data set and IO delay average threshold L A IO delay median threshold L M IO request proportion threshold P of the tail delay range at this time T Performing delay slow disk judgment and aging slow disk judgment to make slow disk decision, isolating hard disk corresponding to slow disk, and reconstructing IO delay average value threshold L when new disk is added or old disk is pulled out A IO delay median threshold L M IO request proportion threshold P of the tail delay range at this time T The method comprises the steps of carrying out a first treatment on the surface of the And making a slow disc decision.
Specifically, referring to FIG. 4, in step S41, and according to the IO delay data set U of the target IO request l Calculating an IO delay average value threshold L of a target IO request A IO delay median threshold L of target IO request M Average value threshold P of target IO request proportion of tail delay range at this time T The method specifically comprises the following operation steps:
step S411: continuously recording an IO delay data set of a target IO request of each hard disk in a time period T3 for n3 times; recording the IO delay average value of the target IO requests of each hard disk, the IO delay median value of the target IO requests and the target IO request proportion of the tail delay range at the moment;
Step S412: calculating the average value of the IO delay median value of the target IO requests of each type of hard disk;
step S413: screening out the hard disk with the IO delay median of the target IO requests of the hard disk in each type exceeding the average value m% of the IO delay median of the target IO requests of the corresponding type of hard disk, and calculating the average value of the IO delay of the target IO requests of the remaining hard disk in each type of hard disk, the IO delay median of the target IO requests and the average value of the target IO request proportion of the tail delay range at the moment (namely, the average value is the basic threshold of the type);
finally determining the IO delay average value of the target IO request as the IO delay average value threshold value L of the target IO request A Determining the IO delay median of the target IO request as the IO delay medianBit number threshold L M Determining that the average value of the target IO request proportion of the tail delay range at the moment is the average value threshold value P of the target IO request proportion of the tail delay range at the moment T
The m is typically constant; n3 is a constant;
the IO delay threshold L YW The method comprises the steps of judging a delay data set of the processing speed of an IO request of a current hard disk, and judging whether the current hard disk is a slow disk (if the processing speed of the IO request of the current hard disk is too slow, namely, the IO request of the current hard disk affects the read-write service of the hard disk, the current hard disk is determined to be the slow disk); the tail delay threshold L Tail With the tail IO delay median L M The method comprises the steps of judging delay data of processing speed of an IO request of an aging hard disk, and judging whether the current hard disk is an aging slow disk or not;
it should be noted that, in the existing slow disk judging technology, the judging standard is generally whether the number or the proportion of slow IO exceeds a set threshold value in a certain time; when the IO load is large, dynamically adjusting a corresponding threshold according to the load; the typical comparison index is IO average delay; however, the average delay of IO cannot indicate the concentration condition of IO, and is influenced by tail delay; in addition, the IO delay of the hard disk is assumed to be compliant with normal distribution, and whether the distribution of the IO delay data and the IO delay data is consistent is judged by sampling the IO delay data; however, IO delay does not necessarily obey normal distribution under different task loads; therefore, the proposed slow-disc decision algorithm considers a plurality of statistical indexes, and carries out slow-disc judgment through corresponding statistical index change of IO delay;
the IO delay of a slow disk becomes lower than that of a normal disk, and two main aspects are: on one hand, the IO tail delay is prolonged and the delay is overlarge, and on the other hand, the common IO delay is lower than a normal threshold value and further performance degradation effect is shown; the slow disk with obvious IO tail delay becoming longer is simple to judge, and the slow disk can be judged to be even a bad disk as long as the IO delay in the disk is higher than the IO request processing tolerance threshold; the other slow disc needs to be carefully screened, and relatively strict judgment conditions are set, so that the normal disc is prevented from being misjudged; therefore, the first judgment condition of the slow disk judgment algorithm is to set an IO delay threshold value affecting IO request processing, and if the IO of the hard disk exceeds the threshold value, the hard disk is judged to be slow disk; the other judging condition is that the average number, the median rising proportion or the excessive IO delay quantity increasing proportion of the centralized distribution of the hard disk are judged through comparison through related data information of the hard disk, wherein the related data information comprises a normal average value, a median range, an excessive IO delay range and a proportion threshold value.
In summary, the slow disk detection strategy system and method for manufacturing industry provided by the embodiment of the invention are based on the interaction between the detection system and the storage engine, perform slow disk detection by using the IO window dynamic adjustment strategy, and perform slow disk discrimination decision based on various statistics;
the interaction between the slow disk detection system and the storage engine can utilize the IO data statistics function and the IO control function of the storage engine, so that the high coupling between the slow disk detection function and the IO function is avoided; through interaction of the storage engine, IO statistical data is obtained directly and accurately, and accurate slow disk discrimination can be performed based on the IO data;
the IO window dynamic regulation strategy can detect more accurate hard disk processing capacity under different densities of IO request processing; when the number of IO is small, the IO window is basically uncontrolled, and the performance condition corresponding to the disk can be simply obtained by not influencing the IO delay data processed by the IO request; under the condition that IO is denser, accurate hard disk IO delay information can be detected in a smaller time window by adjusting an IO window, IO request processing is not affected, and accurate hard disk detection results can be obtained;
the slow-disc judging strategy considers various statistics related to IO delay, including average, median and tail delay proportion, and does not only consider the IO delay average value, so that deviation of the tail delay on the common level of IO delay can be avoided, and slow-disc misjudgment is reduced; in addition, the change of the tail delay proportion is also considered, the tail delay proportion judgment is carried out on the relatively slow IO by specially eliminating the relatively slow IO, and the tail delay proportion judgment is not just the average level of IO delay, so that the slow IO and the slow disc can be directly related in a targeted manner, and the visual judgment of the slow disc can be carried out; the three IO delay discrimination statistics are combined, so that a more accurate slow disk discrimination effect can be obtained.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; modifications of the technical solutions described in the foregoing embodiments, or equivalent substitutions of some or all of the technical features thereof, may be made by those of ordinary skill in the art; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The slow disk detection strategy system for manufacturing industry is characterized by comprising a host side and storage equipment;
the host end is electrically connected with the storage device;
the host side comprises a host universal block layer, a storage engine and a slow disk detection system;
the slow disk detection system is connected with the host universal block layer through a storage engine;
the host universal block layer is used for sending an IO request to the storage device;
the storage engine comprises a dynamic IO adjusting interface, an IO statistics module and a slow disk isolation module;
the storage device comprises a hard disk cluster consisting of a plurality of hard disks;
the slow disk detection system comprises a hard disk data collection module, a slow disk decision module, a delay monitor and a window regulation task distribution module;
The hard disk data collection module is used for obtaining attribute information of a plurality of hard disks in the storage device; obtaining an IO delay average value threshold L according to the IO delay data set of the hard disk A IO delay median threshold L M IO request proportion threshold P for tail delay range T
The IO statistics module is used for recording the quantity of all IO requests of the hard disk in the current system in the current time period T1; recording an IO delay data set in a current state when the dynamic IO adjustment interface is executed; the information is saved after the processing is completed, and the data is shared in the next time period T1;
the delay monitor is used for acquiring the number of IO requests in the IO statistics module and sending the number of IO requests to the window regulation task distribution module; acquiring an IO delay data set recorded by an IO statistics module, and transmitting the IO delay data set to a slow disk decision module;
the window regulation task distribution module is used for determining the capacity of the IO window according to the number of IO requests;
the dynamic IO adjusting interface is used for dynamically adjusting the capacity of the IO window according to the instruction of the window adjusting task distribution module of the slow disc detection system;
the slow disk decision module is used for acquiring the hard disk attribute information in the hard disk data collection module and according to the IO request data set and the IO delay average value threshold L in the IO statistics module A IO delay median threshold L M IO request proportion threshold P for tail delay range T Analyzing whether the hard disk is a slow disk, if so, marking the slow disk for subsequent isolation;
the slow disk isolation module is used for isolating the hard disk corresponding to the slow disk detected by the slow disk detection system;
the slow disk detection system is used for collecting the number of IO requests, controlling the capacity adjustment of an IO window of the storage engine, monitoring an IO delay data set and carrying out slow disk judgment;
the storage engine is used for sending the IO request from the host end to the storage device, recording the IO delay data set and executing dynamic IO window adjustment by controlling the capacity of the IO window.
2. The manufacturing-oriented slow disk detection strategy system according to claim 1, wherein the hard disk data collection module is configured to obtain attribute information of a plurality of hard disks in the storage device; the IO statistics module records the IO request quantity and the IO delay data set of the current hard disk; the IO request quantity and the IO delay data set of the current hard disk are sent to a delay monitor;
the delay monitor is used for sending the number of IO requests to the window regulation task distribution module, acquiring an IO delay data set recorded by the IO statistics module and sending the IO delay data set to the slow disk decision module;
The window distribution module is used for dynamically setting the capacity of the IO window according to the number of IO requests;
the IO statistics module is used for detecting an IO delay data set based on the capacity of the IO window and sending the IO delay data set to the delay monitor;
and the slow disk decision module is used for judging whether the hard disk is a slow disk according to the attribute information of the current hard disk and the IO delay data set, and if the hard disk is judged to be the slow disk, the corresponding hard disk information is sent to the slow disk isolation module, and the slow disk isolation module is used for isolating the hard disk.
3. A manufacturing-oriented slow disc detection strategy method processed by the manufacturing-oriented slow disc detection strategy system according to any one of claims 1-2, comprising the following steps:
acquiring attribute information of a plurality of hard disks in a storage device, and recording the IO request quantity of the current hard disk;
dynamically setting the capacity of an IO window according to the number of IO requests, and detecting an IO delay data set according to the capacity of the IO window;
and carrying out slow disk detection processing according to the attribute information of the current hard disk and the IO delay data set.
4. The manufacturing-oriented slow disk detection strategy method according to claim 3, wherein the obtaining attribute information of a plurality of hard disks in the storage device records the number of IO requests of the current hard disk; dynamically setting the capacity of an IO window according to the number of IO requests, and detecting an IO delay data set according to the capacity of the IO window; and carrying out slow disk detection processing according to the attribute information of the current hard disk and the IO delay data set, wherein the slow disk detection processing comprises the following steps of:
Acquiring attribute information of all hard disks in the hard disk cluster, and recording the number of IO requests completed in each hard disk time period T1 through an IO statistics module; the delay monitor acquires the number of IO requests in the IO statistics module; if the number of IO requests of the current hard disk is larger than 0, the number of IO requests is sent to a window adjusting task distribution module for judgment;
the window regulation task distribution module judges whether the number of IO requests in the current time period T1 is larger than k; if yes, setting the capacity of the IO window as a first window threshold W1, and controlling the issuing speed of the IO to enable the IO quantity in the hard disk processing queue to be smaller than or equal to W1; and controlling an IO statistics module to record an IO delay data set U for processing the target IO request in a time period t1 l
If the window regulation task distribution module judges that the number of IO requests in the current time period T1 is smaller than k, the capacity of the IO window is set to be a second window threshold W2, and the IO statistics module is controlled to record an IO delay data set U in the time period T2 1
IO delay data set U of preset recording target hard disk l The number of times threshold n of the target hard disk is recorded in real time l If the number of times of the IO delay data set Ul reaches n times, judging whether the hard disk is a slow disk or not through a slow disk decision algorithm based on the attribute information of the current hard disk and the IO delay data set, and if the hard disk is judged to be a slow disk, marking the hard disk and waiting for isolation.
5. The method for manufacturing-oriented slow disk detection strategy according to claim 4, wherein the determining whether the slow disk is a slow disk is performed by a slow disk decision algorithm, and if the slow disk is determined, the slow disk is marked and waiting for isolation, and the method specifically comprises the following steps:
the hard disk data collection module acquires attribute information of all hard disks in the hard disk cluster and classifies the attribute information, and a tail delay judgment threshold L corresponding to each classified hard disk is preset Tail
Delaying data set U according to IO l Calculating IO delay average value threshold L A IO delay median threshold L M At this time, the tail delay judgment threshold L is exceeded Tail Threshold P of IO request proportion of (2) T
And preset IO delay threshold L YW
Judging IO delay of current hard diskWhether the late dataset exceeds an IO delay threshold L of the type to which the current hard disk belongs YW The method comprises the steps of carrying out a first treatment on the surface of the If yes, marking the disk as a slow disk and waiting for isolation;
if the IO delay data of the current hard disk is judged not to exceed the IO delay threshold L YW Calculating the average value of the IO delay data set of the current hard disk and the median value of the IO delay data of the current hard disk, and further judging whether the average value of the IO delay data is larger than the threshold value L of the IO delay average value A X% of the number of IO delay median threshold L M X of (2); if the two conditions are satisfied at the same time, further judging whether the tail delay range ratio of the IO delay data is greater than the tail delay range ratio threshold P T Y of (2); if yes, marking the disk as a slow disk and waiting for isolation;
and reconstruct IO delay average value threshold L when new disk is added in current cluster or current hard disk is pulled out A IO delay median threshold L M IO request proportion threshold P of tail delay at this time T Performing slow disc detection;
the x is a constant; and y is a constant.
6. The manufacturing-oriented slow disk detection strategy method of claim 5, wherein the IO delay data set U is based on a target IO request l Calculating an IO delay average value threshold L of a target IO request A IO delay median threshold L of target IO request M Threshold P of target IO request proportion for the end delay range at this time T The method specifically comprises the following operation steps:
continuously recording an IO delay data set of a target IO request of each hard disk in n3 time periods T3; recording the IO delay average value of the target IO requests of each hard disk, the IO delay median value of the target IO requests and the target IO request proportion of the tail delay range at the moment;
calculating the average value of IO delay median of target IO requests of each type of hard disk;
screening out the hard disk with the IO delay median of the target IO requests of the hard disk in each type exceeding the average value m% of the IO delay median of the target IO requests of the corresponding type of hard disk, and calculating the average value of the IO delay average value of the target IO requests of the remaining hard disks of each type, the IO delay median of the target IO requests and the average value of the target IO request proportion of the tail delay range at the moment;
Finally determining the IO delay average value of the target IO request as the IO delay average value threshold value L of the target IO request A Determining the average value of the IO delay median of the target IO request as the IO delay median threshold L M Determining the average value of the target IO request proportion of the tail delay range at the moment as the threshold value P of the target IO request proportion of the tail delay range at the moment T
N3 is a constant; and m is a constant.
7. The manufacturing-oriented slow disk detection strategy method as claimed in claim 6, wherein the IO request proportion of the tail delay range at this time is an IO delay data set U l The IO delay time in the middle is larger than the tail delay judgment threshold L Tail Is a proportion of IO requests.
8. The manufacturing-oriented slow disk detection strategy method according to claim 7, wherein the first window threshold W1 is a preset threshold for the number of IO requests in a fixed hard disk processing queue, and the first window threshold W1 in the time period t1 is greater than or equal to the number of IO requests in the current hard disk processing queue; the second window threshold W2 is a threshold value for presetting the number of IO requests in a fixed hard disk processing queue; and the second window threshold W2 is greater than the first window threshold W1.
9. The manufacturing-oriented slow disk detection strategy method according to claim 8, wherein the time period t1 is a preset time period used for processing IO requests when the number of IO attributes of a hard disk processing queue is kept smaller than a first window threshold value W1; the time period t2 is a preset time period used for processing IO requests when the number of IO attributes of the hard disk processing queue is kept smaller than a first window threshold value W2.
10. The manufacturing-oriented slow disk detection strategy method according to claim 9, wherein the attribute information of the current hard disk includes a hard disk type of the current hard disk, a hard disk model number, and a service type handled by the current hard disk.
CN202310143737.8A 2023-02-21 2023-02-21 Manufacturing-oriented slow disk detection strategy system and method Active CN116149557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310143737.8A CN116149557B (en) 2023-02-21 2023-02-21 Manufacturing-oriented slow disk detection strategy system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310143737.8A CN116149557B (en) 2023-02-21 2023-02-21 Manufacturing-oriented slow disk detection strategy system and method

Publications (2)

Publication Number Publication Date
CN116149557A true CN116149557A (en) 2023-05-23
CN116149557B CN116149557B (en) 2023-07-18

Family

ID=86361568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310143737.8A Active CN116149557B (en) 2023-02-21 2023-02-21 Manufacturing-oriented slow disk detection strategy system and method

Country Status (1)

Country Link
CN (1) CN116149557B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806890A (en) * 2024-02-28 2024-04-02 四川省华存智谷科技有限责任公司 Slow disk detection processing method based on distributed storage

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106820A1 (en) * 2008-10-28 2010-04-29 Vmware, Inc. Quality of service management
CN103488544A (en) * 2013-09-26 2014-01-01 华为技术有限公司 Processing method and device for detecting slow disk
CN109684140A (en) * 2018-12-11 2019-04-26 广东浪潮大数据研究有限公司 A kind of slow disk detection method, device, equipment and computer readable storage medium
CN111045881A (en) * 2018-10-15 2020-04-21 深信服科技股份有限公司 Slow disk detection method and system
CN114327266A (en) * 2021-12-24 2022-04-12 深信服科技股份有限公司 Card slow identification method, device and medium of storage device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106820A1 (en) * 2008-10-28 2010-04-29 Vmware, Inc. Quality of service management
CN103488544A (en) * 2013-09-26 2014-01-01 华为技术有限公司 Processing method and device for detecting slow disk
CN111045881A (en) * 2018-10-15 2020-04-21 深信服科技股份有限公司 Slow disk detection method and system
CN109684140A (en) * 2018-12-11 2019-04-26 广东浪潮大数据研究有限公司 A kind of slow disk detection method, device, equipment and computer readable storage medium
CN114327266A (en) * 2021-12-24 2022-04-12 深信服科技股份有限公司 Card slow identification method, device and medium of storage device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806890A (en) * 2024-02-28 2024-04-02 四川省华存智谷科技有限责任公司 Slow disk detection processing method based on distributed storage
CN117806890B (en) * 2024-02-28 2024-05-03 四川省华存智谷科技有限责任公司 Slow disk detection processing method based on distributed storage

Also Published As

Publication number Publication date
CN116149557B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
EP0560343B1 (en) Input-output control system and input-output control method in the system
US8521986B2 (en) Allocating storage memory based on future file size or use estimates
WO2017012392A1 (en) Disk check method and apparatus
CN116149557B (en) Manufacturing-oriented slow disk detection strategy system and method
US8458409B2 (en) Access controller
US9396087B2 (en) Method and apparatus for collecting performance data, and system for managing performance data
CN109522176B (en) Monitoring method and device of storage system, electronic equipment and storage medium
EP1965385A1 (en) Storage device control apparatus, storage device, and data storage control method
US8281102B2 (en) Computer-readable recording medium storing management program, management apparatus, and management method
US10360127B1 (en) Techniques for identifying I/O workload patterns using key performance indicators
CN110688360A (en) Distributed file system storage management method, device, equipment and storage medium
US7890958B2 (en) Automatic adjustment of time a consumer waits to access data from queue during a waiting phase and transmission phase at the queue
JP3812405B2 (en) Disk array system
CN107360050B (en) Automatic testing method and device for performance of video cloud storage node
US11960724B2 (en) Device for detecting zone parallelity of a solid state drive and operating method thereof
CN114327266B (en) Method, device and medium for slowly identifying card of storage device
CN115470059A (en) Disk detection method, device, equipment and storage medium
US11334421B2 (en) Method and apparatus to identify a problem area in an information handling system based on latencies
CN109992217B (en) Service quality control method and device, electronic equipment and storage medium
CN110874192B (en) Storage management apparatus and storage management method
CN117648287B (en) On-chip data processing system, method, server and electronic equipment
CN117119201B (en) Compressed video transmission method, device, equipment and storage medium
JP6852697B2 (en) Master device, control method of master device, information processing program, and recording medium
US20080205220A1 (en) Recording apparatus and recording method
CN117032986A (en) Storage system flow control method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100086

Patentee after: Beijing Zhiling Haina Technology Co.,Ltd.

Country or region after: China

Address before: 8b, building 1, No. 48, Zhichun Road, Haidian District, Beijing 100086

Patentee before: Beijing zhilinghaina Technology Co.,Ltd.

Country or region before: China