CN111400122A - Hard disk health degree assessment method and device - Google Patents

Hard disk health degree assessment method and device Download PDF

Info

Publication number
CN111400122A
CN111400122A CN201910000902.8A CN201910000902A CN111400122A CN 111400122 A CN111400122 A CN 111400122A CN 201910000902 A CN201910000902 A CN 201910000902A CN 111400122 A CN111400122 A CN 111400122A
Authority
CN
China
Prior art keywords
hard disk
clustering
server cluster
smart information
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910000902.8A
Other languages
Chinese (zh)
Other versions
CN111400122B (en
Inventor
马建华
马奇凤
李青懋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910000902.8A priority Critical patent/CN111400122B/en
Publication of CN111400122A publication Critical patent/CN111400122A/en
Application granted granted Critical
Publication of CN111400122B publication Critical patent/CN111400122B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for evaluating health degree of a hard disk, wherein the method comprises the following steps: initializing a server cluster configured with a hard disk, and determining hard disk sample data of the server cluster; acquiring detection attribute SMART information of hard disk sample data in a sampling interval; clustering the SMART information, and determining corresponding clustering parameters; and evaluating the health degree of the hard disk of the server cluster according to the clustering parameters. The embodiment of the invention initializes the server cluster to determine the hard disk sample data which can be used as a sample, clusters the SMART information of the determined hard disk sample data to obtain a clustering parameter, evaluates the health degree of the hard disks in the server cluster according to the clustering parameter, can accurately evaluate the overall health degree of the hard disks in the server cluster, and predicts the hard disks which possibly fail.

Description

Hard disk health degree assessment method and device
Technical Field
The invention relates to the technical field of servers, in particular to a method and a device for evaluating health degree of a hard disk.
Background
With the rapid development of cloud computing and big data industries, the demand of industries such as Information Technology (IT), internet, finance, government and the like for hardware resources of servers is increasing. Due to the large-scale growth of data, mass storage systems become larger and more complex, and constructing a storage system with high reliability and high availability becomes a huge challenge for enterprise operation. The safety and integrity of the data are guaranteed, and the method is an important guarantee for the operation and the survival of enterprises. Therefore, the prediction of the health degree and the fault of the hard disk based on the data center server becomes a common concern for many efficient researchers, operation and maintenance personnel and the like. In the existing international universal standard, common monitorable indexes or detection attribute (SMART) information based on a hard disk becomes an important data basis for realizing prediction.
At present, methods for predicting health and failure of a hard disk include: the method comprises the steps of marking hard disk SMART log data as normal samples and fault hard disk samples, dividing the samples into a plurality of possible irrelevant subsets according to attribute values of the samples, and constructing a prediction model according to a machine learning algorithm. The accuracy and the false alarm rate of the hard disk fault are predicted by taking the current operation data of the hard disk as an input value. However, the data center server supplier types are not uniform, the hard disk models and brands are not uniform, the servers are different in on-shelf production batches, and the like, so that the hard disk health degree and the failure prediction difficulty are increased. The data center 2000 servers are used for calculation and supplied by 1-2 suppliers, and the servers are divided into two categories, namely a calculation analysis type and a storage type, and correspond to at least three types of hard disks: the SSD system disk, the conventional capacity high-speed data disk and the large capacity vulgar storage disk are arranged, the time of putting on shelf and putting into production of the server is divided into 2-3 stages, and each stage is 6 months apart. Assuming that about 2000-3000 hard disks and 10-20 hard disks with average annual faults are used as negative samples to construct a prediction model under the condition that the number of the hard disks in the same batch run at the same time is better, wherein the ratio of the positive samples to the negative samples is 100: more than 1, the phenomenon of insufficient quantity of training negative samples exists, and the difference of fault reasons of the negative samples further weakens the training accuracy.
Disclosure of Invention
The invention provides a method and a device for evaluating health degree of a hard disk, which solve the problem of poor prediction accuracy in a hard disk health and welfare and fault prediction method in the prior art.
The embodiment of the invention provides a hard disk health degree evaluation method, which comprises the following steps:
initializing a server cluster configured with a hard disk, and determining hard disk sample data of the server cluster;
acquiring detection attribute SMART information of hard disk sample data in a sampling interval;
clustering the SMART information, and determining corresponding clustering parameters;
and evaluating the health degree of the hard disk of the server cluster according to the clustering parameters.
An embodiment of the present invention further provides a hard disk health degree evaluation device, including:
the initialization module is used for initializing a server cluster configured with a hard disk and determining hard disk sample data of the server cluster;
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring detection attribute SMART information of hard disk sample data in a sampling interval;
the clustering module is used for clustering the SMART information and determining corresponding clustering parameters;
and the evaluation module is used for evaluating the health degree of the hard disk of the server cluster according to the clustering parameters.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above hard disk health degree assessment method.
The technical scheme of the invention has the beneficial effects that: initializing a server cluster to determine hard disk sample data which can be used as a sample, clustering SMART information of the determined hard disk sample data to obtain a clustering parameter, and evaluating the health degree of hard disks in the server cluster according to the clustering parameter, so that the normal hard disks are used as the sample, the number of samples is large, the overall health degree of the hard disks in the server cluster can be accurately evaluated, and hard disks which possibly have faults can be predicted. Furthermore, the embodiment of the invention can also utilize the determined hard disk sample data and the physical environment monitoring data to construct a PCA algorithm model, and can determine the reasons influencing the health degree of the hard disk based on the PCA algorithm model, so that the deployment strategy of the hard disk can be optimized based on the influence reasons.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for evaluating health of a hard disk according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a hard disk health assessment apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In addition, the terms "system" and "network" are often used interchangeably herein.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
As shown in fig. 1, an embodiment of the present invention provides a method for evaluating health of a hard disk, which specifically includes the following steps:
step 11: initializing a server cluster configured with a hard disk, and determining hard disk sample data of the server cluster.
The initialization in the embodiment of the present invention refers to an initialization process when a hard disk is put into use, and particularly refers to an initialization process when a new hard disk is put into use. In general, a new hard disk has been tested when it is shipped from a factory, and has good performance. The server cluster comprises at least one server, and at least one hard disk is configured on one server. The hard disk sample data refers to a hard disk sample meeting preset requirements, for example, a hard disk sample with good hard disk health.
Step 12: and acquiring detection attribute SMART information of the hard disk sample data in a sampling interval.
The SMART information can be called as a monitorable index, and is an important data basis for realizing prediction and evaluation. The sampling interval may be set periodically or continuously. SMART information of hard disk sample data is acquired, for example, every hour, day, week, or month. And for example, SMART information of hard disk sample data can be continuously acquired.
The establishment of the full attribute mapping relation of the SMART information can be realized by referring to the following manner that according to the server type, the SN, the timestamp, the raid of the hard disk, the hard disk installation slot, the hard disk interface protocol and the hard disk type, the text data of the hard disk operation can be automatically acquired from a monitoring level, such as ZTE _219119783989_1537171201_ c0_3_ SATA _ HDD, and the acquired text data is pushed to a log platform, the name of the text data can identify the hard disk to belong so as to realize quick positioning and establish an asset library, wherein the SMART attribute information is an automatic hard disk state detection and early warning system and specification, as an industry standard, S.M.A.A.R.254 should provide a plurality of unique hard disk state detection and early warning system and specification, as the industry standard, the name of the hard disk manufacturing standard T.254 should be provided by a user, the most of the hard disk manufacturing standard is provided by a common manufacturer ID of SMART information, and the corresponding hard disk manufacturing standard is not allowed to be provided by a standard, and the corresponding common attribute of the SMART information can be provided by a standard.
TABLE 1
Figure BDA0001933553280000041
Figure BDA0001933553280000051
Figure BDA0001933553280000061
Taking the SMART information ID code as 1 as an example, the complete description of the SMART attribute value can be seen in the following table 2:
TABLE 2
Figure BDA0001933553280000071
Wherein, the parameter items corresponding to different ID codes in the SMART information can be the same or different. The SMART attribute values of the parameter items corresponding to different ID codes can be the same or different.
It is worth pointing out that, in the embodiment of the present invention, each piece of SMART information of the hard disk sample data acquired in the sampling interval corresponds to a timestamp.
Step 13: and clustering the SMART information and determining corresponding clustering parameters.
And assuming that the sampling interval is [ t1, t2], clustering the SMART information acquired in [ t1, t2] according to different ID codes of different manufacturers, and determining corresponding clustering parameters. For example, the SMART information in table 1 above is clustered, and the parameter items corresponding to different ID codes in factory a are classified into different categories.
Further, the SMART information collected in [ T1, T2] is marked as the same abscissa time stamp T1, and the clustering parameters are iteratively calculated from the time series after T1.
Step 14: and evaluating the health degree of the hard disk of the server cluster according to the clustering parameters.
The clustering parameters are obtained by clustering SMART information of initialized hard disk sample data, the initialized hard disk sample data are many, and the overall health degree of the server cluster can be seen, so the health degree of the hard disks of the server cluster can be accurately evaluated according to the clustering parameters.
Further, step 11 comprises: acquiring initial SMART information of a hard disk in a server cluster; and performing normal distribution detection on the initial SMART information, and determining the hard disk sample data with a preset percentage confidence interval in a preset standard deviation.
The method for acquiring the initial SMART information of the hard disk in the server cluster is similar to the method for acquiring the SMART information of the hard disk sample data in the step 12, and after the SMART text information of the hard disk in the server cluster is acquired, the SMART text is analyzed to dump the key-value attribute into the relational database, so that retrieval query and data analysis are facilitated. For a certain batch or the same type of server hard disk initial data, performing normal distribution detection on each item of attribute data in the initial sampling interval [ t01, t02] respectively, determining an initial mean value, and determining the total amount of samples in a confidence interval of a preset percentage (such as 90%) within a given standard deviation sigma. Thus, the initial point of each attribute value can be calculated as the coordinate of the starting point of the cluster, and whether the hard disk characteristics of the hard disks with the same specification have overlarge difference or not can be checked.
Further, step 13 includes: clustering key attribute information in the SMART information to obtain clustering parameters of the key attribute information; wherein the key attribute information is at least one of performance attributes contained in the SMART information. The performance attribute refers to an attribute item which can represent the operation performance of the hard disk. Specifically, the implementation of step 13 includes, but is not limited to: and clustering at least one performance attribute in the SMART information to obtain a corresponding clustering parameter. And further evaluating the health degree of the hard disk according to at least one performance attribute. Further, the SMART information further includes at least one of the following attribute information: the method comprises the steps of server type, service node SN, timestamp, raid to which the hard disk belongs, hard disk installation slot positions, hard disk interface protocols and hard disk types.
Further, the clustering parameters determined by clustering analysis of SMART information include, but are not limited to: and at least one of the number of clustering categories, the category center point and the singular sample points which meet the preset requirement from the category center point.
Wherein the number of the cluster categories can describe the grouping health degree differentiation trend of the SMART attributes at present. For example, the original 1 classification group { a } is transformed into 2 classification groups { a1, a2}, which reflect the same kind of variation characteristics of the physical characteristics of a part of hard disks. Wherein, the more the number of classifications is, the more significant the degree of differentiation of the sample hard disk from the health of the total amount is. Accordingly, step 14 comprises: and evaluating the health degree differentiation condition of the hard disks in the server cluster according to the quantity change condition of the clustering lists in the clustering parameters.
Class center point of clustered classes
Figure BDA0001933553280000081
The change trend of the average value of the SMART attributes at present, namely the change rate of the overall attribute of the cluster group can be described. When the number of classes increases, { a } corresponds
Figure BDA0001933553280000082
The conference becomes
Figure BDA0001933553280000083
Wherein,
Figure BDA0001933553280000084
and
Figure BDA0001933553280000085
the size relationship between the health degree and the operation characteristic of the hard disk can indicate whether the health degree of the hard disk is reduced or not and the health degree is reduced and changed. Since the larger the attribute value in the SMART information is, the better the hard disk operation performance is, then when the attribute value is larger, the hard disk operation performance is better
Figure BDA0001933553280000086
When the operation characteristic of the hard disk is in a health degree descending trend, and at least one cluster population is accelerated to descend. When in use
Figure BDA0001933553280000087
In time, it means that the hard disks have a cluster of hard disk operating characteristics which show a trend of decreasing health degree, and the cluster population is decreased in an accelerated manner. If the number of cluster categories is not changed, then
Figure BDA0001933553280000088
Can be used to describe the rate of change of health.
And a singular sample point Xmax which is away from the category center point and meets a preset requirement, such as a sample point which is farthest from the category center point or a sample point which is away from the category center point and exceeds a preset distance. If it is
Figure BDA0001933553280000089
Then it represents integerThe trend of health is downward if
Figure BDA00019335532800000810
The attribute values of the singular sample points may be seriously deviated and may have a fault, and in order to further predict whether the singular sample points are faulty, whether the singular sample points can be defined as suspected potential fault values may be further evaluated according to a factory reference threshold VA L UE.
In addition, in the embodiment of the present invention, step 14 further includes: and acquiring physical environment monitoring data corresponding to the hard disk sample data in a sampling interval. The physical environment monitoring data is used for representing the operating environment of the hard disk, and comprises: at least one of an Input/Output (IO) throughput of the hard disk is read by an Operating (OS) layer of the server by using an inlet air temperature, humidity, noise, on-board voltage.
The method comprises the following steps of obtaining detection attribute SMART information of the hard disk sample data in a sampling interval, and obtaining physical environment monitoring data corresponding to the hard disk sample data in the sampling interval, wherein the steps further comprise: according to SMART information of hard disk sample data and physical environment monitoring data, a PCA algorithm model is constructed; and calculating the principal component score of the SMART information by utilizing a PCA algorithm model, and determining the reason influencing the health degree of the hard disk.
Specifically, the physical environment monitoring data that is closest to the timestamp to which the sampling interval belongs includes, but is not limited to: the server comprises a server IPMI, a server module, a server. The step of constructing the PCA algorithm model based on SMART information of the hard disk sample data and corresponding physical environment monitoring data may include:
1. and performing data standardization (or called centralized calculation) on SMART information of the hard disk sample data.
2. And solving a correlation coefficient matrix.
3. The computed eigenvalues and eigenvectors of the SMART information.
5. The prime component load of SMART information is analyzed.
6. And calculating the principal component load to obtain a principal component score.
And finally, calculating scores through principal component analysis, removing attributes with low scores to realize dimension reduction, and then judging main factors influencing the attribute values of the hard disk. Therefore, the read-write strategy adjustment of the optimized physical environment or the software level can be carried out according to the influence factors, the health degree descending trend of partial hard disks is relieved, the life cycle of the hard disks is balanced, and the hard disks are kept in a high availability state.
In the hard disk health degree evaluation method provided by the embodiment of the invention, the server cluster is initialized to determine the hard disk sample data which can be used as a sample, then the SMART information of the determined hard disk sample data is clustered to obtain the clustering parameter, and the hard disk health degree of the hard disks in the server cluster is evaluated according to the clustering parameter, so that the normal hard disks are used as the sample, the number of the samples is large, the overall health degree of the hard disks in the server cluster can be accurately evaluated, and the hard disks which possibly have faults can be predicted. Furthermore, the embodiment of the invention can also utilize the determined hard disk sample data and the physical environment monitoring data to construct a PCA algorithm model, and can determine the reasons influencing the health degree of the hard disk based on the PCA algorithm model, so that the deployment strategy of the hard disk can be optimized based on the influence reasons.
The above embodiments are described with respect to the hard disk health degree evaluation method of the present invention, and the corresponding apparatuses will be further described with reference to the accompanying drawings.
Specifically, as shown in fig. 2, the hard disk health assessment apparatus according to the embodiment of the present invention includes:
an initialization module 210, configured to initialize a server cluster configured with a hard disk, and determine hard disk sample data of the server cluster;
a first obtaining module 220, configured to obtain detection attribute SMART information of hard disk sample data in a sampling interval;
a clustering module 230, configured to cluster SMART information and determine corresponding clustering parameters;
and the evaluation module 240 is configured to evaluate the health degree of the hard disk of the server cluster according to the clustering parameter.
The initialization module 210 includes:
the first acquisition submodule is used for acquiring initial SMART information of a hard disk in the server cluster;
and the determining submodule is used for performing normal distribution detection on the initial SMART information and determining the hard disk sample data with a preset percentage confidence interval in a preset standard deviation.
Wherein, the clustering module 230 comprises:
the clustering submodule is used for clustering key attribute information in the SMART information to obtain clustering parameters of the key attribute information; wherein the key attribute information is at least one of performance attributes contained in the SMART information.
Wherein the SMART information further comprises at least one of the following attribute information: the method comprises the steps of server type, service node SN, timestamp, raid to which the hard disk belongs, hard disk installation slot positions, hard disk interface protocols and hard disk types.
Wherein the clustering parameters include: and at least one of the number of clustering categories, the category center point and the singular sample points which meet the preset requirement from the category center point.
Wherein the evaluation module 240 comprises at least one of:
the first evaluation submodule is used for evaluating the health degree differentiation condition of the hard disks in the server cluster according to the quantity change condition of the clustering lists in the clustering parameters;
the second evaluation submodule is used for evaluating the health degree reduction rate of the hard disks in the server cluster according to the change condition of the category center points of different clustering categories;
and the prediction submodule is used for predicting the fault hard disk in the server cluster according to the singular sample points.
Wherein, the device still includes:
the second acquisition module is used for acquiring physical environment monitoring data corresponding to the hard disk sample data in a sampling interval;
the building module is used for building a PCA algorithm model according to SMART information of hard disk sample data and physical environment monitoring data;
and the determining module is used for calculating the principal component score of the SMART information by utilizing a PCA algorithm model and determining the reason influencing the health degree of the hard disk.
Wherein the physical environment monitoring data comprises: and the inlet air temperature, the humidity, the noise, the onboard voltage and the operation OS layer of the server read at least one of the input/output IO throughput of the hard disk.
The embodiment of the device of the invention is corresponding to the embodiment of the method, all the implementation means in the embodiment of the method are suitable for the embodiment of the device, and the same technical effect can be achieved. The device initializes the server cluster to determine the hard disk sample data which can be used as a sample, clusters the SMART information of the determined hard disk sample data to obtain a clustering parameter, and evaluates the health degree of the hard disks in the server cluster according to the clustering parameter, so that the normal hard disks are used as the sample, the number of samples is large, the overall health degree of the hard disks in the server cluster can be accurately evaluated, and the hard disks which possibly have faults can be predicted. Furthermore, the embodiment of the invention can also utilize the determined hard disk sample data and the physical environment monitoring data to construct a PCA algorithm model, and can determine the reasons influencing the health degree of the hard disk based on the PCA algorithm model, so that the deployment strategy of the hard disk can be optimized based on the influence reasons.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above hard disk health degree assessment method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.
Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (17)

1. A method for evaluating health of a hard disk is characterized by comprising the following steps:
initializing a server cluster configured with a hard disk, and determining hard disk sample data of the server cluster;
acquiring detection attribute SMART information of the hard disk sample data within a sampling interval;
clustering the SMART information, and determining corresponding clustering parameters;
and evaluating the health degree of the hard disk of the server cluster according to the clustering parameters.
2. The method according to claim 1, wherein initializing a server cluster configured with a hard disk, and determining hard disk sample data of the server cluster comprises:
acquiring initial SMART information of a hard disk in the server cluster;
and performing normal distribution detection on the initial SMART information, and determining the hard disk sample data with a preset percentage confidence interval in a preset standard deviation.
3. The hard disk health assessment method according to claim 1, wherein the step of clustering the SMART information and determining the corresponding clustering parameters comprises:
clustering key attribute information in the SMART information to obtain clustering parameters of the key attribute information; wherein the key attribute information is at least one of performance attributes contained in the SMART information.
4. The hard disk health assessment method according to claim 3, wherein the SMART information further comprises at least one of the following attribute information: the method comprises the steps of server type, service node SN, timestamp, raid to which the hard disk belongs, hard disk installation slot positions, hard disk interface protocols and hard disk types.
5. The hard disk health assessment method according to claim 1, wherein the clustering parameters comprise: and at least one of the number of clustering categories, the category center point and the singular sample points which meet the preset requirement from the category center point.
6. The method according to claim 5, wherein the step of evaluating the health of the hard disk of the server cluster according to the clustering parameter comprises at least one of:
evaluating the health degree differentiation condition of the hard disks in the server cluster according to the quantity change condition of the clustering lists in the clustering parameters;
evaluating the health degree reduction rate of the hard disks in the server cluster according to the change condition of the category center points of different clustering categories;
and predicting the fault hard disk in the server cluster according to the singular sample points.
7. The method according to claim 1, wherein before the step of evaluating the health of the hard disks of the server cluster according to the clustering parameters, the method further comprises:
acquiring physical environment monitoring data corresponding to the hard disk sample data in the sampling interval;
after the step of obtaining the detection attribute SMART information of the hard disk sample data in the sampling interval and the step of obtaining the physical environment monitoring data corresponding to the hard disk sample data in the sampling interval, the method further includes:
according to the SMART information of the hard disk sample data and the physical environment monitoring data, a PCA algorithm model is constructed;
and calculating the principal component score of the SMART information by utilizing the PCA algorithm model, and determining the reason influencing the health degree of the hard disk.
8. The hard disk health assessment method of claim 7, wherein the physical environment monitoring data comprises: and the inlet air temperature, the humidity, the noise, the onboard voltage and the operation OS layer of the server read at least one of the input/output IO throughput of the hard disk.
9. A hard disk health assessment apparatus, comprising:
the initialization module is used for initializing a server cluster configured with a hard disk and determining hard disk sample data of the server cluster;
the first acquisition module is used for acquiring detection attribute SMART information of the hard disk sample data within a sampling interval;
the clustering module is used for clustering the SMART information and determining corresponding clustering parameters;
and the evaluation module is used for evaluating the health degree of the hard disk of the server cluster according to the clustering parameters.
10. The hard disk health assessment device of claim 9, wherein said initialization module comprises:
the first obtaining submodule is used for obtaining initial SMART information of a hard disk in the server cluster;
and the determining submodule is used for performing normal distribution detection on the initial SMART information and determining the hard disk sample data with a preset percentage confidence interval in a preset standard deviation.
11. The hard disk health assessment apparatus according to claim 9, wherein the clustering module comprises:
the clustering submodule is used for clustering key attribute information in the SMART information to obtain clustering parameters of the key attribute information; wherein the key attribute information is at least one of performance attributes contained in the SMART information.
12. The hard disk health assessment apparatus according to claim 11, wherein the SMART information further comprises at least one of the following attribute information: the method comprises the steps of server type, service node SN, timestamp, raid to which the hard disk belongs, hard disk installation slot positions, hard disk interface protocols and hard disk types.
13. The hard disk health assessment apparatus according to claim 9, wherein the clustering parameters comprise: and at least one of the number of clustering categories, the category center point and the singular sample points which meet the preset requirement from the category center point.
14. The hard disk health assessment apparatus of claim 13, wherein said assessment module comprises at least one of:
the first evaluation submodule is used for evaluating the health degree differentiation condition of the hard disks in the server cluster according to the quantity change condition of the clustering lists in the clustering parameters;
the second evaluation submodule is used for evaluating the health degree reduction rate of the hard disks in the server cluster according to the change condition of the category center points of different clustering categories;
and the prediction submodule is used for predicting the fault hard disk in the server cluster according to the singular sample points.
15. The hard disk health assessment apparatus according to claim 9, further comprising:
the second acquisition module is used for acquiring physical environment monitoring data corresponding to the hard disk sample data in the sampling interval;
the building module is used for building a PCA algorithm model according to the SMART information of the hard disk sample data and the physical environment monitoring data;
and the determining module is used for calculating the principal component score of the SMART information by utilizing the PCA algorithm model and determining reasons influencing the health degree of the hard disk.
16. The hard disk health assessment device of claim 15, wherein said physical environment monitoring data comprises: and the inlet air temperature, the humidity, the noise, the onboard voltage and the operation OS layer of the server read at least one of the input/output IO throughput of the hard disk.
17. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the hard disk health assessment method according to any one of claims 1 to 8.
CN201910000902.8A 2019-01-02 2019-01-02 Hard disk health degree assessment method and device Active CN111400122B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910000902.8A CN111400122B (en) 2019-01-02 2019-01-02 Hard disk health degree assessment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910000902.8A CN111400122B (en) 2019-01-02 2019-01-02 Hard disk health degree assessment method and device

Publications (2)

Publication Number Publication Date
CN111400122A true CN111400122A (en) 2020-07-10
CN111400122B CN111400122B (en) 2023-11-10

Family

ID=71435886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910000902.8A Active CN111400122B (en) 2019-01-02 2019-01-02 Hard disk health degree assessment method and device

Country Status (1)

Country Link
CN (1) CN111400122B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306747A (en) * 2020-09-29 2021-02-02 新华三技术有限公司合肥分公司 RAID card fault processing method and device
CN113311923A (en) * 2021-04-21 2021-08-27 北京字节跳动网络技术有限公司 Temperature control method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277797A1 (en) * 2014-03-31 2015-10-01 Emc Corporation Monitoring health condition of a hard disk
US20150293992A1 (en) * 2011-01-03 2015-10-15 Stephen W. Meehan Cluster processing and ranking methods including methods applicable to cluster developed through density based merging
CN108874640A (en) * 2018-05-07 2018-11-23 北京京东尚科信息技术有限公司 A kind of appraisal procedure and device of clustering performance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150293992A1 (en) * 2011-01-03 2015-10-15 Stephen W. Meehan Cluster processing and ranking methods including methods applicable to cluster developed through density based merging
US20150277797A1 (en) * 2014-03-31 2015-10-01 Emc Corporation Monitoring health condition of a hard disk
CN108874640A (en) * 2018-05-07 2018-11-23 北京京东尚科信息技术有限公司 A kind of appraisal procedure and device of clustering performance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王勇;王李福;饶勤菲;邹辉;: "半径自适应的初始中心点选择K-medoids聚类算法" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306747A (en) * 2020-09-29 2021-02-02 新华三技术有限公司合肥分公司 RAID card fault processing method and device
CN112306747B (en) * 2020-09-29 2023-04-11 新华三技术有限公司合肥分公司 RAID card fault processing method and device
CN113311923A (en) * 2021-04-21 2021-08-27 北京字节跳动网络技术有限公司 Temperature control method, device and equipment
CN113311923B (en) * 2021-04-21 2023-12-01 北京字节跳动网络技术有限公司 Temperature control method, device and equipment

Also Published As

Publication number Publication date
CN111400122B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
US20220147405A1 (en) Automatically scalable system for serverless hyperparameter tuning
CN108052528B (en) A kind of storage equipment timing classification method for early warning
Ma et al. Diagnosing root causes of intermittent slow queries in cloud databases
US11093519B2 (en) Artificial intelligence (AI) based automatic data remediation
US20210097343A1 (en) Method and apparatus for managing artificial intelligence systems
EP3591586A1 (en) Data model generation using generative adversarial networks and fully automated machine learning system which generates and optimizes solutions given a dataset and a desired outcome
CN111027615B (en) Middleware fault early warning method and system based on machine learning
US20110078106A1 (en) Method and system for it resources performance analysis
EP3323046A1 (en) Apparatus and method of leveraging machine learning principals for root cause analysis and remediation in computer environments
US20160253229A1 (en) Event log analysis
CN107168995B (en) Data processing method and server
US9489379B1 (en) Predicting data unavailability and data loss events in large database systems
CN109522193A (en) A kind of processing method of operation/maintenance data, system and device
WO2023179042A1 (en) Data updating method, fault diagnosis method, electronic device, and storage medium
CN111400122B (en) Hard disk health degree assessment method and device
CN112951311A (en) Hard disk fault prediction method and system based on variable weight random forest
CN114327964A (en) Method, device, equipment and storage medium for processing fault reasons of service system
CN117170915A (en) Data center equipment fault prediction method and device and computer equipment
CN113778766B (en) Hard disk fault prediction model establishment method based on multidimensional characteristics and application thereof
Agrawal et al. Adaptive anomaly detection in cloud using robust and scalable principal component analysis
CN115185768A (en) Fault recognition method and system of system, electronic equipment and storage medium
CN109978038B (en) Cluster abnormity judgment method and device
CN111724048A (en) Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering
CN111985651A (en) Operation and maintenance method and device for business system
CN112884167B (en) Multi-index anomaly detection method based on machine learning and application system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant