CN113176962B - Computer room IT equipment fault accurate detection method and system for data center - Google Patents

Computer room IT equipment fault accurate detection method and system for data center Download PDF

Info

Publication number
CN113176962B
CN113176962B CN202110400918.5A CN202110400918A CN113176962B CN 113176962 B CN113176962 B CN 113176962B CN 202110400918 A CN202110400918 A CN 202110400918A CN 113176962 B CN113176962 B CN 113176962B
Authority
CN
China
Prior art keywords
target
performance data
equipment
preset
fault type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110400918.5A
Other languages
Chinese (zh)
Other versions
CN113176962A (en
Inventor
赵希峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongda Kehui Technology Development Co ltd
Original Assignee
Beijing Zhongda Kehui Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongda Kehui Technology Development Co ltd filed Critical Beijing Zhongda Kehui Technology Development Co ltd
Priority to CN202110400918.5A priority Critical patent/CN113176962B/en
Publication of CN113176962A publication Critical patent/CN113176962A/en
Application granted granted Critical
Publication of CN113176962B publication Critical patent/CN113176962B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a system for accurately detecting faults of computer room IT equipment for a data center, wherein the method comprises the following steps: acquiring performance index information of target IT equipment according to a target period, obtaining a periodic performance data sequence, carrying out normalization processing on the periodic performance data sequence, clustering the processed periodic performance data according to preset fault types, obtaining a clustering result, calculating a target abnormal value score of a periodic performance data subsequence corresponding to each preset fault type in the clustering result, and judging whether the target IT equipment has faults and specific fault information according to the target abnormal value score corresponding to each preset fault type. Whether the target IT equipment fails or not and the specific failure information when the target IT equipment fails can be intelligently determined according to the calculated scores, manual fault detection is not needed one by one, the specific failure of the target IT equipment is rapidly and accurately determined, the labor cost is saved, and maintenance personnel can rapidly conduct subsequent maintenance.

Description

Computer room IT equipment fault accurate detection method and system for data center
Technical Field
The invention relates to the technical field of equipment management, in particular to a method and a system for accurately detecting faults of computer room IT equipment for a data center.
Background
With the rapid development of the information society, modern information technology and automation equipment show explosive growth trend, the scale of a data center is continuously increased, and the requirements on the safety and stability of the data center are higher and higher. The machine room serves as an important data center and becomes critical for fault detection and elimination. The hidden trouble of equipment in a machine room is mainly caused by long-time power-on work, equipment aging, manual misoperation and the like of electric equipment such as network equipment, storage equipment and servers in the machine room. If the fault of the asset equipment caused by the conditions is not timely alarmed and the abnormal condition in the operation of the machine room is pointed out, the information data has potential safety hazard.
The existing fault detection method for the room IT equipment is to collect the working parameters of the room IT equipment, manually conduct fault detection one by one according to the collected working parameters, seriously waste the labor cost, greatly influence the subsequent maintenance efficiency, and further cause that the periodic maintenance work of the room equipment is not in place, so that the service life of the equipment is reduced.
Disclosure of Invention
Aiming at the problems displayed above, the invention provides a method and a system for accurately detecting faults of computer room IT equipment for a data center, which are used for solving the problems that in the background art, faults are manually detected one by one according to collected working parameters, the subsequent maintenance efficiency is greatly influenced while the labor cost is seriously wasted, and the periodic maintenance work of the computer room equipment is not in place, so that the service life of the equipment is reduced.
A machine room IT equipment fault accurate detection method for a data center comprises the following steps:
Acquiring performance index information of target IT equipment according to a target period to obtain a periodic performance data sequence;
normalizing the periodic performance data sequence;
clustering the processed periodic performance data according to a preset fault type to obtain a clustering result;
calculating a target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result;
And judging whether the target IT equipment fails or not and specific failure information when the failure occurs according to the target abnormal value score corresponding to each preset failure type.
Preferably, the collecting performance index information of the target IT device according to the target period to obtain a periodic performance data sequence includes:
Determining a used time length of the target IT device;
determining a performance detection period of target IT equipment according to the used time length, and determining the performance detection period as the target period;
Acquiring working parameters of the target IT equipment according to the target period;
combining target working parameters corresponding to each performance index with each other;
After the combination is finished, sub-performance index values of each performance index under different dimensions are collected;
generating a performance data subsequence of each performance index according to the sub-performance index values of each performance index in different dimensions;
and generating a periodic performance data sequence of the target IT equipment according to the performance data subsequences of all the performance indexes.
Preferably, the normalizing the periodic performance data sequence includes: and carrying out maximum and minimum normalization processing on the periodic performance data sequence.
Preferably, the clustering the processed periodic performance data according to the preset fault type to obtain a clustering result includes:
Determining a target performance index associated with each preset fault type;
Obtaining target performance data corresponding to the target performance index from the periodic performance data;
and clustering the processed periodic performance data according to the target performance data corresponding to each preset fault type to obtain a clustering result.
Preferably, the calculating the target outlier score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result includes:
Determining a functional level value of the target IT device;
Confirming whether the working function level value is greater than or equal to a preset threshold value, if so, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 100, otherwise, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 90;
Dividing and calculating according to the first subsequence value of the periodic performance data subsequence corresponding to each preset fault type and the second subsequence value of the performance index of the target IT equipment in the normal working state to obtain the first abnormal value fraction of the periodic performance data subsequence corresponding to each preset fault type;
when the accuracy is confirmed to be 100, confirming the first abnormal value score of the periodic performance data subsequence corresponding to each preset fault type as a target abnormal value score of the periodic performance data subsequence in the clustering result;
And when the accuracy is confirmed to be 90, multiplying the first outlier score of the periodic performance data subsequence corresponding to each preset fault type by a preset proportion to obtain a second outlier score of the periodic performance data subsequence corresponding to each preset fault type, and confirming the second outlier score of the periodic performance data subsequence corresponding to each preset fault type as a target outlier score of the periodic performance data subsequence corresponding to the preset fault type.
Preferably, the determining whether the target IT device fails or not and the specific failure information when the failure occurs according to the target abnormal value score corresponding to each preset failure type includes:
confirming whether the target abnormal value score corresponding to each preset fault type is larger than or equal to the preset abnormal value score corresponding to the preset fault type, if so, confirming that the target IT equipment has no fault, otherwise, confirming that the target IT equipment has fault;
And determining a target fault type with the target abnormal value score smaller than the preset abnormal value score, and determining specific fault information of the target IT equipment according to the target fault type and the performance index information of the target IT equipment.
Preferably, the method further comprises:
Generating a fault code of the target IT equipment according to the specific fault information of the target IT equipment;
Searching a fault point corresponding to the fault code, positioning the fault point, and obtaining an electronic positioning result;
inquiring a fault solution corresponding to the fault point;
And displaying the fault solution and the electronic positioning result.
Preferably, the clustering of the processed periodic performance data according to the preset fault type further includes:
Acquiring a target business operation related to the clustering result, acquiring a business input instruction related to the target business operation, and calling first data A 1 and second data A 2 related to the business input instruction from an equipment database and a management database;
Determining a first capacity of the first data A 1, determining a second capacity of the second data A 2, judging whether the first capacity and the second capacity are empty, and if yes, performing invalid feedback to the sub-component of the target IT device;
otherwise, determining a first code χ 1 of the first data a 1 and performing a first classification process;
Meanwhile, determining a second code χ 2 of the second data A 2, and performing second classification processing;
Wherein i=1, 2; A cumulative multiplication of the index values χ ij representing the different characteristic indices of j=1, 2..n in the i-th data; r ij+1 represents the feature code value of the j+1th feature index in the ith data; r ij represents the feature code value of the j-th feature index in the i-th data; k1 represents the number of feature coding sequences of the jth feature index in the ith data; /(I) A sequence value representing a kth feature code sequence in a jth feature index in the ith data; alpha k represents the sequence weight of the kth feature coding sequence in the jth feature index in the ith data, and n represents the number of feature indexes;
And according to the classification areas corresponding to the first classification result S 1 and the second classification result S 2, effective feedback is carried out on the corresponding sub-components in the target IT equipment, and corresponding feedback information is sent to be displayed.
A machine room IT equipment fault accurate detection system for a data center, the system comprising:
The acquisition module is used for acquiring the performance index information of the target IT equipment according to the target period to obtain a periodic performance data sequence;
the processing module is used for carrying out normalization processing on the periodic performance data sequence;
The clustering module is used for clustering the processed periodic performance data according to a preset fault type to obtain a clustering result;
the calculation module is used for calculating the target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result;
And the judging module is used for judging whether the target IT equipment fails or not and specific failure information when the failure occurs according to the target abnormal value score corresponding to each preset failure type.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a workflow diagram of a method for accurately detecting faults of computer room IT equipment in a data center;
FIG. 2 is another workflow diagram of a method for accurately detecting faults of machine room IT equipment in a data center according to the present invention;
FIG. 3 is a further workflow diagram of a method for accurately detecting a failure of a machine room IT device in a data center according to the present invention;
Fig. 4 is a schematic structural diagram of a system for accurately detecting faults of computer room IT equipment for a data center according to the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
With the rapid development of the information society, modern information technology and automation equipment show explosive growth trend, the scale of a data center is continuously increased, and the requirements on the safety and stability of the data center are higher and higher. The machine room serves as an important data center and becomes critical for fault detection and elimination. The hidden trouble of equipment in a machine room is mainly caused by long-time power-on work, equipment aging, manual misoperation and the like of electric equipment such as network equipment, storage equipment and servers in the machine room. If the fault of the asset equipment caused by the conditions is not timely alarmed and the abnormal condition in the operation of the machine room is pointed out, the information data has potential safety hazard.
The existing fault detection method for the room IT equipment is to collect the working parameters of the room IT equipment, manually conduct fault detection one by one according to the collected working parameters, seriously waste the labor cost, greatly influence the subsequent maintenance efficiency, and further cause that the periodic maintenance work of the room equipment is not in place, so that the service life of the equipment is reduced. In order to solve the problems, the embodiment discloses a method for accurately detecting faults of computer room IT equipment for a data center.
A machine room IT equipment fault accurate detection method for a data center, as shown in FIG. 1, comprises the following steps:
step S101, acquiring performance index information of target IT equipment according to a target period to obtain a periodic performance data sequence;
Step S102, carrying out normalization processing on the periodic performance data sequence;
step S103, clustering the processed periodic performance data according to a preset fault type to obtain a clustering result;
step S104, calculating a target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result;
and step 105, judging whether the target IT equipment fails or not and specific failure information when the failure occurs according to the target abnormal value score corresponding to each preset failure type.
The working principle of the technical scheme is as follows: acquiring performance index information of target IT equipment according to a target period, obtaining a periodic performance data sequence, carrying out normalization processing on the periodic performance data sequence, clustering the processed periodic performance data according to preset fault types, obtaining a clustering result, calculating a target abnormal value score of a periodic performance data subsequence corresponding to each preset fault type in the clustering result, and judging whether the target IT equipment has faults or not and specific fault information when the faults occur according to the target abnormal value score corresponding to each preset fault type.
The beneficial effects of the technical scheme are as follows: the method has the advantages that the periodic performance data sequences are clustered, the target abnormal value score corresponding to each preset fault type is calculated, whether the target IT equipment fails or not and the specific fault information when the target IT equipment fails can be intelligently determined according to the calculated scores, the specific faults of the target IT equipment are quickly and accurately determined without manually performing fault checking one by one, the labor cost is saved, maintenance staff can quickly perform subsequent maintenance, the service life of the target IT equipment is prolonged, the problem that the follow-up maintenance efficiency is greatly influenced while the labor cost is seriously wasted when the fault checking is performed one by one according to the collected working parameters in the prior art is solved, and the service life of equipment is reduced due to the fact that the periodic maintenance work of the equipment in a machine room is not in place is solved.
In one embodiment, the collecting the performance index information of the target IT device according to the target period to obtain the periodic performance data sequence includes:
Determining a used time length of the target IT device;
determining a performance detection period of target IT equipment according to the used time length, and determining the performance detection period as the target period;
Acquiring working parameters of the target IT equipment according to the target period;
combining target working parameters corresponding to each performance index with each other;
After the combination is finished, sub-performance index values of each performance index under different dimensions are collected;
generating a performance data subsequence of each performance index according to the sub-performance index values of each performance index in different dimensions;
and generating a periodic performance data sequence of the target IT equipment according to the performance data subsequences of all the performance indexes.
The beneficial effects of the technical scheme are as follows: the periodic performance data sequence of the target IT equipment is more accurate and practical, data guarantee is provided for the subsequent fault judging process, furthermore, the length of the detection period can be flexibly determined according to the service life of the target IT equipment through intelligent setting of the detection period, further, frequent fault detection of the target IT equipment can be realized, and the service life of the target IT equipment is further prolonged.
In one embodiment, the normalizing the periodic performance data sequence includes: and carrying out maximum and minimum normalization processing on the periodic performance data sequence.
In one embodiment, as shown in fig. 2, the clustering the processed periodic performance data according to the preset fault type to obtain a clustering result includes:
Step S201, determining a target performance index related to each preset fault type;
step S202, obtaining target performance data corresponding to the target performance index from the periodic performance data;
step S203, clustering the processed periodic performance data according to the target performance data corresponding to each preset fault type to obtain a clustering result.
The beneficial effects of the technical scheme are as follows: the performance data corresponding to each preset fault type in the periodic performance data can be quickly and intuitively determined by classifying the performance data, so that the classification process is simpler and more convenient.
In one embodiment, the calculating the target outlier score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result includes:
Determining a functional level value of the target IT device;
Confirming whether the working function level value is greater than or equal to a preset threshold value, if so, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 100, otherwise, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 90;
Dividing and calculating according to the first subsequence value of the periodic performance data subsequence corresponding to each preset fault type and the second subsequence value of the performance index of the target IT equipment in the normal working state to obtain the first abnormal value fraction of the periodic performance data subsequence corresponding to each preset fault type;
when the accuracy is confirmed to be 100, confirming the first abnormal value score of the periodic performance data subsequence corresponding to each preset fault type as a target abnormal value score of the periodic performance data subsequence in the clustering result;
And when the accuracy is confirmed to be 90, multiplying the first outlier score of the periodic performance data subsequence corresponding to each preset fault type by a preset proportion to obtain a second outlier score of the periodic performance data subsequence corresponding to each preset fault type, and confirming the second outlier score of the periodic performance data subsequence corresponding to each preset fault type as a target outlier score of the periodic performance data subsequence corresponding to the preset fault type.
The beneficial effects of the technical scheme are as follows: by adjusting the calculated outlier score according to the operational functional level value of the target IT device, IT is possible to actually evaluate whether the target IT device is malfunctioning in consideration of the operational capabilities of the target IT device itself.
In one embodiment, the determining whether the target IT device fails and specific failure information when the failure occurs according to the target outlier score corresponding to each preset failure type includes:
confirming whether the target abnormal value score corresponding to each preset fault type is larger than or equal to the preset abnormal value score corresponding to the preset fault type, if so, confirming that the target IT equipment has no fault, otherwise, confirming that the target IT equipment has fault;
And determining a target fault type with the target abnormal value score smaller than the preset abnormal value score, and determining specific fault information of the target IT equipment according to the target fault type and the performance index information of the target IT equipment.
The beneficial effects of the technical scheme are as follows: whether the target IT equipment fails or not can be intuitively determined according to the scores by directly determining whether the target IT equipment fails or not by using the score comparison mode, and the judging efficiency is improved.
In one embodiment, as shown in fig. 3, the method further comprises:
step S301, generating a fault code of the target IT equipment according to the specific fault information of the target IT equipment;
Step S302, searching a fault point corresponding to the fault code, positioning the fault point and obtaining an electronic positioning result;
Step S303, inquiring a fault solution corresponding to the fault point;
And step S304, displaying the fault solution and the electronic positioning result.
The beneficial effects of the technical scheme are as follows: the maintenance personnel can quickly and accurately know the fault point of the fault of the target IT equipment and a specific maintenance scheme, and further can quickly maintain the target IT equipment so as to ensure the working efficiency and the service quality of the target IT equipment, and the experience of a user is further improved.
In one embodiment, the clustering the processed periodic performance data according to the preset fault type, after obtaining the clustering result, further includes:
Acquiring a target business operation related to the clustering result, acquiring a business input instruction related to the target business operation, and calling first data A 1 and second data A 2 related to the business input instruction from an equipment database and a management database;
Determining a first capacity of the first data A 1, determining a second capacity of the second data A 2, judging whether the first capacity and the second capacity are empty, and if yes, performing invalid feedback to the sub-component of the target IT device;
otherwise, determining a first code χ 1 of the first data a 1 and performing a first classification process;
Meanwhile, determining a second code χ 2 of the second data A 2, and performing second classification processing;
Wherein i=1, 2; A cumulative multiplication of the index values χ ij representing the different characteristic indices of j=1, 2..n in the i-th data; r ij+1 represents the feature code value of the j+1th feature index in the ith data; r ij represents the feature code value of the j-th feature index in the i-th data; k1 represents the number of feature coding sequences of the jth feature index in the ith data; /(I) A sequence value representing a kth feature code sequence in a jth feature index in the ith data; alpha k represents the sequence weight of the kth feature coding sequence in the jth feature index in the ith data, and n represents the number of feature indexes;
And according to the classification areas corresponding to the first classification result S 1 and the second classification result S 2, effective feedback is carried out on the corresponding sub-components in the target IT equipment, and corresponding feedback information is sent to be displayed.
The beneficial effects of the technical scheme are as follows: whether the classification of the clustering result is effective and whether the classification has a reasonable effect on the target IT equipment can be accurately confirmed, further, whether the target IT equipment has a certain fault or not can be accurately confirmed according to the feedback result by effectively or invalidively feeding back the corresponding sub-components in the target IT equipment, on the other hand, the fault judgment of the target IT equipment is realized, and the fault judgment accuracy is improved.
The embodiment also discloses a computer lab IT equipment fault accurate detection system for data center, as shown in fig. 4, the system includes:
The acquisition module 401 is configured to acquire performance index information of the target IT device according to a target period, and obtain a periodic performance data sequence;
A processing module 402, configured to normalize the periodic performance data sequence;
The clustering module 403 is configured to cluster the processed periodic performance data according to a preset fault type, and obtain a clustering result;
a calculation module 404, configured to calculate a target outlier score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result;
And the judging module 405 is configured to judge whether the target IT device fails or not and specific failure information when the failure occurs according to the target outlier score corresponding to each preset failure type.
The working principle and the beneficial effects of the above technical solution are described in the method claims, and are not repeated here.
It will be appreciated by those skilled in the art that the first and second aspects of the present invention refer to different phases of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. A machine room IT equipment fault accurate detection method for a data center is characterized by comprising the following steps:
Acquiring performance index information of target IT equipment according to a target period to obtain a periodic performance data sequence;
normalizing the periodic performance data sequence;
clustering the processed periodic performance data according to a preset fault type to obtain a clustering result;
calculating a target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result;
Judging whether the target IT equipment fails or not and specific failure information when the failure occurs according to the target abnormal value score corresponding to each preset failure type;
The calculating the target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result comprises the following steps:
Determining a functional level value of the target IT device;
Confirming whether the working function level value is greater than or equal to a preset threshold value, if so, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 100, otherwise, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 90;
Dividing and calculating according to the first subsequence value of the periodic performance data subsequence corresponding to each preset fault type and the second subsequence value of the performance index of the target IT equipment in the normal working state to obtain the first abnormal value fraction of the periodic performance data subsequence corresponding to each preset fault type;
when the accuracy is confirmed to be 100, confirming the first abnormal value score of the periodic performance data subsequence corresponding to each preset fault type as a target abnormal value score of the periodic performance data subsequence in the clustering result;
And when the accuracy is confirmed to be 90, multiplying the first outlier score of the periodic performance data subsequence corresponding to each preset fault type by a preset proportion to obtain a second outlier score of the periodic performance data subsequence corresponding to each preset fault type, and confirming the second outlier score of the periodic performance data subsequence corresponding to each preset fault type as a target outlier score of the periodic performance data subsequence corresponding to the preset fault type.
2. The method for accurately detecting the fault of the machine room IT equipment for the data center according to claim 1, wherein the step of collecting the performance index information of the target IT equipment according to the target period to obtain the periodic performance data sequence includes:
Determining a used time length of the target IT device;
determining a performance detection period of target IT equipment according to the used time length, and determining the performance detection period as the target period;
Acquiring working parameters of the target IT equipment according to the target period;
combining target working parameters corresponding to each performance index with each other;
After the combination is finished, sub-performance index values of each performance index under different dimensions are collected;
generating a performance data subsequence of each performance index according to the sub-performance index values of each performance index in different dimensions;
and generating a periodic performance data sequence of the target IT equipment according to the performance data subsequences of all the performance indexes.
3. The method for accurately detecting the fault of the computer room IT equipment in the data center according to claim 1, wherein the normalizing the periodic performance data sequence includes: and carrying out maximum and minimum normalization processing on the periodic performance data sequence.
4. The method for accurately detecting the faults of the computer room IT equipment in the data center according to claim 1, wherein the clustering the processed periodic performance data according to the preset fault type to obtain a clustering result includes:
Determining a target performance index associated with each preset fault type;
Obtaining target performance data corresponding to the target performance index from the periodic performance data;
and clustering the processed periodic performance data according to the target performance data corresponding to each preset fault type to obtain a clustering result.
5. The method for accurately detecting the fault of the machine room IT equipment in the data center according to claim 1, wherein the determining whether the target IT equipment has a fault or not and specific fault information when the fault has a fault according to the target outlier score corresponding to each preset fault type includes:
confirming whether the target abnormal value score corresponding to each preset fault type is larger than or equal to the preset abnormal value score corresponding to the preset fault type, if so, confirming that the target IT equipment has no fault, otherwise, confirming that the target IT equipment has fault;
And determining a target fault type with the target abnormal value score smaller than the preset abnormal value score, and determining specific fault information of the target IT equipment according to the target fault type and the performance index information of the target IT equipment.
6. The machine room IT equipment fault accurate detection method for a data center of claim 1, further comprising:
Generating a fault code of the target IT equipment according to the specific fault information of the target IT equipment;
Searching a fault point corresponding to the fault code, positioning the fault point, and obtaining an electronic positioning result;
inquiring a fault solution corresponding to the fault point;
And displaying the fault solution and the electronic positioning result.
7. The method for accurately detecting faults of computer room IT equipment in a data center according to claim 1, wherein the clustering of the processed periodic performance data according to a preset fault type is performed, and after a clustering result is obtained, the method further comprises:
Acquiring a target business operation related to the clustering result, acquiring a business input instruction related to the target business operation, and calling first data A 1 and second data A 2 related to the business input instruction from an equipment database and a management database;
Determining a first capacity of the first data A 1, determining a second capacity of the second data A 2, judging whether the first capacity and the second capacity are empty, and if yes, performing invalid feedback to the sub-component of the target IT device;
Otherwise, determining a first code of the first data A 1 And performing first classification processing;
At the same time, a second code of the second data A 2 is determined And performing a second classification process;
Wherein i=1, 2; index value/>, which represents different characteristic indexes of j=1, 2..n in the i-th data Is a tired multiplication of (2); /(I)A feature code value indicating a j+1th feature index in the i-th data; /(I)A feature code value indicating a j-th feature index in the i-th data; k1 represents the number of feature coding sequences of the jth feature index in the ith data; /(I)A sequence value representing a kth feature code sequence in a jth feature index in the ith data; /(I)A sequence weight value of a kth characteristic coding sequence in a jth characteristic index in the ith data is represented, and n is the number of the characteristic indexes;
And according to the first classification result And second categorization processing results/>And the corresponding classifying area is used for effectively feeding back to the corresponding sub-component in the target IT equipment and sending corresponding feedback information to display.
8. A computer lab IT equipment fault accurate detection system for data center, characterized in that, this system includes:
The acquisition module is used for acquiring the performance index information of the target IT equipment according to the target period to obtain a periodic performance data sequence;
the processing module is used for carrying out normalization processing on the periodic performance data sequence;
The clustering module is used for clustering the processed periodic performance data according to a preset fault type to obtain a clustering result;
the calculation module is used for calculating the target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result;
The judging module is used for judging whether the target IT equipment fails or not and specific failure information when the failure occurs according to the target abnormal value score corresponding to each preset failure type;
The calculation module is used for calculating a target abnormal value score of the periodic performance data subsequence corresponding to each preset fault type in the clustering result, and the method comprises the following steps:
Determining a functional level value of the target IT device;
Confirming whether the working function level value is greater than or equal to a preset threshold value, if so, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 100, otherwise, confirming that the accuracy of the abnormal value score of the periodic performance data subsequence in each clustering result is 90;
Dividing and calculating according to the first subsequence value of the periodic performance data subsequence corresponding to each preset fault type and the second subsequence value of the performance index of the target IT equipment in the normal working state to obtain the first abnormal value fraction of the periodic performance data subsequence corresponding to each preset fault type;
when the accuracy is confirmed to be 100, confirming the first abnormal value score of the periodic performance data subsequence corresponding to each preset fault type as a target abnormal value score of the periodic performance data subsequence in the clustering result;
And when the accuracy is confirmed to be 90, multiplying the first outlier score of the periodic performance data subsequence corresponding to each preset fault type by a preset proportion to obtain a second outlier score of the periodic performance data subsequence corresponding to each preset fault type, and confirming the second outlier score of the periodic performance data subsequence corresponding to each preset fault type as a target outlier score of the periodic performance data subsequence corresponding to the preset fault type.
CN202110400918.5A 2021-04-14 2021-04-14 Computer room IT equipment fault accurate detection method and system for data center Active CN113176962B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110400918.5A CN113176962B (en) 2021-04-14 2021-04-14 Computer room IT equipment fault accurate detection method and system for data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110400918.5A CN113176962B (en) 2021-04-14 2021-04-14 Computer room IT equipment fault accurate detection method and system for data center

Publications (2)

Publication Number Publication Date
CN113176962A CN113176962A (en) 2021-07-27
CN113176962B true CN113176962B (en) 2024-05-07

Family

ID=76923379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110400918.5A Active CN113176962B (en) 2021-04-14 2021-04-14 Computer room IT equipment fault accurate detection method and system for data center

Country Status (1)

Country Link
CN (1) CN113176962B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391982B (en) * 2021-08-17 2021-11-23 云智慧(北京)科技有限公司 Monitoring data anomaly detection method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205111A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for mining failure modes of time series data
CN106649438A (en) * 2016-09-09 2017-05-10 西安交通大学 Time series data unexpected fault detection method
KR101761781B1 (en) * 2016-12-30 2017-07-26 강원석 Big data processing method for applying integrated management framework for the open source database
CN110826648A (en) * 2020-01-09 2020-02-21 浙江鹏信信息科技股份有限公司 Method for realizing fault detection by utilizing time sequence clustering algorithm
KR102141391B1 (en) * 2019-12-16 2020-08-05 주식회사 한국가스기술공사 Failure data management method based on cluster estimation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8731724B2 (en) * 2009-06-22 2014-05-20 Johnson Controls Technology Company Automated fault detection and diagnostics in a building management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205111A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for mining failure modes of time series data
CN106649438A (en) * 2016-09-09 2017-05-10 西安交通大学 Time series data unexpected fault detection method
KR101761781B1 (en) * 2016-12-30 2017-07-26 강원석 Big data processing method for applying integrated management framework for the open source database
KR102141391B1 (en) * 2019-12-16 2020-08-05 주식회사 한국가스기술공사 Failure data management method based on cluster estimation
CN110826648A (en) * 2020-01-09 2020-02-21 浙江鹏信信息科技股份有限公司 Method for realizing fault detection by utilizing time sequence clustering algorithm

Also Published As

Publication number Publication date
CN113176962A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN114091912B (en) Method for analyzing topological transaction of medium-voltage power grid by applying knowledge graph
CN107527134A (en) A kind of distribution transformer state evaluating method and device based on big data
CN113176962B (en) Computer room IT equipment fault accurate detection method and system for data center
CN112085710A (en) Protection constant value and pressure plate inspection system and method
CN113312200A (en) Event processing method and device, computer equipment and storage medium
CN114255784A (en) Substation equipment fault diagnosis method based on voiceprint recognition and related device
CN114202304A (en) Intelligent monitoring processing method and system for power grid faults
CN114740343A (en) Real-time detection system for circuit breaker
CN114970665A (en) Model training method, electrolytic capacitor residual life prediction method and system
CN114460519A (en) On-site and terminal fusion management system and method based on power transformer inspection
CN117353315B (en) Device for controlling power generation voltage based on transient fluctuation of photovoltaic and wind power generation voltage
CN117808456A (en) Equipment fault early warning method and device based on intelligent operation management
CN117331790A (en) Machine room fault detection method and device for data center
CN113010394A (en) Machine room fault detection method for data center
CN115330285B (en) Transformer substation data processing method and system
CN107450030B (en) Communication storage battery telemetry data processing method, device and system
CN112737120B (en) Regional power grid control report generation method and device and computer equipment
CN114037010A (en) Method and device for identifying abnormal electric quantity data
CN114740419A (en) Method, device, equipment and medium for analyzing error of district ammeter based on three-dimensional graph
CN115372752A (en) Fault detection method, device, electronic equipment and storage medium
CN112949951A (en) Data prediction method, data prediction device, electronic equipment and storage medium
CN113807690A (en) Online evaluation and early warning method and system for operation state of regional power grid regulation and control system
CN107958505A (en) A kind of stable intelligent inspection system for operating condition and its control method
CN116387652B (en) Online maintenance system and method for formation/capacity-division power supply equipment
CN117614132B (en) Distribution transformer voltage out-of-limit portrait method and device for power distribution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant