CN115437886A - Fault early warning method, device and equipment based on storage and calculation integrated chip and storage - Google Patents

Fault early warning method, device and equipment based on storage and calculation integrated chip and storage Download PDF

Info

Publication number
CN115437886A
CN115437886A CN202211105067.2A CN202211105067A CN115437886A CN 115437886 A CN115437886 A CN 115437886A CN 202211105067 A CN202211105067 A CN 202211105067A CN 115437886 A CN115437886 A CN 115437886A
Authority
CN
China
Prior art keywords
data
model
storage
target
integrated chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211105067.2A
Other languages
Chinese (zh)
Inventor
黄志兰
刘荣凯
林显成
陆钢
樊勇兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202211105067.2A priority Critical patent/CN115437886A/en
Publication of CN115437886A publication Critical patent/CN115437886A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a fault early warning method, device, equipment and storage based on a storage and computation integrated chip, and relates to the technical field of emerging information. The method comprises the steps of obtaining input observation data; inputting input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in a storage and computation integrated chip; acquiring observation target data, inputting inference target data and observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in a storage and calculation integrated chip; and determining fault early warning information according to the target deviation difference. According to the method, the pre-trained first model and the pre-trained second model are solidified in the storage and calculation integrated chip, and the fault early warning information is obtained according to the input observation data acquired in real time, so that the efficient server fault early warning is realized under the condition of not additionally consuming calculation power.

Description

Fault early warning method, device and equipment based on storage and calculation integrated chip and storage
Technical Field
The disclosure relates to the technical field of emerging information, in particular to a fault early warning method, device, equipment and storage based on a storage and computation integrated chip.
Background
The server is used as a main carrier of computing power of the cloud resource pool, the cloud data center is located at a key core position, and the health condition of the server has important influence on the overall running state, service reliability and the like of the cloud resource pool. The method has the advantages that the running state of the server is continuously monitored, potential faults are found in advance, early warning is sent out, and the method has important effects on improving the running quality of the server and reducing the influence on services.
In the prior art, the monitoring of the running state of the server mainly comprises an in-band mode and an out-of-band mode. In-band monitoring is realized by installing an agent program in a server operating system, but the resources of a Central Processing Unit (CPU) of a server need to be consumed additionally, so that the overall computing power of a cloud resource pool is consumed greatly, and the cost is high; the out-of-band monitoring is realized by a Baseboard Management Controller (BMC), but because the processing capability of a microkernel on the BMC is limited, the microkernel is used to run a fault prediction calculation program, so that the scale of a calculation model is limited, the running efficiency is low, and other services of the BMC are affected.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The invention provides a fault early warning method, a fault early warning device, fault early warning equipment and fault early warning storage based on a storage and calculation integrated chip, and the problem of high calculation cost in the related technology is solved at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to one aspect of the disclosure, a fault early warning method based on a storage and computation integrated chip is provided, which includes: acquiring input observation data; inputting the input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in a storage and computation integrated chip; acquiring observation target data, inputting the inference target data and the observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in the storage and calculation integrated chip; and determining fault early warning information according to the target deviation difference.
In one embodiment of the present disclosure, the method further comprises: and sending the fault early warning information to a network management system through a standard interface.
In one embodiment of the present disclosure, the target deviation distance information includes distance information of the inference target data and the observation target data.
In an embodiment of the present disclosure, the preset first model is: acquiring input observation data and target observation data, wherein the input observation data are sensor data, and the inference target data are data which are related to the sensor data; and training a neural network model by taking the input observation data as an input value and the target observation data as a target value to determine a first model.
In an embodiment of the present disclosure, the preset second model is: acquiring a server historical data set, wherein the server historical data set comprises first observation target data when a server fails and second observation target data when the server is normal; and determining a second model according to the deviation difference information of the first data and the second data.
In an embodiment of the present disclosure, determining the fault warning information according to the target deviation distance includes: and determining fault early warning information in a preset relation table of early warning levels and deviation distances according to the target deviation distances.
In one embodiment of the present disclosure, the input observation data includes CPU utilization, fan speed; the observation target data comprises a CPU temperature and a power supply temperature.
According to still another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the above described computationally-monolithic-chip-based fault warning method via execution of the executable instructions.
According to yet another aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the above-mentioned method for fault pre-warning based on a computer-integrated chip.
According to another aspect of the present disclosure, there is provided a computer program product comprising computer instructions stored in a computer-readable storage medium, the computer instructions, when executed by a processor, implement the operation instructions of any one of the above-mentioned fault pre-warning method based on a computer-integrated chip.
The embodiment of the disclosure provides a fault early warning method, a fault early warning device and a fault early warning storage method based on a storage and calculation integrated chip, wherein input observation data are acquired; inputting input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in a storage and computation integrated chip; acquiring observation target data, inputting the inference target data and the observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in a storage and calculation integrated chip; and determining fault early warning information according to the target deviation difference. In the embodiment of the disclosure, the pre-trained first model and the pre-trained second model are solidified in the storage and calculation integrated chip, and the fault early warning information is obtained according to the input observation data acquired in real time, so that the efficient server fault early warning is realized without extra calculation power consumption.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.
Fig. 1 shows a flowchart of a fault early warning method based on a storage and computation integrated chip in an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a further fault warning method based on a storage-computation-integrated chip in an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a fault warning method based on a storage and computation integrated chip according to another embodiment of the disclosure;
FIG. 4 is a flowchart illustrating another fault warning method based on a storage-computation-integrated chip according to an embodiment of the present disclosure;
fig. 5 is a flowchart illustrating a specific example of a fault warning method based on a storage-computation-integrated chip according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating a fault warning device based on a storage and computation integrated chip in an embodiment of the present disclosure;
FIG. 7 is a block diagram of an electronic device according to an embodiment of the disclosure;
FIG. 8 shows a schematic diagram of a computer-readable storage medium in an embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The present exemplary embodiment will be described in detail below with reference to the drawings and examples.
First, the embodiment of the present disclosure provides a fault early warning method based on a storage and computation integrated chip, which can be executed by any electronic device with computation processing capability.
Fig. 1 shows a flow chart of a fault early warning based on a storage and computation integrated chip in an embodiment of the present disclosure, and as shown in fig. 1, the fault early warning based on the storage and computation integrated chip provided in the embodiment of the present disclosure includes the following steps:
s102, acquiring input observation data.
The input observation data may be various monitoring index data acquired by a sensor or other monitoring methods, for example, the input observation data may be CPU utilization information, fan rotation speed information, or the like.
And S104, inputting the input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in the storage and computation integrated chip.
It should be noted that the first model may be a Neural network model, the Neural Network (NN) is a complex network system formed by a large number of simple processing units (called neurons) widely connected with each other, reflects many basic features of human brain functions, and is a highly complex nonlinear dynamical learning system, and the Neural network has massive parallel, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities, and can handle inaccurate and fuzzy information processing problems that need to consider many factors and conditions at the same time. The storage and calculation integrated chip is a novel storage chip with integrated computing capability, is different from a framework of von neumann storage and calculation separation, integrates calculation and storage together, enables each storage unit to have a calculation function, can support large-scale parallel calculation, is mainly applied to training and reasoning chips of artificial intelligent models, and has the characteristics of high computing speed and low power consumption.
For example, various monitoring index data acquired by a sensor or other monitoring methods, such as CPU utilization, fan rotation speed, and the like, are used as input data of the first model, and the first model outputs inference target data.
S106, acquiring observation target data, inputting the inference target data and the observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in the storage and calculation integrated chip.
It should be noted that the observation target data is monitoring data acquired by a sensor or other monitoring method, such as a CPU temperature, a power supply temperature, and the like.
And S108, determining fault early warning information according to the target deviation difference.
It should be noted that the target deviation difference may be obtained by calculating a distance or the like to obtain a deviation between the actual state and the theoretical state. The fault warning information may be a fault warning level, for example, the larger the target deviation distance is, the higher the fault warning level is.
For example, in a specific example, the distance calculation model (equivalent to the second model) compares and analyzes the inference target data and the current actual observation target data, and obtains the deviation between the actual state and the theoretical state in a manner of calculating the distance and the like, so as to obtain the fault early warning information.
In specific implementation, a first model (inference model) of the present disclosure mainly establishes a functional relationship between various sensing information through a neural network algorithm, the functional relationship is fixed in a computing circuit of a storage-computation-integrated chip, when the present disclosure is in operation, inference target data is obtained through input observation data, a second model (distance computation model) compares the inference target data with current actual observation target data, and deviation conditions of an actual state and a theoretical state are obtained in a manner of computing distance and the like, so as to obtain fault early warning information, and the distance computation model is also fixed in a computation logic of the storage-computation-integrated chip. The pre-trained first model and the pre-trained second model are solidified in the storage and calculation integrated chip, and the fault early warning information is obtained according to the input observation data acquired in real time, so that the efficient server fault early warning is realized without extra calculation power consumption.
In an embodiment of the present disclosure, as shown in fig. 2, the fault early warning method based on a storage and computation integrated chip provided in the embodiment of the present disclosure may send fault early warning information through the following steps, and may quickly issue the fault early warning information:
and S202, sending the fault early warning information to a network management system through a standard interface.
It should be noted that the standard interface can be the most widely used interface in the personal computer and communication industries, and the communication interface is often used to connect a plurality of systems, and can reliably connect a computer and a remote computer subsystem. For example, the device Management system includes an Intelligent Platform Management Interface (IPMI) and a Redfish Interface, where Redfish is a Management standard based on a hypertext transfer security protocol service, and the device Management is implemented by using a RESTful Interface.
For example, in a specific example, a storage and computation integrated chip is introduced into the server, the sensor data is acquired from the BMC storage unit in real time, the fault early warning analysis is performed, and the fault early warning information is sent to the IPMI. The BMC is an independent controller module integrated on a server mainboard, is used for remote management of the server, can communicate with other physical components in the server through various interfaces such as I2C, PCI, collects state data and fault data of various physical components in the server, such as a CPU (central processing unit), a memory, a fan, a power supply and the like, can also collect operation data and alarm data in an operating system of the server mainboard, and provides the operation data and the alarm data to an external network management system through an independent management network by using standard interfaces such as IPMI (intelligent platform management interface) and Redfish and the like.
The present disclosure enables interoperability and low cost for design by using standard interfaces to transmit early warning information, and reduces the time required for design.
In one embodiment of the present disclosure, the target deviation gap information includes distance information of the inferred target data and the observed target data.
It should be noted that the observation target data may be monitoring data acquired by a sensor or other monitoring method, such as a CPU temperature, a power supply temperature, and the like.
In an embodiment of the present disclosure, as shown in fig. 3, the fault early warning method based on a storage and computation integrated chip provided in the embodiment of the present disclosure may train a preset first model to output target observation data according to input observation data by the following steps:
s302, acquiring input observation data and target observation data, wherein the input observation data are sensor data, and the inference target data are data related to the sensor data;
s304, training the neural network model by taking the input observation data as an input value and the target observation data as a target value, and determining a first model.
In specific implementation, the model preparation under the line of the disclosure mainly comprises learning and training of an inference model and a distance calculation model. The input observation data and the target observation data are collected in a training dependence line of the inference model, several groups of sensor data can be selected as the input observation data, then the sensor data with correlation is selected as the inference target data, all monitoring indexes of the server have correlation, other indexes can be deduced through some indexes, abnormal indexes usually occur before the server fails, for example, the CPU utilization rate and the fan rotating speed can affect the temperature of the CPU, and the neural network model of the input value and the target value is trained under normal working conditions.
In a specific example, the offline server monitoring data is used for training the neural network, a model relation between the input observation data and the observation target data is established, and the inference target model (which is equivalent to the inference model) is solidified to the storage and calculation integrated chip.
The storage and calculation integrated chip acquires sensing observation data on the BMC storage chip and analyzes and calculates the acquired sensing data, and the calculation function of the storage and calculation integrated chip comprises the following steps: the reasoning model comprises the following steps: completing training/modeling offline, establishing a functional relation among various sensing information, solidifying the functional relation into a computing circuit of a storage and computation integrated chip, and computing to obtain inference target data through input observation data during operation; early warning analysis: and comparing and analyzing the reasoning target data and the current actual observation target data, and obtaining the deviation condition of the actual state and the theoretical state in the modes of calculating distance and the like so as to obtain fault early warning information, wherein the analysis and calculation model is also solidified into calculation logic integrated with storage and calculation. The microkernel on the BMC has limited processing capability, is difficult to perform a complex reasoning and calculating model, can only judge simple analysis state data through a threshold value to obtain rough early warning, and unloads a fault early warning analysis program running on the BMC to a memory by utilizing the efficient calculating capability of a storage and calculation integrated chip, so that a complex prediction function is realized.
In an embodiment of the present disclosure, as shown in fig. 4, the fault early warning method based on a storage and computation integrated chip provided in the embodiment of the present disclosure may train a preset second model by following steps, that is, determining deviation distance information according to first observation target data when a server fails and second observation target data when the server is normal:
s402, obtaining a server historical data set, wherein the server historical data set comprises first observation target data when the server is in fault and second observation target data when the server is normal.
It should be noted that the first observation target data may be server failure history data. The second observation target data may be history data when the server is normal.
S404, determining a second model according to the deviation gap information of the first data and the second data.
In specific implementation, the training dependency line of the distance calculation model disclosed by the invention collects target observation data under a fault condition and observation data under a normal condition, analyzes a relation model between the target data under the fault condition and the observation data under the normal condition, and measures the data offset degree and the relation between the offset distance and the fault (for example, the temperature of a CPU under light load is overhigh) by using indexes such as distance and the like.
The disclosed reasoning model is used for reasoning target observation data by inputting observation data instead of directly deducing fault early warning information; obtaining fault early warning information by comparing the difference between target observation data and reasoning observation data; the reasoning of the target data and the difference calculation of the reasoning target data and the target observation data are finished on a storage and calculation integrated chip, but not on a BMC; the normal work of the BMC is not influenced, and the BMC does not need to be redesigned.
In a specific example, the server fault historical data is used for calculating an offset relation model of the observation target data and a normal value under the fault condition, and the distance calculation model is solidified to the storage and calculation integrated chip.
In one embodiment of the present disclosure, determining the fault warning information according to the target deviation distance includes: and determining fault early warning information in a preset relation table of early warning levels and deviation distances according to the target deviation distances.
In one embodiment of the present disclosure, the input observation data includes CPU utilization, fan speed; the observation target data includes a CPU temperature and a power supply temperature.
The method obtains the inference model of input observation data and target observation data through offline training. And obtaining a difference model between the target observation data under the fault and the target observation data under the ideal condition through offline analysis. The two models are cured into a memory integrated chip. When the fault early warning system works, the storage and calculation integrated chip acquires stored sensor observation data from a storage unit of the BMC, inference target data is obtained by inference of input observation data, a target deviation distance is obtained by computation of the inference target data and the observation target data, and a fault early warning level is obtained through the target deviation distance. And sending the fault early warning information to a cloud management system through an IPMI bus. Therefore, under the condition of not additionally consuming the CPU computing power and the BMC computing power, intelligent, efficient and comprehensive server fault early warning is realized.
Fig. 5 is a flowchart illustrating a specific example of a fault early warning based on a storage and computation integrated chip in an embodiment of the present disclosure, and as shown in fig. 5, a main process of an online fault early warning based on a storage and computation integrated chip in an embodiment of the present disclosure includes two parts, namely target data inference and fault early warning analysis, and specifically includes the following steps:
s501, collecting various kinds of observation data to a BMC storage unit by a sensor;
s502, acquiring sensing observation data of the BMC storage unit by the storage and computation integrated chip;
s503, storing target data in the observation data into an input end A of the distance calculation circuit for fault early warning analysis;
s504, storing other parts in the observation data into an input end of the target inference circuit for inference analysis;
s505, reasoning by the neural network reasoning model to obtain reasoning target data;
s506, storing the inference target data into an input end B of the distance calculation circuit;
s507, the distance calculation circuit calculates the distance between the observation target data and the inference target data;
s508, obtaining fault early warning information according to the target deviation distance;
and S509, sending the fault early warning information to a network management system through the IPMI bus.
The method can be used for operation and maintenance monitoring of the server in the cloud data center, and a storage and calculation integrated chip is introduced into the server and used for early warning and analysis of faults. The operation and maintenance quality of a single server can be improved, the operation and maintenance quality is linked with an external operation and maintenance monitoring system, when a fault prediction model in the whole cloud data center range is constructed and leaves a factory, a neural network is trained by adopting an off-line machine learning method, the model relation between input observation data and observation target data is obtained, and the model relation is solidified into a storage and calculation integrated chip as a reasoning model. After the server is on line, the storage and computation integrated chip continuously acquires sensor data from the BMC storage unit, sends the sensor data to the reasoning model to compute the reasoning target observation data, and then performs deviation computation on the reasoning target observation data and the observation target data by using the distance computation model to realize fault early warning.
Based on the same inventive concept, the embodiment of the present disclosure further provides a fault early warning device based on a storage and computation integrated chip, as described in the following embodiments. Because the principle of the embodiment of the apparatus for solving the problem is similar to that of the embodiment of the method, the embodiment of the apparatus can be implemented by referring to the implementation of the embodiment of the method, and repeated details are not described again.
Fig. 6 shows a schematic diagram of a fault early warning device based on a storage and computation integrated chip in an embodiment of the present disclosure, as shown in fig. 6, the device includes: the system comprises an input observation data acquisition module 61, an inference target data output module 62, a target deviation difference information output module 63, a fault early warning information determination module 64 and a fault early warning information sending module 65.
The input observation data obtaining module 61 is configured to obtain input observation data.
And an inference target data output module 62, configured to input the input observation data into a preset first model, and output inference target data, where the preset first model is solidified in the storage and computation integrated chip.
And a target deviation difference information output module 63, configured to acquire observation target data, input the inference target data and the observation target data into a preset second model, and output target deviation difference information, where the preset second model is solidified in the storage and computation integrated chip.
And a fault early warning information determining module 64, configured to determine fault early warning information according to the target deviation difference.
In an embodiment of the present disclosure, the failure early warning apparatus based on a storage and computation integrated chip further includes a failure early warning information sending module 65: and the system is used for sending the fault early warning information to a network management system through a standard interface.
In an embodiment of the disclosure, the target deviation gap information in the fault early warning device based on the storage-integration chip includes distance information between inference target data and observation target data.
In an embodiment of the present disclosure, the first model preset in the fault warning device based on the storage and computation integrated chip is: acquiring input observation data and target observation data, wherein the input observation data are sensor data, and the inference target data are data which are related to the sensor data; and training the neural network model by taking the input observation data as an input value and the target observation data as a target value to determine a first model.
In an embodiment of the present disclosure, the second model preset in the fault warning device based on the memory and computation integrated chip is: acquiring a server historical data set, wherein the server historical data set comprises first observation target data when a server fails and second observation target data when the server is normal; and determining a second model according to the deviation difference information of the first data and the second data.
In an embodiment of the present disclosure, the fault warning information determining module 64 is further configured to: and determining fault early warning information in a preset relation table of early warning levels and deviation distances according to the target deviation distances.
In an embodiment of the present disclosure, the input observation data in the fault early warning device based on the storage and computation integrated chip includes a CPU utilization rate and a fan rotation speed; the observation target data includes a CPU temperature and a power supply temperature.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 700 according to this embodiment of the disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, and a bus 730 that couples various system components including the memory unit 720 and the processing unit 710.
Wherein the storage unit stores program code that is executable by the processing unit 710 to cause the processing unit 710 to perform steps according to various exemplary embodiments of the present disclosure as described in the above section "exemplary methods" of this specification.
For example, the processing unit 710 may perform the following steps of the above method embodiment: acquiring input observation data; inputting input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in a storage and computation integrated chip; acquiring observation target data, inputting the inference target data and the observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in a storage and calculation integrated chip; and determining fault early warning information according to the target deviation difference.
For example, the processing unit 710 may perform the following steps of the above method embodiment: and sending the fault early warning information to a network management system through a standard interface.
For example, the processing unit 710 may perform the following steps of the above method embodiment: the target deviation difference information includes distance information of the inference target data and the observation target data.
For example, the processing unit 710 may perform the following steps of the above method embodiments: the preset first model is: acquiring input observation data and target observation data, wherein the input observation data are sensor data, and the inference target data are data which are related to the sensor data; and training the neural network model by taking the input observation data as an input value and the target observation data as a target value to determine a first model.
For example, the processing unit 710 may perform the following steps of the above method embodiment: the preset second model is as follows: acquiring a server historical data set, wherein the server historical data set comprises first observation target data when a server fails and second observation target data when the server is normal; and determining a second model according to the deviation difference information of the first data and the second data.
For example, the processing unit 710 may perform the following steps of the above method embodiment: and determining fault early warning information in a preset relation table of early warning levels and deviation distances according to the target deviation distances.
For example, the processing unit 710 may perform the following steps of the above method embodiment: inputting observation data including CPU utilization rate and fan rotation speed; the observation target data includes a CPU temperature and a power supply temperature.
The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.
The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 700 may also communicate with one or more external devices 740 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 760. As shown, the network adapter 760 communicates with the other modules of the electronic device 700 over the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium, which may be a readable signal medium or a readable storage medium. Fig. 7 is a schematic diagram of a computer-readable storage medium in an embodiment of the disclosure, and as shown in fig. 7, the computer-readable storage medium 700 has a program product stored thereon, which is capable of implementing the above-mentioned method of the disclosure. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
In one embodiment, the program product in the disclosed embodiments is implemented as a method, when executed by a processor, comprising: acquiring input observation data; inputting input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in a storage and computation integrated chip; acquiring observation target data, inputting inference target data and observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in a storage and calculation integrated chip; and determining fault early warning information according to the target deviation difference.
In one embodiment, the program product in the disclosed embodiments is implemented as a method, when executed by a processor, comprising: and sending the fault early warning information to a network management system through a standard interface.
In one embodiment, the program product in the disclosed embodiments is implemented as a method, when executed by a processor, comprising: the target deviation gap information includes distance information of the inference target data and the observation target data.
In one embodiment, the program product in the disclosed embodiments is implemented as a method, when executed by a processor, comprising: the preset first model is as follows: acquiring input observation data and target observation data, wherein the input observation data are sensor data, and the inference target data are data which are related to the sensor data; and training the neural network model by taking the input observation data as an input value and the target observation data as a target value to determine a first model.
In one embodiment, the program product in the disclosed embodiments is implemented as a method, when executed by a processor, comprising: the preset second model is: acquiring a server historical data set, wherein the server historical data set comprises first observation target data when a server fails and second observation target data when the server is normal; and determining a second model according to the deviation difference information of the first data and the second data.
In one embodiment, the program product in the disclosed embodiments is a method for implementing, when executed by a processor, the steps of: and determining fault early warning information in a preset relation table of early warning levels and deviation distances according to the target deviation distances.
In one embodiment, the program product in the disclosed embodiments is a method for implementing, when executed by a processor, the steps of: inputting observation data including CPU utilization rate and fan rotation speed; the observation target data includes a CPU temperature and a power supply temperature.
More specific examples of the computer-readable storage medium in the present disclosure may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In the present disclosure, a computer readable storage medium may include a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Alternatively, program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
In particular implementations, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A fault early warning method based on a storage and calculation integrated chip is characterized by comprising the following steps:
acquiring input observation data;
inputting the input observation data into a preset first model, and outputting inference target data, wherein the preset first model is solidified in a storage and computation integrated chip;
acquiring observation target data, inputting the inference target data and the observation target data into a preset second model, and outputting target deviation difference information, wherein the preset second model is solidified in the storage and calculation integrated chip;
and determining fault early warning information according to the target deviation difference.
2. The fault pre-warning method based on the storage and computation integrated chip as claimed in claim 1, wherein the method further comprises:
and sending the fault early warning information to a network management system through a standard interface.
3. The fault pre-warning method based on the storage and computation integrated chip as claimed in claim 1, wherein the target deviation difference information includes distance information between the inference target data and observation target data.
4. The fault pre-warning method based on the storage and computation integrated chip as claimed in claim 1, wherein the preset first model is:
acquiring input observation data and target observation data, wherein the input observation data are sensor data, and the inference target data are data which are related to the sensor data;
and training a neural network model by taking the input observation data as an input value and the target observation data as a target value to determine a first model.
5. The fault pre-warning method based on the storage and computation integrated chip as claimed in claim 1, wherein the preset second model is:
acquiring a server historical data set, wherein the server historical data set comprises first observation target data when a server fails and second observation target data when the server is normal;
and determining a second model according to the deviation difference information of the first data and the second data.
6. The fault early warning method based on the storage and computation integrated chip as claimed in claim 1, wherein determining fault early warning information according to the target deviation difference comprises:
and determining fault early warning information in a preset relation table of early warning levels and deviation distances according to the target deviation distances.
7. The fault pre-warning method based on the storage and computation integrated chip as claimed in claim 1, wherein the input observation data comprises CPU utilization rate and fan rotation speed; the observation target data comprises a CPU temperature and a power supply temperature.
8. The utility model provides a trouble early warning device based on integrative chip of depositing and calculating which characterized in that includes:
the input observation data acquisition module is used for acquiring input observation data;
the reasoning target data output module is used for inputting the input observation data into a preset first model and outputting reasoning target data, wherein the preset first model is solidified in the storage and calculation integrated chip;
the target deviation difference information output module is used for acquiring observation target data, inputting the inference target data and the observation target data into a preset second model and outputting target deviation difference information, wherein the preset second model is solidified in the storage and calculation integrated chip;
and the fault early warning information determining module is used for determining fault early warning information according to the target deviation difference.
9. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the fault pre-warning method based on the computer-integrated chip according to any one of claims 1 to 7 through executing the executable instructions.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for fault pre-warning based on a storage-and-computation integrated chip according to any one of claims 1 to 7.
CN202211105067.2A 2022-09-09 2022-09-09 Fault early warning method, device and equipment based on storage and calculation integrated chip and storage Withdrawn CN115437886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211105067.2A CN115437886A (en) 2022-09-09 2022-09-09 Fault early warning method, device and equipment based on storage and calculation integrated chip and storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211105067.2A CN115437886A (en) 2022-09-09 2022-09-09 Fault early warning method, device and equipment based on storage and calculation integrated chip and storage

Publications (1)

Publication Number Publication Date
CN115437886A true CN115437886A (en) 2022-12-06

Family

ID=84246434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211105067.2A Withdrawn CN115437886A (en) 2022-09-09 2022-09-09 Fault early warning method, device and equipment based on storage and calculation integrated chip and storage

Country Status (1)

Country Link
CN (1) CN115437886A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472639A (en) * 2023-12-27 2024-01-30 中诚华隆计算机技术有限公司 Multi-chip interconnection system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491305A (en) * 2018-03-09 2018-09-04 网宿科技股份有限公司 A kind of detection method and system of server failure
CN112067330A (en) * 2020-08-26 2020-12-11 李文彬 Equipment fault diagnosis method and device based on storage and calculation integrated technology
CN112989522A (en) * 2021-05-10 2021-06-18 创新奇智(成都)科技有限公司 Model training method, fault prediction method and device and electronic equipment
CN113032218A (en) * 2021-03-26 2021-06-25 山东英信计算机技术有限公司 Server fault detection method, system and computer readable storage medium
KR20220064098A (en) * 2020-11-11 2022-05-18 한국생산기술연구원 Fault diagnosis apparatus and method based on machine-learning
CN114742234A (en) * 2022-04-12 2022-07-12 广东省科学院智能制造研究所 Fault diagnosis model training method, fault diagnosis method and related device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491305A (en) * 2018-03-09 2018-09-04 网宿科技股份有限公司 A kind of detection method and system of server failure
CN112067330A (en) * 2020-08-26 2020-12-11 李文彬 Equipment fault diagnosis method and device based on storage and calculation integrated technology
KR20220064098A (en) * 2020-11-11 2022-05-18 한국생산기술연구원 Fault diagnosis apparatus and method based on machine-learning
CN113032218A (en) * 2021-03-26 2021-06-25 山东英信计算机技术有限公司 Server fault detection method, system and computer readable storage medium
CN112989522A (en) * 2021-05-10 2021-06-18 创新奇智(成都)科技有限公司 Model training method, fault prediction method and device and electronic equipment
CN114742234A (en) * 2022-04-12 2022-07-12 广东省科学院智能制造研究所 Fault diagnosis model training method, fault diagnosis method and related device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472639A (en) * 2023-12-27 2024-01-30 中诚华隆计算机技术有限公司 Multi-chip interconnection system and method
CN117472639B (en) * 2023-12-27 2024-03-12 中诚华隆计算机技术有限公司 Multi-chip interconnection system and method

Similar Documents

Publication Publication Date Title
CN112162878B (en) Database fault discovery method and device, electronic equipment and storage medium
KR20180108446A (en) System and method for management of ict infra
CN111459700A (en) Method and apparatus for diagnosing device failure, diagnostic device, and storage medium
CN113239627B (en) Distributed intelligent monitoring method and device
CN110119128B (en) Monitoring management system for laboratory electrical equipment
CN104793607A (en) Online fault diagnosis, health analysis and failure prediction system and online fault diagnosis, health analysis and failure prediction method for servers
CN114723082A (en) Abnormity early warning method and system for intelligent low-voltage complete equipment
CN110597235A (en) Universal intelligent fault diagnosis method
CN117176560A (en) Monitoring equipment supervision system and method based on Internet of things
CN113406508A (en) Battery detection and maintenance method and device based on digital twinning
CN115437886A (en) Fault early warning method, device and equipment based on storage and calculation integrated chip and storage
CN113487086B (en) Method, device, computer equipment and medium for predicting residual service life of equipment
CN113487182B (en) Device health state evaluation method, device, computer device and medium
CN117391675B (en) Data center infrastructure operation and maintenance management method
CN110413482B (en) Detection method and device
CN115618746B (en) Intelligent equipment diagnosis and analysis method and system based on cloud service
CN114265324B (en) Method and device for monitoring running state of equipment and terminal equipment
CN115964935A (en) Data center machine room IT equipment management method, device, server and medium
JP2019191880A (en) Equipment management support system
CN115809818A (en) Multidimensional diagnosis and evaluation method and device for auxiliary equipment of pumped storage power station
WO2021042233A1 (en) Remote diagnosis system, apparatus and method for power tool
CN112804104A (en) Early warning method, device, equipment and medium
CN116708135B (en) Network service fault monitoring method and device, electronic equipment and storage medium
CN112884199B (en) Hydropower station equipment fault prediction method, hydropower station equipment fault prediction device, computer equipment and storage medium
Yang Remote Diagnosis and Detection Technology for Electrical Control of Intelligent Manufacturing CNC Machine Tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20221206