CN115712550A - Floating point calculation performance monitoring device and monitoring method thereof - Google Patents

Floating point calculation performance monitoring device and monitoring method thereof Download PDF

Info

Publication number
CN115712550A
CN115712550A CN202211493429.XA CN202211493429A CN115712550A CN 115712550 A CN115712550 A CN 115712550A CN 202211493429 A CN202211493429 A CN 202211493429A CN 115712550 A CN115712550 A CN 115712550A
Authority
CN
China
Prior art keywords
data
floating point
point calculation
information
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211493429.XA
Other languages
Chinese (zh)
Inventor
甘润东
龙玉江
卫薇
王策
卢仁猛
钟掖
王杰峰
陈卿
袁捷
吴忠
李洵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202211493429.XA priority Critical patent/CN115712550A/en
Publication of CN115712550A publication Critical patent/CN115712550A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a floating point calculation performance monitoring device and a monitoring method thereof, relating to the technical field of computers, and specifically comprising a data acquisition module: receiving an instruction sent by a client, acquiring floating point calculation data information generated when a CPU runs in real time, and adopting a feature extraction module: the method comprises an extraction unit for extracting floating point calculation data information characteristics, and a data scheduling module: the device comprises a queue scheduler and a queue manager, wherein the queue scheduler is connected with the queue manager and a floating point register, and a data monitoring module: monitoring the information characteristics of the floating point calculation data, and the data recovery module: the method comprises a first statistical unit used for obtaining a target floating point calculation data information data set and a second statistical unit used for obtaining a called floating point calculation data information data set. According to the invention, the data monitoring module is used for monitoring the plurality of data queues, so that the monitoring efficiency is improved, and the lost floating point calculation data information is recovered.

Description

Floating point calculation performance monitoring device and monitoring method thereof
Technical Field
The invention relates to a floating point calculation performance monitoring device and a monitoring method thereof, and belongs to the technical field of floating point calculation performance monitoring.
Background
With the depth of the deep learning model becoming deeper and deeper, the size of the model becomes larger and larger due to huge parameters, and the calculation amount also rises, and in practical engineering application, most deep learning models do not need 64-bit floating point number precision, even 32-bit. To increase the computation speed and reduce the space occupied by the model, floating point numbers in BF16 (BFloat 16) format have come to date and have gradually become a standard for deep learning.
Due to the limitations of computer storage space and word length, most computers execute scientific calculations (such as numerical nuclear reactor simulation programs) under the 754 floating-point arithmetic standard of the institute of electrical and electronics engineers (ieee), rounding errors are inevitable in floating-point calculations, and the cumulative effect of rounding errors may seriously affect the calculation results, even cause disastrous consequences, so that the performance of the floating-point numbers of the computers needs to be monitored.
The existing monitoring method for floating point numbers generally comprises the following steps: firstly, acquiring floating point performance data in real time, secondly, determining whether the floating point performance data are abnormal by using a monitoring device, and then, judging the node position of the abnormal floating point performance data so as to repair the data subsequently; otherwise, the data belongs to abnormal data, and an alarm is given to the abnormal data. When a plurality of continuous abnormal data occur, an alarm is carried out every time one abnormal data occurs. However, the alarm is carried out every time one abnormal data is found, which can cause the problems of too frequent alarm and excessive waste of alarm resources; when the floating point performance data is collected, due to reasons such as disconnection of the server, the data stored in the server cannot be collected, so that part of floating point calculation data information is lost, the monitored data is incomplete, and the accuracy of the monitored result is low. Therefore, it is necessary to provide a floating point calculation performance monitoring apparatus and a monitoring method thereof.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the floating point calculation performance monitoring device and the monitoring method thereof can effectively solve the problems that in the background technology, the monitoring method for the floating point number has too frequent alarms and excessively wastes alarm resources, and the problems that data stored in the server cannot be acquired when the floating point performance data is acquired, so that part of floating point calculation data information is lost, the monitored data is incomplete, and the accuracy of the monitored result is low.
The technical scheme adopted by the invention is as follows: a floating point computing performance monitoring device, comprising:
a data acquisition module: receiving an instruction sent by a client, and acquiring floating point calculation data information generated when a CPU runs in real time;
a feature extraction module: the device comprises an extraction unit, a data acquisition module and a data processing unit, wherein the extraction unit is used for extracting floating point calculation data information characteristics according to floating point calculation data information generated by the data acquisition module, presetting a floating point register, setting a data queue in the floating point register, setting a plurality of main cells in the data queue, and storing the floating point calculation data information characteristics in the corresponding main cells of the data queue in a classified manner;
a data scheduling module: the device comprises a queue scheduler and a queue manager, wherein the queue scheduler is connected with the queue manager and a floating point register, the queue scheduler sends a scheduling request to the queue manager, the queue manager generates scheduling information and transmits the scheduling information to the floating point register, and corresponding floating point calculation data information characteristic data are called;
a data monitoring module: the monitoring unit monitors the information characteristics of the floating point calculation data, a threshold value is set in the monitoring unit, whether the floating point calculation data information is abnormal or not is judged by comparing the information characteristics of the floating point calculation data with the threshold value, and the abnormal floating point calculation data information is transmitted to a server database to be stored;
a data recovery module: the device comprises a first statistical unit and a second statistical unit, wherein the first statistical unit is used for obtaining a target floating point calculation data information data set, the second statistical unit is used for obtaining a called floating point calculation data information data set, the first statistical unit is connected with a floating point register, the second statistical unit is connected with a queue manager, missing information data are obtained by comparing the target floating point calculation data information data set in the first statistical unit with the called floating point calculation data information data set in the second statistical unit, and a transmission unit is arranged and inserts the missing information data into a main cell of a corresponding queue.
Preferably, the floating point calculation performance monitoring device further includes a data analysis module, where the data analysis module is configured to analyze and process floating point calculation data information, and transmit the processed data to the feature extraction module through an interface.
Preferably, the floating-point calculation data information characteristics include a sign bit, an exponent bit, a mantissa bit, and a floating-point calculation time length of the floating-point calculation data information, the data queues have multiple groups, the multiple groups of data queues store the exponent bits in groups according to the sign bit of the floating-point calculation data, so that the exponent bits are placed in corresponding main cells, each group of data queues is provided with two sub-data queues, each sub-data queue is provided with multiple sub-cells arranged in sequence, the mantissa bits and the floating-point calculation time of the floating-point calculation data information are stored in a classified manner, and each sub-cell is associated with the corresponding main cell.
Preferably, the monitoring unit is provided with a marking part and a comparing part, the marking part is used for marking each data queue and establishing four sub-index numbers corresponding to the information characteristics of the floating point calculation data, correlation is arranged between two adjacent sub-index numbers, and the comparing part is used for judging whether the floating point calculation time length is greater than a threshold value.
Preferably, the extraction module establishes a secondary index number corresponding to the plurality of data queues, four sub-index numbers of each floating point calculation data information feature are all associated with the secondary index number of the data queue where the floating point calculation data information feature is located, a bitmap is arranged in the data recovery module, the bitmap comprises a plurality of bitmap units, each bitmap unit has a unique primary index number, each primary index number is associated with the secondary index number, and missing information data is stored in the corresponding bitmap unit.
Preferably, the data recovery module further includes a comparison unit and a calculation unit, the comparison unit is configured to compare the target floating point calculation data information data set in the first statistical unit with the called floating point calculation data information data set in the second statistical unit, and the calculation unit calculates missing information data according to the comparison information and calculates a correlation between the missing information data and a bitmap unit in a bitmap.
Preferably, the floating point calculation performance monitoring device further includes an information query module, where the information query module includes a configuration unit, a range determination unit, a conversion unit, and a derivation unit, the configuration unit is configured to configure a matching relationship between source data and target query data in the server database and a data check rule to generate a configuration file, the range determination unit is configured to read the configuration file and select an instruction corresponding to the data, and determine a target data range in the server database, the conversion unit is configured to execute the conversion instruction to match a format of the target data with the target query data, and the derivation unit is configured to execute the derivation instruction to derive the target data matched with the target query data.
A monitoring method of a floating point calculation performance monitoring device comprises the following steps:
s1, when the system is used, a user sends an instruction through a client, a data acquisition module receives the instruction and acquires floating point calculation data information generated when a CPU runs in real time, a data analysis module processes the floating point calculation data information, and the processed data is transmitted to a feature extraction module through an interface;
s2, extracting the characteristics of the floating point calculation data information through an extraction unit in the characteristic extraction module, and storing exponent bits, mantissa bits and floating point calculation duration of the floating point calculation data information in a corresponding data queue, main cell and sub-cell in a classified manner according to the sign bits of the floating point calculation data;
s3, sending a scheduling request to a queue manager through a queue scheduler in the data scheduling module, generating scheduling information by the queue manager according to the scheduling request, transmitting the scheduling information to a floating point register, and scheduling data in each data queue in the floating point register;
s4, monitoring floating point calculation data information through a monitoring unit in a data monitoring module, marking each data queue through a marking part, establishing four sub-index numbers corresponding to characteristics of the floating point calculation data information, judging whether the highest floating point calculation time length in each data queue is larger than a set threshold value through a comparison part, if the highest floating point calculation time length is larger than the set threshold value, transmitting the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue to a server database to be stored into a log file of one type, and if the highest floating point calculation time length is smaller than the set threshold value, transmitting the floating point calculation data information in the data queue to the server database to be stored into a log file of two types;
s5, when abnormal floating point calculation data information is inquired, a configuration unit in the information inquiry module configures the matching relation between source data and target inquiry data in a server database and a data verification rule to generate a configuration file, a range determining unit reads the configuration file and corresponds to a data selection instruction to determine a target data range in a class of files in the server database, a conversion unit executes the conversion instruction to match the format of the target data with the target inquiry data, and an export unit executes an export instruction to export the target data matched with the target inquiry data;
s6, when the server is disconnected and partial floating point calculation data information is lost, comparing a target floating point calculation data information data set in the first statistical unit with an invoked floating point calculation data information data set in the second statistical unit through a comparison unit in the data recovery module, calculating missing information data through the calculation unit according to comparison information, calculating the correlation between the missing information data and a bitmap unit in a bitmap, storing the missing information data in a corresponding bitmap unit, transmitting the missing information to a secondary index number which is associated with the bitmap unit where the missing information data is located and is matched with the index number through a transmission unit, and accurately inserting the missing information data into a main cell and a sub cell of a corresponding data queue.
The invention has the beneficial effects that: compared with the prior art, the invention has the following effects:
1) The method comprises the steps of receiving a client instruction by setting a data acquisition module, acquiring floating point calculation data information generated when a CPU runs in real time, processing the floating point calculation data information by a data analysis module, extracting the characteristics of the floating point calculation data information by setting a characteristic extraction module, storing exponent bits, mantissa bits and floating point calculation duration characteristics of the sign bits of the floating point calculation data in main cells and subcells of each data queue in a classified manner according to the sign bits of the floating point calculation data, and arranging the floating point calculation duration characteristics of each floating point calculation data information in the subcells of each data queue in sequence, thereby conveniently extracting the highest floating point calculation duration for subsequent monitoring;
2) In the invention, a data scheduling module is arranged, a scheduling request is sent to a queue manager through a queue scheduler, the queue manager generates scheduling information and transmits the scheduling information to a floating point register, corresponding floating point calculation data information characteristic data is called, and the scheduling information quantity is uniformly distributed according to a data queue in the process of calling information data, so that the scheduling load of the queue scheduler is reduced;
3) The method comprises the steps that a data monitoring module is arranged, each data queue is marked through a marking part in a monitoring unit, four sub-index numbers corresponding to floating point calculation data information characteristics are established, whether the highest floating point calculation time length in each data queue is larger than a set threshold value or not is judged through a comparison part, if the highest floating point calculation time length is larger than the set threshold value, the floating point calculation data information in the data queue is abnormal, the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue is transmitted to a server database and stored into a log file of one type, the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue is deleted from a data queue corresponding to a floating point register, at the moment, the highest floating point calculation time length in the data queue is compared with the set threshold value again, if the highest floating point calculation time length is smaller than the set threshold value, the floating point calculation data information in the data queue is normal and is transmitted to the server database and stored into a log file of two types, at the moment, all floating point calculation data information in the data queue is transmitted to the server database and stored, and the original floating point calculation data information is deleted from the data queue is monitored, so that the monitoring of each floating point calculation resource is not wasted;
4) According to the invention, by arranging the data recovery module, when data queue information in the floating point register is called, a target floating point calculation data information data set is obtained through the arranged first statistical unit, the called floating point calculation data information data set is obtained through the second statistical unit, the target floating point calculation data information data set and the called floating point calculation data information data set are compared through the comparison unit, missing information data is obtained through calculation by the calculation unit according to the comparison information, and the correlation between the missing information data and a bitmap unit in a bitmap is calculated, so that the missing information data is stored in the corresponding bitmap unit, the missing information is transmitted to a secondary index number which is associated with the bitmap unit where the missing information is located and matched with the index number through the transmission unit, the missing information data is accurately inserted into a main cell and a sub cell of the corresponding data queue, when a server is disconnected or other reasons cause that part of the floating point calculation data information is lost, the lost floating point calculation data information can be recovered, the monitoring data is perfected, and the accuracy of the monitoring result is improved.
Drawings
FIG. 1 is a system diagram of a floating point computing performance monitoring apparatus according to the present invention;
FIG. 2 is a flow chart of a monitoring method of a floating point calculation performance monitoring apparatus according to the present invention;
in the figure: 1. a data acquisition module; 2. a feature extraction module; 3. a data scheduling module; 4. a data monitoring module; 5. a data recovery module; 6. a data analysis module; 7. and an information inquiry module.
Detailed Description
The invention is further described with reference to the accompanying drawings and specific embodiments.
Example 1: as shown in fig. 1-2, a floating point calculation performance monitoring apparatus includes:
the data acquisition module 1: receiving an instruction sent by a client, and acquiring floating point calculation data information generated when a CPU runs in real time;
the feature extraction module 2: the data acquisition module 1 is used for generating floating point calculation data information, and the data acquisition module is used for acquiring the floating point calculation data information of the data queue;
wherein, a second-level index number corresponding to the plurality of data queues is established, and four sub-index numbers of each floating point calculation data information characteristic are all associated with the second-level index number of the data queue where the floating point calculation data information characteristic is located (a floating point register sends a wireless signal to the data queue to generate a plurality of independent first signaling, and each sub-index number is associated with the second-level index number through the corresponding first signaling);
the floating point calculation data information characteristics comprise sign bits, exponent bits, mantissa bits and floating point calculation duration of the floating point calculation data information, the data queues are provided with a plurality of groups, the exponent bits of the data queues are stored in groups according to the sign bits of the floating point calculation data to be put into corresponding main cells, two sub data queues are set in each group of data queues, a plurality of sub cells which are arranged in sequence are arranged in each sub data queue, the mantissa bits and the floating point calculation time of the floating point calculation data information are stored in a classified mode, and each sub cell is associated with the corresponding main cell;
the data scheduling module 3: the device comprises a queue scheduler and a queue manager, wherein the queue scheduler is connected with the queue manager and a floating point register, the queue scheduler sends a scheduling request to the queue manager, the queue manager generates scheduling information and transmits the scheduling information to the floating point register, and the scheduling information characteristic data of corresponding floating point calculation data is called;
the data monitoring module 4: the monitoring unit monitors the information characteristics of the floating point calculation data, a threshold is set in the monitoring unit, whether the floating point calculation data information is abnormal or not is judged by comparing the information characteristics of the floating point calculation data with the threshold, and the abnormal floating point calculation data information is transmitted to a server database to be stored;
the monitoring unit is provided with a marking part and a comparison part, the marking part is used for marking each data queue and establishing four sub-index numbers corresponding to the information characteristics of the floating point calculation data, correlation is arranged between two adjacent sub-index numbers, and the comparison part is used for judging whether the floating point calculation time length is greater than a threshold value or not;
the data recovery module 5: the device comprises a first statistical unit and a second statistical unit, wherein the first statistical unit is used for obtaining a target floating point calculation data information data set, the second statistical unit is used for obtaining a called floating point calculation data information data set, the first statistical unit is connected with a floating point register, the second statistical unit is connected with a queue manager, missing information data are obtained by comparing the target floating point calculation data information data set in the first statistical unit with the called floating point calculation data information data set in the second statistical unit, and a transmission unit is arranged and inserts the missing information data into a main cell of a corresponding queue.
The data recovery module 5 is provided with a bitmap, the bitmap comprises a plurality of bitmap units, each bitmap unit has a unique primary index number, each primary index number is associated with a secondary index number (the bitmap sends a signal instruction to each bitmap unit, a plurality of independent second signaling is established, each primary index number is associated with the secondary index number through the second signaling), and missing information data is stored in the corresponding bitmap unit;
the data recovery module 5 further includes a comparison unit and a calculation unit, the comparison unit is configured to compare a target floating point calculation data information data set in the first statistics unit with a called floating point calculation data information data set in the second statistics unit (the comparison unit extracts data in the target floating point calculation data information data set and the called floating point calculation data information data set to generate a first number pair and a second number pair, and the data in the first number pair and the second number pair are sequentially arranged and compared with each other through the first number pair, different data generated during comparison is missing information data), the calculation unit calculates missing information data according to the comparison information and calculates a correlation between the missing information data and a bitmap unit in a bitmap (the calculation unit in the data recovery module receives a signal sent by a computer terminal and generates an instruction, calls the missing information data, generates a corresponding first character string, calls the bitmap unit from the bitmap unit, generates a corresponding second character string, and analyzes a feature of the first character string and the second character string through a training model preset in the calculation unit;
the floating point calculation data processing system further comprises a data analysis module 6, wherein the data analysis module 6 is used for analyzing and processing the floating point calculation data information (analyzing data through an analysis unit in the data analysis module to generate a data table matched with the characteristics of the floating point calculation data information, and an extraction unit in the characteristic extraction module extracts the data), and transmitting the processed data to the characteristic extraction module 2 through an interface.
The system comprises an information query module 7, wherein the information query module 7 comprises a configuration unit, a range determining unit, a converting unit and an exporting unit, the configuration unit is used for configuring the matching relation between source data and target query data in a server database and a data verification rule to generate a configuration file, the range determining unit is used for reading the configuration file and corresponding to a data selection instruction to determine a target data range in the server database, the converting unit is used for executing the converting instruction to match the format of the target data with the target query data, and the exporting unit is used for executing the exporting instruction to export the target data matched with the target query data.
Example 2: as shown in fig. 2, a monitoring method of a floating-point computing performance monitoring apparatus includes the following steps:
s1, when the system is used, a user sends an instruction through a client, a data acquisition module 1 receives the instruction and acquires floating point calculation data information generated when a CPU runs in real time, a data analysis module 6 processes the floating point calculation data information, and the processed data is transmitted to a feature extraction module 2 through an interface;
s2, extracting the characteristics of the floating point calculation data information through an extraction unit in the characteristic extraction module 2, and storing exponent bits, mantissa bits and floating point calculation duration of the floating point calculation data information in a corresponding data queue, main cell and sub cell in a classified manner according to the sign bits of the floating point calculation data;
s3, sending a scheduling request to a queue manager through a queue scheduler in the data scheduling module 3, generating scheduling information by the queue manager according to the scheduling request, transmitting the scheduling information to a floating point register, and scheduling data in each data queue in the floating point register;
s4, monitoring floating point calculation data information through a monitoring unit in a data monitoring module 4, marking each data queue through a marking part, establishing four sub-index numbers corresponding to characteristics of the floating point calculation data information, judging whether the highest floating point calculation time length in each data queue is larger than a set threshold through a comparison part (a comparison unit extracts data in a target floating point calculation data information dataset and an invoked floating point calculation data information dataset to generate a first number pair and a second number pair, the data in the first number pair and the data in the second number pair are arranged in sequence, the first number pair and the second number pair are compared, different data generated during comparison are missing information data), when the highest floating point calculation time length is larger than the set threshold, the floating point calculation data information in the data queue is abnormal, transmitting the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue to a server database to be stored as a log file of one class, and if the highest floating point calculation time length is smaller than the set threshold, transmitting the floating point calculation data information in the data queue to the server database to be stored as a log file of two classes; the second type of log files are stored as historical data, so that subsequent searching is facilitated;
a. when the highest floating point calculation time length is larger than a set threshold value, transmitting the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue to a server database to be stored as a log file, deleting the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue corresponding to the floating point register, comparing the highest floating point calculation time length in the data queue with the set threshold value again, and comparing each floating point calculation data information in the data queue according to the steps;
b. when the highest floating point calculation duration is smaller than a set threshold, the floating point calculation data information in the data queue is normal, the floating point calculation data information is transmitted to a server database and stored as a second-class log file, all the floating point calculation data information in the data queue is transmitted to the server database for storage, and the original floating point calculation data information is deleted from the data queue;
judging whether the floating point calculation data information is abnormal or not through the steps a and b;
s5, when abnormal floating point calculation data information is inquired, a configuration unit in the information inquiry module 7 configures the matching relation between source data and target inquiry data in a server database and a data check rule to generate a configuration file, a range determining unit reads the configuration file and corresponds to a data selection instruction to determine a target data range in a class of files in the server database, a converting unit executes a conversion instruction to match the format of the target data with the target inquiry data, and an export unit executes an export instruction to export the target data matched with the target inquiry data;
s6, when the server is disconnected, and partial floating point calculation data information is lost, comparing a target floating point calculation data information data set in the first statistical unit with a called floating point calculation data information data set in the second statistical unit through a comparison unit in the data recovery module, calculating missing information data through the calculation unit according to comparison information, calculating the correlation between the missing information data and a bitmap unit in a bitmap, storing the missing information data in the corresponding bitmap unit, transmitting the missing information to a secondary index number which is associated with the bitmap unit where the missing information data is located and is matched with the index number through a transmission unit, and accurately inserting the missing information data into a main cell and a sub cell of the corresponding data queue.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and therefore, the scope of the present invention should be determined by the scope of the claims.

Claims (8)

1. A floating point calculation performance monitoring device is characterized in that: the method comprises the following steps:
data acquisition module (1): receiving an instruction sent by a client, and acquiring floating point calculation data information generated when a CPU runs in real time;
a feature extraction module (2): the device comprises an extraction unit, a data acquisition module (1), a data processing unit and a data processing unit, wherein the extraction unit is used for extracting floating point calculation data information characteristics according to floating point calculation data information generated by the data acquisition module (1), presetting a floating point register, setting a data queue in the floating point register, setting a plurality of main cells in the data queue, and storing the floating point calculation data information characteristics in the corresponding main cells of the data queue in a classified manner;
data scheduling module (3): the floating-point calculation data information characteristic data acquisition device comprises a queue scheduler and a queue manager, wherein the queue scheduler is connected with the queue manager and the floating-point register, the queue scheduler sends a scheduling request to the queue manager, the queue manager generates scheduling information and transmits the scheduling information to the floating-point register, and the corresponding floating-point calculation data information characteristic data is called;
data monitoring module (4): the monitoring unit monitors the information characteristics of the floating point calculation data, a threshold is set in the monitoring unit, whether the floating point calculation data information is abnormal or not is judged by comparing the information characteristics of the floating point calculation data with the threshold, and the abnormal floating point calculation data information is transmitted to a server database to be stored;
data recovery module (5): the device comprises a first statistical unit and a second statistical unit, wherein the first statistical unit is used for obtaining a target floating point calculation data information data set, the second statistical unit is used for obtaining a called floating point calculation data information data set, the first statistical unit is connected with a floating point register, the second statistical unit is connected with a queue manager, missing information data are obtained by comparing the target floating point calculation data information data set in the first statistical unit with the called floating point calculation data information data set in the second statistical unit, and a transmission unit is arranged and inserts the missing information data into a main cell of a queue corresponding to the missing information data.
2. The floating point computing performance monitoring device of claim 1, wherein: the floating point calculation data processing system further comprises a data analysis module (6), wherein the data analysis module (6) is used for analyzing and processing the floating point calculation data information and transmitting the processed data to the feature extraction module (2) through an interface.
3. The floating point computing performance monitoring device of claim 2, wherein: the characteristics of the floating point calculation data information comprise a sign bit, an exponent bit, a mantissa bit and a floating point calculation time length of the floating point calculation data information, the data queues are provided with a plurality of groups, the exponent bits of the floating point calculation data information are stored in groups according to the sign bit of the floating point calculation data information, the data queues are placed in corresponding main cells, two sub data queues are set in each group of data queues, a plurality of sub cells which are arranged in sequence are arranged in each sub data queue, the mantissa bits and the floating point calculation time of the floating point calculation data information are stored in a classified mode, and association is established between each sub cell and the corresponding main cell.
4. The floating point computing performance monitoring device of claim 3, wherein: the monitoring unit is provided with a marking part and a comparison part, the marking part is used for marking each data queue and establishing four sub-index numbers corresponding to the information characteristics of the floating point calculation data, correlation is arranged between two adjacent sub-index numbers, and the comparison part is used for judging whether the floating point calculation time length is greater than a threshold value.
5. The floating point computing performance monitoring device of claim 4, wherein: and establishing a secondary index number corresponding to the data queues, wherein four sub-index numbers of each floating point calculation data information characteristic are associated with the secondary index number of the data queue where the floating point calculation data information characteristic is located, a bitmap is arranged in the data recovery module (5), the bitmap comprises a plurality of bitmap units, each bitmap unit has a unique primary index number, each primary index number is associated with the secondary index number, and the missing information data is stored in the corresponding bitmap unit.
6. The floating point computing performance monitoring device of claim 1 or 5, wherein: the data recovery module (5) further comprises a comparison unit and a calculation unit, wherein the comparison unit is used for comparing the target floating point calculation data information data set in the first statistical unit with the called floating point calculation data information data set in the second statistical unit, the calculation unit calculates missing information data according to comparison information, and calculates the correlation between the missing information data and the bitmap unit in the bitmap.
7. The floating point computing performance monitoring device of claim 6, wherein: the information query module (7) comprises a configuration unit, a range determination unit, a conversion unit and an export unit, wherein the configuration unit is used for configuring the matching relation between source data and target query data in the server database and a data verification rule to generate a configuration file, the range determination unit is used for reading the configuration file and corresponding to a data selection instruction to determine a target data range in the server database, the conversion unit is used for executing the conversion instruction to match the format of the target data with the target query data, and the export unit is used for executing the export instruction to export the target data matched with the target query data.
8. A monitoring method of a floating point calculation performance monitoring device is characterized in that: the method comprises the following steps:
s1, when the system is used, a user sends an instruction through a client, a data acquisition module (1) receives the instruction and acquires floating point calculation data information generated when a CPU runs in real time, a data analysis module (6) processes the floating point calculation data information, and the processed data is transmitted to a feature extraction module (2) through an interface;
s2, extracting the characteristics of the floating point calculation data information through an extraction unit in the characteristic extraction module (2), and storing exponent bits, mantissa bits and floating point calculation duration of the floating point calculation data information in corresponding data queues, a main cell and a sub cell in a classified manner according to sign bits of the floating point calculation data;
s3, sending a scheduling request to a queue manager through a queue scheduler in the data scheduling module (3), generating scheduling information by the queue manager according to the scheduling request, transmitting the scheduling information to a floating point register, and scheduling data in each data queue in the floating point register;
s4, monitoring floating point calculation data information through a monitoring unit in a data monitoring module (4), marking each data queue through a marking part, establishing four sub-index numbers corresponding to characteristics of the floating point calculation data information, judging whether the highest floating point calculation time length in each data queue is larger than a set threshold value through a comparison part, transmitting the floating point calculation data information corresponding to the highest floating point calculation time length in the data queue to a server database to store the floating point calculation data information as a class log file when the highest floating point calculation time length is larger than the set threshold value, and transmitting the floating point calculation data information in the data queue to the server database to store the floating point calculation data information as a class log file if the highest floating point calculation time length is smaller than the set threshold value;
s5, when abnormal floating point calculation data information is inquired, a configuration unit in the information inquiry module (7) is used for configuring the matching relation between source data and target inquiry data in a server database and a data verification rule to generate a configuration file, a range determining unit reads the configuration file and corresponds to a data selection instruction to determine a target data range in a file of the server database, a conversion unit executes a conversion instruction to match the format of the target data with the target inquiry data, and a lead-out unit executes a lead-out instruction to lead out the target data matched with the target inquiry data;
s6, when the server is disconnected and partial floating point calculation data information is lost, comparing a target floating point calculation data information data set in the first statistical unit with an invoked floating point calculation data information data set in the second statistical unit through a comparison unit in the data recovery module, calculating missing information data through the calculation unit according to comparison information, calculating the correlation between the missing information data and a bitmap unit in a bitmap, storing the missing information data in a corresponding bitmap unit, transmitting the missing information to a secondary index number which is associated with the bitmap unit where the missing information data is located and is matched with the index number through a transmission unit, and accurately inserting the missing information data into a main cell and a sub cell of a corresponding data queue.
CN202211493429.XA 2022-11-25 2022-11-25 Floating point calculation performance monitoring device and monitoring method thereof Pending CN115712550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211493429.XA CN115712550A (en) 2022-11-25 2022-11-25 Floating point calculation performance monitoring device and monitoring method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211493429.XA CN115712550A (en) 2022-11-25 2022-11-25 Floating point calculation performance monitoring device and monitoring method thereof

Publications (1)

Publication Number Publication Date
CN115712550A true CN115712550A (en) 2023-02-24

Family

ID=85234832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211493429.XA Pending CN115712550A (en) 2022-11-25 2022-11-25 Floating point calculation performance monitoring device and monitoring method thereof

Country Status (1)

Country Link
CN (1) CN115712550A (en)

Similar Documents

Publication Publication Date Title
CN112087334B (en) Alarm root cause analysis method, electronic device and storage medium
CN112100149B (en) Automatic log analysis system
Kobayashi et al. Towards an NLP-based log template generation algorithm for system log analysis
CN115409395B (en) Quality acceptance inspection method and system for hydraulic construction engineering
CN109933502B (en) Electronic device, user operation record processing method and storage medium
CN112799897B (en) Information management method, management system and storage medium based on big data
CN112732567B (en) Mock data testing method and device based on ip, electronic equipment and storage medium
CN112486767B (en) Intelligent monitoring method, system, server and storage medium for cloud resources
CN111913824A (en) Method for determining data link fault reason and related equipment
CN117155771B (en) Equipment cluster fault tracing method and device based on industrial Internet of things
CN115712550A (en) Floating point calculation performance monitoring device and monitoring method thereof
CN110955757A (en) Photovoltaic power station equipment log retrieval method and system
CN115599842A (en) Time series prediction system based on neural network method
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN113537942A (en) Method and system for increasing number of sample marks
CN114385398A (en) Request response state determination method, device, equipment and storage medium
CN115129548A (en) Alarm analysis method, device, equipment and medium
CN109977992B (en) Electronic device, method for identifying batch registration behaviors and storage medium
CN112333155A (en) Abnormal flow detection method and system, electronic equipment and storage medium
CN116052404B (en) 5G communication technology-based power grid data interaction system
CN116415423B (en) Computer simulation data processing system and method based on big data analysis
CN115859911B (en) Automatic label generation evolution method and device adapting to dynamic change of data
CN117873839B (en) Fault detection method, device, equipment and storage medium of complex computing system
CN114757541B (en) Performance analysis method, device, equipment and medium based on training behavior data
CN112328960B (en) Optimization method and device for data operation, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination