CN117593330B

CN117593330B - Visual real-time vibration monitoring method

Info

Publication number: CN117593330B
Application number: CN202311574036.6A
Authority: CN
Inventors: 秦迪梅; 廖遥
Original assignee: Sichuan Huanyu Zhongheng Technology Co ltd
Current assignee: Sichuan Huanyu Zhongheng Technology Co ltd
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2024-06-14
Anticipated expiration: 2043-11-23
Also published as: CN117593330A

Abstract

The invention relates to the field of vibration measurement, and provides a visual real-time vibration monitoring method, which comprises the following steps: preprocessing a video frame image, and then extracting characteristic points of a target object to form a characteristic vector; calculating similarity scores between the feature vectors and the feature template dictionary P-N, and judging the category of the feature points to obtain candidate key points; creating a tracker list of candidate key points of the previous frame, and matching the candidate key points in the current frame based on the tracker list; if the tracker of the previous frame is not matched with the candidate key point, estimating the position of the candidate key point of the current frame; if the candidate key points which are not matched exist in the current frame, a new tracker is created for the candidate key points and added into a tracker list, so that a final key point is formed; analyzing vibration signals of key points based on the key points in the tracker list to obtain the vibration frequency and the vibration amplitude of the target object; and forming a visual report of the vibration frequency and the vibration amplitude of the target object.

Description

Visual real-time vibration monitoring method

Technical Field

The invention relates to the technical field of vibration measurement, in particular to a visual real-time vibration monitoring method.

Background

The key industrial equipment and objects can generate different degrees of vibration in the operation process, and the vibration can be caused by the structural characteristics, the working parameters, the external interference and other factors of the equipment. If the vibration exceeds the normal range, damage, malfunction or accident of the apparatus may be caused, and serious consequences may be brought about industrially. Therefore, it is very necessary and important to monitor and analyze the vibration of key equipment, machines, etc. in real time, so that potential problems can be found and prevented in time, and the reliability and the service life of the equipment are improved. At present, the vibration monitoring method mainly comprises the following steps:

One approach is to use acceleration or other types of sensors fixed to the target object, collect its vibration signals, and transmit them to a data processing center for analysis, either by wire or wirelessly. The advantage of this method is that continuous acquisition and real-time analysis of the vibration signal can be achieved, but there are also some drawbacks, such as: the installation of the sensor occupies equipment space and can influence the normal operation of the equipment; the sensor itself is also affected by factors such as ambient temperature, humidity, electromagnetic interference, etc., resulting in degradation of signal quality; the number of the sensors is limited, and all the parts which can possibly generate abnormal vibration cannot be covered; the data transmission mode may have problems of delay, loss, interference and the like; the data processing center requires a large amount of memory and computing resources.

Another method is to use a hand-held or portable vibration detection instrument to detect the target object periodically or aperiodically. The method has the advantages that the detection time and the detection position can be flexibly selected, and the important detection can be carried out on different parts of the equipment according to the needs, but the method also has the disadvantages such as: the detection process needs manual operation, so that manpower and material resources are consumed; the detection frequency and coverage are limited by personnel arrangement, equipment number and other factors; the detection result needs to be manually judged and recorded, and subjective errors and omission are easy to occur; the detection result cannot be fed back to related personnel and departments in time, and quick response and processing are difficult to achieve.

In summary, the existing vibration detection methods have certain limitations and defects, and cannot meet the requirements for real-time monitoring and analysis of vibration states of equipment during operation. Therefore, a method is needed to improve the efficiency and accuracy of real-time detection of device vibrations.

Disclosure of Invention

The invention aims to detect the vibration condition of equipment in real time and provides a visual real-time vibration monitoring method.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

a visual real-time vibration monitoring method comprising the steps of:

Step 1, shooting a video of a target object, preprocessing a video frame image, extracting characteristic points of the target object, and forming a characteristic vector; constructing a feature template dictionary P-N, calculating a similarity score between the feature vector and the feature template dictionary P-N, judging the category of the feature points according to the similarity score, wherein the category comprises positive, negative and uncertain, and taking the category as a candidate key point;

Step 2, creating a tracker list of candidate key points of the previous frame, and matching the candidate key points in the current frame based on the tracker list; if the tracker of the previous frame is not matched with the candidate key point, estimating the position of the candidate key point of the current frame; if the candidate key points which are not matched exist in the current frame, a new tracker is created for the candidate key points and added into a tracker list, so that a final key point is formed;

step 3, analyzing vibration signals of key points based on the key points in the tracker list to obtain the vibration frequency and the vibration amplitude of the target object;

And 4, forming a visual report of the vibration frequency and the vibration amplitude of the target object.

In the step 1, the step of calculating the similarity score between the feature vector and the feature template dictionary P-N includes:

The calculation expression of the similarity score is as follows:

Wherein V _i represents the feature vector of the ith feature point, and N 'feature points are shared, i epsilon N'; d _p,n denotes the feature vector of the sample nearest to V _i in the feature template dictionary P-N; 、/>、/> is a weight coefficient, satisfies/> ; KL () represents KL divergence; ED () represents the Euclidean distance; CS () represents cosine similarity;

the expression of KL divergence is as follows:

Wherein V _i and D _p,n are two discrete probability distributions; u is a possible value;

the expression of the Euclidean distance is as follows:

The expression of cosine similarity is as follows:

Wherein V _i and D _p,n are two m-dimensional vectors; v _i,k (x, y) is the coordinate of V _i in the kth dimension, D _p,n,k (x, y) is the coordinate of D _p,n in the kth dimension, k ε m.

In the step 1, the step of judging the category of the feature point according to the similarity score, wherein the category comprises positive, negative and uncertain, and the step of taking the category as the candidate key point comprises the following steps:

Setting a threshold h, and combining a feature vector V _i of the feature points and a feature template dictionary P-N to obtain the category of the feature points through the following calculation, wherein the category comprises positive, negative and uncertain:

s1: calculating a minimum similarity score _DP for the samples in the feature vector V _i and the positive sample DP;

s2: calculating a minimum similarity score _DN for the samples in the feature vector V _i and the negative sample DN;

s3: if score _DP<score_DN and score _DP < h, then class y _i =1 for the feature point, i.e., class positive;

s4: if score _DP>score_DN and score _DN > h, the category y _i = -1 of the feature point, i.e. the category is negative;

s5: otherwise, the category y _i =0 of the feature point, that is, the category is uncertain;

The feature points of the category y _i = -1 are excluded, and the feature points of the category y _i =1 and y _i =0 are used as candidate key points.

In the step 2, a tracker list of candidate key points of the previous frame is created, and the step of matching the candidate key points in the current frame based on the tracker list includes:

step1: creating a corresponding tracker list according to the detected candidate key points For storing the displacement time sequence of the same candidate key point of two adjacent frames, wherein/>A tracker indicating that the ith candidate key point is at the t frame, storing a coordinate sequence of the 1 st to the t frames of the ith candidate key point,Representing coordinates of the ith candidate key point in the t frame;

step2: matching optimal and non-repeated candidate key points in the current frame based on a tracker of the previous frame;

The coordinates in the tracker of the previous frame and the coordinates of each candidate key point of the current frame are used, and the similarity of the candidate key points of the previous frame and the next frame is matched by the similarity expression as follows:

Wherein sim _i,j is the similarity of candidate key points of the two frames before and after; t _i is the coordinate sequence of the ith tracker in the previous frame; k _j is the coordinate of the j-th candidate key point in the current frame; y _i and y _j are the categories corresponding to the candidate keypoints of the previous frame and the current frame, respectively; s is the scale of the candidate keypoints; ED () is Euclidean distance; kd () is a kronecker function;

step3: if the ith tracker of the previous frame is successfully matched with a candidate key point in the current frame, adding the coordinates of the candidate key point Into the ith tracker;

a tracker representing the ith candidate keypoint at the t-1 th frame;

step4: and repeating Step2-Step3 until all trackers of the previous frame are matched with the candidate key points of the current frame or all candidate key points of the current frame are matched by the trackers of the previous frame.

Compared with the prior art, the invention has the beneficial effects that:

The method and the device can acquire the vibration information of the target object through key point detection, matching and tracking on different types of equipment, acquire the vibration amplitude and the vibration frequency, and display the vibration amplitude and the vibration frequency to a user through a real-time visual graphical interface so as to intuitively judge, discover and process faults or anomalies in time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of candidate keypoints according to an embodiment of the present invention, where (a) in fig. 2 is 4 candidate keypoints on the vibrating table, and (b) in fig. 2 is an image after the graying process and the binarizing process.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.

Example 1:

the invention is realized by the following technical scheme, as shown in fig. 1, a visual real-time vibration monitoring method comprises the following steps:

Step 1, shooting a video of a target object, preprocessing a video frame image, extracting characteristic points of the target object, and forming a characteristic vector; constructing a feature template dictionary P-N, calculating a similarity score between the feature vector and the feature template dictionary P-N, judging the category of the feature points according to the similarity score, wherein the category comprises positive, negative and uncertain, and taking the category as a candidate key point.

The target object is represented by a series of key points, the vibration frequency and the vibration amplitude of the target object are analyzed based on the key points, and the method has the advantages of partial representation of the whole and small calculated amount and is suitable for vibration detection.

The camera is used for shooting the video of the target object, and the preprocessing of the video frame image comprises graying processing and binarizing processing. The acquired video frame images are subjected to gray level processing by using a weighted average method, gray level images are obtained, green passes are highest in weight, red passes are lowest, and blue passes are lowest according to different color sensitive programs of human eyes, and the formula is as follows:

Wherein Gray represents a graying processed image, R, G, B represents images of three channels of red, green and blue of a color image, respectively.

Then, binarization processing is performed on the gray level image to obtain a binarized image, and the binarization processing can be implemented by using different threshold processing methods, including a global threshold, an adaptive threshold, a local threshold and the like. In the binarization processing of the scheme, a fixed global threshold value T is used, each pixel in the gray image is compared with T, if the gray value of the pixel is larger or smaller than T, the pixel is set to be white (255), and if the gray value of the pixel is equal to T, the pixel is set to be black (0), and the binarization processing formula is as follows:

where B (x, y) represents the pixel value of (x, y) in coordinates in the binarized image, and Gray (x, y) represents the pixel value of (x, y) in coordinates in the Gray-scale image.

Then, feature points are detected and extracted from the binarized image, wherein the feature points can be the outline of the target object, and a curve formed by connecting feature points with the same gray value represents outline information and represents the shape and boundary of the target object. Descriptors of feature points of the current frame may also be detected according to different retrieval modes and approximation methods, including but not limited to: a list containing all profile information, an array representing the profile hierarchy, SURF descriptors, SIFT descriptors. That is, the descriptor may describe information of feature points, each feature point detected by the current frame needs to generate an 11-dimensional feature vector V _i∈R^1×11, and through multiple experimental comparison, the feature vector V _i that performs best is composed of the following 11 elements (i represents the ith feature point, N 'is the total number of feature points, i e N'):

V _i = { abscissa x _i of feature point center, ordinate y _i of feature point center, minimum feature point rectangular frame length, minimum feature point rectangular frame width, number of feature points on the target object contour, hierarchical information of the target object contour (vector with length of 4, indexes of the last, the first child node and the first father node of the current contour are respectively represented, if no corresponding node exists, -1), average pixels inside the feature point contour, average pixel values around the feature point contour extending outwards for 10 pixels }.

The scheme constructs a feature template dictionary P-N (P-N), calculates similarity scores between the feature template dictionary P-N and a feature vector V _i, and judges whether the feature points are selected as candidate key points. The established feature template dictionary P-N includes a positive sample dictionary DP including key point objects, and a negative sample dictionary ND including non-key point objects such as background, noise, etc.

The scheme improves the calculation of the similarity score, takes the weighted average value of KL divergence, euclidean distance and cosine similarity into consideration, and the improved expression is as follows:

Wherein V _i represents the feature vector of the ith feature point, and N 'feature points are shared, i epsilon N'; d _p,n denotes the feature vector of the sample nearest to V _i in the feature template dictionary P-N; 、/>、/> is a weight coefficient, satisfies/> ; KL () represents the KL divergence, which is used to measure the difference of two probability distributions; ED () represents the Euclidean distance, which is used to measure the spatial distance of two vectors; CS () represents cosine similarity, which is used to measure the directional similarity of two vectors.

The expression of KL divergence is as follows:

Wherein V _i and D _p,n are two discrete probability distributions; u is a possible value. The smaller the value of the KL divergence, the more closely the two distributions are, the amount of information that needs to be paid from distribution V _i to approximate distribution D _p,n, i.e., the loss of information entropy, is calculated.

The expression of the Euclidean distance is as follows:

Wherein V _i and D _p,n are two m-dimensional vectors; v _i,k (x, y) is the coordinate of V _i in the kth dimension, D _p,n,k (x, y) is the coordinate of D _p,n in the kth dimension, k ε m. The Euclidean distance calculates the linear distance of the two vectors in Euclidean space, and the smaller the Euclidean distance is, the closer the two vectors are.

The expression of cosine similarity is as follows:

the cosine similarity calculates the cosine value of the included angle between the two vectors, namely the inner product between the projections of the cosine similarity on the unit circle, and the closer the cosine similarity is to 1, the more parallel the two vectors are; the closer the cosine similarity is to-1, the more inverted the two vectors are illustrated; the closer the cosine similarity is to 0, the more orthogonal the two vectors are.

The meaning of the similarity score is that the difference of the feature vector of the feature point and the feature vector of the sample in the feature template dictionary P-N in the distribution, distance and direction is comprehensively considered, so that a comprehensive score is obtained, the lower the score is, the more similar the score is, and otherwise the lower the score is, the more similar the score is.

Setting a threshold h, taking a feature vector V _i of a feature point and a feature template dictionary P-N as inputs, and outputting the category of the feature point after calculation, wherein the category comprises three categories: positive (1), negative (-1), uncertain (0), the uncertain feature points typically indicate noise or interference of irrelevant areas caused by changing environments or backgrounds, which would be discarded during the candidate keypoint matching phase. The calculation mode of the judgment category is as follows:

s3: if score _DP<score_DN and score _DP < h, the category y _i =1 for that feature point;

s4: if score _DP>score_DN and score _DN > h, the category y _i = -1 for that feature point;

s5: otherwise, the category y _i =0 of the feature point.

Through multiple experiments, the threshold h=22 is enough to exclude the feature points of the non-key point object, and meanwhile, the detection accuracy of the feature points is ensured to reach more than 95%. In table 1, the threshold h is dynamically adjusted according to different scenes, and the code flow for judging the category of the feature points is listed.

TABLE 1 feature Point detection code

And excluding the characteristic points of the category y _i = -1, taking the characteristic points of the categories y _i =1 and y _i =0 as candidate key points, and entering a key point matching and tracking stage of the step 2.

As shown in fig. 2 (a), the present embodiment screens out 4 candidate key points (black dots) on the vibrating table, and fig. 2 (b) is an image obtained by graying and binarizing, so long as the threshold h is properly set, most of irrelevant information can be filtered out, and the candidate key points and part of noise remain. However, the candidate keypoints left in this step may not filter all irrelevant information, so that non-keypoint samples need to be further excluded when the keypoints are matched.

Step 2, creating a tracker list of candidate key points of the previous frame, and matching the candidate key points in the current frame based on the tracker list; if the tracker of the previous frame is not matched with the candidate key point, estimating the position of the candidate key point of the current frame; if the candidate key points which are not matched exist in the current frame, a new tracker is created for the candidate key points and added to a tracker list, so that a final key point is formed.

In order to match and track the new position of the candidate key point in the current frame as accurately as possible, the new position is obtained by weighing between the speed and the precision based on the detection result of the candidate key point of the previous frame and the candidate key point of the current frame, and the candidate key point displacement parameter is estimated from the detection pair of the previous frame and the subsequent frame.

Step1: creating a corresponding tracker list according to the detected candidate key pointsFor storing the displacement time sequence of the same candidate key point of two adjacent frames, wherein/>A tracker indicating that the ith candidate key point is at the t frame, storing a coordinate sequence of the 1 st to the t frames of the ith candidate key point,And representing the coordinates of the ith candidate key point in the t frame.

Step2: the optimal and non-repeated candidate key points in the current frame are matched based on the tracker of the previous frame.

The space and the semantic plane distance are comprehensively considered, the coordinates in the tracker of the previous frame, the coordinates of each candidate key point of the current frame and the similarity of the corresponding categories are used for matching the candidate key points of the previous frame and the next frame, and the similarity expression is as follows:

Wherein sim _i,j is the similarity of candidate key points of the two frames before and after; t _i is the coordinate sequence of the ith tracker in the previous frame; k _j is the coordinate of the j-th candidate key point in the current frame; y _i and y _j are the categories corresponding to the candidate keypoints of the previous frame and the current frame, respectively; s is the scale of the candidate keypoint (i.e. the smallest circumscribed rectangular area of the candidate keypoint is open); ED () is Euclidean distance; kd () is a kronecker function (Kronecker delta function) that determines if the two categories are identical, and if so, is equal to 1, otherwise is equal to 0. The meaning of the similarity sim _i,j is that the spatial proximity degree and the semantic consistency of candidate key points in the front frame and the rear frame are comprehensively considered, so that a similarity value is obtained, the higher the similarity value is, the more matching is indicated, and the lower the similarity value is, the more mismatching is indicated.

Step3: if the ith tracker of the previous frame is successfully matched with a candidate key point in the current frame, adding the coordinates of the candidate key pointInto the ith tracker.

Step5: for trackers which are not matched with the candidate key or are lost due to factors such as shielding, blurring and the like in the image, the scheme does not directly reject the candidate key points from a tracker list, but provides the following three available strategies for further judgment:

Strategy 1: the most likely location of the missing candidate keypoints is predicted using kalman filtering, and the most similar candidate keypoints are searched around the location, if found, the tracker is updated, otherwise the prediction state is maintained until a suitable candidate keypoint is found or the maximum allowed number of missing frames is exceeded.

Kalman filtering is a recursive filtering algorithm based on a state space model, and can utilize a dynamic model and an observation model of a system to optimally estimate the state of the system so as to predict the most probable candidate key points. The Kalman filtering algorithm firstly predicts the state of the previous moment according to a dynamic model of the system to obtain prior estimation; and then correcting the prior estimation by using the observation value at the current moment according to the observation model to obtain posterior estimation. The Kalman filtering has the advantage of being capable of effectively processing uncertainty and noise interference of the system to improve the accuracy of state estimation.

Assuming that the positions (x _t,y_t) of the lost candidate key points follow a two-dimensional Gaussian distribution, defining a dynamic model and an observation model of the system by taking the positions of the candidate key points as system states and taking pixel values in an image as observation values.

Dynamic model:

Wherein, Is the time interval between two frames; w _x and w _y are system noise following gaussian distribution.

Observation model:

Wherein z _t is the observed value of the current moment, namely the estimated candidate key point in the current frame image; h _x and h _y are elements of the observation matrix, and represent mapping relations between candidate key points and pixel values; v _x and v _y are observation noise subject to gaussian distribution, and according to the kalman filter algorithm flow, the estimation of the prediction step and the updating step can be performed on the candidate key points.

And a prediction step:

Wherein, Is an a priori estimate,/>Is a posterior estimate of the previous time instant; /(I)Is a priori error covariance matrix; p _t-1 is the posterior error covariance matrix at the previous time; f is a state transition matrix; q is the system noise covariance matrix.

Updating:

Obtaining posterior estimation and a posterior error covariance matrix according to the observation model and the observation value at the current moment:

Wherein, Is a posterior estimate; k _t is the Kalman gain matrix; h is the observation matrix; i is the identity matrix. The calculation formula of the Kalman gain matrix is as follows:

Wherein H ^T is the transposed matrix of H; r is the observed noise covariance matrix.

The optimal estimate and estimation error of the candidate keypoint location is obtained by a kalman filter, and when the candidate keypoint is lost, the prior estimate can be used as a predicted location, and the most similar candidate keypoint can be searched near the location. If so, the tracker is updated, otherwise the prediction state is maintained until a suitable match is found or the maximum allowable number of lost frames is exceeded.

Strategy 2: the motion direction and speed of the lost candidate key points are estimated by using a light flow method, the most similar candidate key points are found along the motion track in the direction, if the candidate key points are found, the tracker is updated, otherwise, the motion state is kept until the suitable candidate key points are found to match or the maximum allowable lost frame number is exceeded.

The optical flow method is a method for calculating the motion speed and direction of a pixel point in an image based on the continuity assumption of the gray level or color of the image, and if the time interval between two frames of images is small, the gray level value of the adjacent pixel point on the same target object in the two frames of images cannot be changed obviously.

Wherein, I (x, y, t) is the gray value of the (x, y) position in the image at the time t; the pixels where u and v are (x, y) positions areThe positions in the horizontal direction and in the vertical direction in time, i.e., optical flow vectors.

According to the taylor expansion, I (x, y, t) can be approximated as:

Ignoring the higher order term:

Or into a matrix form:

Wherein. Is the gradient vector of the image.

The optical flow method shows the relation between the motion speed and direction of each pixel point in the image and the gradient and time change rate of the pixel point, however, the optical flow method equation has two unknowns of u and v, so that the optical flow method equation cannot be directly solved, namely the parallax problem. In order to solve the parallax problem, the solution introduces an additional constraint condition to the optical flow method, and supposing that in a small neighborhood, the motion speeds and directions of all the pixels are the same, namely u (x, y) =uv (x, y) =v, so that an optical flow constraint equation can be applied to all the pixels in the neighborhood, and the least squares problem can be solved:

Where M is the set of pixels in the neighborhood. Obtaining an optimal solution by solving an optical flow constraint equation:

The optical flow method constraint algorithm is simple in calculation, is suitable for small motion displacement, is suitable for target object vibration detection of the scheme, is used for estimating the motion direction and speed of the lost candidate key points, and is used for searching the most similar candidate key points along the motion track in the direction, if the candidate key points are found, the tracker is updated, otherwise, the motion state is maintained until the suitable candidate key points are found to be matched or the maximum allowable lost frame number is exceeded.

Strategy 3: a neural network model capable of identifying and recovering lost candidate key points is trained by using a deep learning method, the neural network model is utilized to generate the most probable candidate key points in the current frame, if the generation is successful, the tracker is updated, otherwise, the generation state is kept until the appropriate candidate key points are generated to match or exceed the maximum allowable lost frame number.

The deep learning method is a machine learning method based on a multi-layer neural network, can automatically learn characteristics and rules from a large amount of data, and realizes complex functions and tasks. The deep learning method has the advantages of being capable of processing high-dimensional, nonlinear and unstructured data and being suitable for the fields of images, voice, text and the like. The scheme adopts a deep learning method to train a neural network model capable of identifying and recovering the lost candidate key points, and utilizes the model to generate the most probable candidate key points in the current frame. And using the mean square error (Mean Squared Error, MSE) as the position Loss Function (Loss Function), cross entropy (Cross Entropy) as the tag Loss Function, and adding the two as the total Loss Function. A random gradient descent (Stochastic GRADIENT DESCENT, SGD) algorithm is used to optimize the parameters of the neural network model. The most likely location of the missing candidate keypoints can be obtained by a deep learning method and the tracker updated until a suitable match is generated or the maximum allowed number of missing frames is exceeded.

In the three strategies, the maximum allowable lost frame number is set to be 5, namely if a certain tracker does not track the candidate key point in 5 continuous frames, the position of the candidate key point can be estimated through one of the three strategies. If more than 5 frames still have not estimated candidate keypoint locations, the tracker is deleted.

Step6: for candidate key points which are not successfully matched in the current frame, a new tracker is created for the candidate key pointsAnd adds it to the tracker list.

Through the steps, the tracker which is not matched with the candidate key points and the candidate key points which are not matched with the tracker are further processed to obtain final key points, and the next vibration analysis is carried out on the key points.

And step 3, analyzing vibration signals of the key points based on the key points in the tracker list to obtain the vibration frequency and the vibration amplitude of the target object.

Since the camera captures two-dimensional images, vibration information in x and y directions can only be acquired at most, and vibration information in z axis direction can be obtained by adding a camera in one direction (such as a plane perpendicular to the x and y axes) or capturing a vibration signal by using a depth camera to capture an image.

The acquired vibration signal h (t) of the key point can be expressed as:

Wherein h (t) is the acquired vibration signal of the time sequence; t1 and t2 are the start time and end time of vibration signal acquisition; p (x, y, z, t) represents vibration information of the space (x, y, z) at the time point t.

The vibration signal is subjected to data analysis by one or more analysis algorithms to obtain the vibration frequency and the vibration amplitude of the vibration signal. Analysis algorithms include, but are not limited to: fourier transform, wavelet transform, power spectral density estimation, or other algorithms that can estimate the vibration frequency, to obtain the vibration frequency, and then find the dominant frequency by algorithms such as peak detection, to obtain the vibration amplitude.

The present embodiment uses fourier variations, the fourier series expression being:

Wherein f (t) is a time domain signal; w is the fundamental frequency; a ₀、a_n、b_n is the fourier coefficient. When the period of the periodic function goes to infinity, the fourier series becomes fourier transform in combination with the euler equation.

Wherein j is an imaginary unit; the angle is the amplitude angle of the complex number on the complex plane, namely the included angle of the complex number and the positive real axis; f (jw) is the Fourier series of the frequency domain; w is the frequency; e is the Euler number.

If the camera is used to record vibration information, a certain key point can be selected, the length l (in pixels) of the key point in the image is measured, and the displacement d (in pixels) of the key point in the horizontal direction and the vertical direction is calculated by comparing images of different time sequences. Then, using the principle of similar triangle, the actual displacement a (in millimeters) of the key point is calculated according to the actual size s (in millimeters) of the image and the resolution r (in pixels/millimeters) of the image, namely:

where a is the vibration amplitude and is the maximum offset distance of the target object in real space.

On the basis of obtaining the vibration frequency and the vibration amplitude, a real-time visual report is generated in one or more modes, and real-time display is performed. The user can view the vibration condition of the target object in real time through the visualized graphical interface. Visual reports may include, but are not limited to:

1. the distribution of the vibration frequency and the vibration amplitude is shown in the form of a waveform chart, a bar chart, a pie chart, or the like.

2. The abnormal or dangerous areas are identified in the form of colors, sizes, shapes, etc.

3. The user is reminded of paying attention or taking measures in a sound, text, icon or the like manner.

4. The change in vibration frequency and vibration amplitude with time or space is displayed in the form of a graph, a scatter diagram, a thermodynamic diagram, or the like.

5. The real-time state of the target object is displayed in the forms of a three-dimensional model, virtual reality, augmented reality and the like, and the user experience is enhanced in the modes of animation, special effects, interaction and the like.

6. Vibration conditions of different positions or different axes are displayed in the forms of diagrams, tables and the like, and the functions of sorting, screening, comparing and the like are used for helping a user analyze vibration data.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A visual real-time vibration monitoring method is characterized in that: the method comprises the following steps:

step1: creating a corresponding tracker list according to the detected candidate key points For storing the displacement time sequence of the same candidate key point of two adjacent frames, wherein/>Tracker representing ith candidate key point in the t frame, and stored is coordinate sequence of 1 st to t frames of the ith candidate key point,/>Representing coordinates of the ith candidate key point in the t frame;

a tracker representing the ith candidate keypoint at the t-1 th frame;

step4: repeating Step2-Step3 until all trackers of the previous frame are matched with the candidate key points of the current frame or all candidate key points of the current frame are matched by the trackers of the previous frame;

In the step 3, the step of analyzing the vibration signal of the key point based on the key point in the tracker list includes:

Adding the vertical direction perpendicular to the planes of the x axis and the y axis as the z axis, and obtaining a vibration signal h (t) of the key point expressed as:

wherein h (t) is the acquired vibration signal of the time sequence; t1 and t2 are the start time and end time of vibration signal acquisition; p (x, y, z, t) represents vibration information of the space (x, y, z) at the time point t;

in the step 3, the step of obtaining the vibration frequency of the target object includes:

the fourier series expression is:

Wherein f (t) is a time domain signal; t is time; w is the fundamental frequency; a ₀、a_n、b_n is the fourier coefficient;

wherein j is an imaginary unit; the angle is the amplitude angle of the complex number on the complex plane; f (jw) is the Fourier series of the frequency domain; w is the frequency; e is Euler number;

in the step 3, the step of obtaining the vibration amplitude of the target object includes:

selecting a certain key point, measuring the length l of the key point in an image, and calculating the displacement d of the key point in the horizontal direction and the vertical direction by comparing images of different time sequences; calculating the actual displacement A of the key point according to the actual size s of the image and the resolution r of the image by utilizing the principle of similar triangles:

wherein A is vibration amplitude and is the maximum offset distance of a target object in an actual space;

2. A method of visual real-time vibration monitoring according to claim 1, wherein: in the step 1, a video of a target object is shot, feature points of the target object are extracted after preprocessing a video frame image, and feature vectors are formed, and the method comprises the steps of:

Shooting a video of a target object by using a camera, wherein preprocessing on a video frame image comprises graying processing and binarizing processing;

graying the acquired video frame images by using a weighted average method:

Wherein Gray represents a graying processed image, R, G, B represents images of three channels of red, green and blue of a color image respectively;

Binarizing the gray image to obtain a binarized image, wherein the binarizing process uses a fixed global threshold T, each pixel in the gray image is compared with T, if the gray value of the pixel is larger or smaller than T, the pixel is set to be white (255), and if the gray value of the pixel is equal to T, the pixel is set to be black (0), and the binarizing process formula is as follows:

Wherein B (x, y) represents a pixel value of (x, y) in coordinates in the binarized image, and Gray (x, y) represents a pixel value of (x, y) in coordinates in the Gray-scale image;

Detecting and extracting feature points from the binarized image, wherein the feature points are the outline of the target object, and a curve formed by connecting the feature points with the same gray value represents outline information;

generating an 11-dimensional feature vector V epsilon R ^1×11 for each feature point detected by the current frame, wherein the feature vector V _i consists of the following 11 elements, i represents the ith feature point, N 'is the total number of feature points, and i epsilon N':

v _i = { abscissa x _i of feature point center, ordinate y _i of feature point center, minimum rectangular frame length of feature point, minimum rectangular frame width of feature point, number of feature points on target object contour, index of last, first child node, first father node of target object contour, average pixel inside feature point contour, average pixel value around feature point contour extending outwards by 10 pixels }.

3. A method of visual real-time vibration monitoring according to claim 1, wherein: in the step 1, the constructed feature template dictionary P-N includes a positive sample dictionary DP and a negative sample dictionary ND, wherein the positive sample DP includes a key point object and the negative sample dictionary ND includes a non-key point object.

4. A visual real-time vibration monitoring method according to claim 3, wherein: in the step 1, the step of calculating the similarity score between the feature vector and the feature template dictionary P-N includes:

The calculation expression of the similarity score is as follows:

the expression of KL divergence is as follows:

the expression of the Euclidean distance is as follows:

The expression of cosine similarity is as follows:

5. A method of visual real-time vibration monitoring according to claim 4, wherein: in the step 1, the step of judging the category of the feature point according to the similarity score, wherein the category comprises positive, negative and uncertain, and the step of taking the category as the candidate key point comprises the following steps: