CN109064444B

CN109064444B - Track slab disease detection method based on significance analysis

Info

Publication number: CN109064444B
Application number: CN201810687093.8A
Authority: CN
Inventors: 姚莉; 吴琼颖; 吴含前
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2021-09-28
Anticipated expiration: 2038-06-28
Also published as: CN109064444A

Abstract

The invention discloses a track slab disease detection method based on significance analysis, which mainly comprises the following steps: (1) and carrying out illumination preprocessing on the video, and extracting the spatial significance of the vehicle-mounted video by fusing the color, brightness and direction characteristics. (2) On one hand, sparse sampling is carried out on the track video in the monitoring period, the current monitoring video frame is compared with the corresponding sampling video frame, and the video frame with large change is selected as a reliable frame. On the other hand, the track video frame with the abrupt change characteristic relative to the upper frame and the lower frame is selected as the reliable frame. (3) And obtaining rough space-time saliency map processing by fusing the spatial saliency map and the temporal saliency map. And (3) considering background prior, central prior and space compactness factors for the rough significant map to obtain an improved space-time significant map. (4) And obtaining a track disease significant weight according to the significant graph, and identifying and classifying the track diseases of the high-speed railway by using the weight and adopting a semi-supervised classification method based on a significance weighting model. The invention can detect the track diseases in real time.

Description

Track slab disease detection method based on significance analysis

Technical Field

The invention relates to the field of computer vision and image processing, in particular to a track slab disease detection method based on significance analysis.

Background

Smooth passage of high-speed trains relies on high-quality high-speed railway tracks, and therefore, the track quality is directly related to the safety of operation. However, the high-speed railway track is frequently damaged due to exposure outdoors and influence of various environments. And the high-speed railway is put into operation on a large scale only in recent decades, so that the development of related disease detection work is relatively delayed, and even a high-cost artificial detection mode is still required for the disease detection work in many places. In recent years, the continuous development of intelligent detection systems makes intelligent track disease detection methods possible. The GJ-6 type track detection system is developed by adopting a laser camera measurement technology in China, but the detection methods have low identification accuracy for detecting surface layer diseases of high-speed railway tracks.

At present, the rail disease detection method at home and abroad still has the problems to be solved:

(1) human inspection and traditional detection systems do not address the track disease characteristics in a timely manner, and results obtained at high cost do not display the disease results well.

(2) The existing detection method for processing video data is to check each frame of video frame, and the method has the disadvantages of high complexity, time-consuming operation, difficult application to occasions with high real-time requirements and failure in highlighting key disease areas.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a track slab disease detection method based on significance analysis.

The technical scheme is as follows: the track slab disease detection method based on significance analysis is characterized by comprising the following steps: the method comprises the following steps:

(1) acquiring a track slab monitoring video in a monitoring period, and performing illumination preprocessing on the monitoring video to enable the illumination intensity of the monitoring video to be uniform;

(2) extracting the characteristics in color, brightness and direction from the monitoring video frame in the monitoring period after the preprocessing by adopting an Itti algorithm, and fusing to obtain a spatial saliency map;

(3) sparse sampling is carried out on the historical monitoring video, difference calculation is carried out on the sampled video frame and the monitoring period monitoring video frame preprocessed in the step (1), and the monitoring period monitoring video frame with the difference larger than a preset threshold value is selected as a reliable frame;

(4) selecting a frame with characteristic mutation relative to the upper frame and the lower frame as a reliable frame for the monitoring period monitoring video preprocessed in the step (1);

(5) calculating a time saliency value according to all reliable frames by adopting a self-adaptive transmission method to serve as a time saliency map;

(6) fusing the space saliency map and the time saliency map to obtain a rough space-time saliency map;

(7) carrying out image processing considering background prior, center prior and space compactness factors on the rough space-time saliency map to obtain an improved space-time saliency map;

(8) and obtaining a track disease significance weight according to the improved space-time significance map, and identifying and classifying the track diseases of the high-speed railway by using the weight and adopting a semi-supervised classification method based on a significance weighting model.

Further, the step (1) specifically comprises:

(1.1) acquiring a monitoring video of the track slab;

(1.2) intercepting an image of a key focus position in each frame in a monitoring video by using a Range function in an OpenCV function library, and ignoring some dark areas formed by light shadows at the bottom;

and (1.3) equalizing the histograms of the intercepted images by using a cvEqualizeHist function in an OpenCV function library to realize uniform illumination intensity.

Further, the step (3) specifically comprises:

(3.1) carrying out sparse sampling on the historical monitoring video to obtain a plurality of historical sampling video frames;

(3.2) dividing the preprocessed monitoring period monitoring video into a plurality of video segments to obtain a video segment set V ═ V_i1, …, n is the number of the divided video segments;

(3.3) for video segments v_iScreening out the significant blocks from all frames;

(3.4) projecting the RGB color values of each significant block into a three-dimensional space color model of 10 x 10, and performing Gaussian diffusion on the three-dimensional space color model;

(3.5) forming a one-dimensional vector by using the color values of all points in the three-dimensional space, and taking the one-dimensional vector as the color distribution vector of the current significant block;

(3.6) forming a color matrix by the color distribution vectors of all the significant blocks belonging to one video frame according to the sequence, and performing sparse low-rank matrix decomposition on the color matrix by adopting RPCA to obtain a sparse matrix;

(3.7) calculating the deviation value of the sparse matrix obtained by decomposition and each historical sampling video frame, accumulating the deviation values, and selecting a deviation accumulated value E_SUMGreater than a predetermined thresholdVideo frame as video segment v_iI-1, …, n.

Further, the step (3.3) specifically comprises:

(3.3.1) for video segment v_iEach frame of which is divided into a plurality of image blocks of a × a pixels, a being the size of the divided image block and 0<a<10；

(3.3.2) eliminating image blocks positioned at the boundary from the divided image blocks;

(3.3.3) calculating the average relevant difference value of each image block and four adjacent image blocks which are remained after the processing of the step (3.3.2);

and (3.3.4) taking the image block with the average correlation difference value larger than the preset threshold value as the significant block.

Further, the step (3.4) specifically comprises:

(3.4.1) equally dividing R, G, B value range 0-255 of the RGB color model into 10 intervals respectively, thereby forming a 10 x 10 three-dimensional space color model; i.e. a 10 x 10 cube is formed, which has a total of 1000 points.

(3.4.2) projecting the RGB color values of each pixel of the saliency block into the three-dimensional space color model and three-dimensionally diffusing the projections using a Gaussian sphere model.

Wherein, the coordinate conversion formula of the projection is as follows:

in the formula (R)_p,G_p,B_p) The RGB color value of pixel p in the saliency block, (x)_p,y_p,z_p) The coordinate value of the pixel point p projected into the three-dimensional space color model,

for coordinates (x) in a three-dimensional space color model_p,y_p,z_p) The color value at the position, ceil () represents a ceiling function.

Wherein, because 10 × 10 × 10 is larger than the number of the significant blocks, in order to prevent the situation that the color distribution is sparse, the projection is three-dimensionally diffused by adopting a gaussian sphere model, and the specific diffusion formula is as follows:

in the formula, C_x,y,zIs the color value at the (x, y, z) coordinate position after diffusion, C_posFor a color value at the post-projection position pos, d represents the manhattan distance between position pos and position (x, y, z).

Further, the step (4) specifically comprises:

(4.1) acquiring the monitoring video in the monitoring period preprocessed in the step (1);

(4.2) for each frame of the monitoring video in the monitoring period, respectively calculating the brightness difference value, the color difference value and the texture difference value of the frame with the previous frame and the next frame according to the following formulas:

L_i＝(l(P_i)-l(P_i-1))+(l(P_i)-l(P_i+1))

C_i＝(c(P_i)-c(P_i-1))+(c(P_i)-c(P_i+1))

T_i＝(t(P_i)-t(P_i-1))+(t(P_i)-t(P_i+1))

in the formula, P_iRepresenting the current video frame, P_i-1、P_i+1Respectively representing a previous frame and a next frame, L_i、C_i、T_iRespectively representing the current video frame P_iThe brightness difference value, the color difference value and the texture difference value of the previous frame and the next frame, wherein l (-) represents the brightness value of the video frame, c (-) represents the color value of the video frame, and t (-) represents the texture value of the video frame;

(4.3) normalizing the brightness difference value, the color difference value and the texture difference value of each frame and the previous frame and the next frame to obtain the integral characteristic difference value of the frame

In the formula (I), the compound is shown in the specification,

representing video frames P_iDifference of global characteristics from upper and lower frames, delta_L、δ_c、δ_TWeights representing luminance, color, and texture features, respectively;

(4.4) difference of global characteristics

And considering the video frames with the characteristics suddenly changed as reliable frames when the video frames with the characteristics suddenly changed are larger than a preset threshold.

Further, the step (5) specifically comprises:

(5.1) acquiring all reliable frames, and calculating deviation accumulated value E of each reliable frame and historical frames_SUM,jAnd the difference of the global characteristics of the upper and lower frames

(5.2) adopting a self-adaptive transmission method, and calculating the time significance value of each reliable frame by adopting the following formula:

in the formula, S_jRepresents the temporal saliency value of the jth reliable frame, WSUM represents the final normalized weight, and

W_krepresenting the weight value of the kth reliable frame.

(5.3) the temporal saliency values of all reliable frames form a temporal saliency map.

Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:

(1) the invention provides a method for detecting the track diseases by analyzing the video significance, so that the track disease characteristics are highlighted, and the effect of intelligent track disease detection is achieved.

(2) The method improves the traditional video significance algorithm, ensures that the significant disease area is extracted, and improves the calculation efficiency and the detection rate aiming at the specific track disease.

(3) In the invention, the fused rough space-time apparent image is optimized, and a result image which better accords with human eye attention and highlights disease characteristics is obtained by considering background prior factors, central prior factors and space compactness factors.

(4) The invention provides a semi-supervised classification method based on a significance weighting model, which is used for identifying and classifying diseases.

Drawings

FIG. 1 is a schematic flow diagram of one embodiment of the present invention.

Detailed Description

The embodiment provides a method for detecting a track slab fault based on significance analysis, as shown in fig. 1, the method includes:

the method comprises the steps of firstly, acquiring a track slab monitoring video in a monitoring period, and carrying out illumination preprocessing on the monitoring video to enable the illumination intensity of the monitoring video to be uniform. The method specifically comprises the following steps:

(1.1) acquiring a monitoring video of the track slab;

And step two, extracting the characteristics of color, brightness and direction from the monitoring video frame in the post-preprocessing monitoring period by adopting an Itti algorithm, and fusing to obtain a spatial saliency map.

The main steps can be summarized into three main modules:

(2.1) extraction of local features

First, an input video image is represented as a 9-level gaussian pyramid. Wherein the layer 0 is an input image, and the layer 1 to 8 images are formed by filtering and sampling the input image with a 5 × 5 gaussian filter, and have sizes in the range of 1/2 to 1/256 of the input image, respectively. Then, various luminance features, I, R, G, B, Y, and four directions (0 °, 45 °, 90 °, 135 °, respectively), are extracted for each layer of the pyramid, forming a luminance pyramid, a chrominance pyramid, and a direction pyramid. The color histogram method and the brightness histogram method are adopted to respectively reflect the main distribution of color and brightness in the image for extracting the color and brightness characteristics.

(2.2) calculation of multiscale-center surround contrast

The Itti algorithm is a simulated center-surround structure, and differences are made between different scales of a feature pyramid for various features.

Suppose that the center corresponds to the feature map pixel (c ∈ [2,3,4]) of the scale c, and the peripheral area corresponds to the feature map pixel of the scale s (s ═ c + δ, δ ∈ [3,4 ]). Since the resolution of the feature maps of different scales is different, after interpolation is required to make the two images have the same size, the process can be represented by θ by performing point-to-point subtraction on the two images. I (c, s) represents a luminance characteristic diagram, indicating the contrast of luminance. Other features are similarly expressed. The feature contrast of the center (scale c) and the periphery (scale s) can then be obtained to quantify the local directional feature contrast of the center and the periphery.

I(c，s)＝|I(c)|Θ|I(s)|

(2.3) fusion of the comparison plots

To fuse (2.2) feature maps of multiple different scales and different features, the Itti model proposes a normalized function N (·). Firstly, normalizing the significant value of each pixel point in a graph to an interval [0, M ] for each feature graph, so as to eliminate the influence generated by different intervals of the distribution of the significant values of different features; secondly, searching a global maximum value M in the feature map, and calculating the average value M of all other local maximum values; finally, multiply each position in the feature map by (M-M) 2. This enlarges the positions of the potentially salient regions in each feature map so that the salient values for those positions are more prominent relative to the background.

Firstly, normalizing the feature maps of different scales of each feature to form a comprehensive saliency map related to the feature, and then normalizing the saliency maps of different features to obtain the final visual saliency map.

And step three, sparse sampling is carried out on the historical monitoring video, difference calculation is carried out on the sampled video frames and the monitoring period monitoring video frames preprocessed in the step (1), and the monitoring period monitoring video frames with the difference larger than a preset threshold value are selected as reliable frames.

The method comprises the steps of comparing a current monitoring video frame with a historical video frame and a position frame, sampling a monitoring period video by sparse sampling, and performing targeted treatment on track diseases by considering the gradual change characteristic of the track diseases globally. In addition, the reliable frame is selected, so that the calculation amount can be reduced, and the calculation efficiency is improved.

The method specifically comprises the following steps:

the specific process for screening the significant blocks comprises the following steps: (3.3.1) for video segment v_iEach frame of which is divided into a plurality of image blocks of a × a pixels, a being the size of the divided image block and 0<a<10; in this embodiment, a is 4. (3.3.2) eliminating image blocks positioned at the boundary from the divided image blocks; (3.3.3) calculating the average relevant difference value of each image block and four adjacent image blocks which are remained after the processing of the step (3.3.2); and (3.3.4) taking the image block with the average correlation difference value larger than the preset threshold value as the significant block.

These significant blocks need to be processed next since the foreground will be contained in them. Since for a sequence of video frames the foreground objects are generally invariant throughout the video sequence, their variation in color space is relatively small. Therefore, their foreground part color distribution can be considered to have a low rank characteristic in the entire video sequence. Then, the video frames with low sparsity in the video sequence can be regarded as having higher accuracy of the preliminary result relative to the surrounding frames.

(3.4) projecting the RGB color values of each significant block into a 10 x 10 three-dimensional space color model and performing Gaussian diffusion on the three-dimensional space color model.

The step (3.4) specifically comprises the following steps:

Wherein, the coordinate conversion formula of the projection is as follows:

in the formula, C_x,y,zIs the color value at the (x, y, z) coordinate position after diffusion, C_posFor a color value at position pos, d represents the manhattan distance between position pos and position (x, y, z).

And (3.5) forming a one-dimensional vector by using the color values of all points in the three-dimensional space, and taking the one-dimensional vector as the color distribution vector of the current significant block.

Wherein, because the three-dimensional space color is 10 × 10 × 10, there are 1000 points, and the statistics of 1000 points form a 1000 one-dimensional vector.

(3.6) forming a color matrix by the color distribution vectors of all the significant blocks belonging to one video frame according to a sequence, and performing sparse low-rank matrix decomposition on the color matrix by adopting RPCA (robust Principal Component analysis) to obtain a sparse matrix;

for example, if there are 3 significant blocks of a video frame, K1, K2, and K3, and their corresponding color distribution vectors are cl1, cl2, and cl3, respectively, then the formed color matrix is G ═ cl1, cl2, and cl 3.

RPCA is a matrix decomposition method, where a matrix G is a + B, a is a low rank part, and B is a sparse part.

(3.7) calculating the deviation value of the sparse matrix obtained by decomposition and each historical sampling video frame, accumulating the deviation values, and selecting a deviation accumulated value E_SUMThe video frame larger than the preset threshold value is taken as the video segment v_iI-1, …, n.

Wherein the content of the first and second substances,

E_lindicating the ith historical sample video frame and LM is the number of historical sample video frames.

And step four, selecting a frame with characteristic mutation relative to the upper frame and the lower frame as a reliable frame for the preprocessed monitoring video in the monitoring period. The method specifically comprises the following steps

L_i＝(l(P_i)-l(P_i-1))+(l(P_i)-l(P_i+1))

C_i＝(c(P_i)-c(P_i-1))+(c(P_i)-c(P_i+1))

T_i＝(t(P_i)-t(P_i-1))+(t(P_i)-t(P_i+1))

In the formula (I), the compound is shown in the specification,

(4.4) general characterizationDifference value

And step five, calculating to obtain a time saliency value as a time saliency map according to all reliable frames by adopting a self-adaptive transmission method. The method specifically comprises the following steps:

W_krepresenting the weight value of the kth reliable frame.

And step six, fusing the space saliency map and the time saliency map to obtain a rough space-time saliency map.

And seventhly, performing image processing on the rough space-time saliency map by considering background prior, center prior and space compactness factors to obtain an improved space-time saliency map.

(1) Background prior factors:

during the processing, the suppression of background areas is an important means for obtaining accurate significance values. We can measure the difference between the background region and other regions, and the difference is larger and more significant, and vice versa. A Markov absorption chain method is adopted to calculate the background prior value of each super pixel, the edge super pixels are taken as absorption nodes, and meanwhile, the average absorption time of other nodes is calculated as a significant value.

The Markov process analysis is to use the state transition probability p between states_ijReflecting the dynamic change of the system state and representing the probability of transferring from the ith state to the jth state in one step. The matrix with the state transition probability as an element is called a one-step state transition probability matrix of the Markov chain, and is called a transition matrix for short. If p is_iiWhen 1, i is referred to as the absorption state. Then a markov chain with at least one absorption state is called an absorption chain. Assume that an absorption chain has r absorption states and t transition states. The standard form of the transition matrix can be expressed as:

wherein Q ∈ [0,1 ]]^t×tRepresenting the transition probability between transition states, R ∈ [0,1 ]]^t×tIndicating the transition probability from the transition state to the absorbing state. 0 denotes the zero matrix r × t, and I is the transition probability between absorbing states, which is an identity matrix. Then we can get the basic matrix for P as:

N＝(I-Q)^-1

wherein the element N in N_ijCan be regarded as the expected time for the chain to go from transition state i to transition state j, so the expected time in the chain to go from each transition state to the absorbing state can be expressed as:

Y＝N×c

where c represents a unit column vector of dimension t. The weight defining the boundary between neighboring node i and node j is:

w_ij＝exp(-||x_i-x_j||/δ²)

wherein x_iAnd x_jRespectively representing the characteristic mean values of the super-pixels i and j in the CIELab space, and delta is a constant for controlling the weight. Therefore we canThe element definition of the incidence matrix a can be derived:

where N (i) is the set of nodes connected to node i. The diagonal matrix D represents the sum of the weights connecting each node, defined as:

D＝diag(∑_ja_ij)

the transition matrix can thus be expressed as:

P＝D^-1+A

according to the formula N ═ (I-Q)^-1The value of N can be derived, and the average absorption time per super-pixel, i.e. the expected value S_iaI.e. the background prior value is quantized, thereby obtaining a background prior map.

(2) Central prior factor:

an important measure of the central priors is the distance of each pixel from the central pixel, and the following distance can be defined to measure the size of the central priors:

wherein (x)₀，y₀) The center of the image is represented by (W/2, H/2). For the case of division into superpixels, the positional saliency of a superpixel may be calculated using an isotropic gaussian kernel function.

Wherein (x)_c，y_c) Indicating the region r_i(x) of (C)₀，y₀) Is the center of the field of vision and,

and

are variables in the x and y directions.

(3) Calculating the space compactness:

the spatial compactness reflects the spatial distribution of the superpixels in the image, and the salient objects are obtained through the distribution condition. Calculating superpixel v_iThe compact spatial density value is Ds (i) with the formula

Assume four directions on the original depth map boundary: the upper left, the upper right, the lower left and the lower right are provided with a vertex, wherein

Representing the non-normalized Laplace matrix form at four vertices, n_jRepresenting a superpixel v_jInner pixel number, spatial mean

Is defined as:

in conclusion, the background prior factor can well inhibit the background area, the central prior factor and the spatial compactness can well highlight the brightness of the salient object, so that three considered factors can be applied to the rough space-time salient image processing to obtain a final space-time salient value:

S(i)=norm(S_ia*exp(1-norm(S_ib+S_ic))).

and step eight, obtaining a track disease significance weight according to the improved space-time significance diagram, and identifying and classifying the track diseases of the high-speed railway by using the weight and adopting a semi-supervised classification method based on a significance weighting model.

By analyzing the orbit video space-time saliency before, an improved space-time saliency map is obtained. The significance regions can be subjected to weight division by the significance map, for the regions with high significance, the classification contrast weight of the regions with high significance is correspondingly increased, and the most significant disease regions are considered to be most similar to the existing class disease maps. The extracted diseases can be identified by utilizing a trained classifier, and the probability p (Y | X) of the diseases to be classified in each category is calculated according to the color histogram, the texture feature and the SIFT feature of the diseases to be classified; then, classification is performed according to the probability p (Y | X): if the posterior probability p (Y | X) of the sample is the greatest under which class distribution function, it belongs to which class.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A track slab disease detection method based on significance analysis is characterized by comprising the following steps: the method comprises the following steps:

(4) selecting a frame with characteristic mutation relative to the upper frame and the lower frame as a reliable frame for the monitoring period monitoring video preprocessed in the step (1); the method specifically comprises the following steps:

L_i＝(l(P_i)-l(P_i-1))+(l(P_i)-l(P_i+1))

C_i＝(c(P_i)-c(P_i-1))+(c(P_i)-c(P_i+1))

T_i＝(t(P_i)-t(P_i-1))+(t(P_i)-t(P_i+1))

In the formula (I), the compound is shown in the specification,

(4.4) difference of global characteristics

The video frames larger than the preset threshold value are considered to have the characteristic mutation, and the video frames with the characteristic mutation are used as reliable frames;

(7) carrying out image processing considering background prior, center prior and space compactness factors on the rough space-time saliency map to obtain an improved space-time saliency map, wherein a space-time saliency value S (i) in the improved space-time saliency map is as follows:

S(i)＝norm(S_ia*exp(1-norm(S_ib+S_ic))

S_iafor mean absorption time of superpixels, i.e. for quantifying background prior values, S_ibIs the position significant value of the super pixel, S_icIs the spatial compactness value of the super-pixel;

2. The track slab disease detection method based on significance analysis according to claim 1, characterized in that: the step (1) specifically comprises the following steps:

(1.1) acquiring a monitoring video of the track slab;

3. The track slab disease detection method based on significance analysis according to claim 1, characterized in that: the step (3) specifically comprises the following steps:

4. The track slab disease detection method based on significance analysis according to claim 3, characterized in that: the step (3.3) specifically comprises the following steps:

5. The track slab disease detection method based on significance analysis according to claim 3, characterized in that: the step (3.4) specifically comprises the following steps:

(3.4.1) equally dividing R, G, B value range 0-255 of the RGB color model into 10 intervals respectively, thereby forming a 10 x 10 three-dimensional space color model;

(3.4.2) projecting the RGB color values of each pixel of the saliency block into the three-dimensional space color model and three-dimensionally diffusing the projections using a Gaussian sphere model;

wherein, the coordinate conversion formula of the projection is as follows:

for coordinates (x) in a three-dimensional space color model_p,y_p,z_p) Color value at position, ceil () represents an rounding-up function;

wherein, the three-dimensional diffusion formula is as follows:

in the formula, C_x,y,zIs the color value at the (x, y, z) coordinate position after diffusion, C_posFor the color value at the post-projection position pos, d represents the Manhattan distance between position pos and position (x, y, z).

6. The track slab disease detection method based on significance analysis according to claim 1, characterized in that: the step (5) specifically comprises the following steps:

(5.1) obtaining all reliable frames and calculating eachDeviation accumulated value E of a reliable frame and a historical frame_SUM,jAnd the difference of the global characteristics of the upper and lower frames

W_ka weight value representing a kth reliable frame;