CN113362338A

CN113362338A - Rail segmentation method, device, computer equipment and rail segmentation processing system

Info

Publication number: CN113362338A
Application number: CN202110565485.9A
Authority: CN
Inventors: 张斌; 卓卉; 李雅稚; 孟宪洪; 王宁
Original assignee: Beihang University; Guoneng Shuohuang Railway Development Co Ltd
Current assignee: Beihang University; Guoneng Shuohuang Railway Development Co Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2021-09-07
Anticipated expiration: 2041-05-24
Also published as: CN113362338B

Abstract

The application relates to a rail segmentation method, a rail segmentation device, computer equipment and a rail segmentation processing system. The method comprises the steps of acquiring target images under multiple visual angles, wherein the target images all comprise rails, then denoising the target images under all the visual angles, filtering by utilizing wavelet transformation, removing noise signals, improving the data processing effectiveness of subsequent image processing and improving the processing efficiency; for the filtered image, a neural network model is utilized to obtain feature maps with different spatial resolutions, then the feature maps with all the spatial resolutions are subjected to up-sampling to obtain an up-sampling feature map with a target spatial resolution, the neural network learning capability is utilized to obtain high-quality features, and the up-sampling feature maps of all convolution levels are utilized to perform feature fusion, so that high-level semantic information can be kept, details can be well predicted, and higher accuracy can be obtained.

Description

Rail segmentation method, device, computer equipment and rail segmentation processing system

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a rail segmentation method, an apparatus, a computer device, and a rail segmentation processing system.

Background

The acquisition and the segmentation of the rail images have important significance for rail damage detection.

In conventional practice, a camera is usually fixed to capture images of the rails under a certain viewing angle. However, the inventor finds that after the image acquired in the mode is processed and extracted, the obtained rail segmentation image has poor effect and is greatly different from the actual condition of the rail.

Disclosure of Invention

In view of the above, it is necessary to provide a rail segmentation method, a rail segmentation apparatus, a computer device, and a rail segmentation processing system capable of improving rail segmentation image accuracy and feature and processing efficiency.

A method of rail segmentation, the method comprising:

acquiring target images under a plurality of visual angles, wherein the target images refer to scene images containing rails;

performing wavelet denoising processing on the target image under each visual angle to obtain a filtering image under each visual angle;

inputting the filtered image into a neural network model to obtain a feature map with a plurality of different spatial resolutions;

the characteristic graphs of a plurality of spatial resolutions are up-sampled, and up-sampled characteristic graphs of a plurality of levels with target spatial resolutions are obtained;

and performing feature fusion and semantic segmentation on the up-sampling feature maps corresponding to all the visual angles to obtain a rail segmentation result.

In one embodiment, the step of performing wavelet denoising processing on the target image under each view angle to obtain a filtered image under each view angle includes:

performing wavelet decomposition on the target image under each visual angle to obtain a decomposed signal;

calculating wavelet threshold values under all the visual angles;

taking the average value of the wavelet threshold values under each visual angle as a final threshold value;

and performing wavelet threshold denoising on the wavelet decomposed signals according to the final threshold, and performing wavelet reconstruction on the denoised signals to obtain filtering images under all the visual angles.

In one embodiment, the step of performing wavelet threshold denoising on the wavelet decomposed signal according to the final threshold to obtain a filtered image under each view angle includes:

removing the wavelet decomposed signals with the wavelet coefficient values smaller than the final threshold value, and reserving the wavelet decomposed signals with the wavelet coefficient values larger than the final threshold value to generate a filtering image under each visual angle.

In one embodiment, the step of inputting the filtered image into the neural network model to obtain feature maps with a plurality of different spatial resolutions comprises:

uniformly processing each filtering image into a characteristic diagram with target resolution and inputting the characteristic diagram into a neural network model;

performing convolution on each convolution layer by adopting a preset convolution mode and performing down-sampling by adopting a preset pooling function to obtain a plurality of characteristic graphs with different spatial resolutions;

the step of upsampling the feature maps of the plurality of spatial resolutions to obtain a plurality of levels of upsampled feature maps having the target spatial resolution comprises:

and performing pooling on the feature maps with different spatial resolutions, and performing convolution on the result of the pooling processing to obtain a plurality of levels of up-sampling feature maps with the target spatial resolution.

In one embodiment, the neural network model has 7 convolutional layers and 5 pooling layers; the feature maps with different spatial resolutions comprise 1/8 feature maps with the size of an original, 1/16 feature maps with the size of the original and 1/32 feature maps with the size of the original;

the steps of pooling a plurality of feature maps with different spatial resolutions, and convolving the pooled processing result to obtain a plurality of levels of up-sampled feature maps with target spatial resolutions include:

1/16, performing upsampling on the feature map with the original size, combining the upsampling result with the fourth pooling layer, and performing upsampling to obtain an upsampled feature map with the target spatial resolution;

1/8, performing upsampling on the feature map with the original size, combining the upsampling result with the third pooling layer, and performing upsampling to obtain an upsampled feature map with the target spatial resolution;

1/32 the feature map of the original size is upsampled to an upsampled feature map having the target spatial resolution.

In one embodiment, the step of performing feature fusion and semantic segmentation on the upsampled feature maps corresponding to the respective views to obtain the rail segmentation result includes:

performing feature fusion on the multi-level up-sampling feature map with the target spatial resolution to obtain a feature fusion image;

determining pixel points belonging to the rail in the pixel points of the characteristic fusion image by using a classification model, and identifying the pixel points belonging to the rail;

and mapping the identified pixel points belonging to the rail to a target image to obtain a rail segmentation result.

In one embodiment, the step of determining, by using the classification model, pixel points belonging to the rail among the pixel points of the feature fusion image includes:

mapping each pixel point in the feature fusion image to (0, 1) by using a sigmoid function;

and then, according to the magnitude relation between the mapped value and the classification threshold value, eliminating the pixel points which do not belong to the rail to obtain the pixel points which belong to the rail.

A rail splitting apparatus, the apparatus comprising:

the image acquisition module is used for acquiring target images under multiple visual angles, wherein the target images are scene images containing rails;

the filtering module is used for performing wavelet denoising processing on the target image under each visual angle to obtain a filtering image under each visual angle;

the machine learning module is used for inputting the filtering image into the neural network model to obtain a plurality of characteristic graphs with different spatial resolutions;

the up-sampling module is used for up-sampling the feature maps with the multiple spatial resolutions to obtain up-sampled feature maps with target spatial resolutions of multiple levels;

and the rail segmentation execution module is used for performing feature fusion and semantic segmentation on the up-sampling feature maps corresponding to all the visual angles to obtain a rail segmentation result.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.

A rail division processing system, the system comprising:

the system comprises a multi-image acquisition platform, a data acquisition platform and a data acquisition platform, wherein the multi-image acquisition platform is used for acquiring target images under different viewing angles, and the target images are scene images containing rails;

the computer equipment is in communication connection with the multi-image acquisition platform and comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the steps of the method.

The rail division method, the rail division device, the computer equipment and the rail division processing system are provided. The rail segmentation method comprises the steps of acquiring target images under multiple visual angles, wherein the target images all comprise rails, then denoising the target images under all the visual angles, filtering by utilizing wavelet transformation, removing noise signals, improving the data processing effectiveness of subsequent image processing and improving the processing efficiency; for the filtered image, a plurality of feature maps of different levels (namely feature maps of different spatial resolutions, namely feature maps of different sizes) are obtained by using a neural network model, the feature maps of all spatial resolutions are up-sampled to obtain an up-sampled feature map with a target spatial resolution, high-quality features can be obtained by using the learning capability of the neural network, and finally, feature fusion is performed by using the up-sampled feature maps of all convolution levels, so that high-level semantic information can be retained, details can be well predicted, higher precision can be obtained, semantic segmentation is performed by using the image after feature fusion, and the precision of an obtained rail segmentation result (namely rail parts are segmented from the target image) is high.

Drawings

FIG. 1 is a diagram of an exemplary rail segmentation method;

FIG. 2 is a schematic flow chart of a rail segmentation method according to an embodiment;

FIG. 3 is a schematic flow chart of a rail division method according to another embodiment;

FIG. 4 is a schematic diagram of architecture diagram and feature layer level selection for a neural network in another embodiment;

FIG. 5 is a schematic flow chart diagram of a rail division method in yet another embodiment;

FIG. 6 is a block diagram of a rail splitting apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The wavelet denoising technology is mature at present, and can utilize a wavelet function to filter signals, retain useful signals and filter noise signals. In addition, in the image processing process, feature fusion can be adopted, and different features can be fused to improve the segmentation performance. The method utilizes a CNN (Convolutional Neural Networks) to obtain multilayer features, and utilizes the characteristics of high-level feature semantic low resolution, low-level feature semantic property and more detail features to perform feature fusion, so that more details can be ensured, and the denoising effect can also be ensured. These can all be used to improve the accuracy in the implementation of the rail segmentation. In order to solve the problems in the background art and combine the characteristics of the scene where the rail is located, the embodiment of the present application provides a rail segmentation method.

The feature fusion can be divided into Early fusion (Early fusion) and Late fusion (Late fusion) according to the sequence of fusion and prediction.

Early fusion refers to fusing multiple layers of features first, and then training a predictor on the fused features (only after complete fusion, detection is performed uniformly). This type of method is also called skip connection, i.e. uses the concat, add operations: concat is series feature fusion, two features are directly connected, if the dimensions of x and y of the two input features are p and q, and the dimension of z of the output feature is p + q; add is a parallel strategy that combines the two feature vectors into a complex vector, where for input features x and y, z is x + iy, where i is an imaginary unit. Representative of this concept are Inside-out Net (ION) and HyperNet.

Late fusion refers to improving detection performance by combining detection results of different layers (before final fusion is not completed, detection is started on partially fused layers, detection of multiple layers is possible, and multiple detection results are finally fused). Two typical approaches to this class of research are: firstly, the features are not fused, the Multi-scale features are respectively predicted, and then the prediction results are integrated, such as a Single Shot MultiBox Detector (SSD) and a Multi-scale CNN (MS-CNN); secondly, Pyramid fusion is carried out on the features, and prediction is carried out after the Pyramid fusion, such as Feature Pyramid Network (FPN) and the like.

The semantic segmentation is classification at a pixel level, and pixels belonging to the same class are classified into one class, so that the semantic segmentation is used for understanding an image from the pixel level. For example, in the case of a motorcycle riding photo, pixels belonging to a person are classified into one category, pixels belonging to a motorcycle are also classified into one category, and background pixels are also classified into another category.

Early segmentation algorithms were mainly gray scale segmentation, conditional random fields, and other more traditional algorithms. Before the deep learning method is popular, semantic segmentation methods such as textonfiest and a random forest classifier are used more frequently. Meanwhile, with the continuous development of deep learning technology, deep learning can play a great role in semantic segmentation tasks, and the performance of a person is obtained. From the first real deep learning method semantic segmentation model FCN, semantic segmentation has developed less than six years, classic SegNet, deep lab series, denseas spp, and so on, to the NAS methods that have been the focus of research in recent years. The leberboard of each segmented data set is continually refreshed.

The rail division method provided by the application can be applied to the application environment shown in fig. 1. Wherein the multiple image capture platform 200 communicates with the computer device 400 over a network. According to the specific environment of the rail 600, the multiple image capturing platforms 200 are arranged, a plurality of multiple image capturing platforms 200 can be arranged in the same environment, each multiple image capturing platform 200 can comprise a plurality of cameras, and the shooting angles of the cameras are different. The cameras on the multi-image acquisition platform 200 acquire images containing rails at different viewing angles, and transmit the images to the computer device 400 as target images, and the images of the rail part are accurately segmented from the target images by using the image processing capability of the computer device 400, so as to provide accurate data basis for subsequent rail health state detection. The computer device 400 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, may also be a centralized control device in a station monitoring room, and may also be an independent server or a server cluster composed of a plurality of servers. The periphery of the rail is photographed from different angles by utilizing the cooperation of the multi-image acquisition platform 400.

In one embodiment, a rail division method is provided, as shown in fig. 2, the method comprising:

s100: and acquiring target images at a plurality of visual angles, wherein the target images refer to scene images containing rails.

The description of things from a single perspective is often one-sided and limited, and does not reflect all the characteristics of things well. Due to the fact that the source fields of multi-view data collection are different and the feature extraction modes are different, certain internal connection and mutual independence exist between different views, effective learning is conducted through fully and reasonably mining and utilizing information in the multi-view data, the characteristics of the rail and the background where the rail is located can be deeply known and analyzed, and image segmentation is achieved. Acquiring target images at multiple viewing angles is an important step in solving this technical problem. As described above, the target image is a scene image including the rail, the scene image includes the rail and the background where the rail is located, and by obtaining such images including the rail, the subsequent calculation can be more targeted, and the waste of calculation resources caused by the rail-free object is eliminated, so that the invalid calculation amount is reduced, the calculation efficiency is improved, and the efficiency of realizing rail segmentation is improved as a whole.

S300: and performing wavelet denoising processing on the target image under each visual angle to obtain a filtering image under each visual angle.

Wavelets (wavelets), i.e. waves of a small area, have non-zero values only for a very limited section of the interval, rather than being endless as sine and cosine waves do. The wavelet can be translated back and forth along the time axis and can be stretched and compressed in proportion to obtain low-frequency wavelets and high-frequency wavelets, and the constructed wavelet function can be used for filtering or compressing signals, so that useful signals in the signals containing noise can be extracted. The wavelet function is a finite energy function with fluctuation property, and can be made into square integrable function space L by telescopic translation ²1 set of orthonormal bases of (R).

The wavelet denoising processing here refers to a process of filtering noise signals in a target image by using a wavelet function. The wavelet de-noising processing process may be to perform wavelet decomposition on the target image signal, quantize a threshold of the high-frequency coefficient after the wavelet decomposition, filter the noise signal by setting a reasonable threshold, and perform wavelet reconstruction on the image signal after de-noising according to the low-frequency coefficient of the last level of the wavelet decomposition and the high-frequency coefficient of each level after the quantization processing. The filtered image after wavelet de-noising processing is subjected to de-noising processing on all areas containing noise in the image through wavelet decomposition threshold selection, so that the processed image is an area with good quality. Therefore, subsequent machine learning is carried out by utilizing the filtering image signal, the high quality of the extracted features can be ensured, and the calculation precision and the learning effect are improved.

S500: and inputting the filtered image into a neural network model to obtain a feature map with a plurality of different spatial resolutions.

The neural network adopts a structure of layer-by-layer feature extraction, and the spatial resolution of a feature map (feature map) is continuously reduced (namely the size of the feature map is continuously reduced) along with the deepening of the network. And the low-level features have higher resolution and contain more position and detail information, but have lower semanteme and more noise due to less convolution. The high-level features have stronger semantic information, but have low resolution and poor detail perception capability. By selecting the feature maps under different spatial resolutions, better semantic property and more position and detail information can be reserved, and high-level semantic information and sufficient detail features are provided for the subsequent image segmentation.

S700: and upsampling the feature maps with the plurality of spatial resolutions to obtain a plurality of levels of upsampled feature maps with the target spatial resolution.

The target spatial resolution is the spatial resolution consistent with the resolution of the neural network input picture, namely, the size of the feature map with a plurality of spatial resolutions is reduced to the size of the neural network input picture. In order to obtain a feature map of an original target image size, up-sampling of feature maps of multiple spatial resolutions is required, and up-sampled feature maps with target resolutions of multiple levels are obtained, wherein in order to reduce the problem of excessive information loss caused by direct up-sampling of the feature maps and restoration of the feature maps to the size of a neural network input picture, a multi-level up-sampled feature map with the target spatial resolution can be obtained by adopting a skip level connection strategy. Enabling neural networks to better predict details while retaining high levels of semantic information.

S900: and performing feature fusion and semantic segmentation on the up-sampling feature maps corresponding to all the visual angles to obtain a rail segmentation result.

The purpose of feature fusion is to merge features extracted from images into a feature that is more discriminative than the input features. In many jobs, fusing different features is an important means to improve segmentation performance. In the preorder neural network learning process, the characteristics of high-level strong semantic information and many low-level detail features are fully considered, the upsampling feature maps of different levels are reserved, and the semantic segmentation is performed on the basis of the feature fusion image, so that the accuracy of a rail segmentation result can be ensured.

Specifically, target images under multiple visual angles are acquired, wavelet denoising processing and denoising are performed on the target images under the visual angles, machine learning is performed by using denoised filtering images, and a plurality of characteristic images with different spatial resolutions output by a neural network model are acquired in order to give consideration to semantic performance and characteristic detail richness. The method comprises the steps of carrying out independent up-sampling processing on feature maps with all spatial resolutions to obtain up-sampling feature maps with target spatial resolutions, which are obtained after up-sampling processing is carried out on feature maps of different levels, carrying out feature fusion on the up-sampling feature maps to obtain feature fusion image signals with more detailed features and high-quality semantic information, carrying out semantic segmentation processing on the feature fusion images, and ensuring the accuracy and reliability of rail segmentation results.

Compared with the traditional method of adopting images shot under a single visual angle as a data source for realizing the rail segmentation method. Through the multi-view cooperation, utilize many image acquisition platforms to cooperate in this application, shoot the rail periphery from different angles. In this way, a variety of photographs of the scene with the rail at different viewing angles can be obtained. Therefore, the detection processing of the images with enough visual angles can be ensured for the rail on the ground, and a large amount of information contained in the characteristics can be fully researched. By means of the cooperative matching mode of the multiple image acquisition platforms, the reliability of semantic segmentation results can be greatly improved, and meanwhile, the effect of improving segmentation accuracy is achieved by fully utilizing image signals acquired under multiple visual angles. Meanwhile, the mutual complement characteristic of the multi-view characteristic information also improves the segmentation performance. On the basis of the collected sufficient multi-view images, a better segmentation effect can be obtained, and the working efficiency is improved.

The basic idea of wavelet threshold denoising is that after wavelet transformation is carried out on signals, wavelet coefficients generated by the signals contain important information of the signals, the wavelet coefficients of the signals are large after wavelet decomposition, the wavelet coefficients of noise are small, the wavelet coefficients of the noise are smaller than the wavelet coefficients of the signals, a proper threshold is selected, the wavelet coefficients larger than the threshold are considered to be generated by useful signals and should be reserved, and the wavelet coefficients smaller than the threshold are considered to be generated by the noise and are set to be zero, so that the purpose of denoising is achieved.

From a signaling point of view, wavelet denoising is a problem of signal filtering. Although wavelet denoising can be regarded as low-pass filtering to a large extent, the wavelet denoising is superior to the conventional low-pass filter in this respect because signal characteristics can be successfully preserved after denoising. It follows that wavelet denoising is actually a combination of feature extraction and low-pass filtering. In one embodiment, as shown in fig. 3, the step S300 of performing wavelet denoising processing on the target image under each view angle to obtain a filtered image under each view angle includes:

s310: and carrying out wavelet decomposition on the target image under each visual angle to obtain a decomposed signal.

Wavelet decomposition refers to a calculation process of selecting a mother wavelet and determining a level N of wavelet decomposition, and then performing N-level wavelet decomposition on a target image signal. The mother wavelet function is a set of orthonormal bases, and unlike the well-known fourier transform, the wavelet function is not fixed but has many selectable categories, so that the most suitable wavelet base needs to be selected as the mother wavelet. The selected mother wavelet and its translation and scaling should constitute a set of canonical orthogonal bases.

In one embodiment, the method may include: and selecting a proper wavelet function from a wavelet base constructed in advance as a mother wavelet, and performing wavelet decomposition on the target image under each view angle by using the mother wavelet to obtain a decomposed signal.

This way of constructing a wavelet base from existing wavelets, and selecting the most suitable function from the base, may be referred to as a search method. The process of selecting the appropriate mother wavelet may be: presetting a wavelet base and a certain parameter, calculating the parameter value of the observed noise-contaminated signal after decomposition of all wavelets in the wavelet base one by one, and selecting the optimal wavelet as the mother wavelet. In addition, the process of selecting an appropriate mother wavelet may be based on a correlation-based wavelet selection method (CBWS), which obtains a correlation coefficient between a wavelet function and an observed signal and selects a wavelet corresponding to the maximum value of the correlation coefficient as the mother wavelet.

The mother wavelet may be a Haar wavelet (Haar wavelet), Daubechies wavelet (abbreviated as dbN, dobesy wavelet, N is the order of wavelet), biocathonal wavelet, Coiflets wavelet, Symlets wavelet, Morlet wavelet, Mexican Hat wavelet, Meyer wavelet (fast convergence speed), Gaus wavelet, Dmeyer wavelet, ReverseBior wavelet, Cgau wavelet, Cmor wavelet, Fbsp wavelet, or shann wavelet.

In one embodiment, a Haar wavelet basis function is used for wavelet decomposition, and the Haar wavelet basis function is as follows:

the expansion principle of the wavelet transform with j being 5 is as follows:

d_j(k)＝<f(x),ψ_j,k(x)>＝∫f(x)ψ_j,k(x)dx (4)

L²(R) represents a square integrable function space,

as a function of the scale factor, d_j(k) As a function of wavelet coefficients.

The scale function and wavelet function can then be solved using the following formula:

the wavelet decomposition of the target image signal can be completed by integrating the formulas (1) to (7), and the wavelet coefficient of the wavelet signal is obtained.

S330: and calculating wavelet threshold values at all the visual angles.

S350: and taking the average value of the wavelet threshold values at all the visual angles as a final threshold value.

After wavelet decomposition, a threshold needs to be set to de-noise the decomposed signal. If the wavelet coefficient value of a certain signal after decomposition is larger than the threshold value, the signal is a useful signal, and signal retention is carried out; on the contrary, if the wavelet coefficient value of a certain signal after decomposition is smaller than the threshold value, the signal is considered as noise and is removed. Therefore, it is important to select an appropriate threshold. For a single visual angle, the distinguishing boundary of signals and noise can be affected specifically for a specific visual angle, such as illumination, angle, texture and the like, so that the threshold value obtained by the single visual angle cannot be accurately distinguished, and therefore, the invention introduces multi-visual-angle cooperation, synthesizes image information under multiple visual angles, namely under multiple conditions, and solves the threshold value at higher level and dimensionality.

In one embodiment, an adaptive threshold selection method based on the principle of unbiased likelihood estimation (quadratic equation) of Stein may be used. And finally, averaging the threshold values obtained from the multiple visual angles to obtain a final threshold value.

In one embodiment, the calculation process of the final threshold may be:

firstly, taking an absolute value (with the length of n) of a wavelet coefficient vector for estimating a threshold value, setting the number of view angles as m, sequencing the t view angle from small to large, and then squaring each element to obtain a new vector I to be estimated_t。

Vector I to be estimated_tFor each element subscript k, its Risk vector Risk is calculated as follows_t(k)：

Solving a subscript k value corresponding to the minimum point of the Risk vector Risk, thereby obtaining a denoising threshold T corresponding to the image under the T view angle_t：

Averaging the thresholds for multiple views yields the final threshold T:

s370: and performing wavelet threshold denoising on the wavelet decomposed signals according to the final threshold, and performing wavelet reconstruction on the denoised signals to obtain filtering images under all the visual angles. Finally, through the selection of the wavelet decomposition and the final threshold, the signal after the wavelet decomposition can be denoised by using the final threshold, useful signals are reserved, and namely a region containing noise in the picture is processed. And (4) performing wavelet reconstruction on the denoised signal to obtain an image with noise removed, namely a filtering image.

In practical engineering, a wavelet function needs to be discretized, and the decomposition and reconstruction of discrete wavelet transform can be realized by the following formula:

wherein, a_j[p]Is the jth layer, the p scale factor, d_j[p]Is the jth layer, the pth wavelet coefficient, n is variable, h [ n-2p ]]Representing the low-pass filter coefficient, g n-2p]Represents high pass filter coefficients, and

(mirror filter coefficients) h-n]. Therefore, the scale coefficient and the wavelet coefficient in the next layer of space can be deduced according to the scale coefficient of a certain layer of space, and the steps are repeated until the specified layer is reached, so that the scale coefficient and the wavelet coefficient of all layers can be obtained.

For the wavelet decomposition with the level of N, the threshold quantization process of the high-frequency coefficients obtained after the wavelet decomposition may be to select one threshold for performing threshold quantization processing on the high-frequency coefficients (in three directions) of each layer from the layer 1 to the layer N. The selection and quantization of the threshold (i.e. the definition of the threshold function, which is a rule for modifying the wavelet coefficients, and different threshold functions embody different strategies for processing the wavelet coefficients) play an important role in the rail segmentation result. An important factor directly influencing the denoising effect is the selection of a threshold, and different threshold selections have different denoising effects. At present, general thresholds (VisuShrink), sureshrnk thresholds, Minimax thresholds, bayessshrnk thresholds and the like are mainly available.

The most commonly used threshold functions in the quantization process are the hard threshold function and the soft threshold function. There is also a Garret function between the soft and hard threshold functions. The hard threshold function adopts a strategy of 'eliminating' or 'reserving' for the wavelet coefficient, and the function expression of the hard threshold function is as follows:

in the formula S [ m ]]Are the estimated wavelet coefficients. For high frequency coefficient F [ m ]]Is less than a threshold lambda_sqtTo achieve denoising. Besides the hard threshold function, Donoho constructs another threshold function, namely a soft threshold function rho_sThe expression of the soft threshold function is:

where Sm is the estimated wavelet coefficient and T is the threshold. A soft threshold function is a policy that takes "contraction" or "retention" as opposed to a policy that takes "elimination" or "retention" as a hard threshold function. Both of these are the most common threshold functions. It is generally considered that the hard threshold method can well retain local features such as signal edges, and the soft threshold processing is relatively smooth, but causes distortion phenomena such as edge blurring. The skilled person can choose according to the actual scene in which the rail is located. In one embodiment, a hard threshold function is used for wavelet threshold denoising.

In one embodiment, the step S370 of performing wavelet threshold denoising on the wavelet decomposed signal according to the final threshold comprises:

removing the wavelet decomposed signals with the wavelet coefficient values smaller than the final threshold value, and keeping the wavelet decomposed signals with the wavelet coefficient values larger than the final threshold value. Denoising is achieved by comparing the wavelet coefficients of the wavelet decomposed signals with a final threshold, retaining those signals whose wavelet coefficients are greater than the final threshold, and nulling those signals whose wavelet coefficients are less than the final threshold. For example, using the final threshold λ in combination with the hard threshold function described above_sqtAnd denoising the wavelet coefficients according to the size relation of the wavelet coefficients.

In one embodiment, the step S500 of inputting the filtered image into the neural network model to obtain feature maps with different spatial resolutions includes:

s510: and uniformly processing each filtering image into a characteristic diagram with target resolution and inputting the characteristic diagram into the neural network model.

In order to better describe how to obtain feature maps with different spatial resolutions in the embodiments of the present application, a convolution operation performed by using a same convolution mode in a neural network model is taken as an example for description. In the semantic segmentation based on feature fusion, when inputting, all the picture sizes can be unified to 224 × 3, that is, all the filtered images are processed into feature maps with the same resolution, and a feature map with the same size of 224 × 3 is output through network learning.

For each convolutional layer, same convolution is adopted, and Padding (inner edge distance, space between element borders and element contents) is used for filling 0 operation to ensure that the sizes of the feature maps before and after convolution are not changed, namely W1 is W2 (the widths of the feature maps before and after convolution are equal), and H1 is H2 (the lengths of the feature maps before and after convolution are equal). Taking W1 as an example, the formula for calculating W2 is:

(W1-F+2P)/S+1＝W2 (11)

when W1 is W2, it is possible to calculate the Padding value of 0.

S530: and performing convolution on each convolution layer by adopting a preset convolution mode and performing down-sampling by adopting a preset pooling function to obtain a plurality of characteristic graphs with different spatial resolutions.

The preset convolution pattern may be a same convolution pattern. The size of the image does not change before and after passing through the convolutional layer, the neural network pooling layer can adopt the maximum pooling of 2 multiplied by 2, and the width and the height of the image become half of the original width and the height of the image after passing through one pooling layer. In one embodiment, a neural network with 7 convolutional layers and 5 pooling layers may be used, as shown in FIG. 4. Through the convolutional layer and the pooling layer, a plurality of feature maps with different spatial resolutions, i.e., different sizes, can be obtained. The feature maps with different spatial resolutions comprise both information parts with high semantic performance and more detailed features.

The step S700 of upsampling the feature maps of the plurality of spatial resolutions to obtain a plurality of levels of upsampled feature maps having the target spatial resolution includes:

s710: performing pooling (Unpooling) on a plurality of feature maps with different spatial resolutions, and performing convolution on the result of the pooling processing to obtain a plurality of levels of upsampled feature maps with target spatial resolutions.

In order to restore the feature maps (feature maps) of each source to the size of the original image (the image size of the input neural network model), a deconvolution mode is adopted, and 0 filling is firstly carried out between neurons of the feature maps, namely pooling is carried out; and then carrying out convolution operation, wherein the calculation formula of the convolution operation is the same as the formula (11), and the Padding value can be deduced in the same way.

Taking a neural network with 5 pooling layers (pool) and 7 convolutional layers (conv) as an example, the upsampling step size of the featuremap output by conv7 is 32, and if the predicted size is restored to the original image size (the size of the image input to the neural network) in one step, the information is lost too much, so that the result is not fine enough, and in order to solve the problem, a jump-level connection strategy is adopted. First upsampling the last convolutional layer (the first source of the feature map with the target spatial resolution); then, the prediction of the last layer of convolutional layer and pool4 is combined, and the combined convolutional layer and the prediction of the pool4 are sampled and restored to the size of the original image, so that the network can better predict details, and high-level semantic information (a second source of a feature map with a target spatial resolution) is reserved; similarly, the output of conv5 is up-sampled and then combined with pool3, and the combined result is up-sampled again to the original size (the third source of the feature map with the target spatial resolution), so that higher accuracy can be obtained.

And finally, combining feature maps (feature maps) of the three sources to obtain a final feature map, namely an up-sampling feature map with the target spatial resolution. The loss function of the neural network is the sum of softmax losses of each pixel in the last layer, and a gradient descent method is used for carrying out back propagation training, so that a classification model with excellent performance is finally achieved.

Each pixel point in the finally obtained feature map is independently input into the network, and is classified by using a sigmoid function, because the two-classification problem is solved, the threshold value is set to be 0.5, the pixel points belong to the rail or the background, the pixel points belong to the same category are endowed with the same color (namely, identification is carried out, and the identification mode can not be limited to color identification), and the pixel points are mapped into the corresponding pixel points in the original image, so that the rail segmentation is completed.

In one embodiment, as shown in FIGS. 4-5, the neural network model has 5 pooling layers and 7 convolutional layers; the feature maps with different spatial resolutions comprise 1/8 feature maps with the size of an original, 1/16 feature maps with the size of the original and 1/32 feature maps with the size of the original;

the step S710 of pooling a plurality of feature maps with different spatial resolutions and convolving the pooled result to obtain a plurality of levels of upsampled feature maps with the target spatial resolution includes:

Assuming that target images at three viewing angles are acquired, one frame is input at each viewing angle, and three images are input in total.

And respectively carrying out wavelet decomposition on the three pictures, decomposing the images into superposition of wavelet signals, respectively calculating denoising threshold values corresponding to three visual angles according to formulas (8) to (10), and then calculating an average value to be used as a final threshold value. Taking a specific image as an example, the signal and noise are distinguished according to the calculated final threshold value: the wavelet component coefficient is larger than the threshold value and is regarded as an effective signal to be reserved; wavelet component coefficients that are less than the threshold are considered noise and the coefficients are set to zero. Thereby completing the wavelet denoising process.

And then, unifying the size of the denoised filtered picture to 224 × 3, inputting the denoised filtered picture into a convolutional neural network, and outputting an upsampled feature map with the same size of 224 × 3 and target spatial resolution through network processing. The source of the profile is three: the first is to restore the output of conv7 (1/32 original size feature map) to 224 x 3 by sampling with a step size of 32; the second method is that the output of conv7 is up-sampled by the step size of 2 (1/16 characteristic diagram of original size), then combined with the prediction of pool4, and then up-sampled by the step size of 16 to restore to the original size; the third is to perform upsampling with step size 2 on the output of conv5 (1/8 feature map of original size), then combine pool3 for output, and finally perform upsampling with step size 8 to original size.

It should be noted that, in the implementation process of semantic segmentation based on feature fusion, the feature maps of the above several levels are not limited to be adopted, and feature maps of other layers may be upsampled and then fused, and the level selection that can fully retain the detail features and give consideration to the semantic performance is within the scope of the present application.

and performing feature fusion on the multi-level up-sampling feature map with the target spatial resolution to obtain a feature fusion image.

And determining pixel points belonging to the rail in the pixel points of the characteristic fusion image by using the classification model, and identifying the pixel points belonging to the rail.

The classification model is taken as a softmax function, and the identification mode is color identification. And performing feature fusion on the obtained up-sampling feature maps of all levels, judging the category of each pixel point in the fused feature maps by using a softmax function, and giving the same color to the pixel points belonging to the rail to finish the segmentation of the rail.

In one embodiment, the classification threshold may be 0.5.

According to the rail segmentation method provided by the embodiment of the application, target images under multiple visual angles are obtained, the target images all comprise rails, then denoising processing is carried out on the target images under all the visual angles, filtering is achieved through wavelet transformation, noise signals are removed, the data processing effectiveness of subsequent image processing is improved, and the processing efficiency is improved; for the filtered image, a plurality of feature maps of different levels (namely feature maps of different spatial resolutions, namely feature maps of different sizes) are obtained by using a neural network model, the feature maps of all spatial resolutions are up-sampled to obtain an up-sampled feature map with a target spatial resolution, high-quality features can be obtained by using the learning capability of the neural network, and finally, feature fusion is performed by using the up-sampled feature maps of all convolution levels, so that high-level semantic information can be retained, details can be well predicted, higher precision can be obtained, semantic segmentation is performed by using the image after feature fusion, and the precision of an obtained rail segmentation result (namely rail parts are segmented from the target image) is high.

It should be understood that although the steps in the flowcharts of fig. 2, 3, and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 3, and 5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.

A rail splitting apparatus, as shown in figure 6, comprising:

the image acquisition module 10 is configured to acquire target images at multiple viewing angles, where the target images are scene images including rails;

the filtering module 20 is configured to perform wavelet denoising processing on the target image at each view angle to obtain a filtered image at each view angle;

a machine learning module 30, configured to input the filtered image into a neural network model, so as to obtain feature maps with multiple different spatial resolutions;

an upsampling module 40, configured to upsample the feature maps with the multiple spatial resolutions to obtain a plurality of levels of upsampled feature maps with a target spatial resolution;

and the rail segmentation execution module 50 is configured to perform feature fusion and semantic segmentation on the upsampled feature maps corresponding to the respective viewing angles to obtain a rail segmentation result.

In addition to the method steps listed herein, the functional modules in the rail splitting device provided in the embodiment of the present application may also perform other method steps in any of the rail splitting methods described above to achieve corresponding beneficial effects, and are not described herein again.

For specific definition of the rail splitting device, reference may be made to the above definition of the rail splitting method, which is not described herein again. The modules in the rail splitting apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as classification threshold values, neural network models and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a rail segmentation method.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

s100: acquiring target images under a plurality of visual angles, wherein the target images refer to scene images containing rails;

s300: performing wavelet denoising processing on the target image under each visual angle to obtain a filtering image under each visual angle;

s500: inputting the filtering image into a neural network model to obtain a feature map with a plurality of different spatial resolutions;

s700: the characteristic graphs of a plurality of spatial resolutions are up-sampled, and up-sampled characteristic graphs of a plurality of levels with target spatial resolutions are obtained;

A rail division processing system, the system comprising:

the multi-image acquisition platform 200 is used for acquiring target images under different viewing angles, and the target images are scene images containing rails 600;

a computer device 400, the computer device 400 being communicatively connected to a multi-image acquisition platform 200, the computer device 400 comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.

The multi-image capturing platform 200 may be a platform provided with a plurality of cameras for capturing images, or may be a device in which the cameras can rotate to capture target images at different viewing angles. According to the rail segmentation processing system, by utilizing the computing capability of the computer equipment 400 and the image acquisition capability of the multi-image acquisition platform 200, more rail features are mined by utilizing images under multiple visual angles, and algorithm improvement is combined, so that the dual guarantee of high semantic performance and rich detail features of the images is realized, and the accuracy of rail image segmentation results is improved.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A rail division method, characterized in that the method comprises:

inputting the filtering image into a neural network model to obtain a feature map with a plurality of different spatial resolutions;

2. The method according to claim 1, wherein the step of performing wavelet de-noising processing on the target image at each view angle to obtain the filtered image at each view angle comprises:

calculating wavelet threshold values under all the visual angles;

and carrying out wavelet threshold denoising on the wavelet decomposed signals according to the final threshold, and carrying out wavelet reconstruction on the denoised signals to obtain filtering images under all the visual angles.

3. The method of claim 2, wherein the step of wavelet threshold denoising the wavelet decomposed signal according to the final threshold comprises:

removing the wavelet decomposed signals with the wavelet coefficient values smaller than the final threshold value, and keeping the wavelet decomposed signals with the wavelet coefficient values larger than the final threshold value.

4. The method of claim 1, wherein inputting the filtered image into a neural network model, obtaining a feature map having a plurality of different spatial resolutions comprises:

the step of upsampling the feature maps of the plurality of spatial resolutions to obtain a plurality of levels of upsampled feature maps having the target spatial resolution includes:

and performing pooling on the feature maps with different spatial resolutions, and performing convolution on a result of the pooling processing to obtain a plurality of levels of up-sampling feature maps with target spatial resolutions.

5. The method of claim 4, wherein the neural network model has 7 convolutional layers and 5 pooling layers; the feature maps with different spatial resolutions comprise 1/8 feature maps with the size of an original, 1/16 feature maps with the size of the original and 1/32 feature maps with the size of the original;

the step of performing pooling on the feature maps with different spatial resolutions and convolving the result of pooling to obtain a plurality of levels of upsampled feature maps with target spatial resolutions comprises:

the feature map of the 1/16 original image size is subjected to upsampling, and the upsampling result is combined with a fourth pooling layer and then is subjected to upsampling to form an upsampling feature map with a target spatial resolution;

the feature map of the original drawing size 1/8 is subjected to upsampling, and the upsampling result is combined with a third pooling layer and then is subjected to upsampling to form an upsampling feature map with a target spatial resolution;

and upsampling the 1/32 original size feature map into an upsampled feature map with a target spatial resolution.

6. The method according to any one of claims 1 to 5, wherein the step of performing feature fusion and semantic segmentation on the upsampled feature maps corresponding to the respective views to obtain the rail segmentation result comprises:

determining pixel points belonging to the rail in the pixel points of the feature fusion image by using a classification model, and identifying the pixel points belonging to the rail;

and mapping the identified pixel points belonging to the rail to the target image to obtain a rail segmentation result.

7. The method of claim 6, wherein the step of using the classification model to determine the pixels belonging to the rail among the pixels of the feature fusion image comprises:

8. A rail splitting apparatus, the apparatus comprising:

the system comprises an image acquisition module, a display module and a display module, wherein the image acquisition module is used for acquiring target images under multiple visual angles, and the target images refer to scene images containing rails;

the machine learning module is used for inputting the filtering image into a neural network model to obtain a plurality of characteristic graphs with different spatial resolutions;

9. A computer arrangement, characterized in that the computer arrangement comprises a memory, in which a computer program is stored, and a processor, which when executing the computer program realizes the steps of the method according to any of claims 1-7.

10. A rail division processing system, the system comprising:

the system comprises a multi-image acquisition platform, a data acquisition module and a data processing module, wherein the multi-image acquisition platform is used for acquiring target images at different viewing angles, and the target images are scene images containing rails;

computer device communicatively connected to the multi-image acquisition platform, the computer device comprising a memory storing a computer program and a processor implementing the steps of the method according to any one of claims 1 to 7 when the computer program is executed by the processor.