CN115393585A

CN115393585A - Moving target detection method based on super-pixel fusion network

Info

Publication number: CN115393585A
Application number: CN202210962818.6A
Authority: CN
Inventors: 李阳; 张先玉
Original assignee: Jiangsu Vocational College of Information Technology
Current assignee: Jiangsu Botu Electrical Engineering Co ltd
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-11-25
Anticipated expiration: 2042-08-11
Also published as: CN115393585B

Abstract

The invention relates to the technical field of target detection, in particular to a moving target detection method based on a superpixel fusion network. And then, extracting histogram features of the candidate foreground superpixels, and then respectively taking the pixel features and the superpixel features as the input of a finding pixel fusion network, wherein the whole process has high operation speed and strong robustness.

Description

Moving target detection method based on super-pixel fusion network

Technical Field

The invention relates to the technical field of target detection, in particular to a moving target detection method based on a super-pixel fusion network.

Background

The moving object detection is one of applications of image processing, generally speaking, a background model is obtained through a statistical method, the background model is updated in real time to adapt to light changes and changes of a scene, a morphological method and a detection connected domain area are used for post-processing, influences caused by noise and background disturbance are eliminated, shadows are detected in an HSV (hue, saturation, value) chromaticity space, an accurate moving object is obtained, in a complex scene, the moving object detection is still a challenging task, an existing method based on deep learning mainly adopts a u-net network and obtains a surprising effect, however, the local continuity between pixels is ignored, the detection performance is required to be further improved, in addition, the network contains information of the scene, and the generalization capability is required to be further improved.

Disclosure of Invention

In view of the above situation, an object of the present invention is to provide a moving object detection method based on a super-pixel fusion network.

The technical purpose of the invention is realized by the following technical scheme:

a moving target detection method based on a super-pixel fusion network comprises two parts: (1) a training stage; (2) a detection stage;

the detection phase comprises:

step 1, inputting a color image sequence R1, averaging 3 channel numerical values to perform image graying, and obtaining a grayed image sequence G1;

step 2, performing median filtering on the image sequence to obtain a background image B1, and performing difference on the image sequence G1 and the background image B1 to obtain a candidate foreground sequence which is marked as a pixel characteristic F1;

step 3, performing superpixel segmentation on the color image sequence R1 to obtain region information C1;

step 4, calculating a histogram of pixels of a candidate foreground region corresponding to the superpixel according to the region information, wherein the interval is 0.1 when the histogram is in a range of [ -1,1 ];

step 5, taking the histogram of each region as the characteristics of all pixels in the region, and recording the histogram as a super-pixel characteristic F2;

step 6, constructing a network;

step 7, training the model;

step 8, outputting the trained network model M;

the detection phase comprises:

step 9, inputting an image sequence R2, and averaging 3-channel numerical values to perform image graying if the image is a color image to obtain a grayed image sequence G2; if the image is a gray image, directly enabling G2= R2;

step 10, performing median filtering on the image sequence to obtain a background image B2, and performing difference on the image sequence G2 and the background image B2 to obtain a candidate foreground sequence which is recorded as a pixel characteristic F3;

step 11, performing superpixel segmentation on the color image sequence R2 to obtain region information C2;

step 12, calculating a histogram of pixels of a candidate foreground region corresponding to the super-pixel according to the region information, wherein the interval is 0.1 when the histogram is in a range of [ -1,1 ];

step 13, taking the histogram of each region as the characteristics of all pixels in the region, and recording the histogram as a super-pixel characteristic F4;

step 14, taking the super pixel characteristics F4 and the pixel characteristics F3 as the input of the trained network model M;

and step 15, outputting a detection result.

Further, the specific method of step 6 is as follows:

(1) constructing an encoder:

the convolutional neural network comprises an input layer, a hidden layer and an output layer;

the input layer comprises two inputs, the resolution is 240 multiplied by 320, the number of channels of the encoder input corresponding to the super pixel characteristic F2 is 21, the encoder input corresponding to the pixel characteristic F1 is 1, and the convolution size in the convolution neural network is 3 multiplied by 3;

layer 1 of the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 8 convolutions are used for generating 8 feature maps;

layer 2 of the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 16 convolutions are used for generating 16 feature maps;

the 3 rd layer in the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 32 convolutions are used for generating 32 feature maps;

the 4 th layer in the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 64 feature maps are generated by using 64 convolutions;

(2) constructing a connecting layer:

the 5 th layer in the hidden layers is a connection layer, and the connection layer connects the two encoders by using localization;

(3) constructing a decoder:

layer 6 of the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu, and uses 128 convolutions to generate 64 feature maps;

the 7 th layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + Relu, and generates 32 feature maps by using 64 convolutions;

the 8 th layer in the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + Relu, and 32 convolutions are used for generating 16 feature maps;

the 9 th layer in the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + Relu, and 8 convolutions are used for generating 8 feature maps;

the 10 th layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + ClippedRelu, and 1 feature map is generated by using 1 convolution;

the output layer comprises a regression layer;

and taking the super pixel characteristics and the pixel characteristics as the input of the network, and outputting the super pixel characteristics and the pixel characteristics as the group of the corresponding input image.

In conclusion, the invention has the following beneficial effects:

the invention firstly uses median filtering to obtain candidate foreground, then judges whether the pixel is a foreground pixel or not through a super-pixel fusion network, and only relates to simple multiplication of a matrix when detecting, so the invention has small time complexity, and the processing speed of a training stage and a detection stage is high.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, are not intended to limit the invention, and:

FIG. 1 is a schematic diagram of the steps of the present invention.

Detailed Description

The foregoing and other technical and scientific aspects, features and utilities of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings of fig. 1. The structural contents mentioned in the following embodiments are all referred to the attached drawings of the specification.

Exemplary embodiments of the present invention will be described below with reference to the accompanying drawings.

Example 1: a moving target detection method based on a super-pixel fusion network comprises two parts: (1) a training stage; (2) a detection stage;

the detection phase comprises:

step 1, inputting a color image sequence R1, averaging 3-channel numerical values to perform image graying, and obtaining a grayed image sequence G1;

step 5, taking the histogram of each region as the characteristics of all pixels in the region, and recording the histogram of each region as a super-pixel characteristic F2;

step 6, constructing a network:

(1) constructing an encoder:

the layer 1 in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 8 feature maps are generated by using 8 convolutions;

the layer 2 in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 16 feature maps are generated by using 16 convolutions;

the 3 rd layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 32 feature maps are generated by using 32 convolutions;

layer 4 of the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 64 feature maps are generated by using 64 convolutions;

(2) constructing a connecting layer:

the 5 th layer in the hidden layer is a connection layer, and the connection layer connects the two encoders by using localization;

(3) constructing a decoder:

layer 6 of the hidden layers employs convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu, and 128 convolutions are used to generate 64 feature maps;

the 7 th layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + Relu, and 64 convolutions are used for generating 32 feature maps;

the 8 th layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + Relu, and generates 16 feature maps by using 32 convolutions;

the 9 th layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + Relu, and 8 feature maps are generated by using 8 convolutions;

the 10 th layer in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Deconv + BN + ClippedRelu, and 1 convolution is used for generating 1 feature map;

the output layer comprises a regression layer;

the super pixel characteristics and the pixel characteristics are used as the input of a network, and the super pixel characteristics and the pixel characteristics are output as the group of the corresponding input image;

step 7, training the model;

step 8, outputting the trained network model M;

the detection phase comprises:

step 10, performing median filtering on the image sequence to obtain a background image B2, and performing difference on the image sequence G2 and the background image B2 to obtain a candidate foreground sequence which is marked as a pixel characteristic F3;

step 12, calculating a histogram of pixels of a candidate foreground region corresponding to the superpixel according to the region information, wherein the interval is 0.1 when the histogram is in a range of [ -1,1 ];

and step 15, outputting a detection result.

The invention extracts candidate foreground (called as pixel characteristic) by using median filtering, carries out super-pixel segmentation on an image sequence, then extracts histogram characteristic (called as super-pixel characteristic) of super-pixels of the candidate foreground, and then respectively uses the pixel characteristic and the super-pixel characteristic as input of a pixel-finding fusion network.

In the whole detection process, only simple multiplication of a matrix is involved, so that the time complexity is small, the processing speed of a training stage and a detection stage is high, and the dynamic background can be effectively inhibited through the superpixel fusion characteristic due to the consideration of the dynamic characteristic.

Experiments show that the pixel fusion network has a good effect on 34 image sequences in CDNET 2014, more background noise can be removed through pixel fusion, and the pixel fusion network has stronger expression capability than a network with the same depth.

While the invention has been described in further detail with reference to specific embodiments thereof, it is not intended that the invention be limited to the specific embodiments thereof; for those skilled in the art to which the present invention pertains and related technologies, the extension, operation method and data replacement should fall within the protection scope of the present invention based on the technical solution of the present invention.

Claims

1. A moving target detection method based on a super-pixel fusion network is characterized by comprising two stages: (1) a training stage; (2) a detection stage;

the detection phase comprises the following steps:

step 4, calculating a histogram of pixels of a candidate foreground region corresponding to the super-pixel according to the region information, wherein the interval is 0.1 when the histogram is in a range of [ -1,1 ];

step 6, constructing a network;

step 7, training the model;

step 8, outputting the trained network model M;

the detection phase comprises:

step 14, using the super pixel characteristics F4 and the pixel characteristics F3 as the input of the trained network model M;

and step 15, outputting a detection result.

2. The moving object detection method based on the super-pixel fusion network according to claim 1, wherein the specific method in step 6 is as follows:

(1) constructing an encoder:

the layer 2 in the hidden layer adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 16 convolutions are used for generating 16 feature maps;

layer 3 of the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 32 convolutions are used for generating 32 feature maps;

layer 4 of the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu + Maxpool, and 64 feature maps are generated by using 64 convolutions;

(2) constructing a connecting layer:

(3) constructing a decoder:

layer 6 of the hidden layers adopts convolution, batch normalization, an activation layer and a pooling layer Conv + BN + Relu, and 128 convolutions are used for generating 64 feature maps;

the output layer comprises a regression layer;