CN108805897B

CN108805897B - Improved moving target detection VIBE method

Info

Publication number: CN108805897B
Application number: CN201810498273.1A
Authority: CN
Inventors: 方贤勇; 曹明军; 李薛剑; 孙恒飞; 傅张军; 孙皆安; 王华彬; 汪粼波; 周健; 李同芝; 陶宗祥
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2022-01-21
Anticipated expiration: 2038-05-22
Also published as: CN108805897A

Abstract

Aiming at the problems that a ghost phenomenon occurs in subsequent detection when an input first frame contains a moving target in a classic VIBE foreground extraction algorithm, the detection effect of a fixed radius in a model in a complex scene is poor in pixel discrimination and the like, the invention provides an improved moving target detection VIBE algorithm, which distinguishes whether a static area is a ghost area or a static target and describes the difference degree between a current pixel and a sample in a background model, and dynamically adjusts the radius in the discrimination model according to a description word in the pixel discrimination process, so that more foreground points are detected when the scene change degree is small, pixel points with small fluctuation can be prevented from being detected as foreground points when the change degree is large, and noise in a detection result is reduced. The beneficial technical effects are as follows: compared with the original VIBE algorithm, the method can remove the ghost in fewer frames and ensure that the detected moving target is more accurate.

Description

Improved moving target detection VIBE method

Technical Field

The invention relates to an image/video processing technology, in particular to an improved moving object detection VIBE algorithm.

Background content

Moving object detection^[1]That is, a moving object in a video frame sequence (without background information in a scene) is obtained by some means, and the method is common in the fields of video processing, traffic supervision, social security and the like. At present, gait recognition^[2]Target tracking^[3]Video abnormal behavior analysis^[4]The fire heat is compared in a plurality of research fields, and although the different research fields have differences, the same thing is that all research objects in the research fields are video information. Therefore, how to obtain interesting information from videos becomes a primary task in the related field. Moving object detection method^[5-7]Can be divided into three categories: frame differencing, optical flow, and background differencing. The frame difference method is to make a difference between adjacent frames (or frames with the same interval) of a video, and then obtain a foreground and a background according to a threshold value. The principle is clear and easy to understand, the code is very simple to realize, the running speed can meet the real-time requirement, but the contour extracted by the frame difference method has double shadow phenomenon, and when the interval between two frames is small, the overlapped area of the object in the front frame and the back frame can generate a hollow area after subtraction. The optical flow method describes a motion field by calculating optical flow, and then performs foreground extraction according to an optical flow amplitude threshold value. Although the detection result is relatively accurate, the time cost for calculating the optical flow is large, and therefore the method is not suitable for processing a real-time video, and the influence of noise such as shadow, occlusion and the like cannot be avoided when calculating the optical flow. Background subtraction requires constructing a background model from the first frame or the first few frames of the input video, and comparing the current frame with the background model to determine the pixel type in the current frame. The method has the advantage of being capable of adapting to the continuous change of the scene through the timely update of the background modelAnd a better detection result is obtained in a complex scene. Gaussian mixture algorithm^[8](GMM) belongs to a common algorithm of background subtraction, and GMM learns pixels which change slowly as a background and pixels which change quickly as a foreground, so that the foreground and the background are separated. However, the GMM has a long initialization process and slow parameter estimation, and is not suitable for processing real-time video.

To address the various problems with the above algorithms, Olivier Barnich et al^[9]A background subtraction algorithm-VIBE algorithm without parameter estimation is proposed in 2009, the algorithm is good in detection effect and high in running speed, is widely applied in recent years, and is gradually developed into a general background subtraction algorithm^[10]. However, the conventional VIBE algorithm has two disadvantages: 1) when the first frame contains a moving target, a ghost phenomenon can occur in the detection; 2) the VIBE background model has a fixed discrimination radius, and the fixed threshold value cannot be well adapted to foreground detection in a dynamic scene.

In order to make the detection effect of the VIBE algorithm satisfactory in a complex scene, a plurality of scholars make respective improvements on the basis of the VIBE algorithm.

Aiming at the problem of ghost detection, the current algorithms are mainly divided into two types, wherein the first type is to detect the ghost according to the motion attribute of the foreground pixel. For example, Yang et al^[11]The difference between a certain pixel in the current frame and the previous frame is compared by using a frame difference method, the motion state of each pixel is described by using a motion factor, and when the value of the motion factor is 0, the pixel is described as a static pixel and is judged as a ghost. The second type is an automatic update function with a background model. For example, Stauffer et al^[12]And modifying the background model by each frame, and gradually replacing the 'ghost' pixels introduced when the background model is initialized.

There are also some studies at present for the problem of fixed discrimination radius in the VIBE background model. For example, document [13 ]]When the pixel point category is judged, the threshold value is adjusted according to the variance of the sample of the current pixel point, a certain effect is achieved, but the execution effect of the program is seriously influenced by a large amount of variance calculationRate; literature reference^[14]The background complexity of each frame is calculated, and the value of the radius is dynamically adjusted and judged according to the background complexity, so that the accuracy of the detection result is improved to a certain extent; literature reference^[15]The maximum value and the minimum value difference value in the neighborhood pixels of the pixel point are used for adaptively adjusting the judgment radius, so that the detection accuracy is improved, but when the noise of the neighborhood pixel point is more, the detection result is adversely affected.

In view of the foregoing, there is a need for an improved and improved image processing method.

Disclosure of Invention

In order to improve the problems listed in the background art, the invention provides an improved moving object detection VIBE algorithm, which comprises the following specific steps:

an improved moving object detection VIBE algorithm is additionally provided with a ghost image removing method and an adaptive threshold value method on the basis of the existing moving object detection VIBE algorithm.

Further, the improved moving object detection VIBE algorithm is implemented according to the following steps:

step 1: inputting the current Nth frame image by the video, and judging whether the current frame is the first frame;

if yes, initializing a background model BGM by using a first frame;

if not, entering the step 2;

step 2: judging whether the current frame is the last frame number + 1;

if so, ending the operation;

if not, entering step 3;

and step 3: according to the initialized background model BGM, traversing a certain pixel point in the current frame and judging whether the pixel point is a background point or a foreground point by using an improved self-adaptive threshold;

if yes, entering step 4;

if the scene is a foreground point, entering the step 5;

and 4, step 4: setting the value of a corresponding position in the mask matrix of the corresponding frame to be 0, performing different updating strategies on the background model BGM according to whether the pixel point position is in the actual target external rectangular frame, and entering step 9;

and 5: setting the value of the corresponding position in the mask matrix of the corresponding frame to be 255, and judging whether the frame is traversed completely and the current frame number can be divided by a positive integer;

if not, entering step 3;

if yes, entering step 6;

step 6: obtaining a static area in a current frame and judging the attribute of the static area;

if the static area is a ghost area, entering step 7;

if the static area is the actual moving target area, entering the step 8;

and 7: updating a background model BGM by a local replacement strategy to accelerate elimination of ghosts, and entering step 9;

and 8: saving the position information of the external rectangle of the actual moving target, and entering the step 1;

and step 9: and (5) after the background model BGM is updated, entering the step 1.

Advantageous technical effects

For the problem of 'ghost removal', although the existing method inhibits the generation of 'ghost' to a certain extent or accelerates the elimination of 'ghost', most of the existing methods are 'bricking and tiling' in an original background model, so that the background model becomes complex and bloated, the rate of the model is slow during operation, and the model is difficult to reuse in practical application. The method provided by the invention can detect the ghost image in a few frames, does not influence the real-time performance of the VIBE algorithm, and can be used in practice. Furthermore, the invention provides a 'ghost' removing method based on contour similarity comparison, which distinguishes whether a static area is a 'ghost' area or a static target by comparing the similarity between contours extracted by a Canny operator in the static area and a corresponding gray area, and then respectively processes the static area and the static target

For the adaptive threshold problem, although the existing method can achieve a certain detection effect, the algorithm complexity is high, for example, by using a variance method, a large amount of variance calculation will affect the execution efficiency of the algorithm. The method provided by the invention constructs the LBP-T description word by means of the advantages of high calculation speed of the LBP operator and the like and imitating the construction mode of the LBP operator, so that the VIBE algorithm detection result is more accurate and the execution efficiency of the algorithm is not influenced. The method describes the difference degree between the current pixel and the sample in the background model through an LBP-T descriptor, and the difference degree directly reflects the change degree of the scene in the video. The radius of the discrimination model is dynamically adjusted according to the description characters during pixel discrimination, so that more foreground points are detected when the scene change degree is small, and the pixel points with small fluctuation can be prevented from being detected as foreground points when the change degree is large, thereby reducing the noise in the detection result.

Experimental results show that compared with the original VIBE algorithm, the improved contour similarity comparison-based method can remove the ghost in a smaller number of frames; the adaptive threshold based on the LBP-T descriptor enables the detected moving target to be more accurate.

The invention provides a ghost image removing method based on contour similarity comparison, which does not consider the problem of increasing the complexity of a model, and extracts the contour of each actual frame region and the contour of the corresponding actual frame region by using a Canny operator according to the essence that the content of the actual frame corresponding to a static moving target region and a ghost image region has larger difference, because the contour of the ghost image region has larger difference with the contour of the corresponding region of the actual frame, and the contour of the actual moving target has certain similarity with the contour of the corresponding region of the actual frame, thereby obtaining the attribute of the static target. After the ghost area is obtained, a local background model can be reinitialized by the pixel point of the area where the ghost is located in the current frame to replace the corresponding position in the original background model, so that the background model is closer to the reality, and the speed of eliminating the ghost is accelerated.

In fig. 4, e) and f) show that when the current still region is a ghost, the difference between the contours of the ghost contour and the contour of the region corresponding to the current frame, which are respectively extracted by the Canny operator, is large.

In fig. 4, g) and h) show that when the current stationary region is the actual moving target, the contours of the actual moving target binary image and the contour of the region corresponding to the current frame, which are respectively extracted by using the Canny operator, have a certain similarity.

FIG. 5 is a diagram illustrating the re-initialization of the local model of the "ghost" region to replace the corresponding position of the original model. The figure shows a process of a local model replacement strategy. The left side is the background model containing the initialization of the moving object, the middle is the local background model reinitialized by the actual frame area corresponding to the ghost image, and the right side is the replaced model. It can be found that the replaced model no longer contains the moving object information, so that the model is closer to the actual background, and the ghost elimination speed is accelerated.

FIG. 6 is a comparison of the ghost elimination speed of the present invention and the original VIBE algorithm, where the left is the frame number when the ghost appears, the middle is the frame number when the ghost elimination of the original VIBE algorithm occurs, and the right is the frame number for the ghost elimination of the present invention, it can be found that the present invention can eliminate the ghost in a small number of frames.

The invention provides an LBP-T descriptor-based adaptive threshold VIBE improvement strategy, which is characterized in that in order to not influence the execution efficiency of the whole VIBE algorithm, the method uses the principle of an LBP operator for reference, and an LBP-T descriptor of a pixel is jointly constructed by obtaining the mean value of a background model, all neighbors of a current pixel and a multi-threshold structure. The construction process of the LBP-T description word shows that the LBP-T description word can describe the difference between the current pixel to be classified and the background model sample pixel, when the difference is small, the scene change degree is small, the value of the judgment radius R can be properly reduced, so that more motion information can be detected, when the difference is large, the scene change degree is high, the value of the judgment radius R needs to be increased, and the situation that more complex information in the scene is detected as a foreground to cause more noise points in the detection result is avoided. Experiments show that the moving target result detected by the VIBE can be more accurate based on the adaptive threshold of the LBP-T descriptor.

Fig. 13 is a comparison between the LBP-T adaptive threshold VIBE based improvement of the present invention and the original VIBE algorithm. The left side is the actually input frame, the middle is the result detected by the original VIBE algorithm, and the right side is the detection result of the invention, so that the result of the invention is more accurate than the original VIBE detection result.

Fig. 16 is a comparison of the detection accuracy of several different moving object detection methods of the present invention on a data set, Canoe, and the comparison is plotted as follows. It can be seen from fig. 16 that the original VIBE algorithm has a much better detection effect than the GMM algorithm, but the original VIBE algorithm uses a fixed discrimination radius and cannot adapt to scenes with different fluctuation amplitudes, and the improved method adopts an adaptive threshold, which can automatically adjust the value of the discrimination radius along with the change degree of the scenes, so that the detection accuracy is improved, and the overall detection effect is better than that of the conventional VIBE algorithm.

Drawings

Fig. 1 is a radius threshold judgment schematic diagram of the VIBE algorithm.

FIG. 2 is the effect of "ghosting" on a moving object.

Fig. 3 is a schematic diagram of stationary area detection.

Fig. 4 is the result of the Canny operator for the static region and the corresponding gray scale region.

FIG. 6 is a comparison of the inventive method with the original VIBE "ghosting" removal rate. The three images in the lower part of fig. 6 are the comparison of the speed of the present invention for removing ghosts from the original VIBE algorithm, the frame number when ghosts appear on the left, the frame number for removing ghosts from the original VIBE algorithm in the middle, and the frame number for removing ghosts in the present invention on the right. The three upper graphs in fig. 6 are the comparison between the LBP-T adaptive threshold VIBE based improvement of the present invention and the original VIBE algorithm. The left side is the actually input frame, the middle is the result detected by the original VIBE algorithm, and the right side is the detection result of the invention, so that the result of the invention is more accurate than the original VIBE detection result.

Fig. 7 is a graph showing the effect of different discrimination radii on the results of VIBE detection.

Fig. 8 is an LBP operator construction process.

Fig. 9 is a different case where the same LBP value corresponds to a local pixel distribution.

Fig. 10 is a background model mean.

Fig. 11 is a statistical result of the difference between the pixel to be classified and the background model mean.

FIG. 12 is a schematic diagram of a multi-threshold architecture.

FIG. 13 is a schematic diagram of the improved adaptive threshold provided by the present invention compared to the results of other methods.

Fig. 14 shows that when the current still region is a ghost, the difference between the contours of the ghost contour extracted by the Canny operator and the contour of the region corresponding to the current frame is large.

Fig. 15 shows that when the current stationary region is the actual moving target, the contours of the actual moving target binary image and the contour of the region corresponding to the current frame, which are respectively extracted by the Canny operator, can be found to have a certain similarity.

Fig. 16 is a schematic diagram comparing the detection accuracy of several different moving object detection methods on the data set, Canoe, according to the present invention.

FIG. 17 is a block flow diagram of the present invention.

Detailed Description

Referring to fig. 17, the improved moving object detection VIBE algorithm according to the present invention proceeds as follows:

if yes, initializing a background model BGM by using a first frame;

if not, entering the step 2;

step 2: judging whether the current frame is the last frame number + 1;

if so, ending the operation;

if not, entering step 3;

if yes, entering step 4;

if the scene is a foreground point, entering the step 5;

if not, entering step 3;

if yes, entering step 6;

if the static area is a ghost area, entering step 7;

if the static area is the actual moving target area, entering the step 8;

Further, the specific steps of step 1 are:

inputting the current Nth frame of image by the video, and judging whether the current frame is the first frame of the video;

if yes, initializing a background model BGM by using a first frame, wherein a specific initialization strategy is as follows:

traversing all pixel points of the first frame, and regarding any pixel point v (x), the background model of each pixel point contains N samples, and the N samples are recorded as M (x) { v (x) }₁，v₂，...v_k，...v_nTaking the value of N as 20, adopting a random strategy for a sample in a background model of each pixel, and randomly selecting a pixel value in 8 neighborhoods of the pixel point for N times as a value in the pixel sample; when the first frame is traversed, the background model BGM is initialized;

if not, go to step 2.

Further, whether the pixel point is a background point or a foreground point is judged, namely, the specific process of pixel classification is as follows:

in Euclidean space, a circle S with v (x) as the center and a threshold value R as the radius is defined_R(v (x)); this circle represents the set of points at a distance from the centre v (x) of the circle that is less than the threshold R;

counting the number of the background models M (x) and v (x) with the distance less than R of the current pixel, if the distance is more than a given threshold value D_min(the value is 2), the current pixel point is considered to be close to the background sample, and the current pixel is divided into the background (the pixel value is set to be 0); otherwise, dividing into foreground points (setting the pixel value to be 1); the calculation formula is as follows:

the detection process of the VIBE algorithm has three main parameters: the number N of sample sets, the threshold min is set to be N-20 and min-2, the threshold R for distance proximity determination no longer uses a globally fixed value, and a respective determination radius should be set for each pixel point, so an adaptive threshold based on the LBP-T descriptor is proposed.

Further, in step 4, the corresponding position in the mask matrix of the corresponding frame is set to 0, and different update strategies are made for the background model BGM according to whether the pixel point position is in the actual target circumscribed rectangle frame:

if the pixel point is in the actual target external rectangular frame, the pixel point does not update the background model of the pixel point;

if the pixel point is not in the bounding rectangle of the actual target, then it has

The probability of the pixel is to replace one value of the sample m (x) in its background model with the pixel value of the pixel,

is in the range of 2 to 128, preferably

At the same time, also have

Replacing one value in the background model sample of a certain neighborhood pixel point of the pixel point by the pixel value of the pixel point; subsequently, go to step 9;

further, a specific method for determining whether the attribute of the stationary region in the current frame is a ghost, that is, a "ghost" detection method, is as follows: the method based on the contour similarity comparison comprises the following two steps:

step one, calculating a static foreground area in a video sequence of a VIBE algorithm detection result;

and secondly, extracting contours of the static foreground region and the gray image region corresponding to the region at the same position in the current frame through a Canny operator, calculating the similarity of the contours on a given contour similarity formula, and comparing the contour similarities to obtain the attribute of the static foreground region.

Further, the processing strategy of "updating the background model BGM by the local replacement strategy and accelerating elimination of ghosts" described in step 7 is: when the stationary region is judged to be a "ghost" region, it is necessary to accelerate the speed of eliminating the "ghost".

Firstly, selecting an actual background area and carrying out model initialization by using a VIBE algorithm again;

then replacing the pixel model of the 'ghost' area in the original model;

finally, a 'clean' background model only containing background pixel initialization is obtained.

Further, the adaptive threshold method in step 3 is based on LBP-T descriptor, and specifically includes the following three steps:

calculating the mean value of a background model of a pixel to be classified, obtaining 8 neighborhood pixels of the pixel and placing the mean value of the background model on the current pixel position;

step two, counting a difference value distribution rule of a plurality of pixels to be classified and a background model mean value, and constructing a multi-threshold structure chart;

and step three, calculating the LBP-T value of the current pixel, and using the value as the radius value when the current pixel is classified.

The discrimination radius R used in the background model updating strategy in the original VIBE algorithm is fixed and unchanged, the unique discrimination threshold is used by all pixels in a unified way, the fixed discrimination radius is not beneficial to background detection and can not adapt to the change of a complex scene, in order to adapt to the change of the complex scene, the fixed threshold is improved to be an adaptive threshold, a description word capable of describing the scene change degree in a video needs to be found, then a discrimination radius is set for each pixel according to the description word, and the fixed discrimination threshold in the original VIBE model is replaced.

In the pixel discrimination stage, according to the construction mode of an LBP operator, the invention constructs an LBP-T description word by using all the neighbor pixels of the current pixel to be classified and the sample value to be compared in the background model, and the description word can describe the difference between the neighbor pixels of the current pixel and a certain sample pixel in the background model. The distribution of the pixels has continuity, so that the difference indirectly reflects the difference between the current pixel to be classified and the sample pixel of the background model, when the difference is smaller, the scene change degree is smaller, the value of the judgment radius R can be properly reduced, so that more motion information can be detected, when the difference is larger, the scene change degree is higher, the value of the judgment radius R needs to be increased, and the situation that more complex information in the scene is detected as a foreground to cause more noise points in the detection result is avoided. Therefore, the fixed discrimination radius R in the VIBE model can be determined according to the size of the descriptor.

Referring to fig. 10, further, a step one for "adaptive threshold" is implemented as follows:

where X is the current pixel to be classified, nei_k(

k

1, 2.. 8) is a pixel in the 8 neighborhoods of the current pixel X. N is the number of samples in the background model,

N_iis the mean of the background samples.

Referring to fig. 11, step two for "adaptive threshold" specifically includes: in the invention, when a plurality of videos are detected, the difference value between a plurality of pixels and the mean value of the background model to be compared is counted, and the statistical result is shown in figure 11.

As can be seen from fig. 2, the difference between the pixel to be classified and the background model mean value is mostly distributed in the interval [0, 40], and the sub-interval [0, 20] in the interval [0, 40] occupies more than half of the whole interval.

Based on the above discussion, the present invention constructs a multi-threshold structure diagram, as shown in fig. 12:

wherein Th_iI-1, 2, 3.. 8 are a plurality of thresholds, which are placed in a clockwise (in accordance with the order in which the LBP forms the binary string) order, with the largest threshold placed at the highest bit of the binary string and the smallest threshold placed at the lowest bit of the binary string. Th_iSpecific values are shown in fig. 3 (b). The multi-threshold structure chart is used for comparing the average value difference between the neighbor pixels of the current pixel and the background model with the threshold value of the corresponding position.

Step three for "adaptive threshold", the implementation is as follows:

when delta_k＝|nei_k-μ |≤Th_kWhen the corresponding position in the structure diagram is set to 0, otherwise, the corresponding position is set to 1.

The LBP-T descriptor can be described by the formula:

wherein (x)_c，y_c) Represents the central element of the 3 × 3 neighborhood, and k is the number of neighbors and takes the value of 8. C is a constant, and the value in the experiment is 2. Delta_kRepresents (x)_c，y_c) Is compared to the background model mean value mu. D (x) is the sign function:

with R (x)_c，y_c)＝η×LBP-T(x_c，y_c)#

As a new radius in the original VIBE model, each pixel has a radius value of itself in discrimination, and is not a globally fixed value R any more. Eta is a factor and the experimental value is 1/3.

The construction method is used:

when the value of the descriptor LBP-T is larger, the larger the difference between the neighbor pixel point of the current pixel and the sample pixel point in the background model is, the higher the scene change degree is, at this time, the threshold value should be set larger, the noise point in the scene is prevented from being detected as the foreground point,

on the contrary, when the value of the descriptor LBP-T is smaller, it indicates that the neighbor pixel point of the current pixel is closer to the background model sample pixel point, the scene change degree is lower, and the threshold value should be set smaller, so that more pixel points are detected as the foreground.

Further, the detailed steps of the ghost removing method proposed in step 6 are as follows: the ghost image removing method is a contour similarity comparison based eliminating method, and specifically comprises the following three steps:

step one, calculating a static foreground area in a video sequence of a VIBE algorithm detection result; the method comprises the following specific steps: initializing two all-0 matrixes f _ c and s _ r; when the current pixel is detected as a foreground point, taking out the value of the corresponding position of the f _ c matrix, adding 1 to the value and storing the value back; the frame number currently being processed can be divided by P, wherein P is equal to 30, detection is carried out once every 30 frames, the position of a point which is greater than or equal to Q and is equal to 20 in an f _ c matrix is found, and the value of the corresponding position in an s _ r matrix is set to be 255; finding out the circumscribed rectangular frame of the connected region which meets the condition in the s _ r matrix, abandoning the circumscribed rectangular frame when the circumscribed rectangular frame is small in area, and reserving the circumscribed rectangular frame when the circumscribed rectangular frame is large in area; the small area refers to the rejection of rectangular frames with the number of pixels smaller than 20, and on the contrary, the large area is obtained; the position of the rectangular frame is the area where the static object is located, and the two matrixes f _ c and s _ r are reset to be all 0 matrixes. Preferably, P, Q are all positive numbers.

Extracting contours of the static foreground region and the gray image region corresponding to the region at the same position in the current frame through Canny operators, calculating the similarity of the contours on a given contour similarity formula, and comparing the contour similarities to obtain the attribute of the static foreground region; the method comprises the following specific steps: after the static area is obtained, Canny edge extraction is adopted to carry out contour detection on the static area and the gray map area corresponding to the static area: judging the attribute of the current static area by adopting a mode of calculating the contour similarity of the binary image; the attribute means that the static area is a ghost or a true static target; the method comprises the following steps: 1. extracting the contour of the static region by using a Canny operator and marking the contour as C₀(ii) a 2. Finding out corresponding position on the gray scale image according to the static area, extracting the contour thereof by using Canny operator, and recording the contour as C₁(ii) a 3. Calculating C₀And C₁Defining a binary image C₀、C₁The similarity of (a) is as follows:

wherein size (C)₀∩C₁) Is C₀∩C₁Length and width product of the result graph, f_gCount(C₀∩C₁) Is C₀∩C₁The number of foreground points (value 255) in the result map,the attribute SR of the still region can be expressed as follows:

wherein T is₀As threshold value, value T is taken in the experiment₀0.02; after the attribute determination of the static area is completed, corresponding processing needs to be performed on the static area.

Step three, after determining the attribute of the static area, namely the ghost or the actual static moving target, giving different processing to the static foreground area according to different attributes; the method comprises the following specific steps:

if the static area is the static of the actual moving object, in order to inhibit the disappearance of the moving object, when the VIBE algorithm detects that the current pixel is a background point, whether the point is in a circumscribed rectangular frame of the static area is judged. If so, the updating factor of the point to the background model is reduced, and the pixel point is directly selected not to update the background model.

If the static area is a ghost area, the elimination speed of the ghost needs to be accelerated, an actual background area is selected, the VIBE algorithm is used for model initialization again, and then a pixel model of the ghost area in the original model is replaced; a "clean" background model is obtained that contains only background pixel initializations.

To better illustrate and compare the technical advantages of the present invention, an angle explanation is now provided as follows:

referring to fig. 17, a specific flow diagram of the present invention is shown, alternatively described as follows:

firstly, the method comprises the following steps: and classifying the pixel points in each frame by utilizing an adaptive threshold value VIBE algorithm improvement strategy based on the LBP-T descriptor, namely whether the pixel points are background pixel points or foreground pixel points. The detection process of the VIBE algorithm mainly comprises three parameters: the number N of sample sets, the threshold # min is set to be N ═ 20, # min ═ 2, and the threshold R for distance proximity determination no longer uses a globally fixed value, and a respective determination radius should be set for each pixel point. With R (x)_c，y_c)＝η×LBP-T(x_c，y_c)#

As a new radius value in the original VIBE model, each pixel has its own radius value in discrimination, and is not a globally fixed value R any more. Eta is a factor and the experimental value is 1/3.

II, secondly: when the pixel point to be classified is judged as a background point, firstly, whether the pixel point is in an external rectangular area of an actual moving target is checked; if so, the pixel point does not update the model; if not, the model is updated according to the original update factor. I.e. it has

The probability of the pixel point is used to replace one value of the sample M (x) in the background model, and at the same time, there is a value of the sample M (x)

The pixel value of the pixel point is used for replacing one value in the background model sample of a certain neighborhood pixel point.

Thirdly, the method comprises the following steps: and when the pixel point to be classified is judged as a foreground point, detecting whether the frame is processed completely and whether the frame number can be divided by a positive integer P (P is 30) or not. If the condition is not met, continuing traversing according to the original step; if the conditions are met, a stationary point is detected, that is, a pixel point in the P frame with at least Q (Q20) times detected as foreground is detected.

Fourthly, the method comprises the following steps: extracting the contours of the detected static region and the region in the actual frame corresponding to the static region by using a Canny operator respectively, calculating the similarity of the contours of the detected static region and the region in the actual frame, and when the similarity is greater than 0.02, considering the static region as an actual static target, storing the position information of the region, wherein the pixels detected as background points do not update a background model in the region; when the similarity is less than 0.02, the image is considered as a ghost area. And reinitializing a local background model by using the region to replace the corresponding position value of the original background model.

Regarding the working principle of the VIBE algorithm:

the VIBE algorithm adopts a background modeling and foreground detection technology based on a pixel level, a background model is initialized through a first frame, then foreground and background discrimination is carried out on pixels in each new frame, and the pixels which are determined as the background have certain probability to update samples in the background model.

The VIBE algorithm framework mainly comprises three modules: 1) initializing a background model; 2) detecting a foreground; 3) and updating the background model.

1) Initialization of the model: the VIBE algorithm establishes a background model for each pixel point v (x) of a first frame of a video, and the background models of the pixel points jointly form the background model of the whole VIBE algorithm. The background model of each pixel contains N samples, which are denoted as m (x) ═ v₁，v₂，...v_k，...v_nUsually N takes the value 20. And a sample in the background model of each pixel adopts a random strategy, and one pixel value in 8 neighborhoods of the pixel point is randomly selected for N times as one value in the pixel sample.

2) And (3) foreground detection: in Euclidean space, a circle S with v (x) as the center and a threshold value R as the radius is defined_R(v (x)). This circle represents the set of points at a distance from the center v (x) that is less than the threshold R, as shown in fig. 1.

Counting the number of the background models M (x) and v (x) with the distance less than R of the current pixel, if the distance is more than a given threshold value D_min(the value is 2), the current pixel point is considered to be closer to the background sample, the current pixel is properly divided into the background, and otherwise, the current pixel is divided into the foreground point. The calculation formula is as follows:

3) background model update strategy: when a certain pixel point is classified as a background pixel point, the pixel point needs to update the background model of the VIBE algorithm. The VIBE algorithm adopts a random updating mechanism and has a random updating function for the judged background pixel points v (x)

Is determined by the probability ofThe pixel value of the pixel is used to replace one value of the sample M (x) in the background model, and there is also one

The VIBE algorithm is simple in implementation principle, good in real-time performance and good in detection effect, and is widely applied to the fields of foreground extraction, moving target detection and the like, but the VIBE algorithm has certain defects. Firstly, the VIBE algorithm automatically updates the ghost by adopting a model, and a large amount of video frames are consumed; secondly, when the pixel points are distinguished, a fixed threshold is adopted, and some pixel points are mistakenly detected in a complex scene. Improvements will be made in both these respects.

VIBE algorithm improved in relation to the present invention

Aiming at the defects of the VIBE algorithm, the invention respectively provides a ghost image removing method based on contour similarity comparison, an adaptive threshold method based on an LBP-T descriptor and a moving target hole filling algorithm based on the combination of super-pixel segmentation and significance. For the proposed 'ghost' removing method based on contour similarity comparison, firstly, a static foreground region in a VIBE algorithm detection result is detected, then contours extracted through Canny operators are respectively obtained according to the static foreground region and a corresponding gray map region, attributes of the static foreground region are obtained through contour similarity comparison, and finally different treatments are given to the static foreground region according to different attributes; according to the LBP-T descriptor-based adaptive threshold method, a descriptor capable of describing the difference degree between a current pixel and a sample pixel in a background model is constructed in a pixel classification stage according to the construction form of an LBP operator, and a fixed judgment radius in a VIBE algorithm is changed into an adaptive threshold according to the size of the descriptor.

Ghost detection and removal based on contour similarity comparison

Ghost problem analysis: the ghost problem is a relatively troublesome problem commonly faced in foreground extraction algorithms such as VIBE and the like. The reason for the formation of the ghost is that when the VIBE background model is initialized, the first frame already contains the moving object, so that a value which is greatly different from the actual background is introduced when the background model of the pixel point of the area where the moving object is located is initialized. When a moving object starts to move once, the area it just started to occupy is detected as foreground, but the detected foreground area has no real moving object in fact, which is the so-called "ghost" phenomenon. The foreground in the red rectangular box as in fig. 2(b) is "ghost". The long-time existence of the ghost inevitably influences the acquisition of the real foreground target and also generates certain negative influence on the related research in the computer vision field such as target detection, gait recognition, target tracking and the like

FIG. 2 shows the frame detection results obtained from the data set [16] streetLight using the VIBE algorithm. As can be seen from fig. 2(b), the expression form of the existence of the "ghost" is similar to the moving object, and is difficult to distinguish. The existence of the ghost influences the initialization and updating operation of the background model of the pixel of the ghost area, thereby having certain influence on the real target detection.

The VIBE model background updating principle shows that the pixel points detected as the background near the ghost area can slowly erode the ghost. When the pixel point near the ghost area is judged as the background pixel, the sample value of the neighborhood is updated with a certain probability, namely the ghost value in the sample model is slowly replaced by the background value, but the updating process is slow. As shown in fig. 2(c), the "ghost" area in the figure is not completely removed until frame 170, and it can be seen that the automatic update process of the VIBE algorithm consumes a large number of frames.

The algorithm improvement principle is as follows: the original VIBE algorithm needs to be multiple frames later to remove the "ghost" areas clean. The objective of the improved algorithm herein is to speed up the disappearance of "ghosts" and suppress the disappearance of actual stationary objects, since it is considered that actual stationary objects are also part of moving objects, and are only temporarily stationary. For example, in the anomaly detection in a surveillance video system, if an actual stationary target is abnormal, warning is also needed, and it is necessary to ensure that the actual stationary target is detected.

Firstly, detecting Q times (P < Q) of pixel points detected as foreground in a P frame by using a VIBE algorithm, and obtaining the position of a static area after processing; then, contour information of a static area and a corresponding area of a gray scale image of the current frame corresponding to the static area is obtained through a Canny operator, and the attribute (ghost or static target) of the static area is judged by calculating the similarity of the two contour information; and finally, adopting different processing methods for the pixel points of the static area according to the attribute of the static area.

The detailed process of the improved method comprises the following steps: it is known that the representation form of the "Ghost" region (Ghost) and the actual stationary target region (Object) in the detection result of the VIBE algorithm is the same, and both represent that the same position in a plurality of video frames is always detected as a foreground pixel, so it is first required to detect the region that is always detected as a foreground point in the current output result of the VIBE algorithm, i.e. the stationary region, and the detection steps are as follows:

1) initializing two all-0 matrixes f _ c and s _ r;

2) when the current pixel is detected as a foreground point, taking out the value of the corresponding position of the f _ c matrix, adding 1 to the value and storing the value back;

3) the currently processed frame number can be divided by P (P is 30, detection is performed every 30 frames), the position of a point which is greater than or equal to Q (Q is 20) in the f _ c matrix is found, and the value of the corresponding position in the s _ r matrix is 255;

4) and finding out a circumscribed rectangular frame (a small-area rectangular frame is abandoned) of the connected region which meets the condition in the s _ r matrix, wherein the position of the rectangular frame is the region where the static target is located and is marked on the original image. As shown in fig. 3 (c).

5) Resetting the two matrixes f _ c and s _ r as the all 0 matrixes and preparing initialization for the next circulation.

After obtaining the above static area, the present document uses Canny edge extraction to perform contour detection on both the static area and the gray map area corresponding to the static area, and the contour detection result is shown in fig. 4.

The content in the actual frame corresponding to the "ghost" area is different from the content in the actual frame corresponding to the actual stationary object area, and the actual frame corresponding to the "ghost" area is generally a background area, as shown in fig. 4(b), the background area generally has a smooth pixel distribution, while the actual stationary object corresponds to the actual stationary moving object, as shown in fig. 4 (d), the actual moving object and the surrounding area have a contour, which is also exactly similar in shape to the contour of the current stationary area binary image. Based on such elicitations, the method of calculating the contour similarity of the binary image is adopted herein to judge the attribute of the current static area (whether the ghost is a real static target). The calculation steps are as follows:

1) extracting the contour of the static region by using a Canny operator and marking the contour as C₀；

2) Finding out corresponding position on the gray scale image according to the static area, extracting the contour thereof by using Canny operator, and recording the contour as C₁；

3) Calculating C₀And C₁Defining a binary image C₀、C₁The similarity of (a) is as follows:

wherein size (C)₀∩C₁) Is C₀∩C₁Length and width product of the result graph, f_gCount(C₀∩C₁) Is C₀∩C₁The number of foreground points (with a value of 255) in the result map, the attribute SR of the still area can be expressed as follows:

wherein T is₀As threshold value, value T is taken in the experiment₀＝0.02。

After the attribute determination of the static area is completed, corresponding processing needs to be performed on the static area.

If the static area is the static of the actual moving object, in order to inhibit the disappearance of the moving object, when the VIBE algorithm detects that the current pixel is a background point, whether the point is in a circumscribed rectangular frame of the static area is judged. If so, the update factor of the point to the background model is reduced, and the point is directly selected not to update the background model.

If the static area is the ghost area, the speed of eliminating the ghost needs to be increased, the actual background area is selected, the VIBE algorithm is used for model initialization again, and then the pixel model of the ghost area in the original model is replaced. As shown in fig. 5(c), a "clean" background model is obtained that only contains background pixel initialization.

Verification and analysis of the improvement effect: the proposed contour similarity comparison based "ghost" removal method and the original VIBE slow update "ghost" removal method make comparison experiments on two sets of videos in the datasets PETS2006 and streetLight, and the experimental results are shown in fig. 6.

6(a), (d) the red rectangle frame is the "ghost" area, (b), (e) the number of frames consumed by the original VIBE algorithm to update "ghost" slowly, (c), (f) the number of frames consumed by the improved "ghost" removal, it is obvious that the number of frames consumed by the improved method in video 1 when "ghost" is updated slowly is 120 frames, and the improved method only needs 53 frames of "ghost" to almost disappear; the frame consumed when the ghost image in the video 2 is updated slowly is 170 frames, the ghost image in the improved method almost disappears only by 74 frames, and experiments prove that the improved method is quicker and more effective for removing the ghost image.

Adaptive threshold based on LBP-T descriptor

For threshold problem analysis: the selection of the threshold is very important for the result of foreground detection, the result of foreground detection contains a lot of noise points if the threshold is small, and the missing detection of a target with a large area if the threshold is large.

Fig. 7 sets two different discrimination radii for the VIBE algorithm, and two sets of experiments are respectively compared. As can be seen from fig. 7(a) and 7(c), when the discrimination radius is set to be larger, the noise in the scene may not be detected; however, as shown in fig. 7(b) and 7(d), if the determination radius is set to be large, the contour of the moving object is missed, so that the contour is unclear and incomplete.

It can be concluded from fig. 7 that the discrimination radius of the VIBE algorithm is not fixed or optimal, and it is desirable that it be adjusted appropriately to obtain the best detection result with the scene change.

3.2.2 Algorithm improvement principle

In the pixel discrimination stage, the LBP is imitated^[17]In the method for constructing the operator, an LBP-T description word is constructed by all neighboring pixels of the current pixel to be classified and the sample value to be compared in the background model, and the description word can describe the difference between the neighboring pixels of the current pixel and a certain sample pixel in the background model. The distribution of the pixels has continuity, so that the difference indirectly reflects the difference between the current pixel to be classified and the sample pixel of the background model, when the difference is smaller, the scene change degree is smaller, the value of the judgment radius R can be properly reduced, so that more motion information can be detected, when the difference is larger, the scene change degree is higher, the value of the judgment radius R needs to be increased, and the situation that more complex information in the scene is detected as a foreground to cause more noise points in the detection result is avoided. Therefore, the fixed discrimination radius R in the VIBE model can be replaced according to the size of the description word.

The detailed process of the improved method comprises the following steps: the calculation is simple, and the image can be analyzed in real time, and the like, which is the characteristic of the LBP operator. As shown in fig. 8, a conventional LBP is defined in a 3 × 3 neighborhood of pixels, using an intermediate pixel as a threshold, comparing neighboring pixel values with pixel values of intermediate positions, and if the neighboring values are greater than the central pixel value, the position of the neighboring pixel is marked as 1, otherwise, the position is 0. Thus, 8 pixels in a 3 × 3 neighborhood are compared to obtain an eight-bit binary number, which is then converted to a decimal number, which is the LBP value of the pixel, and formed into a binary string in a clockwise order.

The above construction process shows that the LBP operator describes the relationship of a certain pixel to the surrounding pixels. The value of the LBP operator does not directly indicate the degree of difference between the neighboring pixels and the intermediate pixel, as shown in fig. 9. Fig. 9 shows that the LBP values of (a) and (d) are the same, but the difference between the neighboring pixels and the intermediate pixels of (a) and (d) is different, the intermediate pixels and neighboring pixels of (d) are similar, but the intermediate pixels of (a) are certainly much different from the neighboring pixels. Assuming that the middle pixel in fig. 9(a) is a sample mean value in the background model of a corresponding position of a certain pixel in the VIBE algorithm when performing pixel discrimination, and 8 neighboring pixels thereof are all neighbors of the current pixel, it can be known that the difference between all neighbors of the current pixel and the sample value is large, and the complexity of scene change is higher. The distribution among pixels in an image is not arbitrary, and certain regularity exists between local parts. The current pixel has a certain similarity in spatial distribution with its neighboring pixels. When the difference between most of the neighborhood pixels of the current pixel and the pixel value in the background model is small, the difference between the current pixel and the background model is small, the lower the scene change degree is reflected by the side surface, and the discrimination threshold value is properly reduced, so that more foreground pixel points are detected; when the difference between the pixel values of most neighborhood pixels and the background model is large, the difference between the current pixel and the background model is large, the degree of scene change reflected by the side surface is higher, the judgment radius is properly increased, some pixel points with small fluctuation are prevented from being detected as foreground points, and the noise in the detection result is reduced.

Therefore, it is necessary to find a descriptor that can describe the degree of difference between the 8 neighboring pixels of the current pixel and the sample value of the background model, i.e., the degree of scene change.

Following the definition of the LBP operator, the invention constructs an LBP-T (threshold value) descriptor. The construction method comprises the following steps:

(1) background model mean calculation

Where X is the current pixel to be classified, nei_k(

k

N_iis the mean of the background samples.

(2) Construction of a Multi-threshold Structure graph

In the method, during multiple video detections, the difference between multiple pixels and the mean value of the background model to be compared is counted, and the statistical result is shown in the following chart.

As can be seen from fig. 11, the difference between the pixel to be classified and the background model mean value is mostly distributed in the interval [0, 40], and the sub-interval [0, 20] in the interval [0, 40] occupies more than half of the whole interval.

A multi-threshold structure diagram is constructed: in FIG. 12, Th_iI-1, 2, 3.. 8 are a plurality of thresholds, which are placed in a clockwise (in accordance with the order in which the LBP forms the binary string) order, with the largest threshold placed at the highest bit of the binary string and the smallest threshold placed at the lowest bit of the binary string. Th_iSpecific values are shown in fig. 12 (b). The multi-threshold structure chart is used for comparing the average value difference between the neighbor pixels of the current pixel and the background model with the threshold value of the corresponding position.

(3) Computing LBP-T descriptors

When delta_k＝|nei_k-μ|≤Th_kWhen the corresponding position in the structure diagram is set to 0, otherwise, the corresponding position is set to 1.

The LBP-T descriptor can be described by the formula:

wherein (x)_c，y_c) Represents the central element of the 3 × 3 neighborhood, and k is the number of neighbors and takes the value of 8. C is a constant, trueThe median value was 2. Delta_kRepresents (x)_c，y_c) Is compared to the background model mean value mu. D (x) is the sign function:

with R (x)_c，y_c)＝η×LBP-T(x_c，y_c) (6)

The construction method shows that when the LBP-T value of the descriptor is larger, the difference between the neighbor pixel point of the current pixel and the sample pixel point in the background model is larger, the scene change degree is higher, at the moment, the threshold value is set to be larger, the noise point in the scene is prevented from being detected as the foreground point, and conversely, when the LBP-T value of the descriptor is smaller, the neighbor pixel point of the current pixel and the sample pixel point of the background model are closer, the scene change degree is lower, the threshold value is set to be smaller, and therefore more pixel points are detected as the foreground points.

And (3) comparing and analyzing results: in three videos of a data set pedestrians, a canoe and an opencv self-contained sample video, the same frame input is selected, and respective detection results are obtained and compared on a GMM algorithm, an original VIBE algorithm and an LBP-T descriptor-based adaptive threshold VIBE improved algorithm. Fig. 13 shows a comparison of 3 sets of experimental results, wherein the images in column (a) are the 520 th, 958 th and 435 th input images of the exemplary videos of the data sets pedestrians, canoe and opencv from top to bottom. (b) The column images are the detection results of the Gaussian mixture GMM algorithm on the corresponding input frame, and it can be seen that many foreground points are mistakenly detected as background points in the GMM detection results. (c) The column image is the detection result of the original VIBE algorithm on the corresponding input frame, which is significantly better than the detection result of the GMM algorithm. (d) The column images are detection results of the LBP-T descriptor-based VIBE improved algorithm on the corresponding input frames, and experiments prove that the obtained foreground detection results are better by using the improved adaptive threshold, the detection results are fuller than the original VIBE algorithm, the contained foreground pixel points are more, and the foreground is more real.

In conclusion, aiming at the problem of 'ghost' of the classic VIBE foreground extraction algorithm, the invention provides a 'ghost' detection method based on contour similarity comparison, and compared with the method for automatically updating and removing 'ghost' by VIBE, the method has the advantages that the number of used frames is less, and the 'ghost' is removed more quickly; aiming at the problem that the detection effect of a fixed discrimination radius on a complex background in a VIBE background model is poor, the VIBE improvement method of the self-adaptive threshold of the LBP-T descriptor is provided, so that the result obtained after improvement is better than the original detection result. The improved method makes feasible bedding work for subsequent gait recognition, anomaly detection and other researches.

6 reference

1.Hu,W.,et al.,Moving object detection using tensor-based low-rank and saliently fused-sparse decomposition.IEEE Transactions on Image Processing,2017.26(2):p.724-737.

2.Man,J.and B.Bhanu,Individual recognition using gait energy image. IEEE transactions on pattern analysis and machine intelligence, 2006.28(2):p.316-322.

3.Leal-Taixé,L.,et al.,Tracking the trackers:an analysis of the state of the art in multiple object tracking.arXiv preprint arXiv:1704.02781,2017.

4.Bala,R.and V.Monga,Video Anomaly Detection.Computer Vision and Imaging in Intelligent Transportation Systems,2017:p.227-256.

5.Bouwmans,T.,F.El Baf,and B.Vachon,Background modeling using mixture of gaussians for foreground detection-a survey.Recent Patents on Computer Science,2008.1(3):p.219-237.

6.Bouwmans,T.,Traditional and recent approaches in background modeling for foreground detection:An overview.Computer Science Review,2014.11:p.31-66.

7.Kim,K.,et al.,Real-time foreground–background segmentationusing codebook model.Real-time imaging,2005.11(3):p.172-185.

8.Reynolds,D.,Gaussian mixture models.Encyclopedia of biometrics, 2015:p.827-832.

9.Barnich,O.and M.Van Droogenbroeck.ViBe:a powerful random technique to estimate the background in video sequences.in Acoustics,Speech and Signal Processing,2009.ICASSP 2009.IEEE International Conference on.2009.IEEE.

10.Barnich,O.and M.Van Droogenbroeck,ViBe:A universal background subtraction algorithm for video sequences.IEEE Transactions on Image processing,2011.20(6):p.1709-1724.

11.Li,Y.,W.Chen,and R.Jiang.The integration adjacent frame difference of improved ViBe for foreground object detection.in 2011 7th International Conference on Wireless Communications, Networking and Mobile Computing.2011.

12.Stauffer,C.and W.E.L.Grimson,Learning patterns of activity using real-time tracking.IEEE Transactions on pattern analysis and machine intelligence,2000.22(8):p.747-757.

13.Brutzer,S.,B.

and G.Heidemann.Evaluation of background subtraction techniques for video surveillance.in Computer Vision and Pattern Recognition(CVPR),2011 IEEE Conference on.2011.IEEE.

14.Hofmann,M.,P.Tiefenbacher,and G.Rigoll.Background segmentation with feedback:The pixel-based adaptive segmenter. in Computer Vision and Pattern Recognition Workshops(CVPRW),2012 IEEE Computer Society Conference on.2012.IEEE.

15.Su,Y.,A.Li,and K.Jiang,Improved visual background extractor model for moving objects detecting algorithm.J.Comput.Aided Des. Comput.Graph,2014.26(2):p.232-240.

16.PETS 2012 Benchmark Data http://pets2012.net/.

17.Wang,X.,T.X.Han,and S.Yan.An HOG-LBP human detector with partial occlusion handling.in Computer Vision,2009 IEEE 12th International Conference on.2009.IEEE.

18.Van Droogenbroeck,M.and O.Paquot.Background subtraction: Experiments and improvements for ViBe.in Computer Vision and Pattern Recognition Workshops(CVPRW),2012 IEEE Computer Society Conference on.2012.IEEE.

Claims

1. An improved moving object detection (VIBE) method, characterized by: the method comprises the following steps:

step 1: inputting the current Nth frame image by the video, and judging whether the current frame is the first frame:

inputting the current Nth frame image by the video, and judging whether the current frame is the first frame of the video:

if yes, initializing a background model BGM by using a first frame, and initializing: traversing all pixel points of the first frame, and regarding any pixel point v (x), the background model of each pixel point contains N samples, and the N samples are recorded as M (x) { v (x) }₁,v₂,…v_k,…v_NTaking the value of N as 20, adopting a random strategy for a sample in a background model of each pixel, and randomly selecting a pixel value in 8 neighborhoods of the pixel point for N times as a value in the pixel sample; when the first frame is traversed, the background model BGM is initialized;

if not, entering the step 2;

step 2: judging whether the current frame is the last frame number + 1;

if so, ending the operation;

if not, entering step 3;

and step 3: according to the initialized background model BGM, traversing a certain pixel point in the current frame and judging whether the pixel point is a background point or a foreground point by using an improved self-adaptive threshold; the self-adaptive threshold method in the step is based on an LBP-T description word, and specifically comprises the following steps:

calculating the LBP-T value of the current pixel, and using the value as the radius value when the current pixel is classified;

if yes, entering step 4;

if the scene is a foreground point, entering the step 5;

if not, entering step 3;

if yes, entering step 6;

if the static area is a ghost area, entering step 7;

if the static area is the actual moving target area, entering the step 8;

2. The improved moving object detection VIBE method of claim 1, wherein in step 3, the pixel point is determined to be a background point or a foreground point, i.e. the specific process of pixel classification is as follows:

in European space, determineDefining a circle S with v (x) as the center and a threshold R as the radius_R(v (x)); this circle represents the set of points at a distance from the centre v (x) of the circle that is less than the threshold R;

counting the number of the background models M (x) and v (x) with the distance less than R of the current pixel, if the distance is more than a given threshold value D_minIf the current pixel point is close to the background sample, the current pixel is divided into the background; otherwise, the scene is divided into foreground points; the calculation formula is as follows:

3. the improved moving object detection VIBE method of claim 1, wherein in step 4, the corresponding position in the mask matrix of the corresponding frame is set to 0, and different update strategies are made for the background model BGM according to whether the pixel position is in the actual target bounding rectangle frame:

ranges from 2 to 128; at the same time, also have

Replacing one value in the background model sample of a certain neighborhood pixel point of the pixel point by the pixel value of the pixel point; subsequently, step 9 is entered.

4. An improved moving object detection, VIBE, method according to claim 1,

the specific method for judging whether the attribute of the static area in the current frame is the ghost or not, namely the ghost detection method, comprises the following steps: the method based on the contour similarity comparison comprises the following two steps: