CN102340620B

CN102340620B - Mahalanobis-distance-based video image background detection method

Info

Publication number: CN102340620B
Application number: CN 201110328046
Authority: CN
Inventors: 杨梦宁; 洪明坚; 徐玲; 张小洪; 杨丹; 霍东海; 葛永新; 陈远; 胡海波
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2011-10-25
Filing date: 2011-10-25
Publication date: 2013-06-19
Anticipated expiration: 2031-10-25
Also published as: CN102340620A

Abstract

The invention provides a Mahalanobis-distance-based video image background detection method. Red, green and blue (RGB) component distribution characteristics of pixels in a video image are taken into account and analyzed, and are measured according to the RGB component distribution characteristics, discovered by researches and analysis, of the pixels in the video image by utilizing a Mahalanobis distance algorithm to acquire a real football-shaped RGB component distribution outline of background pixels in the video image, and background detection is performed in combination with a threshold method, so background detection accuracy is improved; even though a small number of noises exist in a background detection result, the noises are mainly distributed in the vicinity of foreground pixels, so the requirements of video image background identification and foreground capturing on practicability and accuracy in actual application can be completely met; and simultaneously, operation efficiency equivalent to that of a codebook background modeling detection method is substantially kept by the method, and the method is highly real-time and robust.

Description

Video image background detection method based on Mahalanobis distance

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a Mahalanobis distance-based video image background detection method.

Background

With the wide application of video monitoring cameras in the field of intelligent monitoring and the rapid development of intelligent video analysis technology, the video abstraction technology gradually enters the eye curtains of people. The video abstraction technology is a technology for compressing mass video data for a long time into a controllable time period, and is convenient for people to browse videos. In the video summarization technology, an efficient background detection method is required to be used for detecting the background of a video image, and then a foreground moving object in a video is captured as an object to be tracked by the video summarization, so that the background detection technology becomes a research hotspot in the video summarization technology. Meanwhile, in other video image processing technical fields, such as the technical field of face recognition, the technical field of video compression processing, and the like, the background detection technology is also needed to distinguish the background from the foreground. Therefore, the background detection technology has an important position in video image processing technologies of various applications, and is a mainstream research direction in the field of image processing technology at present. Background detection methods widely used at present mainly include a background difference method, a nuclear density estimation detection method, a mixed Gaussian background modeling detection method and a codebook background modeling detection method.

Heikkila et al in the literature "Heikkila, J.and O.silver.A real-time system for monitoring of cyclists and polypeptides.in: collins, Colorado: IEEE, 1999.74-81 "and the literature" picchardi, m.background subset technologies: in: IEEEInternational Conference on Systems, Man and Cybernetics 2004, The Hague, Netherlands: the background difference method is proposed in IEEE 2004.3099-3104 vol.4 ", and the algorithm uses a background subtraction method, that is, a background image which is given in advance is subtracted from a video image sequence to be processed, and then a binarization method is applied to distinguish a moving foreground. The algorithm has the advantages of easy implementation, low algorithm complexity, almost no consumption of computing resources, and rapid acquisition of the motion foreground, and is further applied to a real-time video abstract generation system. However, the algorithm needs to give a complete background image in advance, and the complete background image is not easy to obtain, depends on external input, and cannot be updated over time, so that a large error occurs in the work at the later stage of video processing.

Elgamma et al, in the literature "Picchardi, M.Background and subset technologies: in: IEEE International Conference on Systems, Man and Cybernetics 2004, The Hague, Netherlands: IEEE 2004.3099-3104 vol.4 "and the literature" Elgammal, a., d.hartwood, and l.davis, Non-parametric model for background Vision, computer Vision ECCV 2000, 2000: p.751-767 "proposes a nonparametric background modeling method, which estimates the probability density of background pixel values in a time sequence, and estimates the probability that the pixel values belong to the foreground or the background through a window with a set length and a window function, thereby determining whether each pixel in an image is a background pixel. The algorithm has the advantages of being convenient to adapt to the addition of new training samples and providing convenience for online learning of density estimation. However, the algorithm is too complex to be applied in a real-time motion detection system, and is not robust under dynamic background and light abrupt change conditions.

Wren et al, in "Wren, c.r., et al, pfinder: real-time tracking of the human body IEEEtransactions on Pattern Analysis and Machine understanding, 1997.19 (7): p.780-785, a single Gaussian model is used for modeling the background, the limitation that the background needs to be input externally is overcome, and the detection effect is good in indoor and other single-peak environments, but in complex multi-peak environments such as a fluctuating lake surface and swinging leaves, the model is difficult to be used for accurately modeling the background environment. In order to solve these problems, Stauffer et al propose a mixed gaussian background modeling detection method (abbreviated as MOG method), which considers the temporal continuity of the pixel points, assumes that the distribution of the background pixels on the time series is a mixed gaussian model, assumes that the three components of the RGB space are mutually independent, and gives the distribution characteristics of the background pixels, i.e., the background pixels present a spherical distribution in the RGB space. However, the three components in the RGB space are not independent from each other, so the MOG method is not very accurate in describing the distribution characteristics of the background pixels, resulting in an increase in the detection error of the moving foreground.

Kim et al in "Kim, K., et al," Real-time for background segmentation using codebook model. Real-time imaging, 2005.11 (3): p.172-185 "and the document" Chalidabhongse, T.H., et al.A. circulation method for evaluating background analysis algorithms in: joint IEEEInternational work on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.2003.Nice, France: citeseer proposes a structured codebook background modeling detection method, and obtains a better image background detection effect in a multimodal environment. Wu et al, in "Wu, M.and X.Peng, spread-temporal context for coded-based dynamic background section, AEU-International Journal of Electronics and Communications, 2010.64 (8): p.739-747 "and Qiu et al in the literature" Tu, Q., Y.xu, and M.Zhou.Box-based codebook model for real-time object detection. in: 7th World consistency on Intelligent Control and Automation. Chongqing, China: IEEE 2008.7621-7625 "partially improves the codebook background modeling detection method proposed by Kim, etc., and further improves the accuracy of background detection to a certain extent. A codebook background modeling detection method is based on brightness change of pixels of a video image, and realizes the distinguishing of a background and a foreground by defining the upper and lower boundaries of the brightness of a background model, thereby reducing the influence of global and local illumination change on the distinguishing of the background, having better background detection effect compared with an MOG method, compressing the background model by operation on the basis of not influencing the foreground detection effect, greatly reducing the demand and the calculated amount on an internal memory, and having better processing effect and operation efficiency than the three background detection methods. However, the codebook background modeling detection method is proposed based on luminance statistical observation of the video image pixel points, and the RGB component distribution of the video image pixel points is not considered, so that the distinction of the image background and the image foreground is not accurate enough in many cases, which causes more noise in background detection and foreground capture.

Disclosure of Invention

Aiming at the problems in the prior art, RGB component distribution characteristics of video image pixel points are considered and analyzed, and the RGB component distribution characteristics of the video image pixel points are measured by using a Mahalanobis distance algorithm, so that the video image background detection method based on the Mahalanobis distance is higher in background detection accuracy.

In order to achieve the purpose, the invention adopts the following technical means:

the method for detecting the background of the video image based on the Mahalanobis distance comprises the following steps:

a) extracting F frame background images from the video to serve as a training sample set, wherein F is more than or equal to 80 and less than or equal to L, and L represents the total frame number of the video;

b) respectively solving an RGB mean matrix of each pixel point of the image in the training sample set:

{\overset{&OverBar;}{X}}_{k} = \frac{1}{F} Σ_{i = 1}^{F} X_{k} (i), k = 1,2, . . ., (M \times N);

wherein k represents the serial number of a pixel point in an image of the training sample set, and M multiplied by N represents the resolution of a video image;

representing an RGB mean matrix of a k-th pixel point of an image in a training sample set; x_k(i)＝[R_k(i)，G_k(i)，B_k(i)]An RGB matrix representing the kth pixel point of the ith frame of image in the training sample set, i is more than or equal to 1 and less than or equal to F, R_k(i)、G_k(i) And B_k(i) Respectively representing the red component value and the green component of the kth pixel point of the ith frame of image in the training sample setA value and a blue component value;

c) respectively solving RGB covariance matrixes of all pixel points of images in a training sample set:

cov (X_{k}) = \frac{1}{F - 1} Σ_{i = 1}^{F} [(X_{k} (i) - {\overset{&OverBar;}{X}}_{k}) \times {(X_{k} (i) - {\overset{&OverBar;}{X}}_{k})}^{T}], k = 1,2, . . ., (M \times N);

wherein, cov (X)_k) Representing an RGB covariance matrix of a k-th pixel point of an image in a training sample set;t is a matrix transposition symbol;

d) respectively determining the background boundary threshold of each pixel point according to the RGB mean matrix and the RGB covariance matrix of each pixel point of the image in the training sample set:

{TH}_{k} = \max [Dis (X_{k} (i), {\overset{&OverBar;}{X}}_{k}) | i &Element; {1,2, . . ., F}], k = 1,2, . . ., (M \times N);

wherein TH is_kRepresenting a background boundary threshold value of a k-th pixel point of the video image;

RGB matrix X for expressing kth pixel point of ith frame image in training sample set_k(i) RGB mean matrix of k-th pixel point of image in training sample set

Mahalanobis distance of (a):

Dis (X_{k} (i), {\overset{&OverBar;}{X}}_{k}) = \sqrt{{(X_{k} (i) - {\overset{&OverBar;}{X}}_{k})}^{T} cov {(X_{k})}^{- 1} (X_{k} (i) - {\overset{&OverBar;}{X}}_{k})};

t is a matrix transposition symbol;

representing the RGB matrix of the k-th pixel point of each frame of image in the training sample set and the RGB mean matrix of the k-th pixel point of the image in the training sample set

Maximum among mahalanobis distances of;

e) for J frames of images serving as background detection objects in the video, J is more than or equal to 1 and less than or equal to L, and Mahalanobis distances between RGB matrixes of pixel points of each frame of image serving as the background detection objects and RGB mean matrixes of pixel points of corresponding sequence numbers of images in the training sample set are respectively obtained:

Dis (X_{k} (j), {\overset{&OverBar;}{X}}_{k}) = \sqrt{{(X_{k} (j) - {\overset{&OverBar;}{X}}_{k})}^{T} cov {(X_{k})}^{- 1} (X_{k} (j) - {\overset{&OverBar;}{X}}_{k})},

k＝1，2，…，(M×N)，j＝1，2，…，J；

wherein,

RGB matrix X for expressing kth pixel point of jth frame image as background detection object_k(j) RGB mean matrix of k-th pixel point of image in training sample set

Mahalanobis distance of; x_k(j)＝[R_k(j)，G_k(j)，B_k(j)]RGB matrix, R, representing k pixel point of j frame image as background detection object_k(j)、G_k(j) And B_k(j) Respectively representing a red component value, a green component value and a blue component value of a kth pixel point of a jth frame image serving as a background detection object; t is a matrix transposition symbol;

f) for J frame image as background detection object in video, if

Judging that the kth pixel point of the jth frame image as a background detection object is a background pixel point; otherwise, judging that the kth pixel point of the jth frame image as a background detection object is a foreground pixel point; therefore, whether each pixel point of the J frame image serving as a background detection object in the video is a background pixel point is detected, and background detection of the J frame image serving as the background detection object is completed.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the background detection method of the video image based on the Mahalanobis distance, RGB component distribution characteristics of the video image pixel points are considered and analyzed, the RGB component distribution characteristics of the video image pixel points are measured by using the Mahalanobis distance algorithm according to the RGB component distribution characteristics of the pixel points in the video image found through research and analysis, the real rugby-ball-shaped RGB component distribution outline of the background pixel points in the video image is obtained, and background detection is performed by combining a threshold value method, so that the accuracy of the background detection is improved.

2. The operation amount of the Mahalanobis distance algorithm adopted in the background detection method of the video image based on the Mahalanobis distance is basically equivalent to that of the algorithm used in the codebook background modeling detection method, so that the background detection method of the invention basically keeps the operation efficiency equivalent to that of the codebook background modeling detection method, and still has good real-time performance and robustness.

3. The background detection method of the video image based on the Mahalanobis distance has less noise points of background detection, the detection result is closer to the actual situation of distinguishing the background and the foreground, and the background detection precision is obviously improved compared with the prior art; even if a small amount of noise exists in the background detection result, the noise is mainly distributed near the foreground pixel point, and the practical accuracy requirements of video image background identification and foreground capture in practical application can be completely met, so that the method is particularly suitable for practical application technologies needing background identification and foreground capture, such as a surveillance video abstract tracking technology, a face identification technology and the like.

Drawings

FIG. 1 is a block diagram of a flow chart of a Mahalanobis distance-based video image background detection method according to the present invention;

fig. 2 is a diagram showing a distribution situation in which RGB components of four different pixel points in an obtained image sample are projected to an RGB three-dimensional coordinate space after a Wallflower video is sampled;

fig. 3 is a comparison graph of detection results obtained by respectively performing background detection on a sampled image in a segment of self-portrait video by using the mahalanobis distance-based video image background detection method, the codebook background modeling detection method and the gaussian mixture background modeling detection method in the laboratory example.

Detailed Description

The invention provides a background detection method of a video image based on Mahalanobis distance, which aims at the problem that the existing background detection method is not high enough in accuracy, considers and analyzes RGB component distribution characteristics of pixels of the video image, adopts background and foreground boundary conditions based on the RGB component distribution characteristics, measures the RGB component distribution characteristics of the pixels of the video image by using the Mahalanobis distance algorithm, realizes more accurate background detection, and simultaneously ensures good real-time property and robustness.

Firstly, the RGB component distribution characteristics of the video image pixel points.

The invention relates to a video image background detection method based on Mahalanobis distance, which considers and analyzes RGB component distribution characteristics of video image pixel points, and measures the RGB component distribution characteristics of the video image pixel points by utilizing the Mahalanobis distance algorithm on the basis of analyzing the common RGB component distribution characteristics of the image pixel points.

In the codebook background modeling detection method, the distinction between the background and the foreground is realized by defining the upper and lower boundaries of the brightness of a background model based on the brightness change of pixels of a video image. The brightness bri (X) of pixel X is calculated as follows:

bri (X) = \sqrt{R^{2} + G^{2} + B^{2}};

r, G, B respectively represents the red component value, the green component value and the blue component value of the pixel point X. Because the brightness calculation is equivalent to defining the distribution profile of RGB three components of the background pixel point in the RGB space as a cylindrical body, the actual RGB component distribution condition of the video image pixel point is not considered, and the pixel points with similar different hues and gray levels are easy to be mistaken as the same pixel point and difficult to distinguish due to the equivalent brightness value, the background detection accuracy of the codebook background modeling detection method is influenced.

In order to improve the accuracy of background detection, the RGB three components of the video image pixel points are taken into consideration as three main components of background detection, so that a large amount of video image data are collected to carry out RGB component distribution statistics so as to know the universal distribution condition of the RGB components of the video image pixel points.

The general distribution of RGB three components of pixels in a video image is described by observing the distribution of RGB three components of pixels in each frame of the video by taking a Wallflower video (see the documents "Toyama K, Krumm J, Brumitt B, Meyers B. Wallflower: Principles and practice of background main. in: Proceeding of the 7th IEEEInternational Conference on Computer Vision. Corfu, Greece: IEEE.1999.255") as an example. After the Wallflower video is sampled, a distribution situation group diagram of projecting RGB components of four different pixel points in an obtained image sample to an RGB three-dimensional coordinate space is shown in fig. 2, the resolution of the Wallflower video image is 120 (lines) × 160 (columns), row-column coordinates of the four pixel points in the video image are (10, 10), (24, 142), (50, 50) and (112, 50), and specific positions of the four pixel points in the Wallflower video image are shown in fig. 2A, 2B, 2C and 2D of fig. 2. Wherein: 1) the pixel point with the row and column coordinates of (10, 10) has RGB component values distributed in a narrow area of the RGB three-dimensional coordinate space in each frame image of the video (as shown in fig. 2E), and approximately obeys the same gaussian distribution on R, G, B three components (as shown in fig. 2I, 2M, and 2Q of fig. 2 respectively), such RGB component distribution appears as a rugby shape in the RGB three-dimensional coordinate space (as shown in fig. 2E of fig. 2); 2) pixel points with row and column coordinates (24, 142) and (112, 50), because the image texture of the positions of the pixel points is complex, and the influence of light change is large, the RGB component values of the two pixel points change within a certain range, but the distribution of R, G, B three components within the change range better obeys gaussian distribution (as shown in 2J, 2L, 2N, 2P, 2R, 2T diagrams of fig. 2), so that the RGB component distributions of the pixel points with row and column coordinates (24, 142) and (112, 50) all appear as rugby shape in the RGB three-dimensional coordinate space (as shown in 2F, 2H diagrams of fig. 2); 3) the pixel point with the row and column coordinates (50, 50) is located at the position of the twill in the middle of the image, so that the pixel value sometimes presents a deeper twill pixel, sometimes presents a shallower sky pixel, and sometimes presents an intermediate state value of the two cases, so that the RGB component distribution presents a multi-modal state (as shown in fig. 2G), and the distribution of the three components R, G, B can be seen to be in a multi-modal distribution (as shown in fig. 2K, 2O, and 2S), but the peak profile as a whole still has a certain gaussian distribution characteristic, so that the RGB component distribution of the pixel point with the row and column coordinates (50, 50) presents a rectangular rugby shape (as shown in fig. 2G of fig. 2) in the RGB three-dimensional coordinate space.

Since R, G, B three components of a pixel point in a video image sequence are approximately in gaussian distribution respectively, the partial outline of the RGB components distributed in the RGB three-dimensional coordinate space is in the shape of football in the RGB three-dimensional coordinate space, which is not only the example of RGB component distribution shown in the above example, but also through a large number of experiments, the distribution characteristics are almost ubiquitous in all pixel points of the video image, and are reflected by the real RGB component distribution characteristics of the pixel points in the video image sequence. The background detection method of the video image based on the Mahalanobis distance distinguishes the background and the foreground of the video image by utilizing the RGB component distribution characteristic of the pixel point of the video image, measures the RGB component distribution characteristic of the pixel point of the video image by utilizing the Mahalanobis distance algorithm, realizes background detection and obtains higher background detection precision.

And secondly, a video image background detection method based on the Mahalanobis distance.

In order to overcome the limitations of the mixed Gaussian background modeling detection method on the independence of three components in RGB space and the limitation of RGB component distribution profile of a video image background pixel point defined by a cylindrical body in a codebook background modeling detection method in the aspect of detection accuracy, the invention considers and analyzes the RGB component distribution characteristics of the video image pixel point, measures the RGB component distribution characteristics of the video image pixel point by using a Mahalanobis distance algorithm according to the RGB component distribution characteristics of the pixel point in the video image discovered by the research and analysis, obtains the real rugby-shaped RGB component distribution profile of the background pixel point in the video image, and performs background detection by combining a threshold method, thereby not only changing the boundary condition of the background and the foreground, realizing more accurate background detection effect than the codebook background modeling detection method, but also basically keeping the operational efficiency equivalent to the codebook background modeling detection method, the method has good real-time performance and robustness, and achieves good effect in a series of experiments.

The background images of different video data are different, the RGB component distribution profile of background pixel points in the video image needs to be obtained through background training, then a background boundary threshold value is determined according to the RGB component distribution characteristics of the video image pixel points, and background detection is carried out by means of the background boundary threshold value of the background training. Therefore, the background detection method of the video image based on the Mahalanobis distance mainly comprises a background training stage and a background detection stage.

The following specifically describes the detection process of the mahalanobis distance-based video image background detection method of the present invention.

The flow chart of the video image background detection method based on the Mahalanobis distance is shown in FIG. 1, and the specific steps are as follows:

A. a background training stage:

step a): extracting F frame background images from the video to serve as a training sample set, wherein F is more than or equal to 80 and less than or equal to L, and L represents the total frame number of the video;

the method comprises the steps of selecting a training sample set from a video to serve as a detection and identification basis for whether each pixel point in a video image is a background pixel point. The images extracted as the training sample set are all background images in the video; the background image described in the invention refers to an image in which each pixel point in a video is displayed as a background scene. The specific identification mode of the background image can be that all pixel points in the video are identified as background pixel points through prior detection, namely the background image is identified, and the background image can also be identified through artificial naked eyes. As for the specific position of the F frame background image extracted as the training sample set in the video, the specific position can be determined according to the actual situation of the background image in the video; in most cases, the F frame images that are continuous at the beginning of the video can be directly regarded as background images and selected as a training sample set, and certainly, the F frame background images can also be selected as the training sample set in the video through prior detection or artificial identification. However, the number F of the background image frames of the training sample set needs to be at least 80 frames, so that the training sample set can be ensured to embody the real RGB component distribution profile of the video image background pixel points; if the number of the background image frames of the training sample set is too small, the RGB component distribution profile of the background pixel points of the video image is difficult to accurately obtain, and the accuracy of background detection is inevitably influenced. Of course, since there is 80 ≦ F ≦ J, that is, the total number of frames of the video for which the method of the present invention is directed also needs to be greater than 80 frames and contains at least 80 background images. Videos smaller than 80 frames are too short, and actual needs of background identification and foreground extraction are not separately carried out; if the background image contained in the video is less than 80 frames, the RGB component distribution profile of the background pixel points in the image is difficult to obtain accurately, and the effect of background detection is affected to a certain extent.

Step b): respectively solving an RGB mean matrix of each pixel point of the image in the training sample set:

{\overset{&OverBar;}{X}}_{k} = \frac{1}{F} Σ_{i = 1}^{F} X_{k} (i), k = 1,2, . . ., (M \times N);

representing an RGB mean matrix of a k-th pixel point of an image in a training sample set; x_k(i)＝[R_k(i)，G_k(i)，B_k(i)]An RGB matrix representing the kth pixel point of the ith frame of image in the training sample set, i is more than or equal to 1 and less than or equal to F, R_k(i)、G_k(i) And B_k(i) Respectively representing a red component value, a green component value and a blue component value of the kth pixel point of the ith frame of image in the training sample set.

The invention relates to a video image background detection method based on Mahalanobis distance, which takes pixel points of a video image as detection identification objects and judges whether the pixel points are background pixel points, thereby realizing the video image background detection, and therefore, the background training also takes the pixel points as training objects. In the step, the value of k is taken from 1 to MXN, so that RGB mean value matrixes of all pixel points of images in a training sample set are respectively obtained, and the method aims to measure the actual RGB component distribution characteristics of each pixel point by taking the RGB mean value matrix of each pixel point as the distribution center of RGB components of the pixel point and by taking the RGB component distribution center of each pixel point, and further determine the boundary conditions of the background and the foreground.

Step c): respectively solving RGB covariance matrixes of all pixel points of images in a training sample set:

cov (X_{k}) = \frac{1}{F - 1} Σ_{i = 1}^{F} [(X_{k} (i) - {\overset{&OverBar;}{X}}_{k}) \times {(X_{k} (i) - {\overset{&OverBar;}{X}}_{k})}^{T}], k = 1,2, . . ., (M \times N);

wherein, cov (X)_k) Watch (A)Showing an RGB covariance matrix of a k-th pixel point of an image in a training sample set; t is a matrix transposition symbol.

In this step, each of the resulting RGB covariance matrices cov (X)_k) Is a data matrix with 3 rows and 3 columns; and taking the value of k from 1 to MXN, and respectively calculating the RGB covariance matrix of each pixel point of the image in the training sample set so as to prepare for determining the background boundary threshold of each pixel point of the video image in the next step.

Step d): respectively determining the background boundary threshold of each pixel point according to the RGB mean matrix and the RGB covariance matrix of each pixel point of the image in the training sample set:

{TH}_{k} = \max [Dis (X_{k} (i), {\overset{&OverBar;}{X}}_{k}) | i &Element; {1,2, . . ., F}], k = 1,2, . . ., (M \times N);

RGB matrix X for expressing kth pixel point of ith frame image in training sample set_k(i) RGB mean matrix of k-th pixel point of image in training sample setMahalanobis distance of (a):

Dis (X_{k} (i), {\overset{&OverBar;}{X}}_{k}) = \sqrt{{(X_{k} (i) - {\overset{&OverBar;}{X}}_{k})}^{T} cov {(X_{k})}^{- 1} (X_{k} (i) - {\overset{&OverBar;}{X}}_{k})};

t is a matrix transposition symbol;representing the RGB matrix of the k-th pixel point of each frame of image in the training sample set and the RGB mean matrix of the k-th pixel point of the image in the training sample set

Is measured in the form of a maximum value among mahalanobis distances.

The Mahalanobis distance between the RGB matrix of the image pixel points in the training sample set and the RGB mean matrix of the pixel points with the corresponding sequence numbers of the images in the training sample set is obtained, the aim is that the Mahalanobis distance algorithm can fully reflect the independence and the relevance among three components of the pixel point R, G, B, each pixel point of the images in the training sample set is a real background pixel point in the video image, the RGB mean matrix of each pixel point is used as the distribution center of RGB components of the pixel point, therefore, the mahalanobis distance between the RGB matrix of the image pixel points in the training sample set and the RGB mean matrix of the pixel points of the corresponding sequence numbers of the images in the training sample set well reflects the distribution difference between the RGB component distribution characteristics of the background pixel points in the video image and the RGB component distribution centers of the pixel points of the corresponding sequence numbers, and is equivalent to measure the RGB component distribution characteristics of the background pixel points in the video image by using the mahalanobis distance algorithm. For any k-th pixel point in the video image, the mahalanobis distance

The larger the value of (A) is, the RGB matrix X of the kth pixel point of the ith frame image in the training sample set is indicated_k(i) The RGB component distribution center of the kth pixel point of the image

The larger the distribution difference is, the more the RGB matrix of the kth pixel point of each frame of image in the training sample set and the RGB mean matrix of the kth pixel point of the image in the training sample set are taken

Maximum value among mahalanobis distances of

As a background boundary threshold value of a k-th pixel point of the video image, the RGB component distribution boundary of the background image in the video at the k-th pixel point position is truly embodied. Therefore, in the step, the value of k is taken from 1 to MXN, and the background boundary threshold of each pixel point of the video image is respectively obtained to determine the RGB component distribution profile of the background image at each pixel point position in the video.

At this point, the processing steps of the background training phase are completed. Next, a background detection stage is performed for each frame image in the video that is the object of background detection.

B. And (3) background detection stage:

step e): for J frames of images serving as background detection objects in the video, J is more than or equal to 1 and less than or equal to L, and Mahalanobis distances between RGB matrixes of pixel points of each frame of image serving as the background detection objects and RGB mean matrixes of pixel points of corresponding sequence numbers of images in the training sample set are respectively obtained:

Dis (X_{k} (j), {\overset{&OverBar;}{X}}_{k}) = \sqrt{{(X_{k} (j) - {\overset{&OverBar;}{X}}_{k})}^{T} cov {(X_{k})}^{- 1} (X_{k} (j) - {\overset{&OverBar;}{X}}_{k})};

k＝1，2，…，(M×N)，j＝1，2，…，J；

wherein,

RGB matrix X for expressing kth pixel point of jth frame image as background detection object_k(j) RGB mean matrix of k-th pixel point of image in training sample setMahalanobis distance of; x_k(j)＝[R_k(j)，G_k(j)，B_k(j)]RGB matrix, R, representing k pixel point of j frame image as background detection object_k(j)、G_k(j) And B_k(j) Respectively representing a red component value, a green component value and a blue component value of a kth pixel point of a jth frame image serving as a background detection object; t is a matrix transposition symbol.

The number of image frames J to be background-detected is determined entirely by the actual need to perform background detection, and the image to be background-detected may be an arbitrary one-frame image in the video (corresponding to J being 1), may be a continuous or discrete multi-frame image in the video (corresponding to 1 < J < L), or may even have all the frame images of the video as background-detected objects (corresponding to J being L).

In the step, J is taken from 1 to J to ensure that the operation is performed on each frame image serving as a background detection object, and for each value of J, k is taken from 1 to MXN, so that the Mahalanobis distance between the RGB matrix of each pixel point of each frame image serving as the background detection object and the RGB mean matrix of the corresponding sequence number pixel points of the images in the training sample set is respectively obtained. For any k-th pixel point of any ith frame image of the background detection object, the Mahalanobis distance

The larger the value of (A) is, the RGB matrix X of the kth pixel point of the jth frame image as a background detection object is indicated_k(j) The RGB component distribution center of the kth pixel point of the image

The larger the distribution difference is, the more the RGB component distribution difference is measured whether the RGB component distribution difference exceeds the RGB component distribution boundary of the background pixel point.

Step f): for J frame image as background detection object in video, if

Judging that the kth pixel point of the jth frame image as a background detection object is a background pixel point; otherwise, judging that the kth pixel point of the jth frame image as a background detection object is a foreground pixel point; thereby detecting each pixel point of J frame image as background detection object in videoAnd if the pixel points are background pixel points, finishing the background detection of the J frame image serving as the background detection object.

In the step, for any k-TH pixel point of any j-TH frame image as a background detection object, a background boundary threshold value TH of the k-TH pixel point of the image is used_kAs boundary conditions of background and foreground, if

Indicating that the RGB component distribution difference between the kth pixel point of the jth frame image as the background detection object and the corresponding actual background pixel point does not exceed the range of the background boundary condition, and therefore judging that the kth pixel point of the jth frame image as the background detection object is the background pixel point; if it is not

Then, it indicates that the RGB component distribution difference between the kth pixel point of the jth frame image as the background detection object and the actual background pixel point corresponding thereto has exceeded the range of the background boundary condition and has satisfied the foreground identification condition, so that the kth pixel point of the jth frame image as the background detection object is determined to be the foreground pixel point. Therefore, the background detection of the J frame image as the background detection object can be completed through the background/foreground detection of each pixel point of each frame image as the background detection object.

As can be seen from the steps of the background detection method, the operation amount of the Mahalanobis distance algorithm is basically equivalent to that of the algorithm used in the codebook background modeling detection method, so that the background detection method basically keeps the operation efficiency equivalent to that of the codebook background modeling detection method, and still has good real-time performance and robustness. In different applications, the number and the sequence of the image frames required to be subjected to background detection in the video are different, but for any frame image serving as a background image in the video, the background detection can be completed according to the steps. For example, if the background detection is to be performed on 21 to 50 frames of images in the video, in steps e) to f) of the background detection stage of the method of the present invention, the 21 st to 50 th frames of images in the video are used as the background detection object to perform the background detection operation; if the background detection is to be performed on all the images of the video, in steps e) to f) of the background detection stage of the method of the present invention, the background detection operation is performed using 1 to L frame images of the video as the background detection object, where L represents the total frame number of the video.

If a plurality of video clips with different background images exist in the same video, a new background image in the video clip can be used as a new training sample set to perform background training again (namely updating the training sample aiming at the new background) according to the steps a) to d) of the background training stage of the method of the invention, and then the video clip where the new background image is located is subjected to background detection according to the steps e) to f) of the background detection stage of the method of the invention; thereby performing background detection on video clips with different background images respectively. The application method expands the application range of the background detection method to a certain extent.

Third, experimental example.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

In order to verify the actual effect of the Mahalanobis distance-based video image background detection method, the method is realized under a Matlab 2010b programming tool, and the experimental hardware environment is an Intel Pentium (R)4 processor with a main frequency of 3.0GHz and a memory of 2.0 GB. In order to embody the advantages of the method under the background condition with complex textures, a sampling image in a self-shooting video is selected as a background detection object, the image resolution is 120 (rows) × 160 (columns), the background image of the self-shooting video comprises background objects such as sky, trees and floors with complex textures, partial buildings and the like, a 2800 multi-frame image of the self-shooting video comprises 386 frames of background images, and the rest images have moving character foregrounds in front of the background. In the experiment, the background detection of the sampled image in the self-shooting video is performed by using the mahalanobis distance-based video image background detection method (hereinafter, referred to as the present invention method), the codebook background modeling detection method (hereinafter, referred to as the codebook method) and the gaussian mixture background modeling detection method (hereinafter, referred to as the MOG method), respectively, 80 frames of background images in the Wallflower video are selected as training sample sets, the parameter n in the present invention method is 3, the control parameters α is 0.4, β is 1.5, the background boundary radius ξ is 100 (see the literature, "Kim, k., et al," Real-time for feedback and estimation using. Real-time acquisition, 2005.11(3 p.172-185:, "and so on), the gaussian mixture parameter α is 3, and the learning rate α is 0.005, c.r., et al.pfinder: real-time tracking, IEEE Transactions on Pattern Analysis and Machine Analysis, 1997.19 (7): p.780-785. "), setting the RGB component values of the detected background pixels to be 0, 0, 0 (black), and the RGB component values of the foreground pixels to be 255, 255, 255 (white), so as to distinguish, and the background detection result is as shown in fig. 3. In fig. 3, a picture 3A is a sampled image original image in the self-shooting video, and two characters in the sampled image original image are foreground; the 3B picture is an actual background and foreground distinguishing contrast picture, the 3C picture is a background detection distinguishing contrast picture of the method, the 3D picture is a background detection distinguishing contrast picture of the codebook method, and the 3E picture is a background detection distinguishing contrast picture of the MOG method. As can be seen from comparison of the 3B, 3C, 3D, and 3E graphs in fig. 3, compared with the actual background and foreground distinguishing and comparing graphs, the noise points of the MOG method and the codebook method for background detection are relatively more (the noise points are background pixel points which are falsely detected as the foreground or foreground pixel points which are falsely detected as the background), and the noise points are distributed in a messy manner, and the messy noise point distribution in the actual application easily affects the accuracy of background identification and foreground capture; compared with an MOG method and a codebook method, the background detection method has fewer noise points, the detection result is closer to the actual situation of distinguishing the background and the foreground, the background detection precision is obviously improved compared with the prior art, and if the number of frames of the image extracted as the training sample set is more, the detection accuracy can be higher and the noise points are less; in the background detection result shown in fig. 3C obtained by the method of the present invention, although some noise points still exist, the noise points are mainly distributed near the foreground pixel points, which completely meets the practical accuracy requirements of video image background recognition and foreground capture in practical applications, and is particularly suitable for practical application technologies requiring background recognition and foreground capture, such as surveillance video abstract tracking technology, face recognition technology, etc.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. The Mahalanobis distance-based video image background detection method is characterized by comprising the following steps of:

{\overset{&OverBar;}{X}}_{k} = \frac{1}{F} Σ_{i = 1}^{F} X_{k} (i), k = 1,2, . . ., (M \times N);

representing an RGB mean matrix of a k-th pixel point of an image in a training sample set; x_k(i)＝[R_k(i)，G_k(i)，B_k(i)]An RGB matrix representing the kth pixel point of the ith frame of image in the training sample set, i is more than or equal to 1 and less than or equal to F, R_k(i)、G_k(i) And B_k(i) Respectively representing a red component value, a green component value and a blue component value of a kth pixel point of an ith frame of image in a training sample set;

cov (X_{k}) = \frac{1}{F - 1} Σ_{i = 1}^{F} [(X_{k} (i) - {\overset{&OverBar;}{X}}_{k}) \times {(X_{k} (i) - {\overset{&OverBar;}{X}}_{k})}^{T}], k = 1,2, . . ., (M \times N);

wherein, cov (X)_k) Representing an RGB covariance matrix of a k-th pixel point of an image in a training sample set; t is a matrix transposition symbol;

{TH}_{k} = \max [Dis (X_{k} (i), {\overset{&OverBar;}{X}}_{k}) | i &Element; {1,2, . . ., F}], k = 1,2, . . ., (M \times N);

Mahalanobis distance of (a):

Dis (X_{k} (i), {\overset{&OverBar;}{X}}_{k}) = \sqrt{{(X_{k} (i) - {\overset{&OverBar;}{X}}_{k})}^{T} cov {(X_{k})}^{- 1} (X_{k} (i) - {\overset{&OverBar;}{X}}_{k})};

t is a matrix transposition symbol;

Maximum among mahalanobis distances of;

Dis (X_{k} (j), {\overset{&OverBar;}{X}}_{k}) = \sqrt{{(X_{k} (j) - {\overset{&OverBar;}{X}}_{k})}^{T} cov {(X_{k})}^{- 1} (X_{k} (j) - {\overset{&OverBar;}{X}}_{k})},

k＝1，2，…，(M×N)，j＝1，2，…，J；

wherein,

Mahalanobis distance of; x_k(j)＝[R_k(j)，G_k(j)，B_k(j)]RGB matrix, R, representing k pixel point of j frame image as background detection object_k(j)、G_k(j) And B_k(j) Are respectively shown asThe red component value, the green component value and the blue component value of the kth pixel point of the jth frame image of the background detection object; t is a matrix transposition symbol;

f) for J frame image as background detection object in video, if