CN102340620B - Mahalanobis-distance-based video image background detection method - Google Patents

Mahalanobis-distance-based video image background detection method Download PDF

Info

Publication number
CN102340620B
CN102340620B CN 201110328046 CN201110328046A CN102340620B CN 102340620 B CN102340620 B CN 102340620B CN 201110328046 CN201110328046 CN 201110328046 CN 201110328046 A CN201110328046 A CN 201110328046A CN 102340620 B CN102340620 B CN 102340620B
Authority
CN
China
Prior art keywords
background
image
pixel point
rgb
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110328046
Other languages
Chinese (zh)
Other versions
CN102340620A (en
Inventor
杨梦宁
洪明坚
徐玲
张小洪
杨丹
霍东海
葛永新
陈远
胡海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN 201110328046 priority Critical patent/CN102340620B/en
Publication of CN102340620A publication Critical patent/CN102340620A/en
Application granted granted Critical
Publication of CN102340620B publication Critical patent/CN102340620B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a Mahalanobis-distance-based video image background detection method. Red, green and blue (RGB) component distribution characteristics of pixels in a video image are taken into account and analyzed, and are measured according to the RGB component distribution characteristics, discovered by researches and analysis, of the pixels in the video image by utilizing a Mahalanobis distance algorithm to acquire a real football-shaped RGB component distribution outline of background pixels in the video image, and background detection is performed in combination with a threshold method, so background detection accuracy is improved; even though a small number of noises exist in a background detection result, the noises are mainly distributed in the vicinity of foreground pixels, so the requirements of video image background identification and foreground capturing on practicability and accuracy in actual application can be completely met; and simultaneously, operation efficiency equivalent to that of a codebook background modeling detection method is substantially kept by the method, and the method is highly real-time and robust.

Description

Video image background detection method based on Mahalanobis distance
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a Mahalanobis distance-based video image background detection method.
Background
With the wide application of video monitoring cameras in the field of intelligent monitoring and the rapid development of intelligent video analysis technology, the video abstraction technology gradually enters the eye curtains of people. The video abstraction technology is a technology for compressing mass video data for a long time into a controllable time period, and is convenient for people to browse videos. In the video summarization technology, an efficient background detection method is required to be used for detecting the background of a video image, and then a foreground moving object in a video is captured as an object to be tracked by the video summarization, so that the background detection technology becomes a research hotspot in the video summarization technology. Meanwhile, in other video image processing technical fields, such as the technical field of face recognition, the technical field of video compression processing, and the like, the background detection technology is also needed to distinguish the background from the foreground. Therefore, the background detection technology has an important position in video image processing technologies of various applications, and is a mainstream research direction in the field of image processing technology at present. Background detection methods widely used at present mainly include a background difference method, a nuclear density estimation detection method, a mixed Gaussian background modeling detection method and a codebook background modeling detection method.
Heikkila et al in the literature "Heikkila, J.and O.silver.A real-time system for monitoring of cyclists and polypeptides.in: collins, Colorado: IEEE, 1999.74-81 "and the literature" picchardi, m.background subset technologies: in: IEEEInternational Conference on Systems, Man and Cybernetics 2004, The Hague, Netherlands: the background difference method is proposed in IEEE 2004.3099-3104 vol.4 ", and the algorithm uses a background subtraction method, that is, a background image which is given in advance is subtracted from a video image sequence to be processed, and then a binarization method is applied to distinguish a moving foreground. The algorithm has the advantages of easy implementation, low algorithm complexity, almost no consumption of computing resources, and rapid acquisition of the motion foreground, and is further applied to a real-time video abstract generation system. However, the algorithm needs to give a complete background image in advance, and the complete background image is not easy to obtain, depends on external input, and cannot be updated over time, so that a large error occurs in the work at the later stage of video processing.
Elgamma et al, in the literature "Picchardi, M.Background and subset technologies: in: IEEE International Conference on Systems, Man and Cybernetics 2004, The Hague, Netherlands: IEEE 2004.3099-3104 vol.4 "and the literature" Elgammal, a., d.hartwood, and l.davis, Non-parametric model for background Vision, computer Vision ECCV 2000, 2000: p.751-767 "proposes a nonparametric background modeling method, which estimates the probability density of background pixel values in a time sequence, and estimates the probability that the pixel values belong to the foreground or the background through a window with a set length and a window function, thereby determining whether each pixel in an image is a background pixel. The algorithm has the advantages of being convenient to adapt to the addition of new training samples and providing convenience for online learning of density estimation. However, the algorithm is too complex to be applied in a real-time motion detection system, and is not robust under dynamic background and light abrupt change conditions.
Wren et al, in "Wren, c.r., et al, pfinder: real-time tracking of the human body IEEEtransactions on Pattern Analysis and Machine understanding, 1997.19 (7): p.780-785, a single Gaussian model is used for modeling the background, the limitation that the background needs to be input externally is overcome, and the detection effect is good in indoor and other single-peak environments, but in complex multi-peak environments such as a fluctuating lake surface and swinging leaves, the model is difficult to be used for accurately modeling the background environment. In order to solve these problems, Stauffer et al propose a mixed gaussian background modeling detection method (abbreviated as MOG method), which considers the temporal continuity of the pixel points, assumes that the distribution of the background pixels on the time series is a mixed gaussian model, assumes that the three components of the RGB space are mutually independent, and gives the distribution characteristics of the background pixels, i.e., the background pixels present a spherical distribution in the RGB space. However, the three components in the RGB space are not independent from each other, so the MOG method is not very accurate in describing the distribution characteristics of the background pixels, resulting in an increase in the detection error of the moving foreground.
Kim et al in "Kim, K., et al," Real-time for background segmentation using codebook model. Real-time imaging, 2005.11 (3): p.172-185 "and the document" Chalidabhongse, T.H., et al.A. circulation method for evaluating background analysis algorithms in: joint IEEEInternational work on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.2003.Nice, France: citeseer proposes a structured codebook background modeling detection method, and obtains a better image background detection effect in a multimodal environment. Wu et al, in "Wu, M.and X.Peng, spread-temporal context for coded-based dynamic background section, AEU-International Journal of Electronics and Communications, 2010.64 (8): p.739-747 "and Qiu et al in the literature" Tu, Q., Y.xu, and M.Zhou.Box-based codebook model for real-time object detection. in: 7th World consistency on Intelligent Control and Automation. Chongqing, China: IEEE 2008.7621-7625 "partially improves the codebook background modeling detection method proposed by Kim, etc., and further improves the accuracy of background detection to a certain extent. A codebook background modeling detection method is based on brightness change of pixels of a video image, and realizes the distinguishing of a background and a foreground by defining the upper and lower boundaries of the brightness of a background model, thereby reducing the influence of global and local illumination change on the distinguishing of the background, having better background detection effect compared with an MOG method, compressing the background model by operation on the basis of not influencing the foreground detection effect, greatly reducing the demand and the calculated amount on an internal memory, and having better processing effect and operation efficiency than the three background detection methods. However, the codebook background modeling detection method is proposed based on luminance statistical observation of the video image pixel points, and the RGB component distribution of the video image pixel points is not considered, so that the distinction of the image background and the image foreground is not accurate enough in many cases, which causes more noise in background detection and foreground capture.
Disclosure of Invention
Aiming at the problems in the prior art, RGB component distribution characteristics of video image pixel points are considered and analyzed, and the RGB component distribution characteristics of the video image pixel points are measured by using a Mahalanobis distance algorithm, so that the video image background detection method based on the Mahalanobis distance is higher in background detection accuracy.
In order to achieve the purpose, the invention adopts the following technical means:
the method for detecting the background of the video image based on the Mahalanobis distance comprises the following steps:
a) extracting F frame background images from the video to serve as a training sample set, wherein F is more than or equal to 80 and less than or equal to L, and L represents the total frame number of the video;
b) respectively solving an RGB mean matrix of each pixel point of the image in the training sample set:
X ‾ k = 1 F Σ i = 1 F X k ( i ) , k = 1,2 , . . . , ( M × N ) ;
wherein k represents the serial number of a pixel point in an image of the training sample set, and M multiplied by N represents the resolution of a video image;
Figure BDA0000102056920000032
representing an RGB mean matrix of a k-th pixel point of an image in a training sample set; xk(i)=[Rk(i),Gk(i),Bk(i)]An RGB matrix representing the kth pixel point of the ith frame of image in the training sample set, i is more than or equal to 1 and less than or equal to F, Rk(i)、Gk(i) And Bk(i) Respectively representing the red component value and the green component of the kth pixel point of the ith frame of image in the training sample setA value and a blue component value;
c) respectively solving RGB covariance matrixes of all pixel points of images in a training sample set:
cov ( X k ) = 1 F - 1 Σ i = 1 F [ ( X k ( i ) - X ‾ k ) × ( X k ( i ) - X ‾ k ) T ] , k = 1,2 , . . . , ( M × N ) ;
wherein, cov (X)k) Representing an RGB covariance matrix of a k-th pixel point of an image in a training sample set;t is a matrix transposition symbol;
d) respectively determining the background boundary threshold of each pixel point according to the RGB mean matrix and the RGB covariance matrix of each pixel point of the image in the training sample set:
TH k = max [ Dis ( X k ( i ) , X ‾ k ) | i ∈ { 1,2 , . . . , F } ] , k = 1,2 , . . . , ( M × N ) ;
wherein TH iskRepresenting a background boundary threshold value of a k-th pixel point of the video image;
Figure BDA0000102056920000035
RGB matrix X for expressing kth pixel point of ith frame image in training sample setk(i) RGB mean matrix of k-th pixel point of image in training sample set
Figure BDA0000102056920000036
Mahalanobis distance of (a):
Dis ( X k ( i ) , X ‾ k ) = ( X k ( i ) - X ‾ k ) T cov ( X k ) - 1 ( X k ( i ) - X ‾ k ) ;
t is a matrix transposition symbol;
Figure BDA0000102056920000041
representing the RGB matrix of the k-th pixel point of each frame of image in the training sample set and the RGB mean matrix of the k-th pixel point of the image in the training sample set
Figure BDA0000102056920000042
Maximum among mahalanobis distances of;
e) for J frames of images serving as background detection objects in the video, J is more than or equal to 1 and less than or equal to L, and Mahalanobis distances between RGB matrixes of pixel points of each frame of image serving as the background detection objects and RGB mean matrixes of pixel points of corresponding sequence numbers of images in the training sample set are respectively obtained:
Dis ( X k ( j ) , X ‾ k ) = ( X k ( j ) - X ‾ k ) T cov ( X k ) - 1 ( X k ( j ) - X ‾ k ) ,
k=1,2,…,(M×N),j=1,2,…,J;
wherein,
Figure BDA0000102056920000044
RGB matrix X for expressing kth pixel point of jth frame image as background detection objectk(j) RGB mean matrix of k-th pixel point of image in training sample set
Figure BDA0000102056920000045
Mahalanobis distance of; xk(j)=[Rk(j),Gk(j),Bk(j)]RGB matrix, R, representing k pixel point of j frame image as background detection objectk(j)、Gk(j) And Bk(j) Respectively representing a red component value, a green component value and a blue component value of a kth pixel point of a jth frame image serving as a background detection object; t is a matrix transposition symbol;
f) for J frame image as background detection object in video, if
Figure BDA0000102056920000046
Judging that the kth pixel point of the jth frame image as a background detection object is a background pixel point; otherwise, judging that the kth pixel point of the jth frame image as a background detection object is a foreground pixel point; therefore, whether each pixel point of the J frame image serving as a background detection object in the video is a background pixel point is detected, and background detection of the J frame image serving as the background detection object is completed.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the background detection method of the video image based on the Mahalanobis distance, RGB component distribution characteristics of the video image pixel points are considered and analyzed, the RGB component distribution characteristics of the video image pixel points are measured by using the Mahalanobis distance algorithm according to the RGB component distribution characteristics of the pixel points in the video image found through research and analysis, the real rugby-ball-shaped RGB component distribution outline of the background pixel points in the video image is obtained, and background detection is performed by combining a threshold value method, so that the accuracy of the background detection is improved.
2. The operation amount of the Mahalanobis distance algorithm adopted in the background detection method of the video image based on the Mahalanobis distance is basically equivalent to that of the algorithm used in the codebook background modeling detection method, so that the background detection method of the invention basically keeps the operation efficiency equivalent to that of the codebook background modeling detection method, and still has good real-time performance and robustness.
3. The background detection method of the video image based on the Mahalanobis distance has less noise points of background detection, the detection result is closer to the actual situation of distinguishing the background and the foreground, and the background detection precision is obviously improved compared with the prior art; even if a small amount of noise exists in the background detection result, the noise is mainly distributed near the foreground pixel point, and the practical accuracy requirements of video image background identification and foreground capture in practical application can be completely met, so that the method is particularly suitable for practical application technologies needing background identification and foreground capture, such as a surveillance video abstract tracking technology, a face identification technology and the like.
Drawings
FIG. 1 is a block diagram of a flow chart of a Mahalanobis distance-based video image background detection method according to the present invention;
fig. 2 is a diagram showing a distribution situation in which RGB components of four different pixel points in an obtained image sample are projected to an RGB three-dimensional coordinate space after a Wallflower video is sampled;
fig. 3 is a comparison graph of detection results obtained by respectively performing background detection on a sampled image in a segment of self-portrait video by using the mahalanobis distance-based video image background detection method, the codebook background modeling detection method and the gaussian mixture background modeling detection method in the laboratory example.
Detailed Description
The invention provides a background detection method of a video image based on Mahalanobis distance, which aims at the problem that the existing background detection method is not high enough in accuracy, considers and analyzes RGB component distribution characteristics of pixels of the video image, adopts background and foreground boundary conditions based on the RGB component distribution characteristics, measures the RGB component distribution characteristics of the pixels of the video image by using the Mahalanobis distance algorithm, realizes more accurate background detection, and simultaneously ensures good real-time property and robustness.
Firstly, the RGB component distribution characteristics of the video image pixel points.
The invention relates to a video image background detection method based on Mahalanobis distance, which considers and analyzes RGB component distribution characteristics of video image pixel points, and measures the RGB component distribution characteristics of the video image pixel points by utilizing the Mahalanobis distance algorithm on the basis of analyzing the common RGB component distribution characteristics of the image pixel points.
In the codebook background modeling detection method, the distinction between the background and the foreground is realized by defining the upper and lower boundaries of the brightness of a background model based on the brightness change of pixels of a video image. The brightness bri (X) of pixel X is calculated as follows:
bri ( X ) = R 2 + G 2 + B 2 ;
r, G, B respectively represents the red component value, the green component value and the blue component value of the pixel point X. Because the brightness calculation is equivalent to defining the distribution profile of RGB three components of the background pixel point in the RGB space as a cylindrical body, the actual RGB component distribution condition of the video image pixel point is not considered, and the pixel points with similar different hues and gray levels are easy to be mistaken as the same pixel point and difficult to distinguish due to the equivalent brightness value, the background detection accuracy of the codebook background modeling detection method is influenced.
In order to improve the accuracy of background detection, the RGB three components of the video image pixel points are taken into consideration as three main components of background detection, so that a large amount of video image data are collected to carry out RGB component distribution statistics so as to know the universal distribution condition of the RGB components of the video image pixel points.
The general distribution of RGB three components of pixels in a video image is described by observing the distribution of RGB three components of pixels in each frame of the video by taking a Wallflower video (see the documents "Toyama K, Krumm J, Brumitt B, Meyers B. Wallflower: Principles and practice of background main. in: Proceeding of the 7th IEEEInternational Conference on Computer Vision. Corfu, Greece: IEEE.1999.255") as an example. After the Wallflower video is sampled, a distribution situation group diagram of projecting RGB components of four different pixel points in an obtained image sample to an RGB three-dimensional coordinate space is shown in fig. 2, the resolution of the Wallflower video image is 120 (lines) × 160 (columns), row-column coordinates of the four pixel points in the video image are (10, 10), (24, 142), (50, 50) and (112, 50), and specific positions of the four pixel points in the Wallflower video image are shown in fig. 2A, 2B, 2C and 2D of fig. 2. Wherein: 1) the pixel point with the row and column coordinates of (10, 10) has RGB component values distributed in a narrow area of the RGB three-dimensional coordinate space in each frame image of the video (as shown in fig. 2E), and approximately obeys the same gaussian distribution on R, G, B three components (as shown in fig. 2I, 2M, and 2Q of fig. 2 respectively), such RGB component distribution appears as a rugby shape in the RGB three-dimensional coordinate space (as shown in fig. 2E of fig. 2); 2) pixel points with row and column coordinates (24, 142) and (112, 50), because the image texture of the positions of the pixel points is complex, and the influence of light change is large, the RGB component values of the two pixel points change within a certain range, but the distribution of R, G, B three components within the change range better obeys gaussian distribution (as shown in 2J, 2L, 2N, 2P, 2R, 2T diagrams of fig. 2), so that the RGB component distributions of the pixel points with row and column coordinates (24, 142) and (112, 50) all appear as rugby shape in the RGB three-dimensional coordinate space (as shown in 2F, 2H diagrams of fig. 2); 3) the pixel point with the row and column coordinates (50, 50) is located at the position of the twill in the middle of the image, so that the pixel value sometimes presents a deeper twill pixel, sometimes presents a shallower sky pixel, and sometimes presents an intermediate state value of the two cases, so that the RGB component distribution presents a multi-modal state (as shown in fig. 2G), and the distribution of the three components R, G, B can be seen to be in a multi-modal distribution (as shown in fig. 2K, 2O, and 2S), but the peak profile as a whole still has a certain gaussian distribution characteristic, so that the RGB component distribution of the pixel point with the row and column coordinates (50, 50) presents a rectangular rugby shape (as shown in fig. 2G of fig. 2) in the RGB three-dimensional coordinate space.
Since R, G, B three components of a pixel point in a video image sequence are approximately in gaussian distribution respectively, the partial outline of the RGB components distributed in the RGB three-dimensional coordinate space is in the shape of football in the RGB three-dimensional coordinate space, which is not only the example of RGB component distribution shown in the above example, but also through a large number of experiments, the distribution characteristics are almost ubiquitous in all pixel points of the video image, and are reflected by the real RGB component distribution characteristics of the pixel points in the video image sequence. The background detection method of the video image based on the Mahalanobis distance distinguishes the background and the foreground of the video image by utilizing the RGB component distribution characteristic of the pixel point of the video image, measures the RGB component distribution characteristic of the pixel point of the video image by utilizing the Mahalanobis distance algorithm, realizes background detection and obtains higher background detection precision.
And secondly, a video image background detection method based on the Mahalanobis distance.
In order to overcome the limitations of the mixed Gaussian background modeling detection method on the independence of three components in RGB space and the limitation of RGB component distribution profile of a video image background pixel point defined by a cylindrical body in a codebook background modeling detection method in the aspect of detection accuracy, the invention considers and analyzes the RGB component distribution characteristics of the video image pixel point, measures the RGB component distribution characteristics of the video image pixel point by using a Mahalanobis distance algorithm according to the RGB component distribution characteristics of the pixel point in the video image discovered by the research and analysis, obtains the real rugby-shaped RGB component distribution profile of the background pixel point in the video image, and performs background detection by combining a threshold method, thereby not only changing the boundary condition of the background and the foreground, realizing more accurate background detection effect than the codebook background modeling detection method, but also basically keeping the operational efficiency equivalent to the codebook background modeling detection method, the method has good real-time performance and robustness, and achieves good effect in a series of experiments.
The background images of different video data are different, the RGB component distribution profile of background pixel points in the video image needs to be obtained through background training, then a background boundary threshold value is determined according to the RGB component distribution characteristics of the video image pixel points, and background detection is carried out by means of the background boundary threshold value of the background training. Therefore, the background detection method of the video image based on the Mahalanobis distance mainly comprises a background training stage and a background detection stage.
The following specifically describes the detection process of the mahalanobis distance-based video image background detection method of the present invention.
The flow chart of the video image background detection method based on the Mahalanobis distance is shown in FIG. 1, and the specific steps are as follows:
A. a background training stage:
step a): extracting F frame background images from the video to serve as a training sample set, wherein F is more than or equal to 80 and less than or equal to L, and L represents the total frame number of the video;
the method comprises the steps of selecting a training sample set from a video to serve as a detection and identification basis for whether each pixel point in a video image is a background pixel point. The images extracted as the training sample set are all background images in the video; the background image described in the invention refers to an image in which each pixel point in a video is displayed as a background scene. The specific identification mode of the background image can be that all pixel points in the video are identified as background pixel points through prior detection, namely the background image is identified, and the background image can also be identified through artificial naked eyes. As for the specific position of the F frame background image extracted as the training sample set in the video, the specific position can be determined according to the actual situation of the background image in the video; in most cases, the F frame images that are continuous at the beginning of the video can be directly regarded as background images and selected as a training sample set, and certainly, the F frame background images can also be selected as the training sample set in the video through prior detection or artificial identification. However, the number F of the background image frames of the training sample set needs to be at least 80 frames, so that the training sample set can be ensured to embody the real RGB component distribution profile of the video image background pixel points; if the number of the background image frames of the training sample set is too small, the RGB component distribution profile of the background pixel points of the video image is difficult to accurately obtain, and the accuracy of background detection is inevitably influenced. Of course, since there is 80 ≦ F ≦ J, that is, the total number of frames of the video for which the method of the present invention is directed also needs to be greater than 80 frames and contains at least 80 background images. Videos smaller than 80 frames are too short, and actual needs of background identification and foreground extraction are not separately carried out; if the background image contained in the video is less than 80 frames, the RGB component distribution profile of the background pixel points in the image is difficult to obtain accurately, and the effect of background detection is affected to a certain extent.
Step b): respectively solving an RGB mean matrix of each pixel point of the image in the training sample set:
X ‾ k = 1 F Σ i = 1 F X k ( i ) , k = 1,2 , . . . , ( M × N ) ;
wherein k represents the serial number of a pixel point in an image of the training sample set, and M multiplied by N represents the resolution of a video image;
Figure BDA0000102056920000082
representing an RGB mean matrix of a k-th pixel point of an image in a training sample set; xk(i)=[Rk(i),Gk(i),Bk(i)]An RGB matrix representing the kth pixel point of the ith frame of image in the training sample set, i is more than or equal to 1 and less than or equal to F, Rk(i)、Gk(i) And Bk(i) Respectively representing a red component value, a green component value and a blue component value of the kth pixel point of the ith frame of image in the training sample set.
The invention relates to a video image background detection method based on Mahalanobis distance, which takes pixel points of a video image as detection identification objects and judges whether the pixel points are background pixel points, thereby realizing the video image background detection, and therefore, the background training also takes the pixel points as training objects. In the step, the value of k is taken from 1 to MXN, so that RGB mean value matrixes of all pixel points of images in a training sample set are respectively obtained, and the method aims to measure the actual RGB component distribution characteristics of each pixel point by taking the RGB mean value matrix of each pixel point as the distribution center of RGB components of the pixel point and by taking the RGB component distribution center of each pixel point, and further determine the boundary conditions of the background and the foreground.
Step c): respectively solving RGB covariance matrixes of all pixel points of images in a training sample set:
cov ( X k ) = 1 F - 1 Σ i = 1 F [ ( X k ( i ) - X ‾ k ) × ( X k ( i ) - X ‾ k ) T ] , k = 1,2 , . . . , ( M × N ) ;
wherein, cov (X)k) Watch (A)Showing an RGB covariance matrix of a k-th pixel point of an image in a training sample set; t is a matrix transposition symbol.
In this step, each of the resulting RGB covariance matrices cov (X)k) Is a data matrix with 3 rows and 3 columns; and taking the value of k from 1 to MXN, and respectively calculating the RGB covariance matrix of each pixel point of the image in the training sample set so as to prepare for determining the background boundary threshold of each pixel point of the video image in the next step.
Step d): respectively determining the background boundary threshold of each pixel point according to the RGB mean matrix and the RGB covariance matrix of each pixel point of the image in the training sample set:
TH k = max [ Dis ( X k ( i ) , X ‾ k ) | i ∈ { 1,2 , . . . , F } ] , k = 1,2 , . . . , ( M × N ) ;
wherein TH iskRepresenting a background boundary threshold value of a k-th pixel point of the video image;
Figure BDA0000102056920000085
RGB matrix X for expressing kth pixel point of ith frame image in training sample setk(i) RGB mean matrix of k-th pixel point of image in training sample setMahalanobis distance of (a):
Dis ( X k ( i ) , X ‾ k ) = ( X k ( i ) - X ‾ k ) T cov ( X k ) - 1 ( X k ( i ) - X ‾ k ) ;
t is a matrix transposition symbol;representing the RGB matrix of the k-th pixel point of each frame of image in the training sample set and the RGB mean matrix of the k-th pixel point of the image in the training sample set
Figure BDA0000102056920000094
Is measured in the form of a maximum value among mahalanobis distances.
The Mahalanobis distance between the RGB matrix of the image pixel points in the training sample set and the RGB mean matrix of the pixel points with the corresponding sequence numbers of the images in the training sample set is obtained, the aim is that the Mahalanobis distance algorithm can fully reflect the independence and the relevance among three components of the pixel point R, G, B, each pixel point of the images in the training sample set is a real background pixel point in the video image, the RGB mean matrix of each pixel point is used as the distribution center of RGB components of the pixel point, therefore, the mahalanobis distance between the RGB matrix of the image pixel points in the training sample set and the RGB mean matrix of the pixel points of the corresponding sequence numbers of the images in the training sample set well reflects the distribution difference between the RGB component distribution characteristics of the background pixel points in the video image and the RGB component distribution centers of the pixel points of the corresponding sequence numbers, and is equivalent to measure the RGB component distribution characteristics of the background pixel points in the video image by using the mahalanobis distance algorithm. For any k-th pixel point in the video image, the mahalanobis distance
Figure BDA0000102056920000095
The larger the value of (A) is, the RGB matrix X of the kth pixel point of the ith frame image in the training sample set is indicatedk(i) The RGB component distribution center of the kth pixel point of the image
Figure BDA0000102056920000096
The larger the distribution difference is, the more the RGB matrix of the kth pixel point of each frame of image in the training sample set and the RGB mean matrix of the kth pixel point of the image in the training sample set are taken
Figure BDA0000102056920000097
Maximum value among mahalanobis distances of
Figure BDA0000102056920000098
As a background boundary threshold value of a k-th pixel point of the video image, the RGB component distribution boundary of the background image in the video at the k-th pixel point position is truly embodied. Therefore, in the step, the value of k is taken from 1 to MXN, and the background boundary threshold of each pixel point of the video image is respectively obtained to determine the RGB component distribution profile of the background image at each pixel point position in the video.
At this point, the processing steps of the background training phase are completed. Next, a background detection stage is performed for each frame image in the video that is the object of background detection.
B. And (3) background detection stage:
step e): for J frames of images serving as background detection objects in the video, J is more than or equal to 1 and less than or equal to L, and Mahalanobis distances between RGB matrixes of pixel points of each frame of image serving as the background detection objects and RGB mean matrixes of pixel points of corresponding sequence numbers of images in the training sample set are respectively obtained:
Dis ( X k ( j ) , X ‾ k ) = ( X k ( j ) - X ‾ k ) T cov ( X k ) - 1 ( X k ( j ) - X ‾ k ) ;
k=1,2,…,(M×N),j=1,2,…,J;
wherein,
Figure BDA0000102056920000101
RGB matrix X for expressing kth pixel point of jth frame image as background detection objectk(j) RGB mean matrix of k-th pixel point of image in training sample setMahalanobis distance of; xk(j)=[Rk(j),Gk(j),Bk(j)]RGB matrix, R, representing k pixel point of j frame image as background detection objectk(j)、Gk(j) And Bk(j) Respectively representing a red component value, a green component value and a blue component value of a kth pixel point of a jth frame image serving as a background detection object; t is a matrix transposition symbol.
The number of image frames J to be background-detected is determined entirely by the actual need to perform background detection, and the image to be background-detected may be an arbitrary one-frame image in the video (corresponding to J being 1), may be a continuous or discrete multi-frame image in the video (corresponding to 1 < J < L), or may even have all the frame images of the video as background-detected objects (corresponding to J being L).
In the step, J is taken from 1 to J to ensure that the operation is performed on each frame image serving as a background detection object, and for each value of J, k is taken from 1 to MXN, so that the Mahalanobis distance between the RGB matrix of each pixel point of each frame image serving as the background detection object and the RGB mean matrix of the corresponding sequence number pixel points of the images in the training sample set is respectively obtained. For any k-th pixel point of any ith frame image of the background detection object, the Mahalanobis distance
Figure BDA0000102056920000103
The larger the value of (A) is, the RGB matrix X of the kth pixel point of the jth frame image as a background detection object is indicatedk(j) The RGB component distribution center of the kth pixel point of the image
Figure BDA0000102056920000104
The larger the distribution difference is, the more the RGB component distribution difference is measured whether the RGB component distribution difference exceeds the RGB component distribution boundary of the background pixel point.
Step f): for J frame image as background detection object in video, if
Figure BDA0000102056920000105
Judging that the kth pixel point of the jth frame image as a background detection object is a background pixel point; otherwise, judging that the kth pixel point of the jth frame image as a background detection object is a foreground pixel point; thereby detecting each pixel point of J frame image as background detection object in videoAnd if the pixel points are background pixel points, finishing the background detection of the J frame image serving as the background detection object.
In the step, for any k-TH pixel point of any j-TH frame image as a background detection object, a background boundary threshold value TH of the k-TH pixel point of the image is usedkAs boundary conditions of background and foreground, if
Figure BDA0000102056920000106
Indicating that the RGB component distribution difference between the kth pixel point of the jth frame image as the background detection object and the corresponding actual background pixel point does not exceed the range of the background boundary condition, and therefore judging that the kth pixel point of the jth frame image as the background detection object is the background pixel point; if it is not
Figure BDA0000102056920000107
Then, it indicates that the RGB component distribution difference between the kth pixel point of the jth frame image as the background detection object and the actual background pixel point corresponding thereto has exceeded the range of the background boundary condition and has satisfied the foreground identification condition, so that the kth pixel point of the jth frame image as the background detection object is determined to be the foreground pixel point. Therefore, the background detection of the J frame image as the background detection object can be completed through the background/foreground detection of each pixel point of each frame image as the background detection object.
As can be seen from the steps of the background detection method, the operation amount of the Mahalanobis distance algorithm is basically equivalent to that of the algorithm used in the codebook background modeling detection method, so that the background detection method basically keeps the operation efficiency equivalent to that of the codebook background modeling detection method, and still has good real-time performance and robustness. In different applications, the number and the sequence of the image frames required to be subjected to background detection in the video are different, but for any frame image serving as a background image in the video, the background detection can be completed according to the steps. For example, if the background detection is to be performed on 21 to 50 frames of images in the video, in steps e) to f) of the background detection stage of the method of the present invention, the 21 st to 50 th frames of images in the video are used as the background detection object to perform the background detection operation; if the background detection is to be performed on all the images of the video, in steps e) to f) of the background detection stage of the method of the present invention, the background detection operation is performed using 1 to L frame images of the video as the background detection object, where L represents the total frame number of the video.
If a plurality of video clips with different background images exist in the same video, a new background image in the video clip can be used as a new training sample set to perform background training again (namely updating the training sample aiming at the new background) according to the steps a) to d) of the background training stage of the method of the invention, and then the video clip where the new background image is located is subjected to background detection according to the steps e) to f) of the background detection stage of the method of the invention; thereby performing background detection on video clips with different background images respectively. The application method expands the application range of the background detection method to a certain extent.
Third, experimental example.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
In order to verify the actual effect of the Mahalanobis distance-based video image background detection method, the method is realized under a Matlab 2010b programming tool, and the experimental hardware environment is an Intel Pentium (R)4 processor with a main frequency of 3.0GHz and a memory of 2.0 GB. In order to embody the advantages of the method under the background condition with complex textures, a sampling image in a self-shooting video is selected as a background detection object, the image resolution is 120 (rows) × 160 (columns), the background image of the self-shooting video comprises background objects such as sky, trees and floors with complex textures, partial buildings and the like, a 2800 multi-frame image of the self-shooting video comprises 386 frames of background images, and the rest images have moving character foregrounds in front of the background. In the experiment, the background detection of the sampled image in the self-shooting video is performed by using the mahalanobis distance-based video image background detection method (hereinafter, referred to as the present invention method), the codebook background modeling detection method (hereinafter, referred to as the codebook method) and the gaussian mixture background modeling detection method (hereinafter, referred to as the MOG method), respectively, 80 frames of background images in the Wallflower video are selected as training sample sets, the parameter n in the present invention method is 3, the control parameters α is 0.4, β is 1.5, the background boundary radius ξ is 100 (see the literature, "Kim, k., et al," Real-time for feedback and estimation using. Real-time acquisition, 2005.11(3 p.172-185:, "and so on), the gaussian mixture parameter α is 3, and the learning rate α is 0.005, c.r., et al.pfinder: real-time tracking, IEEE Transactions on Pattern Analysis and Machine Analysis, 1997.19 (7): p.780-785. "), setting the RGB component values of the detected background pixels to be 0, 0, 0 (black), and the RGB component values of the foreground pixels to be 255, 255, 255 (white), so as to distinguish, and the background detection result is as shown in fig. 3. In fig. 3, a picture 3A is a sampled image original image in the self-shooting video, and two characters in the sampled image original image are foreground; the 3B picture is an actual background and foreground distinguishing contrast picture, the 3C picture is a background detection distinguishing contrast picture of the method, the 3D picture is a background detection distinguishing contrast picture of the codebook method, and the 3E picture is a background detection distinguishing contrast picture of the MOG method. As can be seen from comparison of the 3B, 3C, 3D, and 3E graphs in fig. 3, compared with the actual background and foreground distinguishing and comparing graphs, the noise points of the MOG method and the codebook method for background detection are relatively more (the noise points are background pixel points which are falsely detected as the foreground or foreground pixel points which are falsely detected as the background), and the noise points are distributed in a messy manner, and the messy noise point distribution in the actual application easily affects the accuracy of background identification and foreground capture; compared with an MOG method and a codebook method, the background detection method has fewer noise points, the detection result is closer to the actual situation of distinguishing the background and the foreground, the background detection precision is obviously improved compared with the prior art, and if the number of frames of the image extracted as the training sample set is more, the detection accuracy can be higher and the noise points are less; in the background detection result shown in fig. 3C obtained by the method of the present invention, although some noise points still exist, the noise points are mainly distributed near the foreground pixel points, which completely meets the practical accuracy requirements of video image background recognition and foreground capture in practical applications, and is particularly suitable for practical application technologies requiring background recognition and foreground capture, such as surveillance video abstract tracking technology, face recognition technology, etc.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (1)

1. The Mahalanobis distance-based video image background detection method is characterized by comprising the following steps of:
a) extracting F frame background images from the video to serve as a training sample set, wherein F is more than or equal to 80 and less than or equal to L, and L represents the total frame number of the video;
b) respectively solving an RGB mean matrix of each pixel point of the image in the training sample set:
X &OverBar; k = 1 F &Sigma; i = 1 F X k ( i ) , k = 1,2 , . . . , ( M &times; N ) ;
wherein k represents the serial number of a pixel point in an image of the training sample set, and M multiplied by N represents the resolution of a video image;
Figure FDA0000102056910000012
representing an RGB mean matrix of a k-th pixel point of an image in a training sample set; xk(i)=[Rk(i),Gk(i),Bk(i)]An RGB matrix representing the kth pixel point of the ith frame of image in the training sample set, i is more than or equal to 1 and less than or equal to F, Rk(i)、Gk(i) And Bk(i) Respectively representing a red component value, a green component value and a blue component value of a kth pixel point of an ith frame of image in a training sample set;
c) respectively solving RGB covariance matrixes of all pixel points of images in a training sample set:
cov ( X k ) = 1 F - 1 &Sigma; i = 1 F [ ( X k ( i ) - X &OverBar; k ) &times; ( X k ( i ) - X &OverBar; k ) T ] , k = 1,2 , . . . , ( M &times; N ) ;
wherein, cov (X)k) Representing an RGB covariance matrix of a k-th pixel point of an image in a training sample set; t is a matrix transposition symbol;
d) respectively determining the background boundary threshold of each pixel point according to the RGB mean matrix and the RGB covariance matrix of each pixel point of the image in the training sample set:
TH k = max [ Dis ( X k ( i ) , X &OverBar; k ) | i &Element; { 1,2 , . . . , F } ] , k = 1,2 , . . . , ( M &times; N ) ;
wherein TH iskRepresenting a background boundary threshold value of a k-th pixel point of the video image;
Figure FDA0000102056910000015
RGB matrix X for expressing kth pixel point of ith frame image in training sample setk(i) RGB mean matrix of k-th pixel point of image in training sample set
Figure FDA0000102056910000016
Mahalanobis distance of (a):
Dis ( X k ( i ) , X &OverBar; k ) = ( X k ( i ) - X &OverBar; k ) T cov ( X k ) - 1 ( X k ( i ) - X &OverBar; k ) ;
t is a matrix transposition symbol;
Figure FDA0000102056910000018
representing the RGB matrix of the k-th pixel point of each frame of image in the training sample set and the RGB mean matrix of the k-th pixel point of the image in the training sample set
Figure FDA0000102056910000019
Maximum among mahalanobis distances of;
e) for J frames of images serving as background detection objects in the video, J is more than or equal to 1 and less than or equal to L, and Mahalanobis distances between RGB matrixes of pixel points of each frame of image serving as the background detection objects and RGB mean matrixes of pixel points of corresponding sequence numbers of images in the training sample set are respectively obtained:
Dis ( X k ( j ) , X &OverBar; k ) = ( X k ( j ) - X &OverBar; k ) T cov ( X k ) - 1 ( X k ( j ) - X &OverBar; k ) ,
k=1,2,…,(M×N),j=1,2,…,J;
wherein,
Figure FDA0000102056910000022
RGB matrix X for expressing kth pixel point of jth frame image as background detection objectk(j) RGB mean matrix of k-th pixel point of image in training sample set
Figure FDA0000102056910000023
Mahalanobis distance of; xk(j)=[Rk(j),Gk(j),Bk(j)]RGB matrix, R, representing k pixel point of j frame image as background detection objectk(j)、Gk(j) And Bk(j) Are respectively shown asThe red component value, the green component value and the blue component value of the kth pixel point of the jth frame image of the background detection object; t is a matrix transposition symbol;
f) for J frame image as background detection object in video, if
Figure FDA0000102056910000024
Judging that the kth pixel point of the jth frame image as a background detection object is a background pixel point; otherwise, judging that the kth pixel point of the jth frame image as a background detection object is a foreground pixel point; therefore, whether each pixel point of the J frame image serving as a background detection object in the video is a background pixel point is detected, and background detection of the J frame image serving as the background detection object is completed.
CN 201110328046 2011-10-25 2011-10-25 Mahalanobis-distance-based video image background detection method Expired - Fee Related CN102340620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110328046 CN102340620B (en) 2011-10-25 2011-10-25 Mahalanobis-distance-based video image background detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110328046 CN102340620B (en) 2011-10-25 2011-10-25 Mahalanobis-distance-based video image background detection method

Publications (2)

Publication Number Publication Date
CN102340620A CN102340620A (en) 2012-02-01
CN102340620B true CN102340620B (en) 2013-06-19

Family

ID=45516107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110328046 Expired - Fee Related CN102340620B (en) 2011-10-25 2011-10-25 Mahalanobis-distance-based video image background detection method

Country Status (1)

Country Link
CN (1) CN102340620B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281779B (en) * 2013-06-13 2015-08-12 北京空间飞行器总体设计部 Based on the radio frequency tomography method base of Background learning
CN106162092B (en) * 2016-07-29 2019-03-15 深圳创维-Rgb电子有限公司 A kind of method and system sport video acquisition and played
CN106846336B (en) * 2017-02-06 2022-07-15 腾讯科技(上海)有限公司 Method and device for extracting foreground image and replacing image background
CN108257188A (en) * 2017-12-29 2018-07-06 重庆锐纳达自动化技术有限公司 A kind of moving target detecting method
CN110827287B (en) * 2018-08-14 2023-06-23 阿里巴巴(上海)有限公司 Method, device and equipment for determining background color confidence and image processing
CN113473628B (en) * 2021-08-05 2022-08-09 深圳市虎瑞科技有限公司 Communication method and system of intelligent platform
CN116861224B (en) * 2023-09-04 2023-12-01 鲁东大学 Intermittent process soft measurement modeling system based on intermittent process soft measurement modeling method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7176963B2 (en) * 2003-01-03 2007-02-13 Litton Systems, Inc. Method and system for real-time image fusion
CN101420533B (en) * 2008-12-02 2011-11-09 上海电力学院 Embedded image fusion system and method based on video background detection
CN101783015B (en) * 2009-01-19 2013-04-24 北京中星微电子有限公司 Equipment and method for tracking video
CN101883209B (en) * 2010-05-31 2012-09-12 中山大学 Method for integrating background model and three-frame difference to detect video background

Also Published As

Publication number Publication date
CN102340620A (en) 2012-02-01

Similar Documents

Publication Publication Date Title
CN102340620B (en) Mahalanobis-distance-based video image background detection method
CN108492319B (en) Moving target detection method based on deep full convolution neural network
Zhang et al. Motion human detection based on background subtraction
CN106886216B (en) Robot automatic tracking method and system based on RGBD face detection
Jia et al. A two-step approach to see-through bad weather for surveillance video quality enhancement
CN111882810B (en) Fire identification and early warning method and system
CN112800860B (en) High-speed object scattering detection method and system with coordination of event camera and visual camera
CN109685045B (en) Moving target video tracking method and system
CN103942557B (en) A kind of underground coal mine image pre-processing method
Chu et al. Object tracking algorithm based on camshift algorithm combinating with difference in frame
KR100572768B1 (en) Automatic detection method of human facial objects for the digital video surveillance
CN102510437B (en) Method for detecting background of video image based on distribution of red, green and blue (RGB) components
CN117132510B (en) Monitoring image enhancement method and system based on image processing
CN102982537A (en) Scene change detection method and scene change detection system
CN113139489A (en) Crowd counting method and system based on background extraction and multi-scale fusion network
CN102509076B (en) Principal-component-analysis-based video image background detection method
CN113688741A (en) Motion training evaluation system and method based on cooperation of event camera and visual camera
Yoshinaga et al. Real-time people counting using blob descriptor
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN111582036A (en) Cross-view-angle person identification method based on shape and posture under wearable device
CN114155278A (en) Target tracking and related model training method, related device, equipment and medium
CN109241932A (en) A kind of thermal infrared human motion recognition method based on movement variogram phase property
CN110322479B (en) Dual-core KCF target tracking method based on space-time significance
CN111582076A (en) Picture freezing detection method based on pixel motion intelligent perception
TWI381735B (en) Image processing system and method for automatic adjustment of image resolution for image surveillance apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130619

Termination date: 20151025

EXPY Termination of patent right or utility model