CN118038310A

CN118038310A - Video background elimination method, system, equipment and storage medium

Info

Publication number: CN118038310A
Application number: CN202410055542.2A
Authority: CN
Inventors: 许少睿; 黄练; 赵金洪; 李茜茜
Original assignee: Guangdong Mechanical and Electrical College
Current assignee: Guangdong Mechanical and Electrical College
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-05-14

Abstract

The invention discloses a video background elimination method, a system, equipment and a storage medium, which belong to the technical field of background elimination and solve the technical problem of low processing speed of the traditional method, wherein the method comprises the following steps: step 1, constructing a multi-class support vector machine model; step 2, constructing a color model for training a multi-class support vector machine model; step 3, determining the standard of the multi-class support vector machine model for judging the attribution of the classes; step 4, constructing an automatic background learning updating rule; step 5, obtaining a pre-training image, and inputting a multi-class support vector machine model to learn a training background image; and 6, acquiring an image to be segmented, performing foreground segmentation on the image to be segmented by using a trained multi-class support vector machine model, and outputting the segmented image. The background elimination algorithm based on the multi-class support vector machine can realize the function of extracting the binary image of the motion background, has processing speed and effect performance superior to those of the traditional algorithm, and shows stronger robustness.

Description

Video background elimination method, system, equipment and storage medium

Technical Field

The present invention relates to the field of background elimination, and more particularly, to a method, a system, a device, and a storage medium for eliminating video background.

Background

Video-based computer vision applications first require moving objects to be detected and then use background elimination (background subtraction) techniques to separate the background from the foreground, as shown in fig. 1. Background elimination is a commonly used method for preprocessing video target detection, which uses a still camera to generate one or more foreground masks containing object pixel information in video, and performs subtraction on the current frame and the foreground masks to extract moving objects in the video. The change in the indoor and outdoor environments brings the following difficulties to the treatment of the background elimination technology:

(1) Illumination changes, illumination changes over time, pixel values in the RGB color space may change dramatically; variations in illumination may be confused with moving objects.

(2) Shadows of humans, when a walking person appears in a picture, are often accompanied by shadows, including self-shadows and cast shadows, which cause variations in local illumination.

(3) Background object reassignment, when background objects are live-action positioned, for example, a trash can that is considered part of a reference image in background modeling is removed from the image or reassigned elsewhere, the area where the trash can was originally positioned may be mistaken for a moving object.

(4) Camouflage, the similarity of the colors of the moving object and the background increases the difficulty of distinguishing from each other, for example, a person walking with black clothes appears in a dark outdoor environment, and it is difficult to obtain a satisfactory result due to the similarity of the pixel values.

(5) A non-static background, a tree in the wind, a computer display, a running fan, etc. dynamic background is difficult to model by simple statistics.

(6) The background changes rapidly, and when the background changes at high frequency, the background model cannot adapt rapidly, so that misclassification occurs.

The existing background elimination algorithm classification method can be summarized as follows:

(1) Classified by mathematical concepts, can be divided into statistical models, fuzzy models, and Dempster-schafer models.

(2) The machine learning concept is used for classification, and the classification can be divided into a reconstructed, distinguished and mixed subspace learning model, a matrix decomposition or tensor decomposition subspace learning, a support vector machine, a neural network and deep learning.

(3) The classification is carried out by a signal processing model, and the classification can be divided into a wiener filter, a Kalman filter, an entropy filter and a chebyshev filter.

(4) Clustering algorithm of classification model.

Statistical, fuzzy, and Dempster-Schafer models allow for the challenges presented by data inaccuracy, uncertainty, and imperfection, machine learning concepts allow for learning background pixel representations in a supervised or unsupervised manner, signal processing models allow for estimating background values, and classification models attempt to classify pixels as background or foreground. However, the existing background elimination techniques have the problems of large calculation amount and low processing speed.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art, and aims to provide a video background elimination method with high processing speed, good effect and strong robustness.

The invention aims to provide a video background elimination system with high processing speed, good effect and strong robustness.

The third object of the present invention is to provide a computer device.

A fourth object of the present invention is to provide a computer storage medium.

In order to achieve the above object, the present invention provides a method for removing video background, comprising the following steps:

step 1, constructing a multi-class support vector machine model;

step 2, constructing a color model for training the multi-class support vector machine model;

using a YUV color space and an HSI color space that combine color information with luminance separation, a pixel is represented with Y, U, V, H four parameters, and the following color model is obtained:

Y＝0.299R+0.587G+0.114B；

U＝-0.147R-0.289G+0.436B；

V＝0.615R-0.515G-0.1B；

in the t-th frame image, the variation of the chromaticities U and V at the ith pixel from the nth class of the background B is measured by a distance D (n):

Wherein B (U, n) and B (V, n) represent the values of U and V, respectively, in the n-th class of background B;

If the value of D (n) is within the threshold TH _D, which is the global threshold for all images, it is defined as a matching class; d (n) can be adjusted to control the probability of false detection, once a matching class is found, H and V are taken as two subsequent conditions for the final decision; when a new object appears at the corresponding position, the change of the H value is larger than a threshold value TH _H; when the change in Y is less than the threshold TH _Y, the pixel is a reliable background; the following criteria are met to obtain a classification of a new pixel as foreground:

D(n)≤TH_D；

H≤TH_H；

Y≤TH_Y；

Where TH _D、TH_H and TH _Y are both empirical values;

step 3, determining the standard of the multi-class support vector machine model for judging the attribution of the classes;

Separating (U, V) into U and V, and treating U and V as two independent features; u and V are a value of U and V, respectively, (u+th _D) and (U-TH _D) are both considered as features of a class when processing data of U, and their boundaries (u+th _D +c) and (U-TH _D -c) are considered as features within the background; values outside this range are considered foreground features;

When processing v data, (v+th _D) and (v-TH _D) are both considered as a class of features, their boundaries (v+th _D +c) and (v-TH _D -c) are considered as features within the background;

When the pixel values u and v belong to the same class, the pixel is regarded as the background, otherwise, the pixel is regarded as the foreground;

step 4, constructing an automatic background learning updating rule;

Automatic background learning updating is realized through two steps of retraining data and updating the weight of each class;

The retraining data is specifically: defining (u _new,v_new) as the center of the new class, (u _c,v_c) as the center of the class nearest to the center of the new class, D (new) as the radius of the new class, and D (C) as the radius of the class nearest to the center of the new class; the values of u _new and v _new are calculated independently, the value of u _new is determined according to the following rule,

If the distance from u _new to u _c is greater than 2TH _D, then the value of u _new is set to TH _D;

If the distance from u _new to u _c is greater than D (C) but less than 2TH _D, then the value of u _new is set to |u _new-u_c l-D (C) -2C;

If the distance from u _new to u _c is less than D (C), then the value of u _new is set to D (C);

the value of v _new is determined according to the following rule,

If the distance from v _new to v _c is greater than 2TH _D, then the value of v _new is set to TH _D.

If the distance from v _new to v _c is greater than D (C) but less than 2TH _D, then the value of v _new is set to |v _new-v_c | -D (C) -2C

If the distance from v _new to v _c is less than D (C), then the value of v _new is set to D (C);

The updating of the weights of each class is specifically: in the t-th frame image, the weight ω _n,i,t update function at the nth class of the i-th pixel background can be expressed as:

Where N _i is the number of classes in the ith pixel background, α is the update rate, M _n,i,t is 1 when the classes match, the remaining classes are 0, f _c (x) is the decision function,

Omega _n,i,t is determined by f _c (x), and the value range is smaller than 1;

Step 5, obtaining a pre-training image, and inputting the multi-class support vector machine model to learn a training background image by using the pre-training image;

and 6, acquiring an image to be segmented, performing foreground segmentation on the image to be segmented by using a trained multi-class support vector machine model, and outputting the segmented image.

As a further improvement, in step 3, c is set to 0.01.

Further, in step 5, it includes:

step 51, obtaining and storing color information (u, v, y, h) of each pixel point in the pre-training image;

Step 52, inputting the stored data into the multi-class support vector machine model for training, and simultaneously obtaining the weight omega _n,i,t of each pixel point of the image;

Step 53, after training of the pre-training image is completed, the prediction classification of the image data is started, after a new video image is obtained, the data of each pixel point is sequentially put into a trained multi-class support vector machine model to carry out class attribution judgment, and the pixel point is judged to belong to the background or the foreground;

If the background is the background, skipping and not processing;

If the model belongs to the foreground, the data is required to be stored into the current pixel point, retraining is carried out on all the data of the current pixel point, the data of the model is updated, and meanwhile, the pixel information newly added into the background obtains new weight initialization information omega _n+1,i,t.

Further, the step 6 includes:

Step 61, after a new video image is obtained, sequentially placing the data of each pixel point into a model of a multi-class support vector machine to judge the attribution of the class, and judging whether the pixel point belongs to the background or the foreground;

If the background belongs to the background, automatically performing background learning updating according to the step 4, and updating and storing the weight omega _n,i,t of the background pixel;

if the foreground is the foreground, marking the foreground as the foreground;

Step 62, after the background of the whole image is eliminated, enhancing the segmentation result by using morphological technology;

Step 63, the newly acquired image is continuously circularly processed and output according to the steps 61 to 62.

Further, the pre-training image is part of the image to be segmented.

In order to achieve the second object, the present invention provides a video background elimination system, including:

the model construction module is used for constructing a multi-class support vector machine model;

The color module is used for constructing a color model for training the multi-class support vector machine model;

the attribution judging module is used for determining the standard of attribution judgment of the classes of the multi-class support vector machine model;

the learning module is used for constructing an automatic background learning updating rule;

The pre-training module is used for acquiring a pre-training image and inputting the multi-class support vector machine model to learn a training background image by using the pre-training image;

The segmentation module is used for acquiring an image to be segmented, performing foreground segmentation on the image to be segmented by using a trained multi-class support vector machine model, and outputting the segmented image.

In order to achieve the third object, the present invention provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor implements a video background elimination method as described above when executing the computer program.

In order to achieve the above object, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a video background elimination method as described above.

Advantageous effects

Compared with the prior art, the invention has the advantages that:

The background elimination algorithm based on the multi-class support vector machine can realize the function of extracting the binary image of the motion background, has better realization speed and effect performance than the traditional algorithm, and shows stronger robustness.

Drawings

FIG. 1 is a background elimination schematic;

FIG. 2 is a flowchart of a background image learning training process in the present invention;

FIG. 3 is a flow chart of a background image segmentation process in the present invention;

fig. 4 is an effect diagram of the segmented image of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments in the drawings.

The support vector machine (Support Vector Machine, SVM) is a commonly used method of data classification in machine learning. They were originally designed for two different kinds of data classification. In order to classify data belonging to multiple classes, support vector machines are extended to multi-class support vector machines, which use a decision function that represents the probability of test data in a class to determine a single class. The decision function f (x) is expressed as:

where X _i represents training set data, α _i represents a Lagrangian multiplier, y _i ε {1, -1}, b represents a bias value, and K (X _i, X) represents a kernel function.

The multi-class support vector machine comprises three basic methods: one-to-one, one-to-many, and DAGSVM. In a one-to-one multi-class support vector machine approach, each class is trained with another class in a loop, building k (k-1)/2 classifiers. For training data of class i and class j, in order to maximize the classification interval, the objective function needs to be optimized to makeMinimizing, the objective function is expressed as:

Wherein ζ.gtoreq.0 is a relaxation variable, C.gtoreq.0 is a penalty parameter, ω is a normal vector, and b is a bias variable.

The one-to-one multi-class support vector machine is practical in solving the practical problem, so the technical scheme of the invention is a method based on the one-to-one multi-class support vector machine.

Specifically, referring to fig. 2 to fig. 4, the present invention provides a method for eliminating video background, which includes the following steps 1 to 6:

And 1, constructing a multi-class support vector machine model, namely constructing a multi-class support vector machine model based on one-to-one.

And 2, constructing a color model for training the multi-class support vector machine model.

The color model plays a vital role in the background cancellation algorithm, a well behaved color model should be insensitive to noise and be able to distinguish the difference between the current pixel and the background. On the one hand, the RGB color model is too sensitive to noise and variations in lighting conditions. On the other hand, it can easily distinguish the difference between the current pixel and the background due to its sensitivity. The invention selects a YUV color space and an HSI color space which can combine color information and brightness, uses Y, U, V, H four parameters to represent a pixel, Y, U and V represent brightness and chromatic aberration in the YUV color space, and H represents hue of the HSI color space, and the following color model is obtained:

Y＝0.299R+0.587G+0.114B；

u=0.147R-0.289g+0.436b;

V＝0.615R-0.515G-0.1B。

In a multi-class SVM, both U and V have color information for training and prediction. In the t-th frame image, the variation of the chromaticities U and V at the ith pixel from the nth class of the background B is measured by a distance D (n):

Wherein B (U, n) and B (V, n) represent the values of U and V, respectively, in the n-th class of background B.

If the value of D (n) is within the threshold TH _D, which is the global threshold for all images, it is defined as a matching class. D (n) can be adjusted to control the probability of false detection, once a matching class is found, H and V are taken as two subsequent conditions for the final decision. When a new object is present at the corresponding location, the change in the H value is greater than a threshold TH _H. When the change in Y is less than the threshold TH _Y, the pixel is a reliable background. The following criteria are met to obtain a classification of a new pixel as foreground:

D(n)≤TH_D；

H≤TH_H；

Y≤TH_Y；

Where TH _D、TH_H and TH _Y are both empirical values.

And 3, determining the standard of the multi-class support vector machine model for judging the attribution of the classes.

In order to reduce the computational complexity, the invention separates (U, V) into U and V, and treats U and V as two independent features; for example, U and V are a value of U and V, respectively, and when processing data of U, (u+TH _D) and (U-TH _D) are both considered as features of one class, their boundaries (u+TH _D +c) and (U-TH _D -c) are considered as features within the background range, where c is set to 0.01; values outside this range are considered foreground features.

Similarly, when processing v data, (v+th _D) and (v-TH _D) are both considered as a class feature, their boundaries (v+th _D +c) and (v-TH _D -c) are considered as features within the background range, where c is set to 0.01.

After training the data, when a new pixel value (u, v, y, h) is observed, the multi-class support vector machine model will output two predictions about u and v. The observed pixel value u and v is considered to be background only if it belongs to the same class, otherwise foreground.

And 4, constructing an automatic background learning updating rule.

The automatic background learning update enables the algorithm to intelligently adapt to the changes in ambient light and the problem of foreground re-allocation. In the present invention, the automatic background learning update is realized by two steps of retraining data and updating the weight of each class.

To avoid that the intersection of two classes results in an erroneous decision on the class in the class prediction step, a simple mechanism may be employed. The retraining data is specifically: defining (u _new,v_new) as the center of the new class, (u _c,v_c) as the center of the class nearest to the center of the new class, D (new) as the radius of the new class, and D (C) as the radius of the class nearest to the center of the new class. The values of u _new and v _new are calculated independently, and the value of u _new is determined according to the following rule:

If the distance from u _new to u _c is greater than D (C) but less than 2TH _D, then the value of u _new is set to |u _new-u_c | -D (C) -2C;

if the distance from u _new to u _c is less than D (C), then the value of u _new is set to D (C).

Similarly, the value of v _new is determined according to the following rule:

If the distance from v _new to v _c is less than D (C), then the value of v _new is set to D (C).

Another key step in automatic background learning updating is updating the weight of each class, which describes supporting evidence of classes belonging to the background. The updating of the weights of each class is specifically: in the t-th frame image, the weight ω _n,i,t update function at the nth class of the i-th pixel background can be expressed as:

Omega _n,i,t is determined by f _c (x), and the value range is smaller than 1. The weight updating function can enable classes to learn at a proper speed, and slowly eliminate classes in which pixels in the background image are stored.

And 5, acquiring a pre-training image, and inputting a multi-class support vector machine model to learn a training background image by using the pre-training image.

The purpose of step 5 is to learn the context and initialize the parameters, as shown in fig. 2, comprising in step 5:

step 51, after obtaining the video image, judging whether the video image belongs to the pre-training image, if the video image belongs to the pre-training image, obtaining and storing color information (u, v, y, h) of each pixel point in the pre-training image;

Step 52, inputting the stored data into a multi-class support vector machine model for training, and simultaneously obtaining the weight omega _n,i,t of each pixel point of the image;

And step 53, after training of the pre-training image is completed, the prediction classification of the image data is started. And (3) after a new video image is acquired, sequentially putting the data of each pixel point into the trained multi-class support vector machine model to carry out class attribution judgment, judging whether the pixel point belongs to the background or the foreground, and judging the condition according to the attribution judgment standard in the step (3).

If the background is the background, skipping and not processing;

Foreground segmentation is divided into two phases, as shown in fig. 3, mainly involving classification of the background and foreground and implementation of automatic background learning update. In step 6, it includes:

step 61, after a new video image is obtained, sequentially placing the data of each pixel point into a model of a multi-class support vector machine to judge the attribution of the class, and judging whether the pixel point belongs to the background or the foreground; the judging condition is based on the attribution judging standard in the step 3;

if the foreground is the foreground, marking the foreground as the foreground;

Step 63, the newly acquired image is continuously circularly processed and output according to the steps 61 to 62, and the final effect diagram is shown in fig. 4.

The pre-training image may be a portion of the image to be segmented, such as a segment of the image from which the pre-training image begins.

A video background elimination system, comprising:

the color module is used for constructing a color model for training a multi-class support vector machine model;

The attribution judging module is used for determining the standard of attribution judgment of the classes by the multi-class support vector machine model;

The pre-training module is used for acquiring a pre-training image and inputting a multi-class support vector machine model to learn a training background image by using the pre-training image;

the segmentation module is used for acquiring an image to be segmented, performing foreground segmentation on the image to be segmented by using the trained multi-class support vector machine model, and outputting the segmented image.

A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes a video background elimination method when executing the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a video background elimination method as described above.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these do not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. A method for removing video background, comprising the steps of:

step 1, constructing a multi-class support vector machine model;

Y＝0.299R+0.587G+0.114B；

U＝-0.147R-0.289G+0.436B；

V＝0.615R-0.515G-0.1B；

D(n)≤TH_D；

H≤TH_H；

Y≤TH_Y；

Where TH _D、TH_H and TH _Y are both empirical values;

step 4, constructing an automatic background learning updating rule;

the value of v _new is determined according to the following rule,

Omega _n,i,t is determined by f _c (x), and the value range is smaller than 1;

2. The video background elimination method according to claim 1, wherein in step 3, c is set to 0.01.

3. The method of claim 1, wherein in step 5, the method comprises:

If the background is the background, skipping and not processing;

4. The method of claim 1, wherein in step 6, the method comprises:

if the foreground is the foreground, marking the foreground as the foreground;

5. The method of claim 1, wherein the pre-training image is a portion of the image to be segmented.

6. A video background elimination system, comprising:

7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements a video background elimination method according to any of claims 1-5 when executing the computer program.

8. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements a video background elimination method according to any of claims 1-5.