CN102054270B

CN102054270B - Method and device for extracting foreground from video image

Info

Publication number: CN102054270B
Application number: CN2009102108310A
Authority: CN
Inventors: 高辉; ***; 罗引; 牛彩卿
Original assignee: Huawei Technologies Co Ltd; University of Electronic Science and Technology of China
Current assignee: Huawei Technologies Co Ltd; University of Electronic Science and Technology of China
Priority date: 2009-11-10
Filing date: 2009-11-10
Publication date: 2013-06-05
Anticipated expiration: 2029-11-10
Also published as: CN102054270A

Abstract

The embodiment of the invention provides a method and device for extracting a foreground from a video image, wherein the method comprises the following steps: eliminating the background of a video-frame image in which a foreground enters by using a Gaussian model established according to background-frame images in a preset number, thus obtaining a video-frame image subjected to background elimination; generating the value of an input parameter of a pre-trained neural network model according to the information of the video-frame image subjected to background elimination and the information of the background-frame images in a preset number; according to the value of the input parameter and the pre-trained neural network model, generating a corresponding output value of the neural network model; and according to the relationship between the information of the video-frame image subjected to background elimination and the output value of the neural network model, detecting shadowed points and illumination noise points contained in the video-frame image subjected to background elimination. The method provided by the embodiment of the invention has the advantages of improving the accuracies of shadow detection and light interference resistance, and reducing the treatment complexity, thereby meeting the application requirements of real-time scenes.

Description

A kind of method of extracting foreground from video image and device

Technical field

The present invention relates to computer graphic image and process, relate to particularly a kind of method and device of extracting foreground from video image.

Background technology

Before and after video, scape separates, and refers to the foreground object in video is separated from background.Scape isolation technics before and after video, the scene that is widely used, most typical as the extraction of the who object in video calling, the foreground object of separating is combined to replace background with other background image, reach secret protection and the entertainment effect of video communication.The existing research widely of scape isolation technics before and after video, but in prospect and background detachment process, two characteristics of shade have determined that shadow Detection is a very hard problem: the one, shade is the same with object, all is different from significantly background; The 2nd, in most cases, the object that shade is corresponding with it is adjacent, and the characteristics of motion is the same, usually is merged into an integral body when cutting apart.Therefore, with effective method, shade is separated become Research Significance and practical value.

Before and after prior art, the method for scape separation and shade is as follows:

At first, utilize the method for difference background to extract prospect, scape before and after gauss hybrid models commonly used extracts and separates.

Secondly, utilize the monochrome information feature detection of image shadow region and remove shade: the same area that shade covers and do not have shade to cover, its pixel colourity does not have significant change, the brightness of pixel varies widely, that is, there is the pixel intensity in the zone that shade covers dark more a lot of than what do not have shade to cover.Hence one can see that, and in foreground area, pixel brightness value is not shadows pixels higher than the pixel of the pixel brightness value of corresponding point in the background area.Particularly, make that x ∈ (r, g, b) is the pixel of present frame, X ∈ (R, G, B) is background pixel, and the RGB color space is to brightness value conversion formula: I=0.3 * R+0.59 * G+0.11 * B (1), and I represents brightness value.Make I _xThe brightness value of the pixel of expression present frame, I _XThe brightness value of expression background pixel is established M _ShadowBe the shadows pixels set, its initial value is the x pixel, respectively by the brightness value I of R, G, three Color Channel calculating pixel X of B, then judges the ownership of pixel x according to formula (1).If satisfy I _x＞I _X, pixel

x &Element; M_{shadow},

With pixel x from M _ShadowMiddle removal.Prior art exists the degree of accuracy of foreground extraction low, the shortcoming that real-time is relatively poor.

Summary of the invention

In view of this, the object of the present invention is to provide a kind of method and device of extracting foreground from video image, to overcome the low poor deficiency of real-time that reaches of shadow Detection degree of accuracy in prior art, to extract desirable foreground area from video image.

On the one hand, the embodiment of the present invention provides a kind of method of extracting foreground from video image, described method comprises: utilize the Gauss model of setting up according to the background frames image of predetermined number to eliminate to have the background of the video frame images that prospect enters, obtain to eliminate the video frame images after background; According to the information of the background frames image of the information of the video frame images after described elimination background and described predetermined number, generate the value of the input parameter of the good neural network model of training in advance; Generate corresponding output valve according to value and the described neural network model that trains of described input parameter; According to the relation of the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and illumination noise spot that the video frame images after described elimination background comprises.

On the other hand, the embodiment of the present invention provides a kind of device of extracting foreground from video image, described device comprises: background is eliminated the unit, be used for to utilize the Gauss model of setting up according to the background frames image of predetermined number to eliminate to have the background of the video frame images that prospect enters, obtain to eliminate the video frame images after background; The input parameter generation unit is used for the information according to the background frames image of the information of the video frame images after described elimination background and described predetermined number, generates the value of the input parameter of the good neural network model of training in advance; The output valve generation unit is used for generating corresponding output valve according to value and the described neural network model that trains of input parameter; Shade and noise processed unit are used for the relation according to the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and illumination noise spot that the video frame images after described elimination background comprises.

the method and apparatus of a kind of extracting foreground from video image of the embodiment of the present invention, the value of the input parameter of the neural network model good according to the Information generation training in advance of the background frames image of the information of the video frame images after described elimination background and described predetermined number, and according to the value of the described input parameter of described nerve net lattice model and the relation of described output valve, process shadow spots and illumination noise spot that the video frame images after described elimination background comprises, thereby can detect effectively rapidly or eliminate the shadow spots and the illumination noise spot that comprise in video image, be conducive to extract desirable foreground area from video image.The method and apparatus of the embodiment of the present invention has improved the accuracy of shadow Detection and the interference of anti-light, and when the neural network model of the embodiment of the present invention was carried out computing simultaneously, complexity was lower, thereby can satisfy the application demand of real-time scene.

Description of drawings

Fig. 1 is the overall flow figure of method of a kind of extracting foreground from video image of the embodiment of the present invention;

Fig. 2 is the particular flow sheet of method of a kind of extracting foreground from video image of the embodiment of the present invention;

The structural representation of the BP neural network model of Fig. 3 embodiment of the present invention;

Fig. 4 is the Establishment of Neural Model of the embodiment of the present invention and the method flow diagram of training;

Fig. 5 is the method flow diagram of training data of the acquisition neural network model of the embodiment of the present invention;

Fig. 6 is the training process schematic diagram of the BP neural network of the embodiment of the present invention;

Fig. 7 is the functional block diagram of device of a kind of extracting foreground from video image of the embodiment of the present invention;

Fig. 8 is the refinement functional block diagram of the input parameter generation unit of the embodiment of the present invention.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

In the prior art, when light source changes, easily produce a large amount of erroneous judgement points, thereby cause the degree of accuracy of foreground extraction sharply to descend; And the method for prior art is higher at its algorithm complex of process of judgement shadow spots, is unsuitable for the application under real-time scene, and real-time is relatively poor.

The embodiment of the present invention provides a kind of method of extracting foreground from video image.

Fig. 1 is the overall flow figure of method of a kind of extracting foreground from video image of the embodiment of the present invention.As shown in Figure 1, the method for the embodiment of the present invention comprises:

S101, utilize the Gauss model of setting up according to the background frames image of predetermined number to eliminate to have the background of the video frame images that prospect enters, obtain to eliminate the video frame images after background;

The value of the input parameter of S102, the neural network model good according to the Information generation training in advance of the background frames image of the information of the video frame images after described elimination background and described predetermined number;

S103, generate corresponding output valve according to the value of described input parameter and the described neural network model that trains;

S104, according to the relation of the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and illumination noise spot that the video frame images after described elimination background comprises.

Below for the embodiment in practical application, the shadow spots that adopts embodiment of the present invention method shown in Figure 1 to process in real time to comprise in inputted video image and the detailed process of illumination noise spot are described.

Fig. 2 is the particular flow sheet of method of a kind of extracting foreground from video image of the embodiment of the present invention.Fig. 2 has disclosed shadow Detection and the anti-light interference method based on artificial neural network of the embodiment of the present invention, and as shown in Figure 2, the method comprises:

S201, input video sequence, this sequence is comprised of a plurality of frame of video;

Whether the frame number of the frame of video of S202, judgement input is in the background modeling frame number, and in this way, execution in step S203, utilize the frame of video of above-mentioned input to set up Gauss model; As no, execution in step S204, utilize this Gauss model to separate the background of this frame of video;

Alternatively, the scope that can be provided for the video frame number of background modeling is 1～100 frame, if the frame number of frame of video in 100 frames, execution in step S203, namely the front 100 frame frame of video of input video sequence can be used for setting up Gauss model; Otherwise execution in step S204 is namely to from frame number being the frame of video execution background separation process in the of 101.The method of setting up Gauss model according to above-mentioned 100 frame frame of video will describe in detail in the embodiment of back, wouldn't launch at this.

S203, set up Gauss model;

S204, utilize the above-mentioned Gauss model that establishes in advance, the background of separating video two field picture;

Said process is specially: the background of this video frame images is eliminated, and with the video frame images after acquisition elimination background, the video frame images after this elimination background comprises foreground image, illumination noise spot and shadow spots.

S205, utilize the good anti-light of training in advance to disturb neural network model to carry out anti-light to the video frame images that obtains in S204 to disturb and process, namely detect and remove the illumination noise spot that above-mentioned video frame images comprises, to obtain to have eliminated the video frame images of illumination noise spot, through after above-mentioned processing, include only foreground image and shadow spots in this video frame images;

S206, utilize the good elimination shade neural network model of training in advance to carry out shadow Detection to the video frame images that obtains in S205 to process; Namely detect and remove the shadow spots that the video frame images that obtains in S205 comprises, to obtain to have eliminated the video frame images of shadow spots, include only foreground image in this video frame images.

The foreground image after shadow spots and illumination noise spot has been eliminated in S207, output.

Need to prove, aforementioned anti-light disturbs neural network model and eliminated the shade neural network model before input video frame is carried out the operation of extraction prospect, just sets up and trains.By adopting method shown in Figure 2, eliminated the illumination noise spot, shadow spots and the background image that comprise in the video frame images that has prospect to enter, thereby extracted the better foreground image of effect.Be more than one embodiment of the present of invention, in other embodiments, utilize anti-light to disturb neural network model to eliminate illumination noise spot disposal and utilization to image and eliminate the shade neural network model and image is eliminated the execution sequence that shadow spots processes can mutually change, the embodiment of the present invention does not limit the processing sequence of above-mentioned two neural network models.

The embodiment of the present invention adopts neural network model to process shadow spots and the illumination noise spot that video image comprises, and according to the relation between the output valve of the information of the video frame images after described elimination background and described neural network model, detect and eliminate shadow spots and the illumination noise spot that the video frame images after background comprises, thereby improved the accuracy that shadow Detection and anti-light disturb, and can satisfy the application demand of real-time scene, when the neural network model of the embodiment of the present invention was carried out computing simultaneously, complexity was lower.

The embodiment of the present invention also provides the method for another kind of extracting foreground from video image.The method is passed through in conjunction with principal component analysis (PCA) and BP (the reverse propagation of error, Back Propagation) advantage of neural network, remove the shade that is produced by prospect in image and change the light noise spot that produces due to light source, thereby reaching the purpose that extracts foreground area desirable in image.

The described principal component analysis (PCA) of the embodiment of the present invention (Principal Component Analysis, PCA) is intended to utilize the thought of dimensionality reduction, and many indexs are converted into a few overall target.Be to be introduced in the biological theory research of 1901 by Pearson the earliest, be widely used in multivariate statistics afterwards.The essence of PCA is the rotation of coordinate of n-dimensional space, do not change the sample data structure, the major component that obtains is linear combination and two pairwise uncorrelateds of former variable, and can farthest reflect the information that former variable comprises, after the individual more important major component of K, multidimensional problem is originally simplified before choosing with certain standard.In embodiments of the present invention, by principal component analysis (PCA), non-orthogonal rgb color space is transformed into the color space of quadrature, the color space of the quadrature after this processes through PCA is expressed as the SP color space.

The structural representation of the BP neural network model of Fig. 3 embodiment of the present invention.As shown in Figure 3, the input layer of the BP neural network model of the embodiment of the present invention has 6 input parameters, is respectively: R ', G ', B ', Y '-Y, U '-U, V '-V.Wherein, R ', G ', the value of these 3 input parameters of B ' is the SP coordinate figure the pixel of background image is converted into the SP color space of quadrature from non-orthogonal rgb color space after; Y '-Y, U '-U, V '-V are other 3 input parameters of the input layer of this BP neural network model, and Y wherein, U, V are that the pixel of background image is converted into YUV coordinate figure after the YUV color space from rgb color space; Y ', U ', V ' is converted into YUV coordinate figure after the YUV color space for the corresponding pixel points of current frame image from rgb color space.The BP neural network model of the embodiment of the present invention has 3 desired output parameters, is respectively: R ", G ", B ", its value is that the pixel of current frame image is converted into SP coordinate figure after the SP color space from rgb color space.

Due in rgb color space, R, G, three color components of B have certain correlativity, if directly set up the BP neural network model on this rgb color space, can not cause the precision of prediction of BP neural network model to descend because this RGB color space does not possess orthogonality, therefore at first the embodiment of the present invention adopts principal component analysis (PCA) to eliminate R, G, correlativity between the B component, setting up the color space of quadrature, and then on the color space of this quadrature, obtain training data with training BP neural network model.Below for transform to the specific implementation method of SP color space from rgb color space based on principal component analysis (PCA):

For example can take several pictures by camera or other picture pick-up device, obtain the R of n pixel altogether, G, B component value.

1, making j color component of i point is x _ij, wherein, i=1,2 ..., n; J=1,2,3.Raw data is carried out standardization:

z_{ij} = \frac{x_{ij} - \overset{&OverBar;}{x_{j}}}{\sqrt{var (x_{j})}} - - - (2)

Wherein,

\overset{&OverBar;}{x_{j}} = Σ_{i = 1}^{n} x_{ij} / n

(n the mean value of putting at the j component),

var (x_{j}) = \frac{1}{n - 1} Σ_{j = 1}^{n} {(x_{ij} - \overset{&OverBar;}{x_{j}})}^{2}

(variance)

2, calculate autocorrelation matrix R=(r _ij) _{3 * 3},

r_{ij} = \frac{1}{n - 1} Σ_{k = 1}^{n} (z_{ki} - \overset{&OverBar;}{z_{i}}) {(z_{kj} - \overset{&OverBar;}{z_{j}})}^{'} - - - (3)

Wherein,

r_{ij} = \frac{1}{n - 1} Σ_{k = 1}^{n} (z_{ki} - \overset{&OverBar;}{z_{i}}) {(z_{kj} - \overset{&OverBar;}{z_{j}})}^{'},

i＝1，2，3；j＝1，2，3。

3, order | R-λ I|=0, find the solution the characteristic root λ of autocorrelation matrix R ₁, λ ₂, λ ₃And characteristic of correspondence vector

4, calculate each major component characteristic root contribution rate or weight:

C_{j} = λ_{j} / Σ_{j = 1}^{3} λ_{k} - - - (4)

Just obtained by proper vector by principal component analysis (PCA)

The new quadrature color space SP that consists of.

5, obtaining pixel from the coordinate transform formula that former rgb space transforms to the SP space is:

(R^{'}, G^{'}, B^{'}) = (R, G, B) {({\overset{&RightArrow;}{e}}_{1}, {\overset{&RightArrow;}{e}}_{2}, {\overset{&RightArrow;}{e}}_{3})}^{T} - - - (5)

Alternatively, the aforementioned conversion process that transforms to the YUV color space from rgb color space, can carry out according to following formula:

[\begin{matrix} Y \\ U \\ V \end{matrix}] = [\begin{matrix} 16 \\ 128 \\ 128 \end{matrix}] + [\begin{matrix} 65.481 & 128.533 & 24.966 \\ - 37.797 & - 74.204 & 112.000 \\ 112.000 & - 93.786 & - 18.214 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}] - - - (6)

Due to before image is carried out processing in real time, need to set up in advance and train anti-light and disturb neural network model and eliminate the shade neural network model, so following foundation and the training process that will describe above-mentioned two models in detail.

Fig. 4 is the Establishment of Neural Model of the embodiment of the present invention and the method flow diagram of training.As Fig. 4 institute method, the method comprises:

S401, the input parameter of determining described neural network model and desired output parameter;

Described neural network model comprises eliminates shade neural network model and anti-light interference neural network model, and in the present embodiment, the type of above-mentioned model is the BP neural network model.Said process specifically can comprise:

The input parameter of determining described elimination shade neural network model comprises: in background image, the pixel corresponding with the shadow region of the image that has prospect to enter is at the SP coordinate figure of the SP of quadrature color space; The correspondence position pixel of the shadow region pixel of the image that prospect enters and background image is arranged in the YUV of YUV color space coordinate difference; The desired output parameter of described elimination shade neural network model comprises: described have the shadow region pixel of the image that prospect enters at the SP coordinate figure of the SP of quadrature color space;

Determine that described anti-light disturbs the input parameter of neural network model to comprise: the pixel of the reference picture that becomes according to reference light Johnson ﹠ Johnson is at the SP coordinate figure of the SP of quadrature color space; The pixel of the present image that generates according to other light intensity and the correspondence position pixel of reference picture are in the YUV of YUV color space coordinate difference; Described anti-light disturbs the desired output parameter of neural network model to comprise: the pixel of present image is at the SP coordinate figure of the SP of quadrature color space.

S402, determine hidden neuron transport function, output layer neural transferring function, hidden layer number and training method;

Particularly, said process can comprise: the hidden neuron transport function of determining described neural network model comprises tanh S type function; For example can be tansig () function; The output layer neural transferring function of determining described neural network model comprises pure linear transmission function, for example can be purelin () function; Wherein, purelin () function and tansig () function can be replaced mutually.

Alternatively, the hidden layer of neural network model and output layer transport function can also adopt: logarithm S type function is for example logsig () function.Wherein, purelin () function, logsig () function, tansig () function three can replace mutually.Preferably, the neural transferring function of the hidden layer of neural network model and output layer is got the combination of tansig/purelin.Above-mentioned hidden neuron transport function can be for contacting between the input layer of setting up neural network model and hidden layer, and the output layer neural transferring function can be for contacting between the hidden layer of setting up neural network model and output layer.

Determine the hidden layer number of described neural network model according to the number of the input parameter of described neural network model, i.e. the hidden neuron nodes.Consider the complexity of processing image, the embodiment of the present invention adopts 2 layers of neural network structure.The design of hidden neuron nodes directly affects network to the mapping ability of actual complex problem.If the hidden neuron node too much can, can cause the overlong time of e-learning, error also differs and is decided to be the best; Also can affect in addition the fault-tolerance of network.If the hidden neuron nodes is very few, study is searched in local minimum, can not obtain reliable results.So the setting of hidden layer number is controlled the speed of convergence in when training and error and is played a crucial role." 2m+1 " method that the hidden neuron nodes of the embodiment of the present invention adopts Hecht-Nielsen to propose, wherein m is the input layer number of BP neural network model.Because the embodiment of the present invention selects 6 parameters as the input of BP neural network model, so the hidden neuron nodes is got 13.

And the training method of determining described neural network model comprises Scaled Conjugate Gradient Method (trainscg).Optional method can also comprise: traingda learning rate changing gradient descent algorithm, traingdx learning rate changing momentum gradient descent algorithm.Above-mentioned training method sees document for details: " application of method of conjugate gradient in the BP network " Zhou Jianhua-computer engineering and application, 1999 p17-18.Adopt the error of the favourable reduction neural network model of above-mentioned training method in training process, make neural network model reach as early as possible stable.

Need to prove, these are only some better embodiments of the embodiment of the present invention, and nonessential embodiment.

The training data of S403, the described neural network model of acquisition;

Fig. 5 is the method flow diagram of training data of the acquisition neural network model of the embodiment of the present invention.As shown in Figure 5, the method comprises;

S501, collection are used for training the image of described neural network model;

Said process specifically comprises:

For training anti-light dried neural network model, can choose m different scenes, select in order one of them scene, adjust the intensity of light source, take under Different Light intensity this group scene image and be total to k and open.This k opens image and belongs to same scene, and the difference of their pixels on same position is mainly that variation has occured in brightness.Appoint in each scene and get an image, the RGB coordinate figure of each pixel in image is done suc as formula the coordinate transform shown in (5).Mark not on the same group under scene each pictures after coordinate transform be respectively P ₁₁, P ₁₂..., P _mk

Eliminate the shade neural network model for training, it is as follows that it gathers image method: for same scene, when entering without prospect, gather image; And the hypographous image that forms after having prospect to enter, gathering this hypographous image, manual intervention marks the shadow region.Above-mentioned data acquisition can carry out under many scenes, and is wider to guarantee the neural network model adaptability that obtains by training data.

S502, described image is carried out SP colour space transformation and YUV colour space transformation based on the quadrature of principal component analysis (PCA), to generate the training data of described neural network model;

For training anti-light dried neural network model, for i (i=1,2 ... m) group scene is according to formula (6) computed image P _ilIn (reference picture), all pixels are converted into YUV coordinate figure after the YUV color space from rgb color space, are designated as Y, U, V; According to formula (6) computed image P _ij(current frame image) (j=2,3 ... k) in, the YUV coordinate figure of all pixels after being converted into the YUV color space from rgb color space, be designated as Y ', U ', V '; Then, calculate P _ijAnd P _ilThe difference Y '-Y of YUV coordinate figure of correspondence position pixel, U '-U, V '-V; According to formula (5) computed image P _ilIn (reference picture), all pixels are converted into SP coordinate figure after the SP color space from rgb color space, are designated as R ', G ', B '; According to formula (5) computed image P _ij(current frame image) (j=2,3 ... k) in, all pixels are converted into SP coordinate figure after the SP color space from rgb color space, are designated as R ", G ", B "; At last, with R ', G ', B ', Y '-Y, U '-U, V '-V disturb the input parameter of neural network model as this anti-light, and with R "; G ", B " and disturb neural network model desired output parameter as this anti-light, to disturb neural network model to train to this anti-light.

About the selection of reference picture, the embodiment of the present invention also is restricted to first image that gathers under first illumination, also can be other situation, and for example also can get same light is reference picture according to many lower images image that forms after processing of averaging.

For eliminating the shade neural network model, choose the different scenes of m group, select in order one of them scene.Under fixed light source, take lower this scene image; When prospect enters, because prospect shuts out the light, thereby produce shade.Manually gather the regional ZONE that produces shade _S, and the corresponding region ZONE in the scene image that enters without prospect _BTo ZONE _BIn pixel do coordinate transform suc as formula (5) expression after, generate wherein three input parameter R ' of this elimination shade BP neural network, G ', B '; ZONE _SIn pixel be expressed as Y ', U ', V ' according to formula (6) from the YUV coordinate figure that former rgb color space is transformed into after yuv space; ZONE _BIn pixel be expressed as Y, U, V according to formula (6) from the YUV coordinate figure that former rgb color space is transformed into after yuv space; Then obtain ZONE _SAnd ZONE _BThe difference table of the YUV coordinate figure of the pixel of middle correspondence position is shown Y '-Y, U '-U, and V '-V, and with its other three input parameters as elimination shade BP neural network; The input of namely eliminating shade BP neural network still is R ', G ', B ', Y '-Y, U '-U, V '-V; To ZONE _SIn corresponding pixel points do suc as formula the coordinate transform shown in (5), obtain R ", G ", B ", with it as the desired output parameter of eliminating shade BP neural network, to train eliminating the shade neural network model.

S503, described image is carried out Uniformizing samples process.

Said process is specially: affect the accuracy of model for local undue adaptation of effectively avoiding neural network model, when gathering training sample, need to consider sampling evenly.By the characteristic root that obtains in the aforementioned colour space transformation process of analysing based on major component: λ ₁, λ ₂, λ ₃, can know the weight C of each component under SP hue coordinate space _j(j=1,2,3) see also formula (4).Due to the weighted of each component, thus adopt the fine degree of homogenising also different, order:

W _i＝C _i×256 i＝(1，2，3)

W _iFineness value for respective color component sampling under the SP space.Suppose that the sample space is that a length is 256 cube V, this cube is divided into several long 256/W of being ₁, wide is 256/W ₂, height is 256/W ₃Small cubes.So-called sampling evenly is about to each group point set P _j(j=1 ..., the R ' in m), G ', B ' value corresponds to the coordinate figure of cube V, if can choose point set P _jA subset make comprise and only comprise a sampled point in all small cubes of cube V, thinking that sampling is uniformly, otherwise the sampling that continues to take pictures.

S404, the value of input parameter that generates described neural network model according to described training data and the value of desired output parameter;

For example, for the training of the dried neural network model of anti-light, (i=1 this moment), get P as an example of first scene example ₁₁Be reference picture, and P ₁₁Comprise a plurality of pixels, describe as an example of one of them pixel Pixel1 example, and P _1jFor current frame image (j=2,3 ... k), hypothesis is got P here ₁₂As current frame image describing, and P ₁₂Also comprise corresponding a plurality of pixel, the pixel Pixel1 ' that is in correspondence position with one of them and Pixel1 describes.

With the SP coordinate figure R ' of Pixel1, G ', B ' is as the wherein value of three input parameters of the dried neural network model of anti-light; With the YUV coordinate difference Y ' of Pixel1 ' and Pixel1-Y, U '-U, V '-V is as the value of other three input parameters of the dried neural network model of anti-light; SP coordinate figure R with Pixel1 ' ", G ", B is " as the value of the desired output parameter of the dried neural network model of anti-light.

S405, according to the described neural network model of value training of value and the desired output parameter of the input parameter of described hidden neuron transport function, output layer neural transferring function, hidden layer number, training method, described neural network model.

for example, with above-mentioned coordinate figure (R ', G ', B ', Y '-Y, U '-U, V '-V) inputs described neural network model, then disturb hidden neuron transport function the tansig () function of neural network model according to the anti-light of determining in above-mentioned steps, output layer neural transferring function purelin () function, 13 hidden neuron nodes, Scaled Conjugate Gradient Method calculates above-mentioned input value, obtain the output valve of neural network model, and the value of this output valve and above-mentioned desired output parameter relatively, both errors are passed through output layer, to hidden layer, input layer is anti-pass successively, and revise or adjust each layer weights, thereby completed a training process of neural network model.By a large amount of similarly training process, can obtain effective neural network model.Because training process particularly is prior art, be not repeated herein.

Alternatively, flow process shown in Figure 4 can further include:

S406, the error threshold scope that described neural network model is set and/or train epochs threshold range; In described error threshold scope, and/or when the train epochs of described neural network model is in described train epochs threshold range, finish training process when the error of described neural network model.

In the process of neural network training model, real output value and expectation value output valve by the comparative neural network model, obtain the error amount of neural network model, then whether relatively this error amount and above-mentioned default error threshold, reached requirement or reached stable to judge this neural network model.

The error threshold scope of described neural network model can comprise 0.00001～20; The train epochs scope of described neural network model can comprise: 10 ⁶Step～10 ⁷Step.Optionally, in training process, preferred training error value is 20, and preferred train epochs was 1,000,000 steps.In the process of neural network training model, if the error of neural network model has reached the train epochs that arranges during 1,000,000 step less than the error 20 that arranges or train epochs, deconditioning.

Through computing after a while, neural network is gradually stable, at this moment, thinks that this network trains, so preserve network.See also Fig. 6, Fig. 6 is the training process schematic diagram of the BP neural network of the embodiment of the present invention.As shown in Figure 6, in figure, the longitudinal axis represents the error amount of training pattern, and transverse axis represents the step number of training.The network that preservation trains, the training process of BP neural network finishes.Disturb the BP neural network model to be designated as NET for anti-light _Light, this network is used as the testing model of eliminating the illumination noise, is designated as NET for eliminating the shade neural network model _Shadow, this network is used as the testing model when eliminating shade.

After above-mentioned anti-light disturbs neural network model and elimination shade neural network model to train, carry out shadow Detection and the anti-light interference method based on artificial neural network of the embodiment of the present invention, with the better prospect of extraction effect from video image.This real-time treatment scheme specifically comprises:

One, carry out the Gauss model modeling process;

This process is responsible for the model that each point in background frames is set up a Gaussian distribution.Due to the dynamic environment more complicated, some extraneous factors are easy to cause the variation of background.As illumination variation, background perturbation, and moving target enters and all can cause background change, these changes are noise spot.The noise of supposing each point in video image is mutually independently on statistical significance, and in video image, each is put at the noise of present frame location independent residing with it, and this noise does not rely on this at the noise of former frame.

According to above-mentioned hypothesis, can obtain the statistical model of background image by the noise profile of measuring video image, the way of applied statistics analysis is to reach the purpose of removing noise.Variation take background when setting up background model is slowly as prerequisite, in order to eliminate noise, to the video sequence of inputting under rgb color space, gather in advance 2n frame background image (namely entering without prospect), the data sample of front n frame conduct is used for setting up background model.Have certain correlativity between color value and noise, in general the noise that comprises of dark pixel is larger.That is to say, although in background frames, the value of each pixel satisfies Gaussian distribution on different color channels, under the impact of noise all pixel values and on different color channels Gaussian distributed not necessarily.If the arbitrfary point in image array is x, each point has R, G, and three color components of B are take the R component as example: order

Be the R value of a certain pixel X in background image in continuous n frame,

Be the noise of this X in continuous n frame, for

Have

&ForAll; i &Element; [1 . . n] :

μ_{x_{i}} = μ_{x} + δ_{x}

Set up, wherein μ _xIt is the actual value of this point.

Suppose to describe the stochastic variable δ of noise _xAverage be 0, namely

E (δ_{x}) = E ({δ_{x_{1}}, δ_{x_{2}}, \cdot \cdot \cdot \cdot \cdot \cdot, δ_{x_{n}}, \cdot \cdot \cdot \cdot \cdot \cdot}) = 0;

According to central limit theorem, enough multiframe samples are in the approximate Gaussian distributed of this value, namely

\frac{1}{n} Σ_{i = 1}^{n} μ_{x_{i}} ~ N ({\overset{&OverBar;}{μ}}_{x}, σ_{x}^{2});

Wherein

{\overset{&OverBar;}{μ}}_{x} = \frac{1}{n} Σ_{i = 1}^{n} μ_{x_{i}} \approx μ_{x}

Be the mathematical expectation of this statistic,

σ_{x} = \sqrt{\frac{1}{n - 1} Σ_{i = 1}^{n} {(μ_{x_{i}} - {\overset{&OverBar;}{μ}}_{x})}^{2}}

Be the variance of this statistic, this n two field picture of each point is tried to achieve average μ _x, variances sigma _x

After before utilizing, n frame background image is set up the mean variance matrix model, as training sample, train the threshold value Nx of every of background model with n+1 to 2n frame background image.,, establish for any point X as example take a frame background image wherein Be the R value of background image mid point X in continuous n frame, and Nx is threshold value parameter to be trained.According to

| μ_{x_{i}} - \overset{&OverBar;}{μ_{x}} | = N_{X} \times σ_{x}

I ∈ [n+1 ..., 2n], obtain the Nx of this pixel X, because training sample has n frame background image, so the X pixel has n Nx, get a threshold value N that conduct is final of the maximum in this n Nx _max, can obtain a threshold matrix N because each pixel has a threshold value.So far Background Modeling is complete.

Two, utilize this Gauss model to carry out the background separation process;

When having prospect to enter, gather the sequence of frames of video that has prospect to enter, for every two field picture, utilize above Gauss model to judge, carry out background Differential Detection foreground area.

Each input video frame and the background model that established are compared, if there are difference to a certain degree in the pixel characteristic of corresponding background image co-located, pixel region feature or further feature (usually using average and variance to compare), in new video frame the pixel of these positions or pixel region with regard to the formation prospect, otherwise be background.

Concrete steps are as follows: for any point X of i two field picture in the video flowing of input, μ _xBe its color intensity, Nx is the value of threshold value matrix N mid point X position.Background subtraction go-on-go gauge is as follows:

If two or more color component μ _xWith this average μ _xPoor absolute value exceeds threshold value and variances sigma _xProduct this pixel be prospect, otherwise, otherwise be background, eliminate from present image, the image of eliminating behind the background area is designated as P _Gauss

Three, utilize elimination shade neural network model to eliminate shade;

After separating background, P _GaussTo be the remainder of eliminating after background, at this moment, P _GaussCan be divided into three parts namely: object part, light source change noise spot and the shade that produces.

To P _GaussIn any pixel P _f(x, y) does the SP colour space transformation with formula (5), is designated as P _F=(R _f, G _f, B _f).And for the mean value image P that forms according to the background frames image of predetermined number _Mean(P _MeanPixel value by the correspondence position pixel of predetermined number background frames image is got average and forms) at the pixel P of this position _b(x, y) does the SP colour space transformation, is designated as P _B=(R, G, B) calculates this foreground point (P _Gauss) and P _MeanThe YUV coordinate difference of the pixel of middle correspondence is Y '-Y _B, U '-U _B, V '-V _BWith L=(R, G, B, Y '-Y _B, U '-U _B, V '-V _B) as input value, calculate with the elimination shade BP neural network model that trains, its output is designated as R ' _o, G ' _o, B ' _oPredicted value R ' with output _o, G ' _o, B ' _oWith foreground point (P _Gauss) color component coordinate figure P on the SP color space _F=(R _f, G _f, B _f) compare.

Four, utilize anti-light to disturb neural network model to eliminate the illumination noise.

After processing through the shadow Detection of eliminating shade BP neural network, P _ShadowThere are two parts in the zone that also comprises namely: object part and light source change the noise spot that produces.Because external environment is complicated, light source usually changes, and therefore can produce some and change the noise spot that causes due to light source.

To P _ShadowIn pixel, do the SP colour space transformation with formula (5), obtain P _ShadowThe SP coordinate figure of the color component of middle pixel is designated as P _F=(R _f, G _f, B _f).And for P _MeanIn be in the pixel P of correspondence position _bThe color component of (x, y) background is designated as P _B=(R, G, B).And calculate P _ShadowMiddle pixel and the mean value image P that forms according to the background frames image of predetermined number _MeanIn be in correspondence position pixel be Y '-Y in the coordinate difference of YUV color space _B, U '-U _B, V '-V _B

With L=(R, G, B, Y '-Y _B, U '-U _B, V '-V _B) as input value, disturb the BP neural network model to calculate with the anti-light that trains, its output is designated as R ' _o, G ' _o, B ' _oPredicted value R ' with output _o, G ' _o, B ' _oWith P _ShadowThe coordinate figure P of the color component on the SP color space _F=(R _f, G _f, B _f) compare.

If satisfy: | R ' _o-R _f|＜R*T _LightAnd | G ' _o-G _f|＜G*T _LightAnd | B ' _o-B _f|＜B*T _Light, detect P (x, y) and change the noise spot that produces for light source.Then detected illumination noise spot is eliminated, needing just to have been obtained the target prospect object of extraction.

Through a large number of experiments show that: when threshold value is got T _Shadow=10%, T _LightCan obtain detecting preferably effect in the time of=25%.Need to prove, the embodiment of the present invention does not limit above-mentioned processing sequence, can first do yet and eliminate the illumination noise processed, then do the elimination Shadows Processing.

The inventor finds by many experimental results, and than the method for prior art, the embodiment of the present invention changes to shade and light source the noise spot that produces can effectively be removed, and the successful of extraction is better than the former.In addition, in case after the neural network model of the embodiment of the present invention (off-line) was set up, during its practical application, computation complexity was lower, only for linear session is O (n), thereby can reach the effect of real-time elimination illumination noise and shade.

At first the embodiment of the present invention, eliminates R with principal component analysis (PCA) by eliminating shade and illumination noise in conjunction with principal component analysis (PCA) and BP neural network, G, the correlativity of B color space; Then, simulate normal background dot and the BP neural network model that is subject to the light noise spot that disturbs and the shadow spots Relations Among that produces shade by the method based on neural metwork training; And use this neural network model to calculate the predicted value of measuring point to be checked, by predicted value and actual value are judged relatively whether this measuring point to be checked is shadow spots or light interference noise point; At last, the shadow spots and the illumination interference noise point that comprise by eliminating the image to be detected of separating background extract desirable prospect.

By discovery that the extraction result of the embodiment of the present invention is compared with the extraction result of prior art, the BP neural network based on principal component analysis (PCA) of the embodiment of the present invention is significantly improved and improves the extraction effect of moving object.And the embodiment of the present invention has improved by utilizing the BP neural net prediction method based on principal component analysis (PCA) the accuracy that shadow Detection and anti-light disturb.Simultaneously, find by lot of experiments, the embodiment of the present invention based on after principal component analysis (PCA) and Uniformizing samples, the elimination shade that off-line is set up and anti-light disturb neural network that adaptability is very widely arranged, and can obtain effect preferably under different environment.The method of the embodiment of the present invention 2 need to not set up feature database for shade and light interference.

The embodiment of the present invention also provides a kind of device of extracting foreground from video image.Fig. 7 is the functional block diagram of device of a kind of extracting foreground from video image of the embodiment of the present invention.As shown in Figure 7, this device 70 comprises:

Background is eliminated unit 701, is used for utilizing the Gauss model of setting up according to the background frames image of predetermined number to eliminate and has the background of the video frame images that prospect enters, and obtains to eliminate the video frame images after background;

Input parameter generation unit 702 is used for the value of the input parameter of the neural network model good according to the Information generation training in advance of the background frames image of the information of the video frame images after described elimination background and described predetermined number;

Output valve generation unit 703 is used for generating corresponding output valve according to value and the described neural network model that trains of input parameter;

Shade and noise processed unit 704 are used for the relation according to the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and illumination noise spot that the video frame images after described elimination background comprises.

Alternatively, described device 70 can further include: training unit 705 is used for the neural network training model.The method of concrete foundation and neural network training model describes in detail in embodiment 2, does not repeat them here.

Fig. 8 is the refinement functional block diagram of the input parameter generation unit of the embodiment of the present invention.As shown in Figure 8, in concrete the application, described input parameter generation unit 703 can comprise: the mean value image generates subelement 7031, is used for generating the mean value image according to the background frames image of described predetermined number; SP color space value generation unit 7032 is used for by the rgb color space value is carried out principal component analysis, the SP color space value of generating orthogonal; The first computation subunit 7033 is used for generating the correspondence position pixel of the pixel of the video frame images after described elimination background and described mean value image in the YUV of YUV color space coordinate difference; The second computation subunit 7034 is used for generating the correspondence position pixel of described mean value image at the SP coordinate figure of the SP of quadrature color space; Input subelement 7035 is used for described SP coordinate figure and the good neural network model of the YUV coordinate difference described training in advance of input.

Alternatively, shade and noise processed unit 705, also be used for the relation according to the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and the illumination noise spot that the video frame images after described elimination background comprises and comprise:

The information of the video frame images after described elimination background and the relation of described output valve satisfy following relational expression:

Wherein, the pixel of the video frame images after described elimination background is expressed as at the coordinate figure of the SP of quadrature color space: R _f, G _f, B _f, accordingly, the pixel of the mean value image of the background frames image of described predetermined number is expressed as at the coordinate figure of the SP of quadrature color space: R, and G, B, the described output value table of described nerve net lattice model is shown: R ' _o, G ' _o, B ' _o, wherein T represents processing coefficient;

The pixel of the video frame images after definite described elimination background is shadow spots or illumination noise spot, and eliminates described shadow spots or illumination noise spot.

Optionally, use value T=T for eliminating shade _Shadow=10%; Or

Optionally, use value T=T for eliminating the illumination noise _Light=25%.

The specific works process of the device of the embodiment of the present invention describes in detail in the above-described embodiments, is not repeated herein.

a kind of extracting foreground from video image and the device of the embodiment of the present invention, the value of the input parameter of the neural network model good according to the Information generation training in advance of the background frames image of the information of the video frame images after described elimination background and described predetermined number, and according to the value of the described input parameter of described nerve net lattice model and the relation of described output valve, process shadow spots and illumination noise spot that the video frame images after described elimination background comprises, thereby can detect effectively rapidly or eliminate the shadow spots and the illumination noise spot that comprise in video image, be conducive to extract desirable foreground area from video image.The device of the embodiment of the present invention has improved the accuracy of shadow Detection and the interference of anti-light, and when the neural network model of the embodiment of the present invention was carried out computing simultaneously, complexity was lower, thereby can satisfy the application demand of real-time scene.

One of ordinary skill in the art will appreciate that all or part of flow process that realizes in above-described embodiment method, to come the relevant hardware of instruction to complete by computer program, described program can be stored in a computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (Random Access Memory, RAM) etc.

Above embodiment only in order to the technical scheme of the embodiment of the present invention to be described, is not intended to limit; Although with reference to previous embodiment, the embodiment of the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme that aforementioned each embodiment puts down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of each embodiment technical scheme of the embodiment of the present invention.

Claims

1. the method for an extracting foreground from video image, is characterized in that, described method comprises:

The Gauss model that utilization is set up according to the background frames image of predetermined number is eliminated the background that the video frame images that prospect enters is arranged, and obtains to eliminate the video frame images after background;

According to the information of the background frames image of the information of the video frame images after described elimination background and described predetermined number, generate the value of the input parameter of the good neural network model of training in advance;

Generate corresponding output valve according to value and the described neural network model that trains of described input parameter;

According to the relation of the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and illumination noise spot that the video frame images after described elimination background comprises; Described according to the video frame images after described elimination background information and the relation of the output valve of described neural network model, detect shadow spots and the illumination noise spot that the video frame images after described elimination background comprises and comprise:

The relation of the information of the video frame images after described elimination background and the output valve of described neural network model satisfies following relational expression:

| \overset{&OverBar;}{R_{o}^{'}} - {\overset{&OverBar;}{R}}_{f} | < \overset{&OverBar;}{R} * T,

And

| \overset{&OverBar;}{G_{o}^{'}} - {\overset{&OverBar;}{G}}_{f} | < \overset{&OverBar;}{G} * T,

And

| \overset{&OverBar;}{B_{o}^{'}} - {\overset{&OverBar;}{B}}_{f} | < \overset{&OverBar;}{B} * T

Wherein, the pixel of the video frame images after described elimination background is expressed as at the coordinate figure of the SP of quadrature color space: The correspondence position pixel of the mean value image of the background frames image of described predetermined number is expressed as at the SP coordinate figure of the SP of quadrature color space:

The described output value table of described nerve net lattice model is shown:

Wherein T represents processing coefficient; Wherein, by principal component analysis (PCA) PCA, non-orthogonal rgb color space is transformed into the color space of quadrature, the color space of the quadrature after processing through PCA is expressed as the SP color space;

The described pixel of the video frame images after definite described elimination background is shadow spots or illumination noise spot.

2. method according to claim 1, is characterized in that, according to the information of the background frames image of the information of the video frame images after described elimination background and described predetermined number, the value that generates the input parameter of the good neural network model of training in advance comprises;

Generate the mean value image according to the background frames image of described predetermined number;

By the rgb color space value is carried out principal component analysis, the SP color space value of generating orthogonal;

Generate the correspondence position pixel of the pixel of the video frame images after described elimination background and described mean value image in the YUV of YUV color space coordinate difference;

Generate the correspondence position pixel of described mean value image at the SP coordinate figure of the SP of quadrature color space;

With described SP coordinate figure and the good neural network model of the YUV coordinate difference described training in advance of input.

3. method according to claim 2, is characterized in that, described method also comprises: the neural network training model, and described neural network training model comprises:

Determine input parameter and the desired output parameter of described neural network model;

Determine hidden neuron transport function, output layer neural transferring function, hidden layer number and training method;

Obtain the training data of described neural network model;

Generate the value of input parameter of described neural network model and the value of desired output parameter according to described training data;

The described neural network model of value training according to value and the desired output parameter of the input parameter of described hidden neuron transport function, output layer neural transferring function, hidden layer number, training method, described neural network model.

4. according to method as claimed in claim 3, it is characterized in that, described method also comprises: error threshold scope and train epochs threshold range that described neural network model is set; In described error threshold scope, or when the train epochs of described neural network model is in described train epochs threshold range, finish training process when the error of described neural network model.

5. according to claim 1-4 described methods of any one, is characterized in that, described neural network model comprises eliminates shade neural network model and anti-light interference neural network model; The input parameter of described elimination shade neural network model comprises:

In background image, the pixel corresponding with the shadow region of the image that has prospect to enter is at the SP coordinate figure of the SP of quadrature color space; The correspondence position pixel of the shadow region pixel of the image that prospect enters and background image is arranged in the YUV of YUV color space coordinate difference;

The desired output parameter of described elimination shade neural network model comprises: described have the shadow region pixel of the image that prospect enters at the SP coordinate figure of the SP of quadrature color space;

Described anti-light disturbs the input parameter of neural network model to comprise:

The pixel of the reference picture that becomes according to reference light Johnson ﹠ Johnson is at the SP coordinate figure of the SP of quadrature color space; The pixel of the present image that generates according to other light intensity and the pixel of reference picture are in the YUV of YUV color space coordinate difference;

Described anti-light disturbs the desired output parameter of neural network model to comprise: the pixel of present image is at the coordinate figure of the SP of quadrature color space.

6. method according to claim 5, is characterized in that, the hidden neuron transport function of described definite neural network model, output layer neural transferring function, hidden layer number and training method comprise:

The hidden neuron transport function of determining described neural network model comprises tanh S type function;

The output layer neural transferring function of determining described neural network model comprises pure linear transmission function;

According to the number of the input parameter of described neural network model, determine the hidden layer number of described neural network model; And

The training method of determining described neural network model comprises Scaled Conjugate Gradient Method.

7. method according to claim 3, is characterized in that, the training data of the described neural network model of described acquisition comprises;

Gather the image that is used for training described neural network model;

Described image is carried out SP colour space transformation and YUV colour space transformation based on the quadrature of principal component analysis (PCA), to generate the training data of described neural network model.

8. method according to claim 1, is characterized in that,

Use value T=10% for eliminating shade; Or

Use value T=25% for eliminating the illumination noise.

9. the device of an extracting foreground from video image, is characterized in that, described device comprises:

Background is eliminated the unit, is used for utilizing the Gauss model of setting up according to the background frames image of predetermined number to eliminate and has the background of the video frame images that prospect enters, and obtains to eliminate the video frame images after background;

The input parameter generation unit is used for the information according to the background frames image of the information of the video frame images after described elimination background and described predetermined number, generates the value of the input parameter of the good neural network model of training in advance;

The output valve generation unit is used for generating corresponding output valve according to value and the described neural network model that trains of input parameter;

Shade and noise processed unit are used for the relation according to the output valve of the information of the video frame images after described elimination background and described neural network model, detect shadow spots and illumination noise spot that the video frame images after described elimination background comprises; Described shade and noise processed unit, the relation of the output valve of the concrete information that is used for the video frame images after described elimination background and described neural network model satisfies following relational expression:

| \overset{&OverBar;}{R_{o}^{'}} - {\overset{&OverBar;}{R}}_{f} | < \overset{&OverBar;}{R} * T,

And

| \overset{&OverBar;}{G_{o}^{'}} - {\overset{&OverBar;}{G}}_{f} | < \overset{&OverBar;}{G} * T,

And

| \overset{&OverBar;}{B_{o}^{'}} - {\overset{&OverBar;}{B}}_{f} | < \overset{&OverBar;}{B} * T

Wherein, the pixel of the video frame images after described elimination background is expressed as at the coordinate figure of the SP of quadrature color space:

The correspondence position pixel of the mean value image of the background frames image of described predetermined number is expressed as at the SP coordinate figure of the SP of quadrature color space:

The described output value table of described nerve net lattice model is shown:

10. device according to claim 9, is characterized in that, described input parameter generation unit comprises:

The mean value image generates subelement, is used for generating the mean value image according to the background frames image of described predetermined number;

SP color space value generation unit is used for by the rgb color space value is carried out principal component analysis, the SP color space value of generating orthogonal;

The first computation subunit is used for generating the correspondence position pixel of the pixel of the video frame images after described elimination background and described mean value image in the YUV of YUV color space coordinate difference;

The second computation subunit is used for generating the correspondence position pixel of described mean value image at the SP coordinate figure of the SP of quadrature color space;

The input subelement is used for described SP coordinate figure and the good neural network model of the YUV coordinate difference described training in advance of input.

11. according to claim 9 or 10 described devices is characterized in that, described device also comprises: training unit, and with the described neural network model of training.